Class Nysiis

  • All Implemented Interfaces:
    Encoder, StringEncoder

    public class Nysiis
    extends java.lang.Object
    implements StringEncoder
    Encodes a string into a NYSIIS value. NYSIIS is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes.

    NYSIIS features an accuracy increase of 2.7% over the traditional Soundex algorithm.

    Algorithm description:

     1. Transcode first characters of name
       1a. MAC ->   MCC
       1b. KN  ->   NN
       1c. K   ->   C
       1d. PH  ->   FF
       1e. PF  ->   FF
       1f. SCH ->   SSS
     2. Transcode last characters of name
       2a. EE, IE          ->   Y
       2b. DT,RT,RD,NT,ND  ->   D
     3. First character of key = first character of name
     4. Transcode remaining characters by following these rules, incrementing by one character each time
       4a. EV  ->   AF  else A,E,I,O,U -> A
       4b. Q   ->   G
       4c. Z   ->   S
       4d. M   ->   N
       4e. KN  ->   N   else K -> C
       4f. SCH ->   SSS
       4g. PH  ->   FF
       4h. H   ->   If previous or next is nonvowel, previous
       4i. W   ->   If previous is vowel, previous
       4j. Add current to key if current != last key character
     5. If last character is S, remove it
     6. If last characters are AY, replace with Y
     7. If last character is A, remove it
     8. Collapse all strings of repeated characters
     9. Add original first character of name as first character of key
     

    This class is immutable and thread-safe.

    Since:
    1.7
    See Also:
    NYSIIS on Wikipedia, NYSIIS on dropby.com, Soundex
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static char[] CHARS_A  
      private static char[] CHARS_AF  
      private static char[] CHARS_C  
      private static char[] CHARS_FF  
      private static char[] CHARS_G  
      private static char[] CHARS_N  
      private static char[] CHARS_NN  
      private static char[] CHARS_S  
      private static char[] CHARS_SSS  
      private static java.util.regex.Pattern PAT_DT_ETC  
      private static java.util.regex.Pattern PAT_EE_IE  
      private static java.util.regex.Pattern PAT_K  
      private static java.util.regex.Pattern PAT_KN  
      private static java.util.regex.Pattern PAT_MAC  
      private static java.util.regex.Pattern PAT_PH_PF  
      private static java.util.regex.Pattern PAT_SCH  
      private static char SPACE  
      private boolean strict
      Indicates the strict mode.
      private static int TRUE_LENGTH  
    • Constructor Summary

      Constructors 
      Constructor Description
      Nysiis()
      Creates an instance of the Nysiis encoder with strict mode (original form), i.e.
      Nysiis​(boolean strict)
      Create an instance of the Nysiis encoder with the specified strict mode: true: encoded strings have a maximum length of 6 false: encoded strings may have arbitrary length
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.Object encode​(java.lang.Object obj)
      Encodes an Object using the NYSIIS algorithm.
      java.lang.String encode​(java.lang.String str)
      Encodes a String using the NYSIIS algorithm.
      boolean isStrict()
      Indicates the strict mode for this Nysiis encoder.
      private static boolean isVowel​(char c)
      Tests if the given character is a vowel.
      java.lang.String nysiis​(java.lang.String str)
      Retrieves the NYSIIS code for a given String object.
      private static char[] transcodeRemaining​(char prev, char curr, char next, char aNext)
      Transcodes the remaining parts of the String.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • CHARS_A

        private static final char[] CHARS_A
      • CHARS_AF

        private static final char[] CHARS_AF
      • CHARS_C

        private static final char[] CHARS_C
      • CHARS_FF

        private static final char[] CHARS_FF
      • CHARS_G

        private static final char[] CHARS_G
      • CHARS_N

        private static final char[] CHARS_N
      • CHARS_NN

        private static final char[] CHARS_NN
      • CHARS_S

        private static final char[] CHARS_S
      • CHARS_SSS

        private static final char[] CHARS_SSS
      • PAT_MAC

        private static final java.util.regex.Pattern PAT_MAC
      • PAT_KN

        private static final java.util.regex.Pattern PAT_KN
      • PAT_K

        private static final java.util.regex.Pattern PAT_K
      • PAT_PH_PF

        private static final java.util.regex.Pattern PAT_PH_PF
      • PAT_SCH

        private static final java.util.regex.Pattern PAT_SCH
      • PAT_EE_IE

        private static final java.util.regex.Pattern PAT_EE_IE
      • PAT_DT_ETC

        private static final java.util.regex.Pattern PAT_DT_ETC
      • strict

        private final boolean strict
        Indicates the strict mode.
    • Constructor Detail

      • Nysiis

        public Nysiis()
        Creates an instance of the Nysiis encoder with strict mode (original form), i.e. encoded strings have a maximum length of 6.
      • Nysiis

        public Nysiis​(boolean strict)
        Create an instance of the Nysiis encoder with the specified strict mode:
        • true: encoded strings have a maximum length of 6
        • false: encoded strings may have arbitrary length
        Parameters:
        strict - the strict mode
    • Method Detail

      • isVowel

        private static boolean isVowel​(char c)
        Tests if the given character is a vowel.
        Parameters:
        c - the character to test
        Returns:
        true if the character is a vowel, false otherwise
      • transcodeRemaining

        private static char[] transcodeRemaining​(char prev,
                                                 char curr,
                                                 char next,
                                                 char aNext)
        Transcodes the remaining parts of the String. The method operates on a sliding window, looking at 4 characters at a time: [i-1, i, i+1, i+2].
        Parameters:
        prev - the previous character
        curr - the current character
        next - the next character
        aNext - the after next character
        Returns:
        a transcoded array of characters, starting from the current position
      • encode

        public java.lang.Object encode​(java.lang.Object obj)
                                throws EncoderException
        Encodes an Object using the NYSIIS algorithm. This method is provided in order to satisfy the requirements of the Encoder interface, and will throw an EncoderException if the supplied object is not of type String.
        Specified by:
        encode in interface Encoder
        Parameters:
        obj - Object to encode
        Returns:
        An object (or a String) containing the NYSIIS code which corresponds to the given String.
        Throws:
        EncoderException - if the parameter supplied is not of a String
        java.lang.IllegalArgumentException - if a character is not mapped
      • encode

        public java.lang.String encode​(java.lang.String str)
        Encodes a String using the NYSIIS algorithm.
        Specified by:
        encode in interface StringEncoder
        Parameters:
        str - A String object to encode
        Returns:
        A Nysiis code corresponding to the String supplied
        Throws:
        java.lang.IllegalArgumentException - if a character is not mapped
      • isStrict

        public boolean isStrict()
        Indicates the strict mode for this Nysiis encoder.
        Returns:
        true if the encoder is configured for strict mode, false otherwise
      • nysiis

        public java.lang.String nysiis​(java.lang.String str)
        Retrieves the NYSIIS code for a given String object.
        Parameters:
        str - String to encode using the NYSIIS algorithm
        Returns:
        A NYSIIS code for the String supplied