Class Nysiis
- java.lang.Object
-
- org.apache.commons.codec.language.Nysiis
-
- All Implemented Interfaces:
Encoder
,StringEncoder
public class Nysiis extends java.lang.Object implements StringEncoder
Encodes a string into a NYSIIS value. NYSIIS is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes.NYSIIS features an accuracy increase of 2.7% over the traditional Soundex algorithm.
Algorithm description:
1. Transcode first characters of name 1a. MAC -> MCC 1b. KN -> NN 1c. K -> C 1d. PH -> FF 1e. PF -> FF 1f. SCH -> SSS 2. Transcode last characters of name 2a. EE, IE -> Y 2b. DT,RT,RD,NT,ND -> D 3. First character of key = first character of name 4. Transcode remaining characters by following these rules, incrementing by one character each time 4a. EV -> AF else A,E,I,O,U -> A 4b. Q -> G 4c. Z -> S 4d. M -> N 4e. KN -> N else K -> C 4f. SCH -> SSS 4g. PH -> FF 4h. H -> If previous or next is nonvowel, previous 4i. W -> If previous is vowel, previous 4j. Add current to key if current != last key character 5. If last character is S, remove it 6. If last characters are AY, replace with Y 7. If last character is A, remove it 8. Collapse all strings of repeated characters 9. Add original first character of name as first character of key
This class is immutable and thread-safe.
- Since:
- 1.7
- See Also:
- NYSIIS on Wikipedia,
NYSIIS on dropby.com,
Soundex
-
-
Field Summary
Fields Modifier and Type Field Description private static char[]
CHARS_A
private static char[]
CHARS_AF
private static char[]
CHARS_C
private static char[]
CHARS_FF
private static char[]
CHARS_G
private static char[]
CHARS_N
private static char[]
CHARS_NN
private static char[]
CHARS_S
private static char[]
CHARS_SSS
private static java.util.regex.Pattern
PAT_DT_ETC
private static java.util.regex.Pattern
PAT_EE_IE
private static java.util.regex.Pattern
PAT_K
private static java.util.regex.Pattern
PAT_KN
private static java.util.regex.Pattern
PAT_MAC
private static java.util.regex.Pattern
PAT_PH_PF
private static java.util.regex.Pattern
PAT_SCH
private static char
SPACE
private boolean
strict
Indicates the strict mode.private static int
TRUE_LENGTH
-
Constructor Summary
Constructors Constructor Description Nysiis()
Creates an instance of theNysiis
encoder with strict mode (original form), i.e.Nysiis(boolean strict)
Create an instance of theNysiis
encoder with the specified strict mode:true
: encoded strings have a maximum length of 6false
: encoded strings may have arbitrary length
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.Object
encode(java.lang.Object obj)
Encodes an Object using the NYSIIS algorithm.java.lang.String
encode(java.lang.String str)
Encodes a String using the NYSIIS algorithm.boolean
isStrict()
Indicates the strict mode for thisNysiis
encoder.private static boolean
isVowel(char c)
Tests if the given character is a vowel.java.lang.String
nysiis(java.lang.String str)
Retrieves the NYSIIS code for a given String object.private static char[]
transcodeRemaining(char prev, char curr, char next, char aNext)
Transcodes the remaining parts of the String.
-
-
-
Field Detail
-
CHARS_A
private static final char[] CHARS_A
-
CHARS_AF
private static final char[] CHARS_AF
-
CHARS_C
private static final char[] CHARS_C
-
CHARS_FF
private static final char[] CHARS_FF
-
CHARS_G
private static final char[] CHARS_G
-
CHARS_N
private static final char[] CHARS_N
-
CHARS_NN
private static final char[] CHARS_NN
-
CHARS_S
private static final char[] CHARS_S
-
CHARS_SSS
private static final char[] CHARS_SSS
-
PAT_MAC
private static final java.util.regex.Pattern PAT_MAC
-
PAT_KN
private static final java.util.regex.Pattern PAT_KN
-
PAT_K
private static final java.util.regex.Pattern PAT_K
-
PAT_PH_PF
private static final java.util.regex.Pattern PAT_PH_PF
-
PAT_SCH
private static final java.util.regex.Pattern PAT_SCH
-
PAT_EE_IE
private static final java.util.regex.Pattern PAT_EE_IE
-
PAT_DT_ETC
private static final java.util.regex.Pattern PAT_DT_ETC
-
SPACE
private static final char SPACE
- See Also:
- Constant Field Values
-
TRUE_LENGTH
private static final int TRUE_LENGTH
- See Also:
- Constant Field Values
-
strict
private final boolean strict
Indicates the strict mode.
-
-
Constructor Detail
-
Nysiis
public Nysiis()
Creates an instance of theNysiis
encoder with strict mode (original form), i.e. encoded strings have a maximum length of 6.
-
Nysiis
public Nysiis(boolean strict)
Create an instance of theNysiis
encoder with the specified strict mode:true
: encoded strings have a maximum length of 6false
: encoded strings may have arbitrary length
- Parameters:
strict
- the strict mode
-
-
Method Detail
-
isVowel
private static boolean isVowel(char c)
Tests if the given character is a vowel.- Parameters:
c
- the character to test- Returns:
true
if the character is a vowel,false
otherwise
-
transcodeRemaining
private static char[] transcodeRemaining(char prev, char curr, char next, char aNext)
Transcodes the remaining parts of the String. The method operates on a sliding window, looking at 4 characters at a time: [i-1, i, i+1, i+2].- Parameters:
prev
- the previous charactercurr
- the current characternext
- the next characteraNext
- the after next character- Returns:
- a transcoded array of characters, starting from the current position
-
encode
public java.lang.Object encode(java.lang.Object obj) throws EncoderException
Encodes an Object using the NYSIIS algorithm. This method is provided in order to satisfy the requirements of the Encoder interface, and will throw anEncoderException
if the supplied object is not of typeString
.- Specified by:
encode
in interfaceEncoder
- Parameters:
obj
- Object to encode- Returns:
- An object (or a
String
) containing the NYSIIS code which corresponds to the given String. - Throws:
EncoderException
- if the parameter supplied is not of aString
java.lang.IllegalArgumentException
- if a character is not mapped
-
encode
public java.lang.String encode(java.lang.String str)
Encodes a String using the NYSIIS algorithm.- Specified by:
encode
in interfaceStringEncoder
- Parameters:
str
- A String object to encode- Returns:
- A Nysiis code corresponding to the String supplied
- Throws:
java.lang.IllegalArgumentException
- if a character is not mapped
-
isStrict
public boolean isStrict()
Indicates the strict mode for thisNysiis
encoder.- Returns:
true
if the encoder is configured for strict mode,false
otherwise
-
nysiis
public java.lang.String nysiis(java.lang.String str)
Retrieves the NYSIIS code for a given String object.- Parameters:
str
- String to encode using the NYSIIS algorithm- Returns:
- A NYSIIS code for the String supplied
-
-