Class ColognePhonetic

  • All Implemented Interfaces:
    Encoder, StringEncoder

    public class ColognePhonetic
    extends java.lang.Object
    implements StringEncoder
    Encodes a string into a Cologne Phonetic value.

    Implements the Kölner Phonetik (Cologne Phonetic) algorithm issued by Hans Joachim Postel in 1969.

    The Kölner Phonetik is a phonetic algorithm which is optimized for the German language. It is related to the well-known soundex algorithm.

    Algorithm

    • Step 1:

      After preprocessing (conversion to upper case, transcription of germanic umlauts, removal of non alphabetical characters) the letters of the supplied text are replaced by their phonetic code according to the following table.
      (Source: Wikipedia (de): Kölner Phonetik -- Buchstabencodes)
      Letter Context Code
      A, E, I, J, O, U, Y 0
      H -
      B 1
      P not before H
      D, T not before C, S, Z 2
      F, V, W 3
      P before H
      G, K, Q 4
      C at onset before A, H, K, L, O, Q, R, U, X
      before A, H, K, O, Q, U, X except after S, Z
      X not after C, K, Q 48
      L 5
      M, N 6
      R 7
      S, Z 8
      C after S, Z
      at onset except before A, H, K, L, O, Q, R, U, X
      not before A, H, K, O, Q, U, X
      D, T before C, S, Z
      X after C, K, Q

      Example:

      "Müller-Lüdenscheidt" => "MULLERLUDENSCHEIDT" => "6005507500206880022"
    • Step 2:

      Collapse of all multiple consecutive code digits.

      Example:

      "6005507500206880022" => "6050750206802"
    • Step 3:

      Removal of all codes "0" except at the beginning. This means that two or more identical consecutive digits can occur if they occur after removing the "0" digits.

      Example:

      "6050750206802" => "65752682"

    This class is thread-safe.

    Since:
    1.5
    See Also:
    Wikipedia (de): Kölner Phonetik (in German)
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static char[] AEIJOUY  
      private static char[] AHKLOQRUX  
      private static char[] AHKOQUX  
      private static char CHAR_IGNORE  
      private static char[] CKQ  
      private static char[] CSZ  
      private static char[] DTX  
      private static char[] FPVW  
      private static char[] GKQ  
      private static char[] SZ  
    • Constructor Summary

      Constructors 
      Constructor Description
      ColognePhonetic()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private static boolean arrayContains​(char[] arr, char key)  
      java.lang.String colognePhonetic​(java.lang.String text)
      Implements the Kölner Phonetik algorithm.
      java.lang.Object encode​(java.lang.Object object)
      Encodes an "Object" and returns the encoded content as an Object.
      java.lang.String encode​(java.lang.String text)
      Encodes a String and returns a String.
      boolean isEncodeEqual​(java.lang.String text1, java.lang.String text2)
      Compares the first encoded string to the second encoded string.
      private char[] preprocess​(java.lang.String text)
      Converts the string to upper case and replaces Germanic umlaut characters The following characters are mapped: capital A, umlaut mark capital U, umlaut mark capital O, umlaut mark small sharp s, German
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • AEIJOUY

        private static final char[] AEIJOUY
      • CSZ

        private static final char[] CSZ
      • FPVW

        private static final char[] FPVW
      • GKQ

        private static final char[] GKQ
      • CKQ

        private static final char[] CKQ
      • AHKLOQRUX

        private static final char[] AHKLOQRUX
      • SZ

        private static final char[] SZ
      • AHKOQUX

        private static final char[] AHKOQUX
      • DTX

        private static final char[] DTX
    • Constructor Detail

      • ColognePhonetic

        public ColognePhonetic()
    • Method Detail

      • arrayContains

        private static boolean arrayContains​(char[] arr,
                                             char key)
      • colognePhonetic

        public java.lang.String colognePhonetic​(java.lang.String text)

        Implements the Kölner Phonetik algorithm.

        In contrast to the initial description of the algorithm, this implementation does the encoding in one pass.

        Parameters:
        text - The source text to encode
        Returns:
        the corresponding encoding according to the Kölner Phonetik algorithm
      • encode

        public java.lang.Object encode​(java.lang.Object object)
                                throws EncoderException
        Description copied from interface: Encoder
        Encodes an "Object" and returns the encoded content as an Object. The Objects here may just be byte[] or Strings depending on the implementation used.
        Specified by:
        encode in interface Encoder
        Parameters:
        object - An object to encode
        Returns:
        An "encoded" Object
        Throws:
        EncoderException - An encoder exception is thrown if the encoder experiences a failure condition during the encoding process.
      • encode

        public java.lang.String encode​(java.lang.String text)
        Description copied from interface: StringEncoder
        Encodes a String and returns a String.
        Specified by:
        encode in interface StringEncoder
        Parameters:
        text - the String to encode
        Returns:
        the encoded String
      • isEncodeEqual

        public boolean isEncodeEqual​(java.lang.String text1,
                                     java.lang.String text2)
        Compares the first encoded string to the second encoded string.
        Parameters:
        text1 - source text to encode before testing for equality.
        text2 - source text to encode before testing for equality.
        Returns:
        true if the encoding the first string equals the encoding of the second string, false otherwise
      • preprocess

        private char[] preprocess​(java.lang.String text)
        Converts the string to upper case and replaces Germanic umlaut characters The following characters are mapped:
        • capital A, umlaut mark
        • capital U, umlaut mark
        • capital O, umlaut mark
        • small sharp s, German