Class PhoneticEngine


  • public class PhoneticEngine
    extends java.lang.Object
    Converts words into potential phonetic representations.

    This is a two-stage process. Firstly, the word is converted into a phonetic representation that takes into account the likely source language. Next, this phonetic representation is converted into a pan-European 'average' representation, allowing comparison between different versions of essentially the same word from different languages.

    This class is intentionally immutable and thread-safe. If you wish to alter the settings for a PhoneticEngine, you must make a new one with the updated settings.

    Ported from phoneticengine.php

    Since:
    1.6
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      (package private) static class  PhoneticEngine.PhonemeBuilder
      Utility for manipulating a set of phonemes as they are being built up.
      private static class  PhoneticEngine.RulesApplication
      A function closure capturing the application of a list of rules to an input sequence at a particular offset.
    • Constructor Summary

      Constructors 
      Constructor Description
      PhoneticEngine​(NameType nameType, RuleType ruleType, boolean concat)
      Generates a new, fully-configured phonetic engine.
      PhoneticEngine​(NameType nameType, RuleType ruleType, boolean concat, int maxPhonemes)
      Generates a new, fully-configured phonetic engine.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private PhoneticEngine.PhonemeBuilder applyFinalRules​(PhoneticEngine.PhonemeBuilder phonemeBuilder, java.util.Map<java.lang.String,​java.util.List<Rule>> finalRules)
      Applies the final rules to convert from a language-specific phonetic representation to a language-independent representation.
      java.lang.String encode​(java.lang.String input)
      Encodes a string to its phonetic representation.
      java.lang.String encode​(java.lang.String input, Languages.LanguageSet languageSet)
      Encodes an input string into an output phonetic representation, given a set of possible origin languages.
      Lang getLang()
      Gets the Lang language guessing rules being used.
      int getMaxPhonemes()
      Gets the maximum number of phonemes the engine will calculate for a given input.
      NameType getNameType()
      Gets the NameType being used.
      RuleType getRuleType()
      Gets the RuleType being used.
      boolean isConcat()
      Gets if multiple phonetic encodings are concatenated or if just the first one is kept.
      private static java.lang.String join​(java.lang.Iterable<java.lang.String> strings, java.lang.String sep)
      Joins some strings with an internal separator.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • NAME_PREFIXES

        private static final java.util.Map<NameType,​java.util.Set<java.lang.String>> NAME_PREFIXES
      • lang

        private final Lang lang
      • nameType

        private final NameType nameType
      • ruleType

        private final RuleType ruleType
      • concat

        private final boolean concat
      • maxPhonemes

        private final int maxPhonemes
    • Constructor Detail

      • PhoneticEngine

        public PhoneticEngine​(NameType nameType,
                              RuleType ruleType,
                              boolean concat)
        Generates a new, fully-configured phonetic engine.
        Parameters:
        nameType - the type of names it will use
        ruleType - the type of rules it will apply
        concat - if it will concatenate multiple encodings
      • PhoneticEngine

        public PhoneticEngine​(NameType nameType,
                              RuleType ruleType,
                              boolean concat,
                              int maxPhonemes)
        Generates a new, fully-configured phonetic engine.
        Parameters:
        nameType - the type of names it will use
        ruleType - the type of rules it will apply
        concat - if it will concatenate multiple encodings
        maxPhonemes - the maximum number of phonemes that will be handled
        Since:
        1.7
    • Method Detail

      • join

        private static java.lang.String join​(java.lang.Iterable<java.lang.String> strings,
                                             java.lang.String sep)
        Joins some strings with an internal separator.
        Parameters:
        strings - Strings to join
        sep - String to separate them with
        Returns:
        a single String consisting of each element of strings interleaved by sep
      • applyFinalRules

        private PhoneticEngine.PhonemeBuilder applyFinalRules​(PhoneticEngine.PhonemeBuilder phonemeBuilder,
                                                              java.util.Map<java.lang.String,​java.util.List<Rule>> finalRules)
        Applies the final rules to convert from a language-specific phonetic representation to a language-independent representation.
        Parameters:
        phonemeBuilder - the current phonemes
        finalRules - the final rules to apply
        Returns:
        the resulting phonemes
      • encode

        public java.lang.String encode​(java.lang.String input)
        Encodes a string to its phonetic representation.
        Parameters:
        input - the String to encode
        Returns:
        the encoding of the input
      • encode

        public java.lang.String encode​(java.lang.String input,
                                       Languages.LanguageSet languageSet)
        Encodes an input string into an output phonetic representation, given a set of possible origin languages.
        Parameters:
        input - String to phoneticise; a String with dashes or spaces separating each word
        languageSet - set of possible origin languages
        Returns:
        a phonetic representation of the input; a String containing '-'-separated phonetic representations of the input
      • getLang

        public Lang getLang()
        Gets the Lang language guessing rules being used.
        Returns:
        the Lang in use
      • getNameType

        public NameType getNameType()
        Gets the NameType being used.
        Returns:
        the NameType in use
      • getRuleType

        public RuleType getRuleType()
        Gets the RuleType being used.
        Returns:
        the RuleType in use
      • isConcat

        public boolean isConcat()
        Gets if multiple phonetic encodings are concatenated or if just the first one is kept.
        Returns:
        true if multiple phonetic encodings are returned, false if just the first is
      • getMaxPhonemes

        public int getMaxPhonemes()
        Gets the maximum number of phonemes the engine will calculate for a given input.
        Returns:
        the maximum number of phonemes
        Since:
        1.7