Class Rule


  • public class Rule
    extends java.lang.Object
    A phoneme rule.

    Rules have a pattern, left context, right context, output phoneme, set of languages for which they apply and a logical flag indicating if all languages must be in play. A rule matches if:

    • the pattern matches at the current position
    • the string up until the beginning of the pattern matches the left context
    • the string from the end of the pattern matches the right context
    • logical is ALL and all languages are in scope; or
    • logical is any other value and at least one language is in scope

    Rules are typically generated by parsing rules resources. In normal use, there will be no need for the user to explicitly construct their own.

    Rules are immutable and thread-safe.

    Rules resources

    Rules are typically loaded from resource files. These are UTF-8 encoded text files. They are systematically named following the pattern:

    org/apache/commons/codec/language/bm/${NameType#getName}_${RuleType#getName}_${language}.txt

    The format of these resources is the following:

    • Rules: whitespace separated, double-quoted strings. There should be 4 columns to each row, and these will be interpreted as:
      1. pattern
      2. left context
      3. right context
      4. phoneme
    • End-of-line comments: Any occurrence of '//' will cause all text following on that line to be discarded as a comment.
    • Multi-line comments: Any line starting with '/*' will start multi-line commenting mode. This will skip all content until a line ending in '*' and '/' is found.
    • Blank lines: All blank lines will be skipped.
    Since:
    1.6
    • Constructor Detail

      • Rule

        public Rule​(java.lang.String pattern,
                    java.lang.String lContext,
                    java.lang.String rContext,
                    Rule.PhonemeExpr phoneme)
        Creates a new rule.
        Parameters:
        pattern - the pattern
        lContext - the left context
        rContext - the right context
        phoneme - the resulting phoneme
    • Method Detail

      • contains

        private static boolean contains​(java.lang.CharSequence chars,
                                        char input)
      • createResourceName

        private static java.lang.String createResourceName​(NameType nameType,
                                                           RuleType rt,
                                                           java.lang.String lang)
      • createScanner

        private static java.util.Scanner createScanner​(NameType nameType,
                                                       RuleType rt,
                                                       java.lang.String lang)
      • createScanner

        private static java.util.Scanner createScanner​(java.lang.String lang)
      • endsWith

        private static boolean endsWith​(java.lang.CharSequence input,
                                        java.lang.CharSequence suffix)
      • getInstance

        public static java.util.List<Rule> getInstance​(NameType nameType,
                                                       RuleType rt,
                                                       Languages.LanguageSet langs)
        Gets rules for a combination of name type, rule type and languages.
        Parameters:
        nameType - the NameType to consider
        rt - the RuleType to consider
        langs - the set of languages to consider
        Returns:
        a list of Rules that apply
      • getInstance

        public static java.util.List<Rule> getInstance​(NameType nameType,
                                                       RuleType rt,
                                                       java.lang.String lang)
        Gets rules for a combination of name type, rule type and a single language.
        Parameters:
        nameType - the NameType to consider
        rt - the RuleType to consider
        lang - the language to consider
        Returns:
        a list of Rules that apply
      • getInstanceMap

        public static java.util.Map<java.lang.String,​java.util.List<Rule>> getInstanceMap​(NameType nameType,
                                                                                                RuleType rt,
                                                                                                Languages.LanguageSet langs)
        Gets rules for a combination of name type, rule type and languages.
        Parameters:
        nameType - the NameType to consider
        rt - the RuleType to consider
        langs - the set of languages to consider
        Returns:
        a map containing all Rules that apply, grouped by the first character of the rule pattern
        Since:
        1.9
      • getInstanceMap

        public static java.util.Map<java.lang.String,​java.util.List<Rule>> getInstanceMap​(NameType nameType,
                                                                                                RuleType rt,
                                                                                                java.lang.String lang)
        Gets rules for a combination of name type, rule type and a single language.
        Parameters:
        nameType - the NameType to consider
        rt - the RuleType to consider
        lang - the language to consider
        Returns:
        a map containing all Rules that apply, grouped by the first character of the rule pattern
        Since:
        1.9
      • parsePhoneme

        private static Rule.Phoneme parsePhoneme​(java.lang.String ph)
      • parsePhonemeExpr

        private static Rule.PhonemeExpr parsePhonemeExpr​(java.lang.String ph)
      • parseRules

        private static java.util.Map<java.lang.String,​java.util.List<Rule>> parseRules​(java.util.Scanner scanner,
                                                                                             java.lang.String location)
      • pattern

        private static Rule.RPattern pattern​(java.lang.String regex)
        Attempts to compile the regex into direct string ops, falling back to Pattern and Matcher in the worst case.
        Parameters:
        regex - the regular expression to compile
        Returns:
        an RPattern that will match this regex
      • startsWith

        private static boolean startsWith​(java.lang.CharSequence input,
                                          java.lang.CharSequence prefix)
      • stripQuotes

        private static java.lang.String stripQuotes​(java.lang.String str)
      • getLContext

        public Rule.RPattern getLContext()
        Gets the left context. This is a regular expression that must match to the left of the pattern.
        Returns:
        the left context Pattern
      • getPattern

        public java.lang.String getPattern()
        Gets the pattern. This is a string-literal that must exactly match.
        Returns:
        the pattern
      • getPhoneme

        public Rule.PhonemeExpr getPhoneme()
        Gets the phoneme. If the rule matches, this is the phoneme associated with the pattern match.
        Returns:
        the phoneme
      • getRContext

        public Rule.RPattern getRContext()
        Gets the right context. This is a regular expression that must match to the right of the pattern.
        Returns:
        the right context Pattern
      • patternAndContextMatches

        public boolean patternAndContextMatches​(java.lang.CharSequence input,
                                                int i)
        Decides if the pattern and context match the input starting at a position. It is a match if the lContext matches input up to i, pattern matches at i and rContext matches from the end of the match of pattern to the end of input.
        Parameters:
        input - the input String
        i - the int position within the input
        Returns:
        true if the pattern and left/right context match, false otherwise