Class Rule
- java.lang.Object
-
- org.apache.commons.codec.language.bm.Rule
-
public class Rule extends java.lang.Object
A phoneme rule.Rules have a pattern, left context, right context, output phoneme, set of languages for which they apply and a logical flag indicating if all languages must be in play. A rule matches if:
- the pattern matches at the current position
- the string up until the beginning of the pattern matches the left context
- the string from the end of the pattern matches the right context
- logical is ALL and all languages are in scope; or
- logical is any other value and at least one language is in scope
Rules are typically generated by parsing rules resources. In normal use, there will be no need for the user to explicitly construct their own.
Rules are immutable and thread-safe.
Rules resources
Rules are typically loaded from resource files. These are UTF-8 encoded text files. They are systematically named following the pattern:
org/apache/commons/codec/language/bm/${NameType#getName}_${RuleType#getName}_${language}.txt
The format of these resources is the following:
- Rules: whitespace separated, double-quoted strings. There should be 4 columns to each row, and these
will be interpreted as:
- pattern
- left context
- right context
- phoneme
- End-of-line comments: Any occurrence of '//' will cause all text following on that line to be discarded as a comment.
- Multi-line comments: Any line starting with '/*' will start multi-line commenting mode. This will skip all content until a line ending in '*' and '/' is found.
- Blank lines: All blank lines will be skipped.
- Since:
- 1.6
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Rule.Phoneme
static interface
Rule.PhonemeExpr
static class
Rule.PhonemeList
static interface
Rule.RPattern
A minimal wrapper around the functionality of Pattern that we use, to allow for alternate implementations.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
ALL
static Rule.RPattern
ALL_STRINGS_RMATCHER
private static java.lang.String
DOUBLE_QUOTE
private static java.lang.String
HASH_INCLUDE
private Rule.RPattern
lContext
private java.lang.String
pattern
private Rule.PhonemeExpr
phoneme
private Rule.RPattern
rContext
private static java.util.Map<NameType,java.util.Map<RuleType,java.util.Map<java.lang.String,java.util.Map<java.lang.String,java.util.List<Rule>>>>>
RULES
-
Constructor Summary
Constructors Constructor Description Rule(java.lang.String pattern, java.lang.String lContext, java.lang.String rContext, Rule.PhonemeExpr phoneme)
Creates a new rule.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static boolean
contains(java.lang.CharSequence chars, char input)
private static java.lang.String
createResourceName(NameType nameType, RuleType rt, java.lang.String lang)
private static java.util.Scanner
createScanner(java.lang.String lang)
private static java.util.Scanner
createScanner(NameType nameType, RuleType rt, java.lang.String lang)
private static boolean
endsWith(java.lang.CharSequence input, java.lang.CharSequence suffix)
static java.util.List<Rule>
getInstance(NameType nameType, RuleType rt, java.lang.String lang)
Gets rules for a combination of name type, rule type and a single language.static java.util.List<Rule>
getInstance(NameType nameType, RuleType rt, Languages.LanguageSet langs)
Gets rules for a combination of name type, rule type and languages.static java.util.Map<java.lang.String,java.util.List<Rule>>
getInstanceMap(NameType nameType, RuleType rt, java.lang.String lang)
Gets rules for a combination of name type, rule type and a single language.static java.util.Map<java.lang.String,java.util.List<Rule>>
getInstanceMap(NameType nameType, RuleType rt, Languages.LanguageSet langs)
Gets rules for a combination of name type, rule type and languages.Rule.RPattern
getLContext()
Gets the left context.java.lang.String
getPattern()
Gets the pattern.Rule.PhonemeExpr
getPhoneme()
Gets the phoneme.Rule.RPattern
getRContext()
Gets the right context.private static Rule.Phoneme
parsePhoneme(java.lang.String ph)
private static Rule.PhonemeExpr
parsePhonemeExpr(java.lang.String ph)
private static java.util.Map<java.lang.String,java.util.List<Rule>>
parseRules(java.util.Scanner scanner, java.lang.String location)
private static Rule.RPattern
pattern(java.lang.String regex)
Attempts to compile the regex into direct string ops, falling back to Pattern and Matcher in the worst case.boolean
patternAndContextMatches(java.lang.CharSequence input, int i)
Decides if the pattern and context match the input starting at a position.private static boolean
startsWith(java.lang.CharSequence input, java.lang.CharSequence prefix)
private static java.lang.String
stripQuotes(java.lang.String str)
-
-
-
Field Detail
-
ALL_STRINGS_RMATCHER
public static final Rule.RPattern ALL_STRINGS_RMATCHER
-
ALL
public static final java.lang.String ALL
- See Also:
- Constant Field Values
-
DOUBLE_QUOTE
private static final java.lang.String DOUBLE_QUOTE
- See Also:
- Constant Field Values
-
HASH_INCLUDE
private static final java.lang.String HASH_INCLUDE
- See Also:
- Constant Field Values
-
RULES
private static final java.util.Map<NameType,java.util.Map<RuleType,java.util.Map<java.lang.String,java.util.Map<java.lang.String,java.util.List<Rule>>>>> RULES
-
lContext
private final Rule.RPattern lContext
-
pattern
private final java.lang.String pattern
-
phoneme
private final Rule.PhonemeExpr phoneme
-
rContext
private final Rule.RPattern rContext
-
-
Constructor Detail
-
Rule
public Rule(java.lang.String pattern, java.lang.String lContext, java.lang.String rContext, Rule.PhonemeExpr phoneme)
Creates a new rule.- Parameters:
pattern
- the patternlContext
- the left contextrContext
- the right contextphoneme
- the resulting phoneme
-
-
Method Detail
-
contains
private static boolean contains(java.lang.CharSequence chars, char input)
-
createResourceName
private static java.lang.String createResourceName(NameType nameType, RuleType rt, java.lang.String lang)
-
createScanner
private static java.util.Scanner createScanner(NameType nameType, RuleType rt, java.lang.String lang)
-
createScanner
private static java.util.Scanner createScanner(java.lang.String lang)
-
endsWith
private static boolean endsWith(java.lang.CharSequence input, java.lang.CharSequence suffix)
-
getInstance
public static java.util.List<Rule> getInstance(NameType nameType, RuleType rt, Languages.LanguageSet langs)
Gets rules for a combination of name type, rule type and languages.- Parameters:
nameType
- the NameType to considerrt
- the RuleType to considerlangs
- the set of languages to consider- Returns:
- a list of Rules that apply
-
getInstance
public static java.util.List<Rule> getInstance(NameType nameType, RuleType rt, java.lang.String lang)
Gets rules for a combination of name type, rule type and a single language.- Parameters:
nameType
- the NameType to considerrt
- the RuleType to considerlang
- the language to consider- Returns:
- a list of Rules that apply
-
getInstanceMap
public static java.util.Map<java.lang.String,java.util.List<Rule>> getInstanceMap(NameType nameType, RuleType rt, Languages.LanguageSet langs)
Gets rules for a combination of name type, rule type and languages.- Parameters:
nameType
- the NameType to considerrt
- the RuleType to considerlangs
- the set of languages to consider- Returns:
- a map containing all Rules that apply, grouped by the first character of the rule pattern
- Since:
- 1.9
-
getInstanceMap
public static java.util.Map<java.lang.String,java.util.List<Rule>> getInstanceMap(NameType nameType, RuleType rt, java.lang.String lang)
Gets rules for a combination of name type, rule type and a single language.- Parameters:
nameType
- the NameType to considerrt
- the RuleType to considerlang
- the language to consider- Returns:
- a map containing all Rules that apply, grouped by the first character of the rule pattern
- Since:
- 1.9
-
parsePhoneme
private static Rule.Phoneme parsePhoneme(java.lang.String ph)
-
parsePhonemeExpr
private static Rule.PhonemeExpr parsePhonemeExpr(java.lang.String ph)
-
parseRules
private static java.util.Map<java.lang.String,java.util.List<Rule>> parseRules(java.util.Scanner scanner, java.lang.String location)
-
pattern
private static Rule.RPattern pattern(java.lang.String regex)
Attempts to compile the regex into direct string ops, falling back to Pattern and Matcher in the worst case.- Parameters:
regex
- the regular expression to compile- Returns:
- an RPattern that will match this regex
-
startsWith
private static boolean startsWith(java.lang.CharSequence input, java.lang.CharSequence prefix)
-
stripQuotes
private static java.lang.String stripQuotes(java.lang.String str)
-
getLContext
public Rule.RPattern getLContext()
Gets the left context. This is a regular expression that must match to the left of the pattern.- Returns:
- the left context Pattern
-
getPattern
public java.lang.String getPattern()
Gets the pattern. This is a string-literal that must exactly match.- Returns:
- the pattern
-
getPhoneme
public Rule.PhonemeExpr getPhoneme()
Gets the phoneme. If the rule matches, this is the phoneme associated with the pattern match.- Returns:
- the phoneme
-
getRContext
public Rule.RPattern getRContext()
Gets the right context. This is a regular expression that must match to the right of the pattern.- Returns:
- the right context Pattern
-
patternAndContextMatches
public boolean patternAndContextMatches(java.lang.CharSequence input, int i)
Decides if the pattern and context match the input starting at a position. It is a match if thelContext
matchesinput
up toi
,pattern
matches at i andrContext
matches from the end of the match ofpattern
to the end ofinput
.- Parameters:
input
- the input Stringi
- the int position within the input- Returns:
- true if the pattern and left/right context match, false otherwise
-
-