Package org.apache.lucene.analysis.cjk
Class CJKWidthCharFilter
- java.lang.Object
-
- java.io.Reader
-
- org.apache.lucene.analysis.CharFilter
-
- org.apache.lucene.analysis.charfilter.BaseCharFilter
-
- org.apache.lucene.analysis.cjk.CJKWidthCharFilter
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,java.lang.Readable
public class CJKWidthCharFilter extends BaseCharFilter
ACharFilter
that normalizes CJK width differences:- Folds fullwidth ASCII variants into the equivalent basic latin
- Folds halfwidth Katakana variants into the equivalent kana
NOTE: this char filter is the exact counterpart of
CJKWidthFilter
.
-
-
Field Summary
Fields Modifier and Type Field Description private static int
HW_KATAKANA_SEMI_VOICED_MARK
private static int
HW_KATAKANA_VOICED_MARK
private int
inputOff
private static byte[]
KANA_COMBINE_SEMI_VOICED
private static byte[]
KANA_COMBINE_VOICED
private static char[]
KANA_NORM
private int
prevChar
-
Fields inherited from class org.apache.lucene.analysis.CharFilter
input
-
-
Constructor Summary
Constructors Constructor Description CJKWidthCharFilter(java.io.Reader in)
Default constructor that takes aReader
.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private int
combineVoiceMark(int ch, int voiceMark)
returns combined char if we successfully combined the voice mark, otherwise original charint
read()
int
read(char[] cbuf, int off, int len)
-
Methods inherited from class org.apache.lucene.analysis.charfilter.BaseCharFilter
addOffCorrectMap, correct, getLastCumulativeDiff
-
Methods inherited from class org.apache.lucene.analysis.CharFilter
close, correctOffset
-
-
-
-
Field Detail
-
KANA_NORM
private static final char[] KANA_NORM
-
KANA_COMBINE_VOICED
private static final byte[] KANA_COMBINE_VOICED
-
KANA_COMBINE_SEMI_VOICED
private static final byte[] KANA_COMBINE_SEMI_VOICED
-
HW_KATAKANA_VOICED_MARK
private static final int HW_KATAKANA_VOICED_MARK
- See Also:
- Constant Field Values
-
HW_KATAKANA_SEMI_VOICED_MARK
private static final int HW_KATAKANA_SEMI_VOICED_MARK
- See Also:
- Constant Field Values
-
prevChar
private int prevChar
-
inputOff
private int inputOff
-
-
Method Detail
-
read
public int read() throws java.io.IOException
- Overrides:
read
in classjava.io.Reader
- Throws:
java.io.IOException
-
combineVoiceMark
private int combineVoiceMark(int ch, int voiceMark)
returns combined char if we successfully combined the voice mark, otherwise original char
-
read
public int read(char[] cbuf, int off, int len) throws java.io.IOException
- Specified by:
read
in classjava.io.Reader
- Throws:
java.io.IOException
-
-