Closeable, AutoCloseablepublic final class CJKBigramFilter
extends org.apache.lucene.analysis.TokenFilter
CJK types are set by these tokenizers, but you can also use
CJKBigramFilter(TokenStream, int) to explicitly control which
of the CJK scripts are turned into bigrams.
In all cases, all non-CJK input is passed thru unmodified.
| Modifier and Type | Field | Description |
|---|---|---|
static String |
DOUBLE_TYPE |
when we emit a bigram, its then marked as this type
|
static int |
HAN |
bigram flag for Han Ideographs
|
static int |
HANGUL |
bigram flag for Hangul
|
static int |
HIRAGANA |
bigram flag for Hiragana
|
static int |
KATAKANA |
bigram flag for Katakana
|
static String |
SINGLE_TYPE |
when we emit a unigram, its then marked as this type
|
| Constructor | Description |
|---|---|
CJKBigramFilter(org.apache.lucene.analysis.TokenStream in) |
|
CJKBigramFilter(org.apache.lucene.analysis.TokenStream in,
int flags) |
Create a new CJKBigramFilter, specifying which writing systems should be bigrammed.
|
| Modifier and Type | Method | Description |
|---|---|---|
boolean |
incrementToken() |
|
void |
reset() |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringpublic static final int HAN
public static final int HIRAGANA
public static final int KATAKANA
public static final int HANGUL
public static final String DOUBLE_TYPE
public static final String SINGLE_TYPE
public CJKBigramFilter(org.apache.lucene.analysis.TokenStream in)
public boolean incrementToken()
throws IOException
incrementToken in class org.apache.lucene.analysis.TokenStreamIOExceptionpublic void reset()
throws IOException
reset in class org.apache.lucene.analysis.TokenFilterIOExceptionCopyright © 2000-2018 Apache Software Foundation. All Rights Reserved.