Package org.apache.lucene.analysis.cn
Class ChineseTokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- org.apache.lucene.analysis.cn.ChineseTokenizer
-
- All Implemented Interfaces:
Closeable,AutoCloseable
@Deprecated public final class ChineseTokenizer extends org.apache.lucene.analysis.Tokenizer
Deprecated.UseStandardTokenizerinstead, which has the same functionality. This filter will be removed in Lucene 5.0Tokenize Chinese text as individual chinese characters.The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.
For example, if the Chinese text "C1C2C3C4" is to be indexed:
- The tokens returned from ChineseTokenizer are C1, C2, C3, C4.
- The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.
Therefore the index created by CJKTokenizer is much larger.
The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.
- Version:
- 1.0
-
-
Constructor Summary
Constructors Constructor Description ChineseTokenizer(Reader in)Deprecated.ChineseTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)Deprecated.ChineseTokenizer(org.apache.lucene.util.AttributeSource source, Reader in)Deprecated.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description voidend()Deprecated.booleanincrementToken()Deprecated.voidreset()Deprecated.voidreset(Reader input)Deprecated.-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Constructor Detail
-
ChineseTokenizer
public ChineseTokenizer(Reader in)
Deprecated.
-
ChineseTokenizer
public ChineseTokenizer(org.apache.lucene.util.AttributeSource source, Reader in)Deprecated.
-
ChineseTokenizer
public ChineseTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)Deprecated.
-
-
Method Detail
-
incrementToken
public boolean incrementToken() throws IOExceptionDeprecated.- Specified by:
incrementTokenin classorg.apache.lucene.analysis.TokenStream- Throws:
IOException
-
end
public final void end()
Deprecated.- Overrides:
endin classorg.apache.lucene.analysis.TokenStream
-
reset
public void reset() throws IOExceptionDeprecated.- Overrides:
resetin classorg.apache.lucene.analysis.TokenStream- Throws:
IOException
-
reset
public void reset(Reader input) throws IOException
Deprecated.- Overrides:
resetin classorg.apache.lucene.analysis.Tokenizer- Throws:
IOException
-
-