Closeable, AutoCloseableCharTokenizer, ChineseTokenizer, CJKTokenizer, ClassicTokenizer, EdgeNGramTokenizer, EmptyTokenizer, ICUTokenizer, JapaneseTokenizer, KeywordTokenizer, MockTokenizer, NGramTokenizer, PathHierarchyTokenizer, ReversePathHierarchyTokenizer, SentenceTokenizer, StandardTokenizer, UAX29URLEmailTokenizer, WikipediaTokenizerpublic abstract class Tokenizer extends TokenStream
This is an abstract class; subclasses must override TokenStream.incrementToken()
NOTE: Subclasses overriding TokenStream.incrementToken() must
call AttributeSource.clearAttributes() before
setting attributes.
AttributeSource.AttributeFactory, AttributeSource.State| Modifier and Type | Field | Description |
|---|---|---|
protected Reader |
input |
The text source for this Tokenizer.
|
| Modifier | Constructor | Description |
|---|---|---|
protected |
Tokenizer() |
Deprecated.
use
Tokenizer(Reader) instead. |
protected |
Tokenizer(Reader input) |
Construct a token stream processing the given input.
|
protected |
Tokenizer(AttributeSource source) |
Deprecated.
use
Tokenizer(AttributeSource, Reader) instead. |
protected |
Tokenizer(AttributeSource.AttributeFactory factory) |
Deprecated.
use
Tokenizer(AttributeSource.AttributeFactory, Reader) instead. |
protected |
Tokenizer(AttributeSource.AttributeFactory factory,
Reader input) |
Construct a token stream processing the given input using the given AttributeFactory.
|
protected |
Tokenizer(AttributeSource source,
Reader input) |
Construct a token stream processing the given input using the given AttributeSource.
|
| Modifier and Type | Method | Description |
|---|---|---|
void |
close() |
By default, closes the input Reader.
|
protected int |
correctOffset(int currentOff) |
Return the corrected offset.
|
void |
reset(Reader input) |
Expert: Reset the tokenizer to a new reader.
|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringclone, finalize, getClass, notify, notifyAll, wait, wait, waitend, incrementToken, resetprotected Reader input
@Deprecated protected Tokenizer()
Tokenizer(Reader) instead.protected Tokenizer(Reader input)
@Deprecated protected Tokenizer(AttributeSource.AttributeFactory factory)
Tokenizer(AttributeSource.AttributeFactory, Reader) instead.protected Tokenizer(AttributeSource.AttributeFactory factory, Reader input)
@Deprecated protected Tokenizer(AttributeSource source)
Tokenizer(AttributeSource, Reader) instead.protected Tokenizer(AttributeSource source, Reader input)
public void close()
throws IOException
close in interface AutoCloseableclose in interface Closeableclose in class TokenStreamIOExceptionprotected final int correctOffset(int currentOff)
input is a CharStream subclass
this method calls CharStream.correctOffset(int), else returns currentOff.currentOff - offset as seen in the outputCharStream.correctOffset(int)public void reset(Reader input) throws IOException
IOExceptionCopyright © 2000-2018 Apache Software Foundation. All Rights Reserved.