Package org.apache.lucene.analysis
Class MockTokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- org.apache.lucene.analysis.MockTokenizer
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public class MockTokenizer extends org.apache.lucene.analysis.TokenizerTokenizer for testing.This tokenizer is a replacement for
WHITESPACE,SIMPLE, andKEYWORDtokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:- An internal state-machine is used for checking consumer consistency. These checks can
be disabled with
setEnableChecks(boolean). - For convenience, optionally lowercases terms that it outputs.
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_MAX_TOKEN_LENGTHstatic intKEYWORDActs Similar to KeywordTokenizer.static intSIMPLEActs like LetterTokenizer.static intWHITESPACEActs Similar to WhitespaceTokenizer
-
Constructor Summary
Constructors Constructor Description MockTokenizer(Reader input)MockTokenizer(Reader input, int pattern, boolean lowerCase)MockTokenizer(Reader input, int pattern, boolean lowerCase, int maxTokenLength)MockTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader input, int pattern, boolean lowerCase, int maxTokenLength)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()voidend()booleanincrementToken()protected booleanisTokenChar(int c)protected intnormalize(int c)protected intreadCodePoint()voidreset()voidreset(Reader input)voidsetEnableChecks(boolean enableChecks)Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Field Detail
-
WHITESPACE
public static final int WHITESPACE
Acts Similar to WhitespaceTokenizer- See Also:
- Constant Field Values
-
KEYWORD
public static final int KEYWORD
Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader...- See Also:
- Constant Field Values
-
SIMPLE
public static final int SIMPLE
Acts like LetterTokenizer.- See Also:
- Constant Field Values
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
MockTokenizer
public MockTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader input, int pattern, boolean lowerCase, int maxTokenLength)
-
MockTokenizer
public MockTokenizer(Reader input, int pattern, boolean lowerCase, int maxTokenLength)
-
MockTokenizer
public MockTokenizer(Reader input, int pattern, boolean lowerCase)
-
MockTokenizer
public MockTokenizer(Reader input)
-
-
Method Detail
-
incrementToken
public final boolean incrementToken() throws IOException- Specified by:
incrementTokenin classorg.apache.lucene.analysis.TokenStream- Throws:
IOException
-
readCodePoint
protected int readCodePoint() throws IOException- Throws:
IOException
-
isTokenChar
protected boolean isTokenChar(int c)
-
normalize
protected int normalize(int c)
-
reset
public void reset() throws IOException- Overrides:
resetin classorg.apache.lucene.analysis.TokenStream- Throws:
IOException
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classorg.apache.lucene.analysis.Tokenizer- Throws:
IOException
-
reset
public void reset(Reader input) throws IOException
- Overrides:
resetin classorg.apache.lucene.analysis.Tokenizer- Throws:
IOException
-
end
public void end() throws IOException- Overrides:
endin classorg.apache.lucene.analysis.TokenStream- Throws:
IOException
-
setEnableChecks
public void setEnableChecks(boolean enableChecks)
Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.
-
-