Class ClassicAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.ReusableAnalyzerBase
-
- org.apache.lucene.analysis.StopwordAnalyzerBase
-
- org.apache.lucene.analysis.standard.ClassicAnalyzer
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public final class ClassicAnalyzer extends StopwordAnalyzerBase
FiltersClassicTokenizerwithClassicFilter,LowerCaseFilterandStopFilter, using a list of English stop words.You must specify the required
Versioncompatibility when creating ClassicAnalyzer:- As of 3.1, StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords
- As of 2.9, StopFilter preserves position increments
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)
StandardAnalyzerimplements Unicode text segmentation, as specified by UAX#29.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
ReusableAnalyzerBase.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_MAX_TOKEN_LENGTHDefault maximum allowed token lengthstatic Set<?>STOP_WORDS_SETAn unmodifiable set containing some common English words that are usually not useful for searching.-
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
matchVersion, stopwords
-
-
Constructor Summary
Constructors Constructor Description ClassicAnalyzer(Version matchVersion)Builds an analyzer with the default stop words (STOP_WORDS_SET).ClassicAnalyzer(Version matchVersion, File stopwords)Deprecated.UseClassicAnalyzer(Version, Reader)instead.ClassicAnalyzer(Version matchVersion, Reader stopwords)Builds an analyzer with the stop words from the given reader.ClassicAnalyzer(Version matchVersion, Set<?> stopWords)Builds an analyzer with the given stop words.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected ReusableAnalyzerBase.TokenStreamComponentscreateComponents(String fieldName, Reader reader)Creates a newReusableAnalyzerBase.TokenStreamComponentsinstance for this analyzer.intgetMaxTokenLength()voidsetMaxTokenLength(int length)Set maximum allowed token length.-
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
-
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
initReader, reusableTokenStream, tokenStream
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
-
-
-
-
Field Detail
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length- See Also:
- Constant Field Values
-
STOP_WORDS_SET
public static final Set<?> STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.
-
-
Constructor Detail
-
ClassicAnalyzer
public ClassicAnalyzer(Version matchVersion, Set<?> stopWords)
Builds an analyzer with the given stop words.- Parameters:
matchVersion- Lucene version to match See {@link above}stopWords- stop words
-
ClassicAnalyzer
public ClassicAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (STOP_WORDS_SET).- Parameters:
matchVersion- Lucene version to match See {@link above}
-
ClassicAnalyzer
@Deprecated public ClassicAnalyzer(Version matchVersion, File stopwords) throws IOException
Deprecated.UseClassicAnalyzer(Version, Reader)instead.Builds an analyzer with the stop words from the given file.- Parameters:
matchVersion- Lucene version to match See {@link above}stopwords- File to read stop words from- Throws:
IOException- See Also:
WordlistLoader.getWordSet(Reader, Version)
-
ClassicAnalyzer
public ClassicAnalyzer(Version matchVersion, Reader stopwords) throws IOException
Builds an analyzer with the stop words from the given reader.- Parameters:
matchVersion- Lucene version to match See {@link above}stopwords- Reader to read stop words from- Throws:
IOException- See Also:
WordlistLoader.getWordSet(Reader, Version)
-
-
Method Detail
-
setMaxTokenLength
public void setMaxTokenLength(int length)
Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or reusableTokenStream is called.
-
getMaxTokenLength
public int getMaxTokenLength()
- See Also:
setMaxTokenLength(int)
-
createComponents
protected ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName, Reader reader)
Description copied from class:ReusableAnalyzerBaseCreates a newReusableAnalyzerBase.TokenStreamComponentsinstance for this analyzer.- Specified by:
createComponentsin classReusableAnalyzerBase- Parameters:
fieldName- the name of the fields content passed to theReusableAnalyzerBase.TokenStreamComponentssink as a readerreader- the reader passed to theTokenizerconstructor- Returns:
- the
ReusableAnalyzerBase.TokenStreamComponentsfor this analyzer.
-
-