Package org.apache.lucene.analysis
Class WordlistLoader
- java.lang.Object
-
- org.apache.lucene.analysis.WordlistLoader
-
public class WordlistLoader extends Object
Loader for text files that represent a list of stopwords.- See Also:
to obtain instances- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
-
Constructor Summary
Constructors Constructor Description WordlistLoader()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static CharArraySetgetSnowballWordSet(Reader reader, CharArraySet result)Reads stopwords from a stopword list in Snowball format.static CharArraySetgetSnowballWordSet(Reader reader, Version matchVersion)Reads stopwords from a stopword list in Snowball format.static CharArrayMap<String>getStemDict(Reader reader, CharArrayMap<String> result)Reads a stem dictionary.static CharArraySetgetWordSet(Reader reader, String comment, CharArraySet result)Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace).static CharArraySetgetWordSet(Reader reader, String comment, Version matchVersion)Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace).static CharArraySetgetWordSet(Reader reader, CharArraySet result)Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace).static CharArraySetgetWordSet(Reader reader, Version matchVersion)Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace).
-
-
-
Method Detail
-
getWordSet
public static CharArraySet getWordSet(Reader reader, CharArraySet result) throws IOException
Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).- Parameters:
reader- Reader containing the wordlistresult- theCharArraySetto fill with the readers words- Returns:
- the given
CharArraySetwith the reader's words - Throws:
IOException
-
getWordSet
public static CharArraySet getWordSet(Reader reader, Version matchVersion) throws IOException
Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).- Parameters:
reader- Reader containing the wordlistmatchVersion- the LuceneVersion- Returns:
- A
CharArraySetwith the reader's words - Throws:
IOException
-
getWordSet
public static CharArraySet getWordSet(Reader reader, String comment, Version matchVersion) throws IOException
Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).- Parameters:
reader- Reader containing the wordlistcomment- The string representing a comment.matchVersion- the LuceneVersion- Returns:
- A CharArraySet with the reader's words
- Throws:
IOException
-
getWordSet
public static CharArraySet getWordSet(Reader reader, String comment, CharArraySet result) throws IOException
Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).- Parameters:
reader- Reader containing the wordlistcomment- The string representing a comment.result- theCharArraySetto fill with the readers words- Returns:
- the given
CharArraySetwith the reader's words - Throws:
IOException
-
getSnowballWordSet
public static CharArraySet getSnowballWordSet(Reader reader, CharArraySet result) throws IOException
Reads stopwords from a stopword list in Snowball format.The snowball format is the following:
- Lines may contain multiple words separated by whitespace.
- The comment character is the vertical line (|).
- Lines may contain trailing comments.
- Parameters:
reader- Reader containing a Snowball stopword listresult- theCharArraySetto fill with the readers words- Returns:
- the given
CharArraySetwith the reader's words - Throws:
IOException
-
getSnowballWordSet
public static CharArraySet getSnowballWordSet(Reader reader, Version matchVersion) throws IOException
Reads stopwords from a stopword list in Snowball format.The snowball format is the following:
- Lines may contain multiple words separated by whitespace.
- The comment character is the vertical line (|).
- Lines may contain trailing comments.
- Parameters:
reader- Reader containing a Snowball stopword listmatchVersion- the LuceneVersion- Returns:
- A
CharArraySetwith the reader's words - Throws:
IOException
-
getStemDict
public static CharArrayMap<String> getStemDict(Reader reader, CharArrayMap<String> result) throws IOException
Reads a stem dictionary. Each line contains:word\tstem
(i.e. two tab separated words)- Returns:
- stem dictionary that overrules the stemming algorithm
- Throws:
IOException
-
-