Uses of Class
org.apache.lucene.analysis.Tokenizer
-
Packages that use Tokenizer Package Description org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens.org.apache.lucene.analysis.ar Analyzer for Arabic.org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).org.apache.lucene.analysis.cn Analyzer for Chinese, which indexes unigrams (individual chinese characters).org.apache.lucene.analysis.cn.smart Analyzer for Simplified Chinese, which indexes words.org.apache.lucene.analysis.icu.segmentation Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.org.apache.lucene.analysis.in Analysis components for Indian languages.org.apache.lucene.analysis.ja Analyzer for Japanese.org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters.org.apache.lucene.analysis.path Analysis components for path-like strings such as filenames.org.apache.lucene.analysis.ru Analyzer for Russian.org.apache.lucene.analysis.standard Standards-based analyzers implemented with JFlex.org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax. -
-
Uses of Tokenizer in org.apache.lucene.analysis
Subclasses of Tokenizer in org.apache.lucene.analysis Modifier and Type Class Description classCharTokenizerAn abstract base class for simple, character-oriented tokenizers.classEmptyTokenizerEmits no tokensclassKeywordTokenizerEmits the entire input as a single token.classLetterTokenizerA LetterTokenizer is a tokenizer that divides text at non-letters.classLowerCaseTokenizerLowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.classMockTokenizerTokenizer for testing.classWhitespaceTokenizerA WhitespaceTokenizer is a tokenizer that divides text at whitespace.Fields in org.apache.lucene.analysis declared as Tokenizer Modifier and Type Field Description protected TokenizerReusableAnalyzerBase.TokenStreamComponents. sourceConstructors in org.apache.lucene.analysis with parameters of type Tokenizer Constructor Description TokenStreamComponents(Tokenizer source)Creates a newReusableAnalyzerBase.TokenStreamComponentsinstance.TokenStreamComponents(Tokenizer source, TokenStream result)Creates a newReusableAnalyzerBase.TokenStreamComponentsinstance. -
Uses of Tokenizer in org.apache.lucene.analysis.ar
Subclasses of Tokenizer in org.apache.lucene.analysis.ar Modifier and Type Class Description classArabicLetterTokenizerDeprecated.(3.1) UseStandardTokenizerinstead. -
Uses of Tokenizer in org.apache.lucene.analysis.cjk
Subclasses of Tokenizer in org.apache.lucene.analysis.cjk Modifier and Type Class Description classCJKTokenizerDeprecated.Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead. -
Uses of Tokenizer in org.apache.lucene.analysis.cn
Subclasses of Tokenizer in org.apache.lucene.analysis.cn Modifier and Type Class Description classChineseTokenizerDeprecated.UseStandardTokenizerinstead, which has the same functionality. -
Uses of Tokenizer in org.apache.lucene.analysis.cn.smart
Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smart Modifier and Type Class Description classSentenceTokenizerTokenizes input text into sentences. -
Uses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
Subclasses of Tokenizer in org.apache.lucene.analysis.icu.segmentation Modifier and Type Class Description classICUTokenizerBreaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/) -
Uses of Tokenizer in org.apache.lucene.analysis.in
Subclasses of Tokenizer in org.apache.lucene.analysis.in Modifier and Type Class Description classIndicTokenizerDeprecated.(3.6) UseStandardTokenizerinstead. -
Uses of Tokenizer in org.apache.lucene.analysis.ja
Subclasses of Tokenizer in org.apache.lucene.analysis.ja Modifier and Type Class Description classJapaneseTokenizerTokenizer for Japanese that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ngram
Subclasses of Tokenizer in org.apache.lucene.analysis.ngram Modifier and Type Class Description classEdgeNGramTokenizerTokenizes the input from an edge into n-grams of given size(s).classNGramTokenizerTokenizes the input into n-grams of the given size(s). -
Uses of Tokenizer in org.apache.lucene.analysis.path
Subclasses of Tokenizer in org.apache.lucene.analysis.path Modifier and Type Class Description classPathHierarchyTokenizerTokenizer for path-like hierarchies.classReversePathHierarchyTokenizerTokenizer for domain-like hierarchies. -
Uses of Tokenizer in org.apache.lucene.analysis.ru
Subclasses of Tokenizer in org.apache.lucene.analysis.ru Modifier and Type Class Description classRussianLetterTokenizerDeprecated.UseStandardTokenizerinstead, which has the same functionality. -
Uses of Tokenizer in org.apache.lucene.analysis.standard
Subclasses of Tokenizer in org.apache.lucene.analysis.standard Modifier and Type Class Description classClassicTokenizerA grammar-based tokenizer constructed with JFlexclassStandardTokenizerA grammar-based tokenizer constructed with JFlex.classUAX29URLEmailTokenizerThis class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs. -
Uses of Tokenizer in org.apache.lucene.analysis.wikipedia
Subclasses of Tokenizer in org.apache.lucene.analysis.wikipedia Modifier and Type Class Description classWikipediaTokenizerExtension of StandardTokenizer that is aware of Wikipedia syntax.
-