Package org.apache.lucene.misc
Class HighFreqTerms
- java.lang.Object
-
- org.apache.lucene.misc.HighFreqTerms
-
public class HighFreqTerms extends Object
HighFreqTermsclass extracts the top n most frequent terms (by document frequency ) from an existing Lucene index and reports their document frequency. If used with the -t flag it also reports their total tf (total number of occurences) in order of highest total tf
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULTnumTermsstatic intnumTerms
-
Constructor Summary
Constructors Constructor Description HighFreqTerms()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.lucene.misc.TermStats[]getHighFreqTerms(IndexReader reader, int numTerms, String field)static longgetTotalTermFreq(IndexReader reader, Term term)static voidmain(String[] args)static org.apache.lucene.misc.TermStats[]sortByTotalTermFreq(IndexReader reader, org.apache.lucene.misc.TermStats[] terms)Takes array of TermStats.
-
-
-
Field Detail
-
DEFAULTnumTerms
public static final int DEFAULTnumTerms
- See Also:
- Constant Field Values
-
numTerms
public static int numTerms
-
-
Method Detail
-
getHighFreqTerms
public static org.apache.lucene.misc.TermStats[] getHighFreqTerms(IndexReader reader, int numTerms, String field) throws Exception
- Parameters:
reader-numTerms-field-- Returns:
- TermStats[] ordered by terms with highest docFreq first.
- Throws:
Exception
-
sortByTotalTermFreq
public static org.apache.lucene.misc.TermStats[] sortByTotalTermFreq(IndexReader reader, org.apache.lucene.misc.TermStats[] terms) throws Exception
Takes array of TermStats. For each term looks up the tf for each doc containing the term and stores the total in the output array of TermStats. Output array is sorted by highest total tf.- Parameters:
reader-terms- TermStats[]- Returns:
- TermStats[]
- Throws:
Exception
-
getTotalTermFreq
public static long getTotalTermFreq(IndexReader reader, Term term) throws Exception
- Throws:
Exception
-
-