Class DirectoryTaxonomyReader
- java.lang.Object
-
- org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader
-
- All Implemented Interfaces:
Closeable,AutoCloseable,TaxonomyReader
public class DirectoryTaxonomyReader extends Object implements TaxonomyReader
ATaxonomyReaderwhich retrieves stored taxonomy information from aDirectory.Reading from the on-disk index on every method call is too slow, so this implementation employs caching: Some methods cache recent requests and their results, while other methods prefetch all the data into memory and then provide answers directly from in-memory tables. See the documentation of individual methods for comments on their performance.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.lucene.facet.taxonomy.TaxonomyReader
TaxonomyReader.ChildrenArrays
-
-
Field Summary
-
Fields inherited from interface org.apache.lucene.facet.taxonomy.TaxonomyReader
INVALID_ORDINAL, ROOT_ORDINAL
-
-
Constructor Summary
Constructors Constructor Description DirectoryTaxonomyReader(org.apache.lucene.store.Directory directory)Open for reading a taxonomy stored in a givenDirectory.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()voiddecRef()Expert: decreases the refCount of this TaxonomyReader instance.protected voidensureOpen()TaxonomyReader.ChildrenArraysgetChildrenArrays()getChildrenArrays() returns aTaxonomyReader.ChildrenArraysobject which can be used together to efficiently enumerate the children of any category.Map<String,String>getCommitUserData()Retrieve user committed data.intgetOrdinal(CategoryPath categoryPath)getOrdinal() returns the ordinal of the category given as a path.intgetParent(int ordinal)getParent() returns the ordinal of the parent category of the category with the given ordinal.int[]getParentArray()getParentArray() returns an int array of size getSize() listing the ordinal of the parent category of each category in the taxonomy.CategoryPathgetPath(int ordinal)getPath() returns the path name of the category with the given ordinal.booleangetPath(int ordinal, CategoryPath result)getPath() returns the path name of the category with the given ordinal.intgetRefCount()Expert: returns the current refCount for this taxonomy readerintgetSize()getSize() returns the number of categories in the taxonomy.voidincRef()Expert: increments the refCount of this TaxonomyReader instance.protected org.apache.lucene.index.IndexReaderopenIndexReader(org.apache.lucene.store.Directory directory)booleanrefresh()refresh() re-reads the taxonomy information if there were any changes to the taxonomy since this instance was opened or last refreshed.voidsetCacheSize(int size)setCacheSize controls the maximum allowed size of each of the caches used bygetPath(int)andgetOrdinal(CategoryPath).voidsetDelimiter(char delimiter)setDelimiter changes the character that the taxonomy uses in its internal storage as a delimiter between category components.StringtoString(int max)
-
-
-
Constructor Detail
-
DirectoryTaxonomyReader
public DirectoryTaxonomyReader(org.apache.lucene.store.Directory directory) throws IOExceptionOpen for reading a taxonomy stored in a givenDirectory.- Parameters:
directory- TheDirectoryin which to the taxonomy lives. Note that the taxonomy is read directly to that directory (not from a subdirectory of it).- Throws:
org.apache.lucene.index.CorruptIndexException- if the Taxonomy is corrupted.IOException- if another error occurred.
-
-
Method Detail
-
openIndexReader
protected org.apache.lucene.index.IndexReader openIndexReader(org.apache.lucene.store.Directory directory) throws org.apache.lucene.index.CorruptIndexException, IOException- Throws:
org.apache.lucene.index.CorruptIndexExceptionIOException
-
ensureOpen
protected final void ensureOpen() throws org.apache.lucene.store.AlreadyClosedException- Throws:
org.apache.lucene.store.AlreadyClosedException- if this IndexReader is closed
-
setCacheSize
public void setCacheSize(int size)
setCacheSize controls the maximum allowed size of each of the caches used bygetPath(int)andgetOrdinal(CategoryPath).Currently, if the given size is smaller than the current size of a cache, it will not shrink, and rather we be limited to its current size.
- Parameters:
size- the new maximum cache size, in number of entries.
-
setDelimiter
public void setDelimiter(char delimiter)
setDelimiter changes the character that the taxonomy uses in its internal storage as a delimiter between category components. Do not use this method unless you really know what you are doing.If you do use this method, make sure you call it before any other methods that actually queries the taxonomy. Moreover, make sure you always pass the same delimiter for all LuceneTaxonomyWriter and LuceneTaxonomyReader objects you create.
-
getOrdinal
public int getOrdinal(CategoryPath categoryPath) throws IOException
Description copied from interface:TaxonomyReadergetOrdinal() returns the ordinal of the category given as a path. The ordinal is the category's serial number, an integer which starts with 0 and grows as more categories are added (note that once a category is added, it can never be deleted).If the given category wasn't found in the taxonomy, INVALID_ORDINAL is returned.
- Specified by:
getOrdinalin interfaceTaxonomyReader- Throws:
IOException
-
getPath
public CategoryPath getPath(int ordinal) throws org.apache.lucene.index.CorruptIndexException, IOException
Description copied from interface:TaxonomyReadergetPath() returns the path name of the category with the given ordinal. The path is returned as a new CategoryPath object - to reuse an existing object, useTaxonomyReader.getPath(int, CategoryPath).A null is returned if a category with the given ordinal does not exist.
- Specified by:
getPathin interfaceTaxonomyReader- Throws:
org.apache.lucene.index.CorruptIndexExceptionIOException
-
getPath
public boolean getPath(int ordinal, CategoryPath result) throws org.apache.lucene.index.CorruptIndexException, IOExceptionDescription copied from interface:TaxonomyReadergetPath() returns the path name of the category with the given ordinal. The path is written to the given CategoryPath object (which is cleared first).If a category with the given ordinal does not exist, the given CategoryPath object is not modified, and the method returns
false. Otherwise, the method returnstrue.- Specified by:
getPathin interfaceTaxonomyReader- Throws:
org.apache.lucene.index.CorruptIndexExceptionIOException
-
getParent
public int getParent(int ordinal)
Description copied from interface:TaxonomyReadergetParent() returns the ordinal of the parent category of the category with the given ordinal.When a category is specified as a path name, finding the path of its parent is as trivial as dropping the last component of the path. getParent() is functionally equivalent to calling getPath() on the given ordinal, dropping the last component of the path, and then calling getOrdinal() to get an ordinal back. However, implementations are expected to provide a much more efficient implementation:
getParent() should be a very quick method, as it is used during the facet aggregation process in faceted search. Implementations will most likely want to serve replies to this method from a pre-filled cache.
If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned. If the given ordinal is a top-level category, the ROOT_ORDINAL is returned. If an invalid ordinal is given (negative or beyond the last available ordinal), an ArrayIndexOutOfBoundsException is thrown. However, it is expected that getParent will only be called for ordinals which are already known to be in the taxonomy.
- Specified by:
getParentin interfaceTaxonomyReader
-
getParentArray
public int[] getParentArray()
getParentArray() returns an int array of size getSize() listing the ordinal of the parent category of each category in the taxonomy.The caller can hold on to the array it got indefinitely - it is guaranteed that no-one else will modify it. The other side of the same coin is that the caller must treat the array it got as read-only and not modify it, because other callers might have gotten the same array too, and getParent() calls are also answered from the same array.
The getParentArray() call is extremely efficient, merely returning a reference to an array that already exists. For a caller that plans to call getParent() for many categories, using getParentArray() and the array it returns is a somewhat faster approach because it avoids the overhead of method calls and volatile dereferencing.
If you use getParentArray() instead of getParent(), remember that the array you got is (naturally) not modified after a refresh(), so you should always call getParentArray() again after a refresh().
- Specified by:
getParentArrayin interfaceTaxonomyReader
-
refresh
public boolean refresh() throws IOException, InconsistentTaxonomyExceptionDescription copied from interface:TaxonomyReaderrefresh() re-reads the taxonomy information if there were any changes to the taxonomy since this instance was opened or last refreshed. Calling refresh() is more efficient than close()ing the old instance and opening a new one.If there were no changes since this instance was opened or last refreshed, then this call does nothing. Note, however, that this is still a relatively slow method (as it needs to verify whether there have been any changes on disk to the taxonomy), so it should not be called too often needlessly. In faceted search, the taxonomy reader's refresh() should be called only after a reopen() of the main index.
Refreshing the taxonomy might fail in some cases, for example if the taxonomy was recreated since this instance was opened or last refreshed. In this case an
InconsistentTaxonomyExceptionis thrown, suggesting that in order to obtain up-to-date taxonomy data a newTaxonomyReadershould be opened. Note: ThisTaxonomyReaderinstance remains unchanged and usable in this case, and the application can continue to use it, and should stillCloseable.close()when no longer needed.It should be noted that refresh() is similar in purpose to IndexReader.reopen(), but the two methods behave differently. refresh() refreshes the existing TaxonomyReader object, rather than opening a new one in addition to the old one as reopen() does. The reason is that in a taxonomy, one can only add new categories and cannot modify or delete existing categories; Therefore, there is no reason to keep an old snapshot of the taxonomy open - refreshing the taxonomy to the newest data and using this new snapshots in all threads (whether new or old) is fine. This saves us needing to keep multiple copies of the taxonomy open in memory.
- Specified by:
refreshin interfaceTaxonomyReader- Returns:
- true if anything has changed, false otherwise.
- Throws:
IOExceptionInconsistentTaxonomyException
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
getSize
public int getSize()
Description copied from interface:TaxonomyReadergetSize() returns the number of categories in the taxonomy.Because categories are numbered consecutively starting with 0, it means the taxonomy contains ordinals 0 through getSize()-1.
Note that the number returned by getSize() is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).
- Specified by:
getSizein interfaceTaxonomyReader
-
getCommitUserData
public Map<String,String> getCommitUserData() throws IOException
Description copied from interface:TaxonomyReaderRetrieve user committed data.- Specified by:
getCommitUserDatain interfaceTaxonomyReader- Throws:
IOException- See Also:
TwoPhaseCommit.commit(Map)
-
getChildrenArrays
public TaxonomyReader.ChildrenArrays getChildrenArrays()
Description copied from interface:TaxonomyReadergetChildrenArrays() returns aTaxonomyReader.ChildrenArraysobject which can be used together to efficiently enumerate the children of any category.The caller can hold on to the object it got indefinitely - it is guaranteed that no-one else will modify it. The other side of the same coin is that the caller must treat the object which it got (and the arrays it contains) as read-only and not modify it, because other callers might have gotten the same object too.
Implementations should have O(getSize()) time for the first call or after a refresh(), but O(1) time for further calls. In neither case there should be a need to read new data from disk. These guarantees are most likely achieved by calculating this object (based on the getParentArray()) when first needed, and later (if the taxonomy was not refreshed) returning the same object (without any allocation or copying) when requested.
The reason we have one method returning one object, rather than two methods returning two arrays, is to avoid race conditions in a multi- threaded application: We want to avoid the possibility of returning one new array and one old array, as those could not be used together.
- Specified by:
getChildrenArraysin interfaceTaxonomyReader
-
toString
public String toString(int max)
-
decRef
public void decRef() throws IOExceptionExpert: decreases the refCount of this TaxonomyReader instance. If the refCount drops to 0, then this reader is closed.- Specified by:
decRefin interfaceTaxonomyReader- Throws:
IOException
-
getRefCount
public int getRefCount()
Expert: returns the current refCount for this taxonomy reader- Specified by:
getRefCountin interfaceTaxonomyReader
-
incRef
public void incRef()
Expert: increments the refCount of this TaxonomyReader instance. RefCounts are used to determine when a taxonomy reader can be closed safely, i.e. as soon as there are no more references. Be sure to always call a corresponding decRef(), in a finally clause; otherwise the reader may never be closed.- Specified by:
incRefin interfaceTaxonomyReader
-
-