Class LaoBreakIterator
- java.lang.Object
-
- com.ibm.icu.text.BreakIterator
-
- org.apache.lucene.analysis.icu.segmentation.LaoBreakIterator
-
- All Implemented Interfaces:
Cloneable
public class LaoBreakIterator extends com.ibm.icu.text.BreakIteratorSyllable iterator for Lao text.This breaks Lao text into syllables according to: Syllabification of Lao Script for Line Breaking Phonpasit Phissamay, Valaxay Dalolay, Chitaphone Chanhsililath, Oulaiphone Silimasak, Sarmad Hussain, Nadir Durrani, Science Technology and Environment Agency, CRULP.
- http://www.panl10n.net/english/final%20reports/pdf%20files/Laos/LAO06.pdf
- http://www.panl10n.net/Presentations/Cambodia/Phonpassit/LineBreakingAlgo.pdf
Most work is accomplished with RBBI rules, however some additional special logic is needed that cannot be coded in a grammar, and this is implemented here.
For example, what appears to be a final consonant might instead be part of the next syllable. Rules match in a greedy fashion, leaving an illegal sequence that matches no rules.
Take for instance the text ກວ່າດອກ The first rule greedily matches ກວ່າດ, but then ອກ is encountered, which is illegal. What LaoBreakIterator does, according to the paper:
- backtrack and remove the ດ from the last syllable, placing it on the current syllable.
- verify the modified previous syllable (ກວ່າ ) is still legal.
- verify the modified current syllable (ດອກ) is now legal.
- If 2 or 3 fails, then restore the ດ to the last syllable and skip the current character.
Finally, LaoBreakIterator also takes care of the second concern mentioned in the paper. This is the issue of combining marks being in the wrong order (typos).
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Constructor Summary
Constructors Constructor Description LaoBreakIterator(com.ibm.icu.text.RuleBasedBreakIterator rules)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Objectclone()Clone method.intcurrent()intfirst()intfollowing(int offset)CharacterIteratorgetText()intlast()intnext()intnext(int n)intprevious()voidsetText(String newText)voidsetText(CharacterIterator text)-
Methods inherited from class com.ibm.icu.text.BreakIterator
getAvailableLocales, getAvailableULocales, getBreakInstance, getCharacterInstance, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getLineInstance, getLocale, getSentenceInstance, getSentenceInstance, getSentenceInstance, getTitleInstance, getTitleInstance, getTitleInstance, getWordInstance, getWordInstance, getWordInstance, isBoundary, preceding, registerInstance, registerInstance, unregister
-
-
-
-
Method Detail
-
current
public int current()
- Specified by:
currentin classcom.ibm.icu.text.BreakIterator
-
first
public int first()
- Specified by:
firstin classcom.ibm.icu.text.BreakIterator
-
following
public int following(int offset)
- Specified by:
followingin classcom.ibm.icu.text.BreakIterator
-
getText
public CharacterIterator getText()
- Specified by:
getTextin classcom.ibm.icu.text.BreakIterator
-
last
public int last()
- Specified by:
lastin classcom.ibm.icu.text.BreakIterator
-
next
public int next()
- Specified by:
nextin classcom.ibm.icu.text.BreakIterator
-
next
public int next(int n)
- Specified by:
nextin classcom.ibm.icu.text.BreakIterator
-
previous
public int previous()
- Specified by:
previousin classcom.ibm.icu.text.BreakIterator
-
setText
public void setText(CharacterIterator text)
- Specified by:
setTextin classcom.ibm.icu.text.BreakIterator
-
setText
public void setText(String newText)
- Overrides:
setTextin classcom.ibm.icu.text.BreakIterator
-
clone
public Object clone()
Clone method. Creates another LaoBreakIterator with the same behavior and current state as this one.- Overrides:
clonein classcom.ibm.icu.text.BreakIterator- Returns:
- The clone.
-
-