Package org.apache.lucene.analysis.hunspell
A Java implementation of Hunspell stemming and
spell-checking algorithms (
Hunspell
), and a stemming
TokenFilter (HunspellStemFilter
) based on it.
For dictionaries, see e.g. LibreOffice repository or Titus Wormer's collection (UTF)
-
Interface Summary Interface Description AffixCondition Checks the "condition" part of affix definition, as inDictEntries An object representing homonym dictionary entries.FragmentChecker An oracle for quickly checking that a specific part of a word can never be a valid word.GeneratingSuggester.AffixProcessor NGramFragmentChecker.NGramConsumer A callback for n-gram ranges in wordsSortingStrategy.EntryAccumulator SortingStrategy.EntrySupplier Stemmer.CaseVariationProcessor Stemmer.RootProcessor -
Class Summary Class Description AffixedWord An object representing the analysis result of a simple (non-compound) wordAffixedWord.Affix An object representing a prefix or a suffix applied to a word stemCheckCompoundPattern CompoundRule ConvTable ICONV or OCONV replacement tableDictEntry An object representing *.dic file entry with its word, flags and morphological data.Dictionary In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.Dictionary.Breaks Possible word breaks according to BREAK directivesDictionary.DefaultAsUtf8FlagParsingStrategy Used to read flags as UTF-8 even if the rest of the file is in the default (8-bit) encodingDictionary.DoubleASCIIFlagParsingStrategy Implementation ofDictionary.FlagParsingStrategy
that assumes each flag is encoded as two ASCII characters whose codes must be combined into a single character.Dictionary.FlagParsingStrategy Abstraction of the process of parsing flags taken from the affix and dic filesDictionary.NumFlagParsingStrategy Implementation ofDictionary.FlagParsingStrategy
that assumes each flag is encoded in its numerical form.Dictionary.SimpleFlagParsingStrategy Simple implementation ofDictionary.FlagParsingStrategy
that treats the chars in each String as a individual flags.EntrySuggestion Suggestion to add/edit dictionary entries to generate a given list of words created byWordFormGenerator.compress(java.util.List<java.lang.String>, java.util.Set<java.lang.String>, java.lang.Runnable)
.FlagEnumerator A structure similar toBytesRefHash
, but specialized for sorted char sequences used for Hunspell flags.FlagEnumerator.Lookup GeneratingSuggester A class that traverses the entire dictionary and applies affix rules to check if those yield correct suggestions similar enough to the given misspelled wordGeneratingSuggester.Weighted<T extends java.lang.Comparable<T>> Hunspell A spell checker based on Hunspell dictionaries.HunspellStemFilter TokenFilter that uses hunspell affix rules and words to stem tokens.HunspellStemFilterFactory TokenFilterFactory that creates instances ofHunspellStemFilter
.ISO8859_14Decoder ModifyingSuggester A class that modifies the given misspelled word in various ways to get correct suggestionsNGramFragmentChecker AFragmentChecker
based on all character n-grams possible in a certain language, keeping them in a relatively memory-efficient, but probabilistic data structure.RepEntry Root<T extends java.lang.CharSequence> SortingStrategy The strategy defining how a Hunspell dictionary should be loaded, with different tradeoffs.Stemmer Stemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word.Stemmer.StemCandidateProcessor Suggester A generator for misspelled word corrections based on Hunspell flags.SuggestibleEntryCache A cache allowing for CPU-cache-friendlier iteration overWordStorage
entries that can be used for suggestions.Suggestion TrigramAutomaton An automaton allowing to achieve the same results as non-weightedGeneratingSuggester.ngramScore(int, java.lang.String, java.lang.String, boolean)
, but faster (in O(s2.length) time).WordFormGenerator A utility class used for generating possible word forms by adding affixes to stems (WordFormGenerator.getAllWordForms(String, String, Runnable)
), and suggesting stems and flags to generate the given set of words (WordFormGenerator.compress(List, Set, Runnable)
).WordFormGenerator.AffixEntry WordFormGenerator.FlagSet WordFormGenerator.State WordStorage A data structure for memory-efficient word storage and fast lookup/enumeration.WordStorage.Builder -
Enum Summary Enum Description AffixKind TimeoutPolicy A strategy determining what to do when Hunspell API calls take too much timeWordCase WordCase.CharCase WordContext -
Exception Summary Exception Description SuggestionTimeoutException An exception thrown whenHunspell.suggest(java.lang.String)
call takes too long, ifTimeoutPolicy.THROW_EXCEPTION
is used.