Class WordFormGenerator

    • Constructor Detail

      • WordFormGenerator

        public WordFormGenerator​(Dictionary dictionary)
    • Method Detail

      • toString

        private java.lang.String toString​(AffixKind kind,
                                          IntsRef input)
      • strip

        private java.lang.String strip​(int affixId)
      • getAllWordForms

        public java.util.List<AffixedWord> getAllWordForms​(java.lang.String root,
                                                           java.lang.Runnable checkCanceled)
        Generate all word forms for all dictionary entries with the given root word. The result order is stable but not specified. This is equivalent to "unmunch" from the "hunspell-tools" package.
        Parameters:
        checkCanceled - an object that's periodically called, allowing to interrupt the generation by throwing an exception
      • getAllWordForms

        public java.util.List<AffixedWord> getAllWordForms​(java.lang.String stem,
                                                           java.lang.String flags,
                                                           java.lang.Runnable checkCanceled)
        Generate all word forms for the given root pretending it has the given flags (in the same format as the dictionary uses). The result order is stable but not specified. This is equivalent to "unmunch" from the "hunspell-tools" package.
        Parameters:
        checkCanceled - an object that's periodically called, allowing to interrupt the generation by throwing an exception
      • getAllWordForms

        private java.util.List<AffixedWord> getAllWordForms​(DictEntry entry,
                                                            char[] encodedFlags,
                                                            java.lang.Runnable checkCanceled)
      • sortAndDeduplicate

        private static char[] sortAndDeduplicate​(char[] flags)
      • deduplicate

        private static char[] deduplicate​(char[] flags)
      • canStemToOriginal

        protected boolean canStemToOriginal​(AffixedWord derived)
        A sanity-check that the word form generated by affixation in getAllWordForms(String, String, Runnable) is indeed accepted by the spell-checker and analyzed to be the form of the original dictionary entry. This can be overridden for cases where such check is unnecessary or can be done more efficiently.
      • isForbiddenWord

        private boolean isForbiddenWord​(char[] chars,
                                        int offset,
                                        int length)
      • expand

        private java.util.List<AffixedWord> expand​(AffixedWord stem,
                                                   char[] flags,
                                                   java.lang.Runnable checkCanceled)
      • shouldConsiderAtAll

        private boolean shouldConsiderAtAll​(char[] flags)
      • updateFlags

        private char[] updateFlags​(char[] flags,
                                   char toRemove,
                                   char[] toAppend)
      • generateAllSimpleWords

        public void generateAllSimpleWords​(java.util.function.Consumer<AffixedWord> consumer,
                                           java.lang.Runnable checkCanceled)
        Traverse the whole dictionary and derive all word forms via affixation (as in getAllWordForms(String, String, Runnable)) for each of the entries. The iteration order is undefined. Only "simple" words are returned, no compounding flags are processed. Upper- and title-case variations are not returned, even if the spellchecker accepts them.
        Parameters:
        consumer - the object that receives each derived word form
        checkCanceled - an object that's periodically called, allowing to interrupt the traversal and generation by throwing an exception
      • compress

        public EntrySuggestion compress​(java.util.List<java.lang.String> words,
                                        java.util.Set<java.lang.String> forbidden,
                                        java.lang.Runnable checkCanceled)
        Given a list of words, try to produce a smaller set of dictionary entries (with some flags) that would generate these words. This is equivalent to "munch" from the "hunspell-tools" package. The algorithm tries to minimize the number of the dictionary entries to add or change, the number of flags involved, and the number of non-requested additionally generated words. All the mentioned words are in the dictionary format and case: no ICONV/OCONV/IGNORE conversions are applied.
        Parameters:
        words - the list of words to generate
        forbidden - the set of words to avoid generating
        checkCanceled - an object that's periodically called, allowing to interrupt the generation by throwing an exception
        Returns:
        the information about suggested dictionary entries and overgenerated words, or null if the algorithm couldn't generate anything
      • isCompatibleWithPreviousAffixes

        private boolean isCompatibleWithPreviousAffixes​(AffixedWord stem,
                                                        AffixKind kind,
                                                        char flag)