Class HighFrequencyDictionary

  • All Implemented Interfaces:
    Dictionary

    public class HighFrequencyDictionary
    extends java.lang.Object
    implements Dictionary
    HighFrequencyDictionary: terms taken from the given field of a Lucene index, which appear in a number of documents above a given threshold.

    Threshold is a value in [0..1] representing the minimum number of documents (of the total) where a term should appear.

    Based on LuceneDictionary.

    • Field Detail

      • field

        private java.lang.String field
      • thresh

        private float thresh
    • Constructor Detail

      • HighFrequencyDictionary

        public HighFrequencyDictionary​(IndexReader reader,
                                       java.lang.String field,
                                       float thresh)
        Creates a new Dictionary, pulling source terms from the specified field in the provided reader.

        Terms appearing in less than thresh percentage of documents will be excluded.

    • Method Detail

      • getEntryIterator

        public final InputIterator getEntryIterator()
                                             throws java.io.IOException
        Description copied from interface: Dictionary
        Returns an iterator over all the entries
        Specified by:
        getEntryIterator in interface Dictionary
        Returns:
        Iterator
        Throws:
        java.io.IOException