Package nltk_lite :: Module probability :: Class FreqDist
[show private | hide private]
[frames | no frames]

Type FreqDist

object --+
         |
        FreqDist


A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occured. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occured as an outcome.

Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment. For example, the following code will produce a frequency distribution that encodes how often each word occurs in a text:
>>> fdist = FreqDist()
>>> for word in tokenize.whitespace(sent):
...    fdist.inc(word)

Method Summary
  __init__(self)
Construct a new empty, FreqDist.
boolean __contains__(self, sample)
Return true if the given sample occurs one or more times in this frequency distribution.
string __repr__(self)
Return a string representation of this FreqDist.
string __str__(self)
Return a string representation of this FreqDist.
int B(self)
Return the total number of sample values (or bins) that have counts greater than zero.
int count(self, sample)
Return the count of a given sample.
float freq(self, sample)
Return the frequency of a given sample.
None inc(self, sample, count)
Increment this FreqDist's count for the given sample.
any or None max(self)
Return the sample with the greatest number of outcomes in this frequency distribution.
int N(self)
Return the total number of sample outcomes that have been recorded by this FreqDist.
int Nr(self, r, bins)
Return the number of samples with count r.
list samples(self)
Return a list of all samples that have been recorded as outcomes by this frequency distribution.
sequence of any sorted_samples(self)
Return the samples sorted in decreasing order of frequency.
Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__

Method Details

__init__(self)
(Constructor)

Construct a new empty, FreqDist. In particular, the count for every sample is zero.
Overrides:
__builtin__.object.__init__

__contains__(self, sample)
(In operator)

Parameters:
sample - The sample to search for.
           (type=any)
Returns:
True if the given sample occurs one or more times in this frequency distribution.
           (type=boolean)

__repr__(self)
(Representation operator)

Returns:
A string representation of this FreqDist.
           (type=string)
Overrides:
__builtin__.object.__repr__

__str__(self)
(Informal representation operator)

Returns:
A string representation of this FreqDist.
           (type=string)
Overrides:
__builtin__.object.__str__

B(self)

Returns:
The total number of sample values (or bins) that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N().
           (type=int)

count(self, sample)

Return the count of a given sample. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Counts are non-negative integers.
Parameters:
sample - the sample whose count should be returned.
           (type=any.)
Returns:
The count of a given sample.
           (type=int)

freq(self, sample)

Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Frequencies are always real numbers in the range [0, 1].
Parameters:
sample - the sample whose frequency should be returned.
           (type=any)
Returns:
The frequency of a given sample.
           (type=float)

inc(self, sample, count=1)

Increment this FreqDist's count for the given sample.
Parameters:
sample - The sample whose count should be incremented.
           (type=any)
count - The amount to increment the sample's count by.
           (type=int)
Returns:
None
Raises:
NotImplementedError - If sample is not a supported sample type.

max(self)

Return the sample with the greatest number of outcomes in this frequency distribution. If two or more samples have the same number of outcomes, return one of them; which sample is returned is undefined. If no outcomes have occured in this frequency distribution, return None.
Returns:
The sample with the maximum number of outcomes in this frequency distribution.
           (type=any or None)

N(self)

Returns:
The total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().
           (type=int)

Nr(self, r, bins=None)

Parameters:
r - A sample count.
           (type=int)
bins - The number of possible sample outcomes. bins is used to calculate Nr(0). In particular, Nr(0) is bins-self.B(). If bins is not specified, it defaults to self.B() (so Nr(0) will be 0).
           (type=int)
Returns:
The number of samples with count r.
           (type=int)

samples(self)

Returns:
A list of all samples that have been recorded as outcomes by this frequency distribution. Use count() to determine the count for each sample.
           (type=list)

sorted_samples(self)

Return the samples sorted in decreasing order of frequency. Instances with the same count will be arbitrarily ordered. Instances with a count of zero will be omitted. This method is O(N^2), where N is the number of samples, but will complete in a shorter time on average.
Returns:
The set of samples in sorted order.
           (type=sequence of any)

Generated by Epydoc 2.1 on Tue Sep 5 09:37:21 2006 http://epydoc.sf.net