Package nltk_lite :: Package corpora :: Module ppattach
[show private | hide private]
[frames | no frames]

Module nltk_lite.corpora.ppattach

Read lines from the Prepositional Phrase Attachment Corpus.

The PP Attachment Corpus contains several files having the format:

  sentence_id verb noun1 preposition noun2 attachment

E.g.:

  42960 gives authority to administration V
  46742 gives inventors of microchip N

The PP attachment is to the verb phrase (V) or noun phrase (N), i.e.:

  (VP gives (NP authority) (PP to administration))
  (VP gives (NP inventors (PP of microchip)))

The corpus contains the following files:

training:   training set
devset:     development test set, used for algorithm development.
test:       test set, used to report results
bitstrings: word classes derived from Mutual Information
            Clustering for the Wall Street Journal.

Ratnaparkhi, Adwait (1994). A Maximum Entropy Model for Prepositional
Phrase Attachment.  Proceedings of the ARPA Human Language Technology
Conference.  [http://www.cis.upenn.edu/~adwait/papers/hlt94.ps]

The PP Attachment Corpus is distributed with NLTK with the permission
of the author.

Function Summary
  demo()
  dictionary(files)
  raw(files)

Variable Summary
dict item_name = {'test': 'test set', 'devset': 'development ...
list items = ['training', 'devset', 'test']

Variable Details

item_name

Type:
dict
Value:
{'devset': 'development test set',
 'test': 'test set',
 'training': 'training set'}                                           

items

Type:
list
Value:
['training', 'devset', 'test']                                         

Generated by Epydoc 2.1 on Tue Sep 5 09:37:21 2006 http://epydoc.sf.net