Wednesday, September 18, 2013

Problem with tagger and chunkers of NLTK on linux

I installed NLTK using the commands which were found here: http://nltk.org/install.html

Everything got installed successfully. So when I started using this nltk package like this:

python
>>> import nltk
>>> text = nltk.word_tokenize("And now for something completely different")
>>> text
['And', 'now', 'for', 'something', 'completely', 'different']

But when I try to tag these tokens, there are some errors as shown below.

>>>  nltk.pos_tag(text)
  File "", line 1
    nltk.pos_tag(text)
    ^
IndentationError: unexpected indent
>>> nltk.pos_tag(text)
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib/python2.6/site-packages/nltk/tag/__init__.py", line 99, in pos_tag
    tagger = load(_POS_TAGGER)
  File "/usr/lib/python2.6/site-packages/nltk/data.py", line 605, in load
    resource_val = pickle.load(_open(resource_url))
  File "/usr/lib/python2.6/site-packages/nltk/data.py", line 686, in _open
    return find(path).open()
  File "/usr/lib/python2.6/site-packages/nltk/data.py", line 467, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not
  found.  Please use the NLTK Downloader to obtain the resource:
  >>> nltk.download()
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


The way you can fix it is: Run this command which downloads the tagger:
>>> nltk.download('maxent_treebank_pos_tagger')

It downloads this package to some directory on your machine. For me it downloaded here: /root/nltk_data...

And all the system related files with regard to NLTK are in this directory: /usr/lib/python2.6/site-packages/nltk
So, I created a new directory called 'taggers' and copy this 'maxent_treebank_pos_tagger' to this new directory named: /usr/lib/python2.6//site-packages/nltk/taggers/

Now when you run this command:
>>> tagged = nltk.pos_tag(tokens)
[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN')]

Similarly when you try to do this:
>>> entities = nltk.chunk.ne_chunk(tagged)
You will see similar results that you have to download chunker and corpora, follow the similar procedure and execute these two commands:

>>> nltk.download('maxent_ne_chunker')
>>> nltk.download('words')

This can help you install the chunkers package and when you execute this command, you can get the parse tree
>>> entities = nltk.chunk.ne_chunk(tagged)
>>> entities
Tree('S', [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN'), Tree('PERSON', [('Arthur', 'NNP')]), ('did', 'VBD'), ("n't", 'RB'), ('feel', 'VB'), ('very', 'RB'), ('good', 'JJ'), ('.', '.')])

Hope this helps!


No comments: