I installed NLTK using the commands which were found here:
http://nltk.org/install.html
Everything got installed successfully. So when I started using this nltk package like this:
python
>>> import nltk
>>> text = nltk.word_tokenize("And now for something completely different")
>>> text
['And', 'now', 'for', 'something', 'completely', 'different']
But when I try to tag these tokens, there are some errors as shown below.
>>> nltk.pos_tag(text)
File "
", line 1
nltk.pos_tag(text)
^
IndentationError: unexpected indent
>>> nltk.pos_tag(text)
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python2.6/site-packages/nltk/tag/__init__.py", line 99, in pos_tag
tagger = load(_POS_TAGGER)
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 605, in load
resource_val = pickle.load(_open(resource_url))
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 686, in _open
return find(path).open()
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 467, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not
found. Please use the NLTK Downloader to obtain the resource:
>>> nltk.download()
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************
The way you can fix it is: Run this command which downloads the tagger:
>>> nltk.download('maxent_treebank_pos_tagger')
It downloads this package to some directory on your machine. For me it downloaded here: /root/nltk_data...
And all the system related files with regard to NLTK are in this directory: /usr/lib/python2.6/site-packages/nltk
So, I created a new directory called 'taggers' and copy this 'maxent_treebank_pos_tagger' to this new directory named: /usr/lib/python2.6//site-packages/nltk/taggers/
Now when you run this command:
>>> tagged = nltk.pos_tag(tokens)
[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN')]
Similarly when you try to do this:
>>> entities = nltk.chunk.ne_chunk(tagged)
You
will see similar results that you have to download chunker and corpora,
follow the similar procedure and execute these two commands:
>>> nltk.download('maxent_ne_chunker')
>>> nltk.download('words')
This can help you install the chunkers package and when you execute this command, you can get the parse tree
>>> entities = nltk.chunk.ne_chunk(tagged)
>>> entities
Tree('S',
[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),
('Thursday', 'NNP'), ('morning', 'NN'), Tree('PERSON', [('Arthur',
'NNP')]), ('did', 'VBD'), ("n't", 'RB'), ('feel', 'VB'), ('very', 'RB'),
('good', 'JJ'), ('.', '.')])
Hope this helps!