Get the latest distribution of NLTK from here. You can install NLTK on Ubuntu as follows:
$ sudo -sThe following shows a cumulative frequency plot of the words that occurred most often first:
# apt-get install python-numpy python-matplotlib prover9
# unzip nltk-2.0b3.zip
# cd nltk-2.0b3/
# sudo python setup.py install
Python 2.6...... (......)
>>> import nltk
There are many other things that can be done. This is a textual example of collocations in the "Inaugural Address". Collocations are words that appear together frequently.
>>> text7.collocations()These examples are taken from the book of NLTK. So, NLTK isn't really about making text accessible or some kind of engine that you can use for parsing / understanding text. It's a toolkit for language processing experimentation, so that using that knowledge you can roll your own stuff afterwards. A specific design goal is low-threshold access to the tools and functions of the toolkit. This should allow people without programming experience to use machine processing tasks for their own research. This hopefully puts it well within reach for anyone looking into natural language processing, filtering spam, building search engines, building translation engines and so forth.
Building collocations list
million *U*; New York; billion *U*; Wall Street; program trading; Mrs.
Yeargin; vice president; Stock Exchange; Big Board; Georgia Gulf;
chief executive; Dow Jones; S&P 500; says *T*-1; York Stock; last
year; Sea Containers; South Korea; American Express; San Francisco