Courses‎ > ‎Python worksheets‎ > ‎

Python demo: gensim

This code uses the Gensim and NLTK packages to build tiny distributional models from two datasets, the Brown corpus and the NLTK movie review data. It then showcases Gensim functions for computing nearest neighbors and similarity.

# building on code from the NLTK cookbook
from gensim.models import Word2Vec
from nltk.corpus import brown, movie_reviews

b = Word2Vec(brown.sents(), iter=10, min_count=10, size=300, workers=4)
mr = Word2Vec(movie_reviews.sents(), iter=10, min_count=10, size=300, workers=4)

# most similar words in several
b.wv.most_similar('money', topn=5)
mr.wv.most_similar('money', topn = 5)

# print individual vectors for "money".
# the following command prints the word vector (wv) for "money"
# in our Brown-trained space
print( b.wv["money"])
# same for the movie views-trained space
print( mr.wv["money"])

# get a single similarity rating, again using the Brown-trained space
print(b.wv.similarity("money", "bank"))
print(b.wv.similarity("river", "bank"))

# same in the movie reviews space
print(mr.wv.similarity("money", "bank"))
print(mr.wv.similarity("river", "bank"))