https://github.com/danielfrg/word2vec
Installation
I recommend the Anaconda python distribution
pip install word2vec
Wheel: Wheels packages for OS X and Windows are provided on Pypi on a best effort sense. The code is quite easy to compile so consider using: --no-use-wheel
on Linux and OS X.
Linux: There is no wheel support for linux so you have to compile the C code. The only requirement is gcc
. You can override the compilation flags if needed: CFLAGS='-march=corei7' pip install word2vec
Windows: Very experimental support based this win32 port
%load_ext autoreload
%autoreload 2
In [2]:
import word2vec
In [3]:
word2vec.word2phrase('/Users/drodriguez/Downloads/text8', '/Users/drodriguez/Downloads/text8-phrases', verbose=True)
In [4]:
word2vec.word2vec('/Users/drodriguez/Downloads/text8-phrases', '/Users/drodriguez/Downloads/text8.bin', size=100, verbose=True)
In [5]:
word2vec.word2clusters('/Users/drodriguez/Downloads/text8', '/Users/drodriguez/Downloads/text8-clusters.txt', 100, verbose=True)
In [1]:
import word2vec
In [2]:
model = word2vec.load('/Users/drodriguez/Downloads/text8.bin')
In [3]:
model.vocab
Out[3]:
In [4]:
model.vectors.shape
Out[4]:
In [5]:
model.vectors
Out[5]:
In [6]:
model['dog'].shape
Out[6]:
In [7]:
model['dog'][:10]
Out[7]:
In [8]:
indexes, metrics = model.cosine('socks')
indexes, metrics
Out[8]:
In [9]:
model.vocab[indexes]
Out[9]:
In [10]:
model.generate_response(indexes, metrics)
Out[10]:
In [11]:
model.generate_response(indexes, metrics).tolist()
Out[11]:
In [12]:
indexes, metrics = model.cosine('los_angeles')
model.generate_response(indexes, metrics).tolist()
Out[12]:
In [13]:
indexes, metrics = model.analogy(pos=['king', 'woman'], neg=['man'], n=10)
indexes, metrics
Out[13]:
In [14]:
model.generate_response(indexes, metrics).tolist()
Out[14]:
In [15]:
clusters = word2vec.load_clusters('/Users/drodriguez/Downloads/text8-clusters.txt')
In [16]:
clusters['dog']
Out[16]:
In [17]:
clusters.get_words_on_cluster(90).shape
Out[17]:
In [18]:
clusters.get_words_on_cluster(90)[:10]
Out[18]:
In [19]:
model.clusters = clusters
In [20]:
indexes, metrics = model.analogy(pos=['paris', 'germany'], neg=['france'], n=10)
In [21]:
model.generate_response(indexes, metrics).tolist()
Out[21]:
In [ ]: