[function] train_word2vec() - P3chys/textmining GitHub Wiki

2. Word Embeddings Functions

Function: train_word2vec()

Purpose

Trains a Word2Vec model on tokenized text data.

Syntax

train_word2vec(df, token_column='processed_text', vector_size=100, window=5,
              min_count=2, workers=4, sg=1, epochs=5)

Parameters

Parameter Type Default Description
df pandas.DataFrame Required DataFrame with tokenized text
token_column str 'processed_text' Column containing token lists
vector_size int 100 Dimension of word vectors
window int 5 Context window size
min_count int 2 Minimum word frequency
workers int 4 Number of training threads
sg int 1 Training algorithm (1=skip-gram, 0=CBOW)
epochs int 5 Number of training iterations

Returns

  • Word2Vec: Trained Word2Vec model object