[function] train_word2vec() - P3chys/textmining GitHub Wiki
2. Word Embeddings Functions
Function: train_word2vec()
Purpose
Trains a Word2Vec model on tokenized text data.
Syntax
train_word2vec(df, token_column='processed_text', vector_size=100, window=5,
min_count=2, workers=4, sg=1, epochs=5)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
df |
pandas.DataFrame | Required | DataFrame with tokenized text |
token_column |
str | 'processed_text' | Column containing token lists |
vector_size |
int | 100 | Dimension of word vectors |
window |
int | 5 | Context window size |
min_count |
int | 2 | Minimum word frequency |
workers |
int | 4 | Number of training threads |
sg |
int | 1 | Training algorithm (1=skip-gram, 0=CBOW) |
epochs |
int | 5 | Number of training iterations |
Returns
- Word2Vec: Trained Word2Vec model object