Vocab - GateNLP/gate-lf-python-data GitHub Wiki
- The Vocab class represents the mappings from strings to string indices and back.
- Main methods/attributes:
- constructor: construct from a counter object or any map that maps each string to an integer (the count).
-
stoi
: map from string to index -
itos
: array returning the string for some index -
string2onehot(string)
: create a dense one-hot vector for the string -
onehot2string(vector)
: convert a dense one-hot vector to the corresponding string
- Can add additional symbols (e.g. padding).
Usage:
map1 = {"word1": 3, "word2": 5}
vocab = Vocab(map1,add_symbols=["<PAD>","<START>"])
vocab.stoi["word1"] # returns 3
vocab.stoi.get("word2") # returns 2
vocab.stoi.get("<PAD>") # returns 0
vocab.itos[2] # returns "word2"
vocab.string2onehot("word2") # returns [0.0, 0.0, 1.0, 0.0]
vocab.onehot2string([0.0, 0.0, 1.0, 0.0]) # returns "word2"