Vocab - GateNLP/gate-lf-python-data GitHub Wiki

  • The Vocab class represents the mappings from strings to string indices and back.
  • Main methods/attributes:
    • constructor: construct from a counter object or any map that maps each string to an integer (the count).
    • stoi: map from string to index
    • itos: array returning the string for some index
    • string2onehot(string): create a dense one-hot vector for the string
    • onehot2string(vector): convert a dense one-hot vector to the corresponding string
  • Can add additional symbols (e.g. padding).

Usage:

map1 = {"word1": 3, "word2": 5}
vocab = Vocab(map1,add_symbols=["<PAD>","<START>"])
vocab.stoi["word1"] # returns 3
vocab.stoi.get("word2") # returns 2
vocab.stoi.get("<PAD>") # returns 0
vocab.itos[2] # returns "word2"
vocab.string2onehot("word2") # returns [0.0, 0.0, 1.0, 0.0]
vocab.onehot2string([0.0, 0.0, 1.0, 0.0]) # returns "word2"
⚠️ **GitHub.com Fallback** ⚠️