Word2Vec Explained - mohsensalari/cs571 GitHub Wiki

Negative Sampling

Word2Vec tries to come up with a measure of similarity in the semantic realm. It tries to map every word with a vector such that words with similar meanings have similar vectors.

To compute a Word Vector, we need to compute the similarity of a word in a given context, and make it as dissimilar to other contexts as possible. But to this end, we need to check all the possible contexts. Negative Sampling tries to avoid this slow process by selecting a random number of contexts, and tries to do the task only for those random contexts.

To do this We build a distribution of words in the vocabulary, and decides which words to sample out of the vocabulary. We then use this distribution to select our sample words, paying attention not to choose the very word we are trying to compute its Word2Vec:

	private int[] getNegativeSamples(Random rand, int word)
	{
		IntSet set = new IntOpenHashSet();
		int target;

		while (set.size() < sample_size)
		{
			target = dist_table[rand.nextInt() % dist_table.length];
			if (target != word) set.add(target);
		}
		
		return set.toIntArray();
	}

What we do after selecting our negative samples is that we learn a BagofWords with the correct word and context, and give a label of '1' to it (meaning that this word does belong to this context), and for each of the negative samples learn another model, with a '0' label:

learnBagOfWords(1, word, syn1, neu1, neu1e, alpha);
		
		for (int sample : getNegativeSamples(rand, word))
			learnBagOfWords(0, sample, syn1, neu1, neu1e, alpha);

The difference between BagofWords and Skip-Gram is that the former tries to find the probability of a word given its context, and the latter tries to guess the context when a word is provided. The desired option is checked here:


					if (cbow) bagOfWords(words, index, window, rand, neu1e, neu1);
					else      skipGram  (words, index, window, rand, neu1e);