sept lesson - SNUDerek/MLsnippets GitHub Wiki

Machine Learning Review / Big Picture:

data: with features x_1...x_n, labels y
error function: must be "differentiable"
partial derivatives: how to adjust each 'knob'
gradient descent: adjusting each knob iteratively
learning rate: taking smaller steps for stability

cross-entropy

friendly intro
why neg loglik

back-propagation

why neural networks?

"Universal Approximation Property"
see Deep Learning Book

NLP applications

resources:

stanford NLP on YT

preprocessing:

live example in colab: tokenization, stemming, etc.

word representations: one-hot vs dense word vectors:

UAP > function to 'map' words to 'meaning space'
(word2vec intuition)[https://towardsdatascience.com/word2vec-skip-gram-model-part-1-intuition-78614e4d6e0b]

language modeling, perplexity

n-gram models, sequence models (MEMM/CRF), RNN
(stanford LM slides)[https://web.stanford.edu/class/cs124/lec/languagemodeling.pdf]
(my project: LM classification)[https://github.com/SNUDerek/lm_perplexity_bootstrapping]