Generating Image Descriptions - rugbyprof/5443-Data-Mining GitHub Wiki

Amazon Mechanical Turk

Amazon Mechanical Turk (MTurk) is a crowdsourcing Internet marketplace enabling individuals and businesses to coordinate the use of human intelligence to perform tasks that computers are currently unable to do. It is one of the sites of Amazon Web Services, and is owned by Amazon. https://en.wikipedia.org/wiki/Amazon_Mechanical_Turk

LSTM (Long short-term memory)

LSTMs don’t have a fundamentally different architecture from RNNs, but they use a different function to compute the hidden state. An LSTM block is composed of four main components: a cell, an input gate, an output gate and a forget gate. The memory in LSTMs are called cells and these cells decide what to keep in memory. They then combine the previous state, the current memory, and the input. Each of the three gates can be thought as a "conventional" artificial neuron, they compute an activation of a weighted sum. Intuitively, they can be thought as regulators of the flow of values that goes through the connections of the LSTM; hence the denotation "gate". There are connections between these gates and the cell. Some of the connections are recurrent, some of them are not. http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

Bigrams

A bigram or diagram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-gram for n=2. The frequency distribution of every bigram in a string is commonly used for simple statistical analysis of text in many applications, including in computational linguistics, cryptography, speech recognition, and so on. https://en.wikipedia.org/wiki/Bigram

METEOR score -Metric for Evaluation of Translation with Explicit Ordering

This a metric for the evaluation of machine translation output. The metric is based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision. It also has several features that are not found in other metrics, such as stemming and synonymy matching, along with the standard exact word matching. https://en.wikipedia.org/wiki/METEOR

BLEU score-bilingual evaluation understudy

This an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human. https://en.wikipedia.org/wiki/BLEU

Arxiv preprints

ArXiv (pronounced "archive")] is a repository of electronic preprints, known as e-prints, of scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance, which can be accessed online. https://en.wikipedia.org/wiki/ArXiv

MSCOCO Microsoft Common objects in context.

Multimodal embedding

Form of training in which multiple modes of input/output are used. E.g. using text labels and images to train a network.

Inter-modal relationships

Relationship between two different entities.

Parse Tree

It is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar.

Markov random field (MRF)

Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be Markov random field if it satisfies Markov properties. Pairwise Markov property, Local Markov property, Global Markov property. https://en.wikipedia.org/wiki/Markov_random_field

RMSProp

(Root Mean Square Propagation) is also a method in which the learning rate is adapted for each of the parameters. The idea is to divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight. So, first the running average is calculated in terms of means square. https://en.wikipedia.org/wiki/Stochastic_gradient_descent#RMSProp