Cross Entropy - aragorn/home GitHub Wiki
Entropy
- Physics or Statistical Thermodynamics
- https://en.wikipedia.org/wiki/Entropy
In statistical thermodynamics, entropy (usual symbol S) is a measure of the number of microscopic configurations ฮฉ that a thermodynamic system can have when in a state as specified by certain macroscopic variables.
]
- https://en.wikipedia.org/wiki/Entropy
- Information Theory
- https://en.wikipedia.org/wiki/Entropy_(information_theory)
In information theory, systems are modeled by a transmitter, channel, and receiver. The transmitter produces messages that are sent through the channel. The channel modifies the message in some way. The receiver attempts to infer which message was sent. In this context, entropy (more specifically, Shannon entropy) is the expected value (average) of the information contained in each message. 'Messages' can be modeled by any flow of information.- Named after Boltzmann's ฮ-theorem, Shannon defined the entropy ฮ (Greek capital letter eta) of a discrete random variable X with possible values {x1, ..., xn} and probability mass function P(X) as:
Here E is the expected value operator, and I is the information content of X. I(X) is itself a random variable.
where b is the base of the logarithm used. Common values of b are 2, Euler's number e, and 10, and the unit of entropy is shannon for b = 2, nat for b = e, and hartley for b = 10.[6] When b = 2, the units of entropy are also commonly referred to as bits.
- Named after Boltzmann's ฮ-theorem, Shannon defined the entropy ฮ (Greek capital letter eta) of a discrete random variable X with possible values {x1, ..., xn} and probability mass function P(X) as:
- https://en.wikipedia.org/wiki/Entropy_(information_theory)
- Explanation on entropy using colored flashing lights
Entropy and Information Gain
http://stackoverflow.com/questions/1859554/what-is-entropy-and-information-gain
KullbackโLeibler divergence
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
In probability theory and information theory, the KullbackโLeibler divergence,[1][2] also called information divergence, information gain, relative entropy, KLIC, or KL divergence, is a measure (but not a metric) of the non-symmetric difference between two probability distributions P and Q. The KullbackโLeibler divergence was originally introduced by Solomon Kullback and Richard Leibler in 1951 as the directed divergence between two distributions; Kullback himself preferred the name discrimination information.[3] It is discussed in Kullback's historic text, Information Theory and Statistics.[2]
Expressed in the language of Bayesian inference, the KullbackโLeibler divergence from Q to P, denoted DKL(PโQ), is a measure of the information gained when one revises one's beliefs from the prior probability distribution Q to the posterior probability distribution P. In other words, it is the amount of information lost when Q is used to approximate P.[4] In applications, P typically represents the "true" distribution of data, observations, or a precisely calculated theoretical distribution, while Q typically represents a theory, model, description, or approximation of P.
Cross Entropy
Wikipedia
https://en.wikipedia.org/wiki/Cross_entropy
In information theory, the cross entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an "unnatural" probability distribution q, rather than the "true" distribution p.
๋์ผํ ์ฌ๊ฑด ์งํฉ์ ๋ํ ๋ ํ๋ฅ ๋ถํฌ p, q๊ฐ ์์ ๋, "์ค์ " ๋ถํฌ์ธ p ๊ฐ ์๋๋ผ "๊ฐ์" ํ๋ฅ ๋ถํฌ q ์ ์ต์ ํ๋ ๋ถํธํ ๋ฐฉ์์ ์ ์ฉํ์์ ๋, ์ฌ๊ฑด ์งํฉ์ ์ฌ๊ฑด์ ์๋ณํ๋๋ฐ ํ์ํ ํ๊ท ๋นํธ์๋ฅผ ์ธก์ ํ๋ค.
Explanations
Bernoulli process
Wikipedia
https://en.wikipedia.org/wiki/Bernoulli_process
A Bernoulli process is a finite or infinite sequence of independent random variables X1, X2, X3, ..., such that
- For each i, the value of Xi is either 0 or 1;
- For all values of i, the probability that Xi = 1 is the same number p. In other words, a Bernoulli process is a sequence of independent identically distributed Bernoulli trials.
Independence of the trials implies that the process is memoryless. Given that the probability p is known, past outcomes provide no information about future outcomes. (If p is unknown, however, the past informs about the future indirectly, through inferences about p.)