Cross Entropy - aragorn/home GitHub Wiki

Entropy

  • Physics or Statistical Thermodynamics
    • https://en.wikipedia.org/wiki/Entropy
      In statistical thermodynamics, entropy (usual symbol S) is a measure of the number of microscopic configurations ฮฉ that a thermodynamic system can have when in a state as specified by certain macroscopic variables.
      entropy
      change in entropy]
  • Information Theory
    • https://en.wikipedia.org/wiki/Entropy_(information_theory)
      In information theory, systems are modeled by a transmitter, channel, and receiver. The transmitter produces messages that are sent through the channel. The channel modifies the message in some way. The receiver attempts to infer which message was sent. In this context, entropy (more specifically, Shannon entropy) is the expected value (average) of the information contained in each message. 'Messages' can be modeled by any flow of information.
      • Named after Boltzmann's ฮ—-theorem, Shannon defined the entropy ฮ— (Greek capital letter eta) of a discrete random variable X with possible values {x1, ..., xn} and probability mass function P(X) as:
        entropy of X
        Here E is the expected value operator, and I is the information content of X. I(X) is itself a random variable.
      • entropy of X
        where b is the base of the logarithm used. Common values of b are 2, Euler's number e, and 10, and the unit of entropy is shannon for b = 2, nat for b = e, and hartley for b = 10.[6] When b = 2, the units of entropy are also commonly referred to as bits.
  • Explanation on entropy using colored flashing lights

Entropy and Information Gain

http://stackoverflow.com/questions/1859554/what-is-entropy-and-information-gain

Kullbackโ€“Leibler divergence

https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

In probability theory and information theory, the Kullbackโ€“Leibler divergence,[1][2] also called information divergence, information gain, relative entropy, KLIC, or KL divergence, is a measure (but not a metric) of the non-symmetric difference between two probability distributions P and Q. The Kullbackโ€“Leibler divergence was originally introduced by Solomon Kullback and Richard Leibler in 1951 as the directed divergence between two distributions; Kullback himself preferred the name discrimination information.[3] It is discussed in Kullback's historic text, Information Theory and Statistics.[2]

Expressed in the language of Bayesian inference, the Kullbackโ€“Leibler divergence from Q to P, denoted DKL(Pโ€–Q), is a measure of the information gained when one revises one's beliefs from the prior probability distribution Q to the posterior probability distribution P. In other words, it is the amount of information lost when Q is used to approximate P.[4] In applications, P typically represents the "true" distribution of data, observations, or a precisely calculated theoretical distribution, while Q typically represents a theory, model, description, or approximation of P.

Cross Entropy

Wikipedia

https://en.wikipedia.org/wiki/Cross_entropy

In information theory, the cross entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an "unnatural" probability distribution q, rather than the "true" distribution p.

๋™์ผํ•œ ์‚ฌ๊ฑด ์ง‘ํ•ฉ์— ๋Œ€ํ•œ ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ p, q๊ฐ€ ์žˆ์„ ๋•Œ, "์‹ค์ œ" ๋ถ„ํฌ์ธ p ๊ฐ€ ์•„๋‹ˆ๋ผ "๊ฐ€์ƒ" ํ™•๋ฅ ๋ถ„ํฌ q ์— ์ตœ์ ํ™”๋œ ๋ถ€ํ˜ธํ™” ๋ฐฉ์‹์„ ์ ์šฉํ•˜์˜€์„ ๋•Œ, ์‚ฌ๊ฑด ์ง‘ํ•ฉ์˜ ์‚ฌ๊ฑด์„ ์‹๋ณ„ํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ํ‰๊ท  ๋น„ํŠธ์ˆ˜๋ฅผ ์ธก์ •ํ•œ๋‹ค.

cross entropy for discrete p and q

Explanations

Bernoulli process

Wikipedia

https://en.wikipedia.org/wiki/Bernoulli_process

A Bernoulli process is a finite or infinite sequence of independent random variables X1, X2, X3, ..., such that

Independence of the trials implies that the process is memoryless. Given that the probability p is known, past outcomes provide no information about future outcomes. (If p is unknown, however, the past informs about the future indirectly, through inferences about p.)