Home - carlosjhr64/neuronet GitHub Wiki

Neuronet wiki

Here's a quick review of the math. Please allow the terse notation as the algebra gets gnarly.

Syntax

Operator precedence is as in ruby:

  • Unary right binding operators
  • *, /
  • +, -
  • =

But I add spacing to create groups:

  • π‘Ž + 𝑏/𝑐 + 𝑑 = π‘Ž + (𝑏/𝑐) + 𝑑
  • π‘Ž+𝑏 / 𝑐+𝑑 = (π‘Ž+𝑏) / (𝑐+𝑑)

The above spacing rule reduces the amount of symbols needed to show structure and makes the algebra less cluttered.

The product, *, may be implied:

  • π‘Ž*𝑏 = π‘Ž 𝑏 = π‘Žπ‘
  • (π‘Ž+𝑏)*(𝑐+𝑑) = π‘Ž+𝑏 𝑐+𝑑
  • π‘₯Β² = π‘₯π‘₯ = π‘₯*π‘₯

Definitions are set by β‰œ and consequent equivalences by =.

I may use Einstein notation. And once indices are shown, they may be dropped:

  • βˆ‘β‚™(𝑾ₙ*𝒂ₙ) β‰œ 𝑾ⁿ𝒂ₙ = 𝑾𝒂

Be aware of these rules.

Style

Referencing Wikipedia's Mathematical operators and symbols in Unicode and Unicode subscripts and superscripts:

  • Italic small: scalar variables π‘Ž..𝑧
  • Bold italic small: single-indexed variables, vectors 𝒂..𝒛
  • Bold italic capital: multi-indexed variables, matrices 𝑨..𝒁
  • Bold script capital: operators, like 𝓓π‘₯ 𝓐..𝓩
  • Double struck small: finite ordered lists 𝕒..𝕫
  • Bold Fraktur small: derived constant parameters 𝖆..π–Ÿ

Next level unary postfix operator

Consider a value in a collection of 𝒂 in level h dependent on values in collection of 𝒂 in level i:

  • 𝒂ₕ β‰œ ⌈(𝒃ₕ + βˆ‘α΅’(𝑾ₕᡒ * 𝒂ᡒ))

The index β‚• enumerates values of 𝒂 in level h, whereas α΅’ enumerates values of 𝒂 in level i. The levels are labeled alphabetically:

  • {...,β‚•,α΅’,β±Ό,β‚–,β‚—,β‚˜,β‚™,β‚’,β‚š,...}

I'll want to express the relation between levels without specifying the level. Given the above, please allow:

  • 𝒂 = ⌈(𝒃 + 𝑾 𝒂')
  • 𝒂 = ⌈ 𝒃+𝑾(𝒂')
  • 𝒂 = ⌈ 𝒃+𝑾𝒂'

Binary competition

In The Math of Species Conflict - Numberphile the following function is referred to as "binary competition":

  • 𝓑(π‘₯) β‰œ π‘₯ * (1 - π‘₯)

This form occurs in the derivative of the squash function, and so I'll use 𝓑 in it's expression.

Squash

# Please let:
βŒ‰(π‘₯) β‰œ Math.exp(π‘₯)
# Define the squash function:
⌈(π‘₯) β‰œ 1 / (1 + Math.exp(-π‘₯))
⌈(π‘₯) = 1 / (1 + βŒ‰(-π‘₯))
⌈π‘₯ = 1 / 1+βŒ‰-π‘₯
   = βŒ‰π‘₯ / βŒ‰π‘₯+1
⌈π‘₯ = βŒ‰π‘₯ / 1+βŒ‰π‘₯
⌈(π‘₯) = βŒ‰(π‘₯) / (1 + βŒ‰(π‘₯))     # Alternate definition of squash
# Equivalence 1-⌈π‘₯ = ⌈-π‘₯
1 - ⌈(π‘₯) = 1 - (βŒ‰(π‘₯) / (1 + βŒ‰(π‘₯)))
1-⌈π‘₯ = 1 - βŒ‰π‘₯ / 1+βŒ‰π‘₯
     = βŒ‰π‘₯+1-βŒ‰π‘₯ / 1+βŒ‰π‘₯
     = 1 / 1+βŒ‰π‘₯
1-⌈π‘₯ = ⌈-π‘₯
1 - ⌈(π‘₯) = ⌈(-π‘₯)
# Equivalence ⌈-π‘₯ = 1-⌈π‘₯
⌈(-π‘₯) = 1 - ⌈(π‘₯)
⌈-π‘₯ = 1-⌈π‘₯
# Equivalence ⌈π‘₯ = 1-⌈-π‘₯
⌈(π‘₯) = 1 - ⌈(-π‘₯)
⌈π‘₯ = 1-⌈-π‘₯
# Derivative:
𝓓π‘₯(⌈(π‘₯)) = 𝓓π‘₯(1 / (1 + βŒ‰(-π‘₯)))
𝓓π‘₯⌈π‘₯ = 𝓓π‘₯(1 / 1+βŒ‰-π‘₯)
     = 1/(1+βŒ‰-π‘₯)Β² -𝓓π‘₯βŒ‰-π‘₯
     = 1/(1+βŒ‰-π‘₯)Β² βŒ‰-π‘₯
     = βŒ‰-π‘₯/(1+βŒ‰-π‘₯)Β² 
     = βŒ‰-π‘₯/(1+βŒ‰-π‘₯) 1/(1+βŒ‰-π‘₯)
     = βŒ‰-π‘₯/(1+βŒ‰-π‘₯) ⌈π‘₯
     = 1/(βŒ‰π‘₯+1) ⌈π‘₯
     = 1/(1+βŒ‰π‘₯) ⌈π‘₯
     = ⌈-π‘₯ ⌈π‘₯
𝓓π‘₯⌈π‘₯ = 1-⌈π‘₯ ⌈π‘₯
𝓓π‘₯(⌈(π‘₯)) = (1 - ⌈(π‘₯)) * ⌈(π‘₯)
         = 𝓑(⌈(π‘₯))

Unsquash

# Please let:
⌊(π‘₯) β‰œ Math.log(π‘₯)
# Recall that Log and Exp are inverses:
⌊(βŒ‰(π‘₯)) = π‘₯
βŒŠβŒ‰π‘₯ = π‘₯
# Recall that Log(1)=0
⌊(1) = 0
# Define the unsquash function:
βŒ‹(π‘₯) β‰œ Math.log(π‘₯ / (1 - π‘₯))
βŒ‹(π‘₯) = ⌊(π‘₯ / (1 - π‘₯))
βŒ‹π‘₯ = ⌊ π‘₯/(1-π‘₯)
# Show that unsquash is the inverse of squash:
βŒ‹(⌈(π‘₯)) = βŒ‹(⌈(π‘₯))
βŒ‹βŒˆπ‘₯ = βŒ‹ ⌈π‘₯
    = ⌊ ⌈π‘₯/(1-⌈π‘₯)            # by definition of unsquash, it's the log of...
    = ⌊⌈π‘₯ - ⌊ 1-⌈π‘₯
    = ⌊ βŒ‰π‘₯/(βŒ‰π‘₯+1) - ⌊ 1-⌈π‘₯   # by alternate definition of squash.
    = βŒŠβŒ‰π‘₯ - ⌊ βŒ‰π‘₯+1 - ⌊ 1-⌈π‘₯
    = π‘₯ - ⌊ βŒ‰π‘₯+1 - ⌊ 1-⌈π‘₯
    = π‘₯ - ⌊ βŒ‰π‘₯+1 - ⌊ 1-βŒ‰π‘₯/(βŒ‰π‘₯+1)
    = π‘₯ - ⌊ βŒ‰π‘₯+1 - ⌊ (βŒ‰π‘₯+1-βŒ‰π‘₯)/(βŒ‰π‘₯+1)
    = π‘₯ - ⌊ βŒ‰π‘₯+1 - ⌊ 1/(βŒ‰π‘₯+1)
    = π‘₯ - ⌊ βŒ‰π‘₯+1 - (⌊1 - ⌊ βŒ‰π‘₯+1)
    = π‘₯ - ⌊ βŒ‰π‘₯+1 - (0 - ⌊ βŒ‰π‘₯+1)
    = π‘₯ - ⌊ βŒ‰π‘₯+1 - (-⌊ βŒ‰π‘₯+1)
    = π‘₯ - ⌊ βŒ‰π‘₯+1 + ⌊ βŒ‰π‘₯+1
βŒ‹βŒˆπ‘₯ = π‘₯
βŒ‹(⌈(π‘₯)) = π‘₯

Activation and value of a neuron

# The activation of the h-th Neuron(in level h connecting to level i):
𝒂ₕ β‰œ ⌈(𝒃ₕ + βˆ‘α΅’(𝑾ₕᡒ * 𝒂ᡒ))
   = ⌈ 𝒃ₕ+𝑾ⁱ𝒂ᡒ
𝒂 = ⌈ 𝒃+𝑾𝒂'
βŒ‹π’‚ = 𝒃+𝑾𝒂'
βŒ‹π’‚β‚• = 𝒃ₕ+𝑾ⁱ𝒂ᡒ
βŒ‹(𝒂ₕ) = 𝒃ₕ + βˆ‘α΅’(𝑾ₕᡒ * 𝒂ᡒ)
# The value of the h-th Neuron is the unsquashed activation:
𝒗ₕ = βŒ‹(𝒂ₕ)
   = 𝒃ₕ + βˆ‘α΅’(𝑾ₕᡒ * 𝒂ᡒ)
𝒗 = 𝒃 + 𝑾 𝒂'

Mirroring

# The bias and weight of a neuron that roughly mirrors the value of another:
𝕧 β‰œ {-1, 0, 1}
𝕒 β‰œ ⌈(𝖇 + (π–œ * 𝕒)) = ⌈ 𝖇+π–œ*𝕒
𝕧 β‰œ βŒ‹(𝕒) = βŒ‹π•’
# Notice that:
𝕒 = ⌈(𝕧) = {⌈(-1), ⌈(0), ⌈(1)}
⌈(0) = ⌈0 = ½
# Find the bias and weight:
𝕧 = βŒ‹βŒˆ(𝖇 + (π–œ * 𝕒))
  = βŒ‹βŒˆπ–‡+π–œπ•’
  = βŒ‹βŒˆ 𝖇+π–œβŒˆπ•§
  = 𝖇+π–œβŒˆπ•§
𝕧 = 𝖇 + (π–œ * ⌈(𝕧))
# Set the value to zero:
0 = 𝖇 + π–œβŒˆ(0)
0 = 𝖇+π–œβŒˆ0
𝖇 = -π–œβŒˆ0
𝖇 = -Β½π–œ
π–œ = -2𝖇
# Set the value to one and substitute the bias:
1 = 𝖇 + π–œβŒˆ(1)
1 = 𝖇+π–œβŒˆ1
1 = -Β½π–œ+π–œβŒˆ1
1 = π–œ(⌈1 - Β½)
π–œ = 1 / (⌈1 - Β½)
𝖇 = Β½ / (Β½ - ⌈1)
# Verify this works when value is negative one:
-1 = 𝖇 + (π–œ * ⌈(-1))
-1 = 𝖇 + π–œβŒˆ-1
-1 = -Β½π–œ + π–œβŒˆ-1
-1 = -Β½π–œ + π–œ(1-⌈1)
-1 = -Β½π–œ + π–œ - π–œβŒˆ1
-1 = Β½π–œ - π–œβŒˆ1
1 = π–œβŒˆ1 - Β½π–œ
1 = π–œ(⌈1 - Β½)
π–œ = 1 / (⌈1 - Β½)
π–œ = 1 / (⌈(1) - Β½)           # OK

Propagation of errors level 1(Perceptron)

# Value is the unsquashed activation:
𝒗ₕ β‰œ βŒ‹(𝒂ₕ)
𝒗 = βŒ‹π’‚
# Error in output value from errors in bias and weights:
𝒗ₕ + 𝒆ₕ β‰œ (𝒃ₕ + πœΊβ‚•) + βˆ‘α΅’((𝑾ₕᡒ + 𝜺ᡒ) * 𝒂ᡒ)
𝒗+𝒆 = 𝒃+𝜺 + (𝑾+𝜺')𝒂'
𝒆 = 𝒃+𝜺 + (𝑾+𝜺')𝒂'- 𝒗
𝒆 = 𝒃 + 𝜺 + 𝑾𝒂' + 𝜺'𝒂' - 𝒗
𝒆 = 𝜺 + 𝜺'𝒂' + (𝒃 + 𝑾𝒂') - 𝒗
𝒆 = 𝜺 + 𝜺'𝒂' + (𝒗) - 𝒗
𝒆 = 𝜺 + 𝜺'𝒂'
𝒆ₕ = πœΊβ‚• + πœΊβ±π’‚α΅’
𝒆ₕ = πœΊβ‚• + βˆ‘α΅’(𝜺ᡒ * 𝒂ᡒ)
# Assume equipartition of errors:
βˆ€β‚“{ πœΊβ‚“ = πœ€ }
𝒆ₕ = πœΊβ‚• + βˆ‘α΅’(𝜺ᡒ * 𝒂ᡒ)
   = πœ€ + βˆ‘α΅’(πœ€ * 𝒂ᡒ)
   = πœ€ + πœ€βˆ‘π’‚α΅’
   = πœ€(1 + βˆ‘π’‚α΅’)
𝒆ₕ = πœ€ * (1 + βˆ‘α΅’(𝒂ᡒ))
# Equipartitioned error level one
# Solve for πœ€:
πœ€ = 𝒆ₕ / 1+βˆ‘π’‚α΅’
πœ€ = 𝒆ₕ / (1 + βˆ‘α΅’(𝒂ᡒ))
### Mju: ######
𝝁ₕ β‰œ 1 + βˆ‘α΅’(𝒂ᡒ)
###############
𝝁 = 1+πŸ­π’‚'
πœ€ = 𝒆ₕ / 𝝁ₕ
πœ€ = 𝒆/𝝁
𝒆 = πœ€π
# Perceptron error:
𝒆ₕ = πœ€ * 𝝁ₕ
###########
# As an estimate, set 𝒂~Β½ and the length of βˆ‘α΅’ at 𝑁:
πœ€ ~ 𝒆 / (1 + ½𝑁)
# Or very roughly:
πœ€ ~ 2𝒆/𝑁
# Activation error
𝒂ₕ + πœΉβ‚• β‰œ ⌈(𝒗ₕ + 𝒆ₕ)
𝒂+𝜹 = ⌈ 𝒗+𝒆
    ~ βŒˆπ’— + π’†π““π’—βŒˆπ’—
    ~ βŒˆπ’— + π’†π“‘βŒˆπ’—
    ~ βŒˆπ’— + 𝒆𝓑𝒂
𝒂ₕ + πœΉβ‚• ~ 𝒂ₕ + (𝒆ₕ * 𝓑(𝒂ₕ))
        ~ 𝒂ₕ + (𝒆ₕ * (1 - 𝒂ₕ) * 𝒂ₕ)
πœΉβ‚• ~ 𝒆ₕ * (1 - 𝒂ₕ) * 𝒂ₕ
   ~ 𝒆ₕ * 𝓑(𝒂ₕ)
𝜹 ~ 𝒆𝓑𝒂
  ~ 𝒆(1-𝒂)𝒂
# Recall that 𝒆=πœ€π:
𝜹 ~ πœ€π(1-𝒂)𝒂
  ~ πœ€ππ“‘π’‚
πœΉβ‚• ~ πœ€ * 𝝁ₕ * 𝓑(𝒂ₕ)
### Activation error ######
πœΉβ‚• ~ πœ€ * 𝝁ₕ * (1 - 𝒂ₕ) * 𝒂ₕ
###########################

Vanishing small errors

# Assume πœ€Β²~0
πœ€Β² ~ 0
# Consider πœ€πœΉ
πœ€ * πœΉβ‚• = πœ€ * πœ€ * 𝝁ₕ * 𝓑(𝒂ₕ)
       = πœ€Β²ππ“‘π’‚
       ~ 0 * 𝝁𝓑𝒂
πœ€πœΉ ~ 0
πœ€ * πœΉβ‚• ~ 0

Propagation of errors level 2

# The error 𝒆ₕ in a percetron was derived from:
𝒗ₕ + 𝒆ₕ β‰œ (𝒃ₕ + πœΊβ‚•) + βˆ‘α΅’((𝑾ₕᡒ + 𝜺ᡒ) * 𝒂ᡒ)
# For the next level correction in a multilayer perceptron(MLP), add 𝜹:
𝒗ₕ + 𝒆ₕ β‰œ (𝒃ₕ + πœΊβ‚•) + βˆ‘α΅’((𝑾ₕᡒ + 𝜺ᡒ) * (𝒂ᡒ + 𝜹ᡒ))
𝒗+𝒆 = 𝒃+𝜺 + (𝑾+𝜺')(𝒂'+𝜹')
    = 𝒃 + πœ€ + 𝑾𝒂' + π‘ΎπœΉ' + 𝜺'𝒂' + 𝜺'𝜹'
    ~ 𝒃 + πœ€ + 𝑾𝒂' + π‘ΎπœΉ' + 𝜺'𝒂'         # πœ€πœΉ vanishes
    ~ 𝒃 + 𝑾𝒂' + π‘ΎπœΉ' + πœ€ + 𝜺'𝒂'
    ~ 𝒗 + π‘ΎπœΉ' + πœ€ + 𝜺'𝒂'
𝒆 ~ π‘ΎπœΉ' + πœ€ + 𝜺'𝒂'
𝒆 ~ π‘ΎπœΉ' + πœ€(1+πŸ­π’‚')
𝒆 ~ π‘ΎπœΉ' + πœ€π
# MLP error(𝜹)
𝒆 ~ πœ€π + π‘ΎπœΉ'       # Same as level one with an extra +π‘ΎπœΉ'
############
# Recall 𝜹 ~ 𝒆𝓑𝒂:
𝒂+𝜹 = ⌈ 𝒗+𝒆
    ~ 𝒂 + 𝒆𝓑𝒂
𝜹 ~ 𝒆𝓑𝒂
# Substitute out 𝜹':
𝒆 ~ πœ€π + π‘ΎπœΉ'
  ~ πœ€π + 𝑾 𝒆'𝓑𝒂'
# MLP error, recursive... Strictly speaking, recursive delegation
𝒆 ~ πœ€π + 𝑾 𝓑𝒂'𝒆'   # See Neuronet::NeuronStats#nju
################
# Substitute out 𝒆':
𝒆 ~ πœ€π + 𝑾 𝓑𝒂'𝒆'
  ~ πœ€π + 𝑾 𝓑𝒂'(πœ€π' + 𝑾'𝜹")
  ~ πœ€π + 𝑾 𝓑𝒂'πœ€π' + 𝑾 𝓑𝒂'𝑾'𝜹"
  ~ πœ€π + πœ€π‘Ύ 𝓑𝒂'𝝁' + 𝑾 𝓑𝒂'𝑾'𝜹"          # reorder
  ~ πœ€(𝝁 + 𝑾 𝓑𝒂'𝝁') + 𝑾 𝓑𝒂'𝑾'𝜹"
# Introduce 𝜧 :
πœ§β‚•β±πα΅’ β‰œ βˆ‘α΅’ 𝑾ₕᡒ𝓑𝒂ᡒ𝝁ᡒ
𝜧 𝝁' = 𝑾 𝓑𝒂'𝝁'
# Substitute in 𝜧 :
𝒆 ~ πœ€(𝝁 + 𝑾 𝓑𝒂'𝝁') + 𝑾 𝓑𝒂'𝑾'𝜹"
  ~ πœ€(𝝁 + 𝜧 𝝁') + 𝜧 𝑾'𝜹"
# Equipartitioned error level two
# For level two, 𝜹"=0 (it's the input layer!)
## MLP(3 layer) error
𝒆 ~ πœ€(𝝁 + 𝜧 𝝁')
𝒆ₕ ~ πœ€ * (𝝁ₕ + πœ§β‚•β±πα΅’)
#####################
# Solve for πœ€:
πœ€ ~ 𝒆 / (𝝁 + 𝜧 𝝁')
πœ€β‚• ~ 𝒆ₕ / (𝝁ₕ + πœ§β‚•β±πα΅’)
# Notice that:
0 < 𝒂 < 1
0 < 𝓑𝒂=(1-𝒂)𝒂 < (0.25 = ΒΌ)
# So there's an upper bound for 𝒆:
𝒆 ~ πœ€(𝝁 + 𝜧 𝝁')
  ~ πœ€(𝝁 + 𝑾 𝓑𝒂'𝝁')
|𝒆| < |πœ€(𝝁 + ¼𝑾 𝝁')|
# Assume 𝒂 is somewhat random about 0.5=Β½ in a level of size large 𝑁:
𝝁 = 1+πŸ­π’‚'  β‡’  π”ͺ ~ 1+½𝑁 ~ ½𝑁
|𝒆| <~ |πœ€(π”ͺ + ΒΌπ”ͺ βˆ‘π‘Ύ)|
# Consider the case when weights are random plus or minus one.
# Let this be like a random walk of 𝑁 steps.
# Then βˆ‘π‘Ύ ~ βˆšπ‘:
|𝒆| <~ |πœ€(π”ͺ + ΒΌπ”ͺ βˆšπ‘)|
|𝒆| <~ |πœ€| π”ͺ(1 + ΒΌ βˆšπ‘)
|𝒆| <~ |πœ€| ΒΌ π”ͺ βˆšπ‘            # 𝑁 is large
|𝒆| <~ |πœ€| ΒΌ ½𝑁 βˆšπ‘ 
|𝒆| <~ |πœ€| 𝑁 βˆšπ‘/8
# If you don't believe the random walk and are pessimistic,
# you might prefer using 𝑁²:
|𝒆| <~ |πœ€| 𝑁²/8

Explicit propagation of errors level 2

𝒗ₕ β‰œ 𝒃ₕ + βˆ‘α΅’(𝑾ₕᡒ * 𝒂ᡒ)
# Ouput Layerβ‚•, Middle Layerα΅’, Input Layerβ±Ό
𝒗ₕ + 𝒆ₕ β‰œ (𝒃ₕ + πœΊβ‚•) + βˆ‘α΅’((𝑾ₕᡒ + 𝜺ᡒ) * (𝒂ᡒ + 𝜹ᡒ))
𝒗ᡒ + 𝒆ᡒ β‰œ (𝒃ᡒ + 𝜺ᡒ) + βˆ‘β±Ό((𝑾ᡒⱼ + 𝜺ⱼ) * (𝒂ⱼ + 𝜹ⱼ))
𝒂ᡒ + 𝜹ᡒ β‰œ ⌈(𝒗ᡒ + 𝒆ᡒ)
        = ⌈((𝒃ᡒ + 𝜺ᡒ) + βˆ‘β±Ό((𝑾ᡒⱼ + 𝜺ⱼ) * (𝒂ⱼ + 𝜹ⱼ)))
        = ⌈(𝒃ᡒ + 𝜺ᡒ + βˆ‘β±Ό(𝑾ᡒⱼ*𝒂ⱼ + 𝜺ⱼ*𝒂ⱼ + 𝑾ᡒⱼ*𝜹ⱼ + 𝜺ⱼ*𝜹ⱼ))
        = ⌈(𝒃ᡒ + 𝜺ᡒ + 𝑾ᡒʲ𝒂ⱼ + πœΊΚ²π’‚β±Ό + π‘Ύα΅’Κ²πœΉβ±Ό + 𝜺ʲ𝜹ⱼ)
        = ⌈(𝒃ᡒ + 𝜺ᡒ + 𝑾ᡒʲ𝒂ⱼ + πœΊΚ²π’‚β±Ό + π‘Ύα΅’Κ²πœΉβ±Ό)      # 𝜺𝜹  vanishes
        = ⌈(𝒃ᡒ + 𝑾ᡒʲ𝒂ⱼ + 𝜺ᡒ + πœΊΚ²π’‚β±Ό + π‘Ύα΅’Κ²πœΉβ±Ό)
        = ⌈(𝒃ᡒ + 𝑾ᡒʲ𝒂ⱼ + πœ€ + πœ€βˆ‘π’‚β±Ό + π‘Ύα΅’Κ²πœΉβ±Ό)       # All 𝜺 are the same πœ€
        = ⌈(𝒃ᡒ + 𝑾ᡒʲ𝒂ⱼ + πœ€(1 + βˆ‘π’‚β±Ό) + π‘Ύα΅’Κ²πœΉβ±Ό)
        = ⌈(𝒃ᡒ + 𝑾ᡒʲ𝒂ⱼ + πœ€πα΅’ + π‘Ύα΅’Κ²πœΉβ±Ό)            # 𝝁ᡒ=1+βˆ‘π’‚β±Ό as 𝝁=1+πŸ­π’‚'
        ~ 𝒂ᡒ + (πœ€πα΅’ + π‘Ύα΅’Κ²πœΉβ±Ό) 𝓑𝒂ᡒ                 # ⌈(𝒗+𝒆) ~ 𝒂 + 𝒆𝓑𝒂
        ~ 𝒂ᡒ + (πœ€πα΅’ + π‘Ύα΅’Κ²πœΉβ±Ό)(1-𝒂ᡒ)𝒂ᡒ
𝒂ᡒ + 𝜹ᡒ ~ 𝒂ᡒ + (πœ€πα΅’ + βˆ‘β±Ό(𝑾ᡒⱼ * 𝜹ⱼ)) * (1 - 𝒂ᡒ) * 𝒂ᡒ
# Solve for 𝜹ᡒ:
𝜹ᡒ ~ (πœ€πα΅’ + βˆ‘β±Ό(𝑾ᡒⱼ * 𝜹ⱼ)) * (1 - 𝒂ᡒ) * 𝒂ᡒ
𝜹ᡒ ~ (πœ€πα΅’+π‘Ύα΅’Κ²πœΉβ±Ό)(1-𝒂ᡒ)𝒂ᡒ
𝜹ᡒ ~ πœ€πα΅’(1-𝒂ᡒ)𝒂ᡒ + π‘Ύα΅’Κ²πœΉβ±Ό(1-𝒂ᡒ)𝒂ᡒ
# Consider the case where the j-th level is error free input:
𝜹ᡒ ~ πœ€πα΅’(1-𝒂ᡒ)𝒂ᡒ   # 𝜹ⱼ is zero
𝒗ₕ + 𝒆ₕ β‰œ (𝒃ₕ + πœΊβ‚•) + βˆ‘α΅’((𝑾ₕᡒ + 𝜺ᡒ) * (𝒂ᡒ + 𝜹ᡒ)) # now substitute out 𝜹ᡒ...
        ~ (𝒃ₕ + πœΊβ‚•) + βˆ‘α΅’((𝑾ₕᡒ + 𝜺ᡒ) * (𝒂ᡒ + πœ€πα΅’(1-𝒂ᡒ)𝒂ᡒ))
        ~ 𝒃ₕ + πœΊβ‚• + 𝑾ₕⁱ(𝒂ᡒ + πœ€πα΅’(1-𝒂ᡒ)𝒂ᡒ) + 𝜺ⁱ(𝒂ᡒ + πœ€πα΅’(1-𝒂ᡒ)𝒂ᡒ)
        ~ 𝒃ₕ + πœΊβ‚• + 𝑾ₕⁱ𝒂ᡒ + πœ€π‘Ύβ‚•β±πα΅’(1-𝒂ᡒ)𝒂ᡒ + πœΊβ±π’‚α΅’ + πœΊβ±πœ€πα΅’(1-𝒂ᡒ)𝒂ᡒ
        ~ 𝒃ₕ + πœΊβ‚• + 𝑾ₕⁱ𝒂ᡒ + πœ€π‘Ύβ‚•β±πα΅’(1-𝒂ᡒ)𝒂ᡒ + πœΊβ±π’‚α΅’          # πœΊβ±πœ€ vanishes
        ~ 𝒃ₕ + 𝑾ₕⁱ𝒂ᡒ + πœ€π‘Ύβ‚•β±πα΅’(1-𝒂ᡒ)𝒂ᡒ + πœΊβ‚• + πœΊβ±π’‚α΅’          # reordered terms
        ~ 𝒗ₕ + πœ€π‘Ύβ‚•β±πα΅’(1-𝒂ᡒ)𝒂ᡒ + πœΊβ‚• + πœΊβ±π’‚α΅’
        ~ 𝒗ₕ + πœ€π‘Ύβ‚•β±πα΅’(1-𝒂ᡒ)𝒂ᡒ + πœ€(1+βˆ‘π’‚α΅’)
        ~ 𝒗ₕ + πœ€(1+βˆ‘π’‚α΅’) + πœ€π‘Ύβ‚•β±πα΅’(1-𝒂ᡒ)𝒂ᡒ         # reordered
        ~ 𝒗ₕ + πœ€πβ‚• + πœ€πœ§β‚•β±πα΅’                      # 𝜧 = 𝑾𝓑𝒂'
𝒗ₕ + 𝒆ₕ ~ 𝒗ₕ + πœ€(𝝁ₕ + πœ§β‚•β±πα΅’)
𝒆ₕ ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’)
πœ€ ~ 𝒆ₕ / (𝝁ₕ + πœ§β‚•β±πα΅’)
πœ€ ~ 𝒆 / (𝝁 + 𝜧 𝝁')           # OK!

Explicit propagation of errors level 3

# Given:
𝒂ₕ β‰œ ⌈(𝒗ₕ)                                         # 𝒂 β‰œ βŒˆπ’—
𝒂ₕ + πœΉβ‚• β‰œ ⌈(𝒗ₕ + 𝒆ₕ)                               # 𝒂+𝜹 β‰œ ⌈ 𝒗+𝒆
𝒗ₕ β‰œ 𝒃ₕ + βˆ‘α΅’(𝑾ₕᡒ * 𝒂ᡒ)                            # 𝒗 β‰œ 𝒃+𝑾𝒂'
𝒗ₕ + 𝒆ₕ β‰œ (𝒃ₕ + πœΊβ‚•) + βˆ‘α΅’((𝑾ₕᡒ + 𝜺ᡒ) * (𝒂ᡒ + 𝜹ᡒ))  # 𝒗+𝒆 β‰œ 𝒃+𝜺+(𝑾+𝜺')(𝒂'+𝜹')
𝝁ₕ β‰œ 1 + βˆ‘α΅’(𝒂ᡒ)                                    # 𝝁 β‰œ 1+πŸ­π’‚'
πœ§β‚•β±πα΅’ β‰œ βˆ‘α΅’(𝑾ₕᡒ * (1 - 𝒂ᡒ) * 𝒂ᡒ * 𝝁ᡒ)             # 𝜧𝝁' β‰œ 𝑾𝓑𝒂'𝝁'
# Assume:
βˆ€β‚“{ πœΊβ‚“ = πœ€ }
πœ€Β² ~ 0
πœ€πœΉ ~ 0
# Recall:
𝓓π‘₯(⌈(π‘₯)) = ⌈(π‘₯) * (1 - ⌈(π‘₯))
         = 𝓑(⌈(π‘₯))
⌈(π‘₯ + πœ€) ~ ⌈(π‘₯) + πœ€ * 𝓓π‘₯(⌈(π‘₯))
         ~ ⌈(π‘₯) + πœ€ * ⌈(π‘₯) * (1 - ⌈(π‘₯))
         ~ ⌈(π‘₯) + πœ€ * 𝓑(⌈(π‘₯))
# Note that one may transpose indices for each level:
β‚•β¬Œα΅’β¬Œβ±Όβ¬Œβ‚–
# Solve for level 3 πœ€.
## 𝜹ᡒ:
𝒂ᡒ + 𝜹ᡒ β‰œ ⌈(𝒗ᡒ + 𝒆ᡒ)
        ~ βŒˆπ’—α΅’ + 𝒆ᡒ * π“‘βŒˆπ’—α΅’
        ~ 𝒂ᡒ + 𝒆ᡒ * π“‘βŒˆπ’—α΅’
𝜹ᡒ ~ 𝒆ᡒ * π“‘βŒˆπ’—α΅’
   ~ 𝒆ᡒ * 𝓑𝒂ᡒ
𝜹ᡒ ~ 𝒆ᡒ * (1-𝒂ᡒ) * 𝒂ᡒ
## Expand first level and solve for 𝒆ₕ:
𝒗ₕ + 𝒆ₕ β‰œ (𝒃ₕ + πœΊβ‚•) + βˆ‘α΅’((𝑾ₕᡒ + 𝜺ᡒ) * (𝒂ᡒ + 𝜹ᡒ))
        = 𝒃ₕ+πœ€ + (𝑾ₕⁱ+𝜺ⁱ)(𝒂ᡒ+𝜹ᡒ)
        = 𝒃ₕ+πœ€ + 𝑾ₕⁱ𝒂ᡒ + πœΊβ±π’‚α΅’ + π‘Ύβ‚•β±πœΉα΅’ + 𝜺ⁱ𝜹ᡒ
        ~ 𝒃ₕ+πœ€ + 𝑾ₕⁱ𝒂ᡒ + πœΊβ±π’‚α΅’ + π‘Ύβ‚•β±πœΉα΅’            # 𝜺𝜹 vanishes
        ~ 𝒃ₕ+𝑾ₕⁱ𝒂ᡒ + πœ€+πœΊβ±π’‚α΅’ + π‘Ύβ‚•β±πœΉα΅’
        ~ 𝒗ₕ + πœ€+πœΊβ±π’‚α΅’ + π‘Ύβ‚•β±πœΉα΅’
𝒆ₕ ~ πœ€+πœΊβ±π’‚α΅’ + π‘Ύβ‚•β±πœΉα΅’
   ~ πœ€(1+βˆ‘π’‚α΅’) + π‘Ύβ‚•β±πœΉα΅’
   ~ πœ€πβ‚• + π‘Ύβ‚•β±πœΉα΅’
## Substitute out 𝜹ᡒ:
𝒆ₕ ~ πœ€πβ‚• + π‘Ύβ‚•β±πœΉα΅’
   ~ πœ€πβ‚• + 𝑾ₕⁱ𝒆ᡒ𝓑𝒂ᡒ
   ~ πœ€πβ‚• + 𝑾ₕⁱ𝓑𝒂ᡒ𝒆ᡒ
## Substitute out 𝒆ᡒ:
𝒆ₕ ~ πœ€πβ‚• + 𝑾ₕⁱ𝓑𝒂ᡒ𝒆ᡒ
   ~ πœ€πβ‚• + 𝑾ₕⁱ𝓑𝒂ᡒ(πœ€πα΅’ + π‘Ύα΅’Κ²πœΉβ±Ό)                   # 𝒆 ~ πœ€π+π‘ΎπœΉ'
   ~ πœ€πβ‚• + π‘Ύβ‚•β±π“‘π’‚α΅’πœ€πα΅’ + π‘Ύβ‚•β±π“‘π’‚α΅’π‘Ύα΅’Κ²πœΉβ±Ό
   ~ πœ€πβ‚• + πœ€π‘Ύβ‚•β±π“‘π’‚α΅’πα΅’ + π‘Ύβ‚•β±π“‘π’‚α΅’π‘Ύα΅’Κ²πœΉβ±Ό               # factor out constant πœ€
   ~ πœ€πβ‚• + πœ€πœ§β‚•β±πα΅’ + πœ§β‚•β±π‘Ύα΅’Κ²πœΉβ±Ό                     # 𝜧 = 𝑾𝓑𝒂'
# Level 2 plus an additional term due to 𝜹ⱼ:
𝒆ₕ ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’) + πœ§β‚•β±π‘Ύα΅’Κ²πœΉβ±Ό
# Recall that in level 2, 𝜹ⱼ was zero, but level three continues...
𝒆ₕ ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’) + πœ§β‚•β±π‘Ύα΅’Κ²πœΉβ±Ό
   ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’) + πœ§β‚•β±π‘Ύα΅’Κ²π“‘π’‚β±Όπ’†β±Ό                 # 𝜹 ~ 𝓑𝒂𝒆
   ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’) + πœ§β‚•β±πœ§α΅’Κ²π’†β±Ό
   ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’) + πœ§β‚•β±πœ§α΅’Κ²(πœ€πβ±Ό+π‘Ύβ±Όα΅πœΉβ‚–)           # 𝒆 ~ πœ€π+π‘ΎπœΉ'
   ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’) + πœ€πœ§β‚•β±πœ§α΅’Κ²πβ±Ό + πœ§β‚•β±πœ§α΅’Κ²π‘Ύβ±Όα΅πœΉβ‚–
   ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’ + πœ§β‚•β±πœ§α΅’Κ²πβ±Ό) + πœ§β‚•β±πœ§α΅’Κ²π‘Ύβ±Όα΅πœΉβ‚–
# For level three, πœΉβ‚– is zero:
𝒆ₕ ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’ + πœ§β‚•β±πœ§α΅’Κ²πβ±Ό)                    # 𝒆 ~ πœ€(𝝁+𝜧𝝁'+𝜧𝜧'𝝁")

General propagation of errors

# The above establishes a clear pattern:
𝒆ₕ ~ πœ€(𝝁ₕ + πœ§β‚•β±πα΅’ + πœ§β‚•β±πœ§α΅’Κ²πβ±Ό + πœ§β‚•β±πœ§α΅’Κ²πœ§β±Όα΅πβ‚– + ...)
𝒆 ~ πœ€(𝝁 + 𝜧 𝝁' + 𝜧 𝜧'𝝁" + 𝜧 𝜧'𝜧"𝝁"' + ...)
# Error bound estimate:
0 < 𝒂 < 1
0 < 𝓑𝒂=(1-𝒂)𝒂 < 0.25 = ΒΌ
|𝓑𝒂| ~ ΒΌ
|𝒂| ~ Β½
|𝝁| ~ 1+βˆ‘|𝒂'|
    ~ 1+βˆ‘Β½
    ~ 1+½𝑁 β‰œ π”ͺ
|βˆ‘π‘Ύ| ~ βˆšπ‘ # random walk
|𝜧| ~ |𝑾||𝓑𝒂|
    ~ ΒΌβˆšπ‘
|𝒆| ~ |πœ€|(|𝝁 + 𝜧 𝝁' + 𝜧 𝜧'𝝁" + 𝜧 𝜧'𝜧"𝝁"' + ...|)
    ~ |πœ€|(π”ͺ + ΒΌβˆšπ‘*π”ͺ' + ΒΌβˆšπ‘*ΒΌβˆšπ‘'*π”ͺ" + ...)
# See Neuronet::NetworkStats#expected_nju:
|𝝂| β‰œ |𝒆|/|πœ€|
|𝝂| = π”ͺ + ΒΌβˆšπ‘*π”ͺ' + ΒΌβˆšπ‘*ΒΌβˆšπ‘'*π”ͺ" + ...
# Consider very large 𝑁 on each level in an 𝑛+2 layer network:
|𝒆| ~ |πœ€|½𝑁(ΒΌβˆšπ‘)ⁿ
# For a 3 layer network(input, middle, and output layers), 𝑛=1:
|𝒆| ~ |πœ€|π”ͺ(1 + |𝜧|)
    ~ |πœ€|π‘βˆšπ‘ / 8 # 𝑁>>1, large 𝑁
|πœ€| ~ 8|𝒆| / π‘βˆšπ‘ # 𝑁>>1

Legacy

# In trying to find the recursion pattern,
# I came across several interesting expressions.
# I define them all here, including the ones actually used above:
𝓑𝒂 β‰œ 𝒂(1-𝒂)
𝒂 β‰œ βŒˆπ’—
𝒗 β‰œ 𝒃 + 𝑾 𝒂'
𝒂 = ⌈ 𝒃+𝑾𝒂'
𝒂+𝜹 β‰œ ⌈(𝒗+𝒆)
𝒗 = βŒ‹π’‚
𝒗+𝒆 β‰œ 𝒃+𝜺 + (𝑾+𝜺)(𝒂'+𝜹')
𝝁 β‰œ 1+πŸ­π’‚'
𝜧 𝝁' β‰œ 𝑾 𝓑𝒂'𝝁'
# Legacy:
𝝀 β‰œ 𝓑𝒂 𝝁
𝜿 β‰œ 𝜧 𝝁' = 𝑾 𝓑𝒂'𝝁' = 𝑾 𝝀'
𝜾 β‰œ 𝜧 𝜿' = 𝜧 𝜧'𝝁"