Home - carlosjhr64/neuronet GitHub Wiki
Neuronet wiki
Here's a quick review of the math. Please allow the terse notation as the algebra gets gnarly.
Syntax
Operator precedence is as in ruby:
- Unary right binding operators
*,/+,-=
But I add spacing to create groups:
π + π/π + π = π + (π/π) + ππ+π / π+π = (π+π) / (π+π)
The above spacing rule reduces the amount of symbols needed to show structure and makes the algebra less cluttered.
The product, *, may be implied:
π*π = π π = ππ(π+π)*(π+π) = π+π π+ππ₯Β² = π₯π₯ = π₯*π₯
Definitions are set by β and consequent equivalences by =.
I may use Einstein notation. And once indices are shown, they may be dropped:
ββ(πΎβ*πβ) β πΎβΏπβ = πΎπ
Be aware of these rules.
Style
Referencing Wikipedia's Mathematical operators and symbols in Unicode and Unicode subscripts and superscripts:
- Italic small: scalar variables
π..π§ - Bold italic small: single-indexed variables, vectors
π..π - Bold italic capital: multi-indexed variables, matrices
π¨..π - Bold script capital: operators, like
ππ₯π..π© - Double struck small: finite ordered lists
π..π« - Bold Fraktur small: derived constant parameters
π..π
Next level unary postfix operator
Consider a value in a collection of π in level h dependent on values in
collection of π in level i:
πβ β β(πβ + βα΅’(πΎβα΅’ * πα΅’))
The index β enumerates values of π in level h, whereas α΅’ enumerates
values of π in level i. The levels are labeled alphabetically:
{...,β,α΅’,β±Ό,β,β,β,β,β,β,...}
I'll want to express the relation between levels without specifying the level. Given the above, please allow:
π = β(π + πΎ π')π = β π+πΎ(π')π = β π+πΎπ'
Binary competition
In The Math of Species Conflict - Numberphile the following function is referred to as "binary competition":
π(π₯) β π₯ * (1 - π₯)
This form occurs in the derivative of the squash function, and so I'll use π
in it's expression.
Squash
# Please let:
β(π₯) β Math.exp(π₯)
# Define the squash function:
β(π₯) β 1 / (1 + Math.exp(-π₯))
β(π₯) = 1 / (1 + β(-π₯))
βπ₯ = 1 / 1+β-π₯
= βπ₯ / βπ₯+1
βπ₯ = βπ₯ / 1+βπ₯
β(π₯) = β(π₯) / (1 + β(π₯)) # Alternate definition of squash
# Equivalence 1-βπ₯ = β-π₯
1 - β(π₯) = 1 - (β(π₯) / (1 + β(π₯)))
1-βπ₯ = 1 - βπ₯ / 1+βπ₯
= βπ₯+1-βπ₯ / 1+βπ₯
= 1 / 1+βπ₯
1-βπ₯ = β-π₯
1 - β(π₯) = β(-π₯)
# Equivalence β-π₯ = 1-βπ₯
β(-π₯) = 1 - β(π₯)
β-π₯ = 1-βπ₯
# Equivalence βπ₯ = 1-β-π₯
β(π₯) = 1 - β(-π₯)
βπ₯ = 1-β-π₯
# Derivative:
ππ₯(β(π₯)) = ππ₯(1 / (1 + β(-π₯)))
ππ₯βπ₯ = ππ₯(1 / 1+β-π₯)
= 1/(1+β-π₯)Β² -ππ₯β-π₯
= 1/(1+β-π₯)Β² β-π₯
= β-π₯/(1+β-π₯)Β²
= β-π₯/(1+β-π₯) 1/(1+β-π₯)
= β-π₯/(1+β-π₯) βπ₯
= 1/(βπ₯+1) βπ₯
= 1/(1+βπ₯) βπ₯
= β-π₯ βπ₯
ππ₯βπ₯ = 1-βπ₯ βπ₯
ππ₯(β(π₯)) = (1 - β(π₯)) * β(π₯)
= π(β(π₯))
Unsquash
# Please let:
β(π₯) β Math.log(π₯)
# Recall that Log and Exp are inverses:
β(β(π₯)) = π₯
ββπ₯ = π₯
# Recall that Log(1)=0
β(1) = 0
# Define the unsquash function:
β(π₯) β Math.log(π₯ / (1 - π₯))
β(π₯) = β(π₯ / (1 - π₯))
βπ₯ = β π₯/(1-π₯)
# Show that unsquash is the inverse of squash:
β(β(π₯)) = β(β(π₯))
ββπ₯ = β βπ₯
= β βπ₯/(1-βπ₯) # by definition of unsquash, it's the log of...
= ββπ₯ - β 1-βπ₯
= β βπ₯/(βπ₯+1) - β 1-βπ₯ # by alternate definition of squash.
= ββπ₯ - β βπ₯+1 - β 1-βπ₯
= π₯ - β βπ₯+1 - β 1-βπ₯
= π₯ - β βπ₯+1 - β 1-βπ₯/(βπ₯+1)
= π₯ - β βπ₯+1 - β (βπ₯+1-βπ₯)/(βπ₯+1)
= π₯ - β βπ₯+1 - β 1/(βπ₯+1)
= π₯ - β βπ₯+1 - (β1 - β βπ₯+1)
= π₯ - β βπ₯+1 - (0 - β βπ₯+1)
= π₯ - β βπ₯+1 - (-β βπ₯+1)
= π₯ - β βπ₯+1 + β βπ₯+1
ββπ₯ = π₯
β(β(π₯)) = π₯
Activation and value of a neuron
# The activation of the h-th Neuron(in level h connecting to level i):
πβ β β(πβ + βα΅’(πΎβα΅’ * πα΅’))
= β πβ+πΎβ±πα΅’
π = β π+πΎπ'
βπ = π+πΎπ'
βπβ = πβ+πΎβ±πα΅’
β(πβ) = πβ + βα΅’(πΎβα΅’ * πα΅’)
# The value of the h-th Neuron is the unsquashed activation:
πβ = β(πβ)
= πβ + βα΅’(πΎβα΅’ * πα΅’)
π = π + πΎ π'
Mirroring
# The bias and weight of a neuron that roughly mirrors the value of another:
π§ β {-1, 0, 1}
π β β(π + (π * π)) = β π+π*π
π§ β β(π) = βπ
# Notice that:
π = β(π§) = {β(-1), β(0), β(1)}
β(0) = β0 = Β½
# Find the bias and weight:
π§ = ββ(π + (π * π))
= ββπ+ππ
= ββ π+πβπ§
= π+πβπ§
π§ = π + (π * β(π§))
# Set the value to zero:
0 = π + πβ(0)
0 = π+πβ0
π = -πβ0
π = -Β½π
π = -2π
# Set the value to one and substitute the bias:
1 = π + πβ(1)
1 = π+πβ1
1 = -Β½π+πβ1
1 = π(β1 - Β½)
π = 1 / (β1 - Β½)
π = Β½ / (Β½ - β1)
# Verify this works when value is negative one:
-1 = π + (π * β(-1))
-1 = π + πβ-1
-1 = -Β½π + πβ-1
-1 = -Β½π + π(1-β1)
-1 = -Β½π + π - πβ1
-1 = Β½π - πβ1
1 = πβ1 - Β½π
1 = π(β1 - Β½)
π = 1 / (β1 - Β½)
π = 1 / (β(1) - Β½) # OK
Propagation of errors level 1(Perceptron)
# Value is the unsquashed activation:
πβ β β(πβ)
π = βπ
# Error in output value from errors in bias and weights:
πβ + πβ β (πβ + πΊβ) + βα΅’((πΎβα΅’ + πΊα΅’) * πα΅’)
π+π = π+πΊ + (πΎ+πΊ')π'
π = π+πΊ + (πΎ+πΊ')π'- π
π = π + πΊ + πΎπ' + πΊ'π' - π
π = πΊ + πΊ'π' + (π + πΎπ') - π
π = πΊ + πΊ'π' + (π) - π
π = πΊ + πΊ'π'
πβ = πΊβ + πΊβ±πα΅’
πβ = πΊβ + βα΅’(πΊα΅’ * πα΅’)
# Assume equipartition of errors:
ββ{ πΊβ = π }
πβ = πΊβ + βα΅’(πΊα΅’ * πα΅’)
= π + βα΅’(π * πα΅’)
= π + πβπα΅’
= π(1 + βπα΅’)
πβ = π * (1 + βα΅’(πα΅’))
# Equipartitioned error level one
# Solve for π:
π = πβ / 1+βπα΅’
π = πβ / (1 + βα΅’(πα΅’))
### Mju: ######
πβ β 1 + βα΅’(πα΅’)
###############
π = 1+ππ'
π = πβ / πβ
π = π/π
π = ππ
# Perceptron error:
πβ = π * πβ
###########
# As an estimate, set π~Β½ and the length of βα΅’ at π:
π ~ π / (1 + Β½π)
# Or very roughly:
π ~ 2π/π
# Activation error
πβ + πΉβ β β(πβ + πβ)
π+πΉ = β π+π
~ βπ + πππβπ
~ βπ + ππβπ
~ βπ + πππ
πβ + πΉβ ~ πβ + (πβ * π(πβ))
~ πβ + (πβ * (1 - πβ) * πβ)
πΉβ ~ πβ * (1 - πβ) * πβ
~ πβ * π(πβ)
πΉ ~ πππ
~ π(1-π)π
# Recall that π=ππ:
πΉ ~ ππ(1-π)π
~ ππππ
πΉβ ~ π * πβ * π(πβ)
### Activation error ######
πΉβ ~ π * πβ * (1 - πβ) * πβ
###########################
Vanishing small errors
# Assume πΒ²~0
πΒ² ~ 0
# Consider ππΉ
π * πΉβ = π * π * πβ * π(πβ)
= πΒ²πππ
~ 0 * πππ
ππΉ ~ 0
π * πΉβ ~ 0
Propagation of errors level 2
# The error πβ in a percetron was derived from:
πβ + πβ β (πβ + πΊβ) + βα΅’((πΎβα΅’ + πΊα΅’) * πα΅’)
# For the next level correction in a multilayer perceptron(MLP), add πΉ:
πβ + πβ β (πβ + πΊβ) + βα΅’((πΎβα΅’ + πΊα΅’) * (πα΅’ + πΉα΅’))
π+π = π+πΊ + (πΎ+πΊ')(π'+πΉ')
= π + π + πΎπ' + πΎπΉ' + πΊ'π' + πΊ'πΉ'
~ π + π + πΎπ' + πΎπΉ' + πΊ'π' # ππΉ vanishes
~ π + πΎπ' + πΎπΉ' + π + πΊ'π'
~ π + πΎπΉ' + π + πΊ'π'
π ~ πΎπΉ' + π + πΊ'π'
π ~ πΎπΉ' + π(1+ππ')
π ~ πΎπΉ' + ππ
# MLP error(πΉ)
π ~ ππ + πΎπΉ' # Same as level one with an extra +πΎπΉ'
############
# Recall πΉ ~ πππ:
π+πΉ = β π+π
~ π + πππ
πΉ ~ πππ
# Substitute out πΉ':
π ~ ππ + πΎπΉ'
~ ππ + πΎ π'ππ'
# MLP error, recursive... Strictly speaking, recursive delegation
π ~ ππ + πΎ ππ'π' # See Neuronet::NeuronStats#nju
################
# Substitute out π':
π ~ ππ + πΎ ππ'π'
~ ππ + πΎ ππ'(ππ' + πΎ'πΉ")
~ ππ + πΎ ππ'ππ' + πΎ ππ'πΎ'πΉ"
~ ππ + ππΎ ππ'π' + πΎ ππ'πΎ'πΉ" # reorder
~ π(π + πΎ ππ'π') + πΎ ππ'πΎ'πΉ"
# Introduce π§ :
π§ββ±πα΅’ β βα΅’ πΎβα΅’ππα΅’πα΅’
π§ π' = πΎ ππ'π'
# Substitute in π§ :
π ~ π(π + πΎ ππ'π') + πΎ ππ'πΎ'πΉ"
~ π(π + π§ π') + π§ πΎ'πΉ"
# Equipartitioned error level two
# For level two, πΉ"=0 (it's the input layer!)
## MLP(3 layer) error
π ~ π(π + π§ π')
πβ ~ π * (πβ + π§ββ±πα΅’)
#####################
# Solve for π:
π ~ π / (π + π§ π')
πβ ~ πβ / (πβ + π§ββ±πα΅’)
# Notice that:
0 < π < 1
0 < ππ=(1-π)π < (0.25 = ΒΌ)
# So there's an upper bound for π:
π ~ π(π + π§ π')
~ π(π + πΎ ππ'π')
|π| < |π(π + ΒΌπΎ π')|
# Assume π is somewhat random about 0.5=Β½ in a level of size large π:
π = 1+ππ' β πͺ ~ 1+Β½π ~ Β½π
|π| <~ |π(πͺ + ΒΌπͺ βπΎ)|
# Consider the case when weights are random plus or minus one.
# Let this be like a random walk of π steps.
# Then βπΎ ~ βπ:
|π| <~ |π(πͺ + ΒΌπͺ βπ)|
|π| <~ |π| πͺ(1 + ΒΌ βπ)
|π| <~ |π| ΒΌ πͺ βπ # π is large
|π| <~ |π| ΒΌ Β½π βπ
|π| <~ |π| π βπ/8
# If you don't believe the random walk and are pessimistic,
# you might prefer using πΒ²:
|π| <~ |π| πΒ²/8
Explicit propagation of errors level 2
πβ β πβ + βα΅’(πΎβα΅’ * πα΅’)
# Ouput Layerβ, Middle Layerα΅’, Input Layerβ±Ό
πβ + πβ β (πβ + πΊβ) + βα΅’((πΎβα΅’ + πΊα΅’) * (πα΅’ + πΉα΅’))
πα΅’ + πα΅’ β (πα΅’ + πΊα΅’) + ββ±Ό((πΎα΅’β±Ό + πΊβ±Ό) * (πβ±Ό + πΉβ±Ό))
πα΅’ + πΉα΅’ β β(πα΅’ + πα΅’)
= β((πα΅’ + πΊα΅’) + ββ±Ό((πΎα΅’β±Ό + πΊβ±Ό) * (πβ±Ό + πΉβ±Ό)))
= β(πα΅’ + πΊα΅’ + ββ±Ό(πΎα΅’β±Ό*πβ±Ό + πΊβ±Ό*πβ±Ό + πΎα΅’β±Ό*πΉβ±Ό + πΊβ±Ό*πΉβ±Ό))
= β(πα΅’ + πΊα΅’ + πΎα΅’Κ²πβ±Ό + πΊΚ²πβ±Ό + πΎα΅’Κ²πΉβ±Ό + πΊΚ²πΉβ±Ό)
= β(πα΅’ + πΊα΅’ + πΎα΅’Κ²πβ±Ό + πΊΚ²πβ±Ό + πΎα΅’Κ²πΉβ±Ό) # πΊπΉ vanishes
= β(πα΅’ + πΎα΅’Κ²πβ±Ό + πΊα΅’ + πΊΚ²πβ±Ό + πΎα΅’Κ²πΉβ±Ό)
= β(πα΅’ + πΎα΅’Κ²πβ±Ό + π + πβπβ±Ό + πΎα΅’Κ²πΉβ±Ό) # All πΊ are the same π
= β(πα΅’ + πΎα΅’Κ²πβ±Ό + π(1 + βπβ±Ό) + πΎα΅’Κ²πΉβ±Ό)
= β(πα΅’ + πΎα΅’Κ²πβ±Ό + ππα΅’ + πΎα΅’Κ²πΉβ±Ό) # πα΅’=1+βπβ±Ό as π=1+ππ'
~ πα΅’ + (ππα΅’ + πΎα΅’Κ²πΉβ±Ό) ππα΅’ # β(π+π) ~ π + πππ
~ πα΅’ + (ππα΅’ + πΎα΅’Κ²πΉβ±Ό)(1-πα΅’)πα΅’
πα΅’ + πΉα΅’ ~ πα΅’ + (ππα΅’ + ββ±Ό(πΎα΅’β±Ό * πΉβ±Ό)) * (1 - πα΅’) * πα΅’
# Solve for πΉα΅’:
πΉα΅’ ~ (ππα΅’ + ββ±Ό(πΎα΅’β±Ό * πΉβ±Ό)) * (1 - πα΅’) * πα΅’
πΉα΅’ ~ (ππα΅’+πΎα΅’Κ²πΉβ±Ό)(1-πα΅’)πα΅’
πΉα΅’ ~ ππα΅’(1-πα΅’)πα΅’ + πΎα΅’Κ²πΉβ±Ό(1-πα΅’)πα΅’
# Consider the case where the j-th level is error free input:
πΉα΅’ ~ ππα΅’(1-πα΅’)πα΅’ # πΉβ±Ό is zero
πβ + πβ β (πβ + πΊβ) + βα΅’((πΎβα΅’ + πΊα΅’) * (πα΅’ + πΉα΅’)) # now substitute out πΉα΅’...
~ (πβ + πΊβ) + βα΅’((πΎβα΅’ + πΊα΅’) * (πα΅’ + ππα΅’(1-πα΅’)πα΅’))
~ πβ + πΊβ + πΎββ±(πα΅’ + ππα΅’(1-πα΅’)πα΅’) + πΊβ±(πα΅’ + ππα΅’(1-πα΅’)πα΅’)
~ πβ + πΊβ + πΎββ±πα΅’ + ππΎββ±πα΅’(1-πα΅’)πα΅’ + πΊβ±πα΅’ + πΊβ±ππα΅’(1-πα΅’)πα΅’
~ πβ + πΊβ + πΎββ±πα΅’ + ππΎββ±πα΅’(1-πα΅’)πα΅’ + πΊβ±πα΅’ # πΊβ±π vanishes
~ πβ + πΎββ±πα΅’ + ππΎββ±πα΅’(1-πα΅’)πα΅’ + πΊβ + πΊβ±πα΅’ # reordered terms
~ πβ + ππΎββ±πα΅’(1-πα΅’)πα΅’ + πΊβ + πΊβ±πα΅’
~ πβ + ππΎββ±πα΅’(1-πα΅’)πα΅’ + π(1+βπα΅’)
~ πβ + π(1+βπα΅’) + ππΎββ±πα΅’(1-πα΅’)πα΅’ # reordered
~ πβ + ππβ + ππ§ββ±πα΅’ # π§ = πΎππ'
πβ + πβ ~ πβ + π(πβ + π§ββ±πα΅’)
πβ ~ π(πβ + π§ββ±πα΅’)
π ~ πβ / (πβ + π§ββ±πα΅’)
π ~ π / (π + π§ π') # OK!
Explicit propagation of errors level 3
# Given:
πβ β β(πβ) # π β βπ
πβ + πΉβ β β(πβ + πβ) # π+πΉ β β π+π
πβ β πβ + βα΅’(πΎβα΅’ * πα΅’) # π β π+πΎπ'
πβ + πβ β (πβ + πΊβ) + βα΅’((πΎβα΅’ + πΊα΅’) * (πα΅’ + πΉα΅’)) # π+π β π+πΊ+(πΎ+πΊ')(π'+πΉ')
πβ β 1 + βα΅’(πα΅’) # π β 1+ππ'
π§ββ±πα΅’ β βα΅’(πΎβα΅’ * (1 - πα΅’) * πα΅’ * πα΅’) # π§π' β πΎππ'π'
# Assume:
ββ{ πΊβ = π }
πΒ² ~ 0
ππΉ ~ 0
# Recall:
ππ₯(β(π₯)) = β(π₯) * (1 - β(π₯))
= π(β(π₯))
β(π₯ + π) ~ β(π₯) + π * ππ₯(β(π₯))
~ β(π₯) + π * β(π₯) * (1 - β(π₯))
~ β(π₯) + π * π(β(π₯))
# Note that one may transpose indices for each level:
ββ¬α΅’β¬β±Όβ¬β
# Solve for level 3 π.
## πΉα΅’:
πα΅’ + πΉα΅’ β β(πα΅’ + πα΅’)
~ βπα΅’ + πα΅’ * πβπα΅’
~ πα΅’ + πα΅’ * πβπα΅’
πΉα΅’ ~ πα΅’ * πβπα΅’
~ πα΅’ * ππα΅’
πΉα΅’ ~ πα΅’ * (1-πα΅’) * πα΅’
## Expand first level and solve for πβ:
πβ + πβ β (πβ + πΊβ) + βα΅’((πΎβα΅’ + πΊα΅’) * (πα΅’ + πΉα΅’))
= πβ+π + (πΎββ±+πΊβ±)(πα΅’+πΉα΅’)
= πβ+π + πΎββ±πα΅’ + πΊβ±πα΅’ + πΎββ±πΉα΅’ + πΊβ±πΉα΅’
~ πβ+π + πΎββ±πα΅’ + πΊβ±πα΅’ + πΎββ±πΉα΅’ # πΊπΉ vanishes
~ πβ+πΎββ±πα΅’ + π+πΊβ±πα΅’ + πΎββ±πΉα΅’
~ πβ + π+πΊβ±πα΅’ + πΎββ±πΉα΅’
πβ ~ π+πΊβ±πα΅’ + πΎββ±πΉα΅’
~ π(1+βπα΅’) + πΎββ±πΉα΅’
~ ππβ + πΎββ±πΉα΅’
## Substitute out πΉα΅’:
πβ ~ ππβ + πΎββ±πΉα΅’
~ ππβ + πΎββ±πα΅’ππα΅’
~ ππβ + πΎββ±ππα΅’πα΅’
## Substitute out πα΅’:
πβ ~ ππβ + πΎββ±ππα΅’πα΅’
~ ππβ + πΎββ±ππα΅’(ππα΅’ + πΎα΅’Κ²πΉβ±Ό) # π ~ ππ+πΎπΉ'
~ ππβ + πΎββ±ππα΅’ππα΅’ + πΎββ±ππα΅’πΎα΅’Κ²πΉβ±Ό
~ ππβ + ππΎββ±ππα΅’πα΅’ + πΎββ±ππα΅’πΎα΅’Κ²πΉβ±Ό # factor out constant π
~ ππβ + ππ§ββ±πα΅’ + π§ββ±πΎα΅’Κ²πΉβ±Ό # π§ = πΎππ'
# Level 2 plus an additional term due to πΉβ±Ό:
πβ ~ π(πβ + π§ββ±πα΅’) + π§ββ±πΎα΅’Κ²πΉβ±Ό
# Recall that in level 2, πΉβ±Ό was zero, but level three continues...
πβ ~ π(πβ + π§ββ±πα΅’) + π§ββ±πΎα΅’Κ²πΉβ±Ό
~ π(πβ + π§ββ±πα΅’) + π§ββ±πΎα΅’Κ²ππβ±Όπβ±Ό # πΉ ~ πππ
~ π(πβ + π§ββ±πα΅’) + π§ββ±π§α΅’Κ²πβ±Ό
~ π(πβ + π§ββ±πα΅’) + π§ββ±π§α΅’Κ²(ππβ±Ό+πΎβ±Όα΅πΉβ) # π ~ ππ+πΎπΉ'
~ π(πβ + π§ββ±πα΅’) + ππ§ββ±π§α΅’Κ²πβ±Ό + π§ββ±π§α΅’Κ²πΎβ±Όα΅πΉβ
~ π(πβ + π§ββ±πα΅’ + π§ββ±π§α΅’Κ²πβ±Ό) + π§ββ±π§α΅’Κ²πΎβ±Όα΅πΉβ
# For level three, πΉβ is zero:
πβ ~ π(πβ + π§ββ±πα΅’ + π§ββ±π§α΅’Κ²πβ±Ό) # π ~ π(π+π§π'+π§π§'π")
General propagation of errors
# The above establishes a clear pattern:
πβ ~ π(πβ + π§ββ±πα΅’ + π§ββ±π§α΅’Κ²πβ±Ό + π§ββ±π§α΅’Κ²π§β±Όα΅πβ + ...)
π ~ π(π + π§ π' + π§ π§'π" + π§ π§'π§"π"' + ...)
# Error bound estimate:
0 < π < 1
0 < ππ=(1-π)π < 0.25 = ΒΌ
|ππ| ~ ΒΌ
|π| ~ Β½
|π| ~ 1+β|π'|
~ 1+βΒ½
~ 1+Β½π β πͺ
|βπΎ| ~ βπ # random walk
|π§| ~ |πΎ||ππ|
~ ΒΌβπ
|π| ~ |π|(|π + π§ π' + π§ π§'π" + π§ π§'π§"π"' + ...|)
~ |π|(πͺ + ΒΌβπ*πͺ' + ΒΌβπ*ΒΌβπ'*πͺ" + ...)
# See Neuronet::NetworkStats#expected_nju:
|π| β |π|/|π|
|π| = πͺ + ΒΌβπ*πͺ' + ΒΌβπ*ΒΌβπ'*πͺ" + ...
# Consider very large π on each level in an π+2 layer network:
|π| ~ |π|Β½π(ΒΌβπ)βΏ
# For a 3 layer network(input, middle, and output layers), π=1:
|π| ~ |π|πͺ(1 + |π§|)
~ |π|πβπ / 8 # π>>1, large π
|π| ~ 8|π| / πβπ # π>>1
Legacy
# In trying to find the recursion pattern,
# I came across several interesting expressions.
# I define them all here, including the ones actually used above:
ππ β π(1-π)
π β βπ
π β π + πΎ π'
π = β π+πΎπ'
π+πΉ β β(π+π)
π = βπ
π+π β π+πΊ + (πΎ+πΊ)(π'+πΉ')
π β 1+ππ'
π§ π' β πΎ ππ'π'
# Legacy:
π β ππ π
πΏ β π§ π' = πΎ ππ'π' = πΎ π'
πΎ β π§ πΏ' = π§ π§'π"