04 Feed Forward Neural Networks and Backpropagation - PAI-yoonsung/lstm-paper GitHub Wiki

In feed-forward neural networks (FFNNs), sets of neurons are organised in layers, where each neuron computes a weighted sum of its inputs.

feed-forward neural networks (FFNNs) 는 레이어에서 조직화된 뉴런들의 모임입니다. 각 뉴런은 입력값들의 가중치가 적용된 합을 계산합니다.

Input neurons take signals from the environment, and output neurons present signals to the environment.

입력 뉴런들은 환경으로부터 신호를 받고, 출력 뉴런들은 신호들을 환경을 향해 표현합니다.

Neurons that are not directly connected to the environment, but which are connected to other neurons, are called hidden neurons.

뉴런들은 환경과 직접 연결되진 않지만, 히든 뉴런이라고 불리우는 다른 뉴런들과 연결됩니다.

Feed-forward neural networks are loop-free and fully connected.

Feed-forward neural networks 는 반복이 없고, 완전히 연결됩니다.

This means that each neuron provides an input to each neuron in the following layer, and that none of the weights give an input to a neuron in a previous layer.

이는 각 뉴런이 다른 레이어의 뉴런들에게 입력값을 제공한다는 것을 의미하고, 모든 가중치들은 이전 레이어의 뉴런에 입력을 주지 않는다는 것을 뜻합니다.

The simplest type of neural feed-forward networks are single-layer perceptron networks.

가장 단순한 형태의 신경 순전파 네트워크는 단일 레이어 퍼셉트론 네트워크입니다.

Single-layer neural networks consist of a set of input neurons, defined as the input layer, and a set of output neurons, defined as the output layer.

단일 레이어 신경망 네트워크는 뉴런들의 집합으로 구성되는데, 입력 뉴런들의 집합인 입력 레이어와 출력 뉴런들의 집합들인 출력 레이어로 구성됩니다.

The outputs of the input-layer neurons are directly connected to the neurons of the output layer.

입력 레이어 뉴런들의 출력은 직접적으로 출력 레이어의 뉴런들을 향해 연결됩니다.

The weights are applied to the connections between the input and output layer.

가중치들은 입력 레이어와 출력 레이어 사이의 연결들에 적용됩니다.

In the single-layer perceptron network, every single perceptron calculates the sum of the products of the weights and the inputs.

단일 레이어 퍼셉트론 네트워크에서는, 각각의 단일 퍼셉트론이 가중치와 입력값 사이의 곱(product) 연산의 합을 계산합니다.

The perceptron fires ‘1’ if the value is above the threshold value;

퍼셉트론의 값이 임계값을 넘어갈 경우에는 1 로 활성화됩니다.

otherwise, the perceptron takes the deactivated value, which is usually ‘-1’.

그 외에는, 퍼셉트론은 비활성화 값을 갖게 되고, 이때는 주로 -1 을 갖게 됩니다.

The threshold value is typically zero.

임계값은 일반적으로 0 을 사용합니다.

Sets of neurons organised in several layers can form multilayer, forwardconnected networks.

여러 레이어로 조직화된 뉴런의 집합들은 전방으로 연결되는 다중 레이어 네트워크를 형성할 수 있습니다.

The input and output layers are connected via at least one hidden layer, built from set(s) of hidden neurons.

입력, 출력 레이어는 최소 하나의 히든 레이어를 통해 연결되고, 이 히든 레이어는 set(s) 개의 히든 뉴런들로 만들어집니다.

The multilayer feed-forward neural network sketched in Figure 4, with one input layer and three output layers (two hidden and one output), is classified as a 3-layer feed-forward neural network.

다중 레이어 순전파 신경망은 Figure 4 에 나타나있고, 이 신경망은 하나의 입력 레이어와 3개의 출력 레이어(2개의 히든 레이어와 하나의 출력 레이어) 를 갖고 있기 때문에 3-레이어 순전파 신경망으로 분류됩니다.

For most problems, feed-forward neural networks with more than two layers offer no advantage.

대부분의 문제에서, 레이어를 2개 넘게 갖는 순전파 신경망은 딱히 이점을 주지 못합니다.

Multilayer feed-forward networks using sigmoid threshold functions are able to express non-linear decision surfaces.

시그모이드 임계 함수를 사용하는 다중 레이어 순전파 망은 비선형 결정 표면을 표현하는 것이 가능합니다.

Any function can be closely approximated by these networks, given enough hidden units.

충분한 수의 히든 유닛들이 주어진다면, 이 네트워크를 통해 어떠한 함수든 비슷하게 근사하는 것이 가능합니다.

Figure 4: A multilayer feed-forward neural network with one input layer, two hidden layers, and an output layer. Using neurons with sigmoid threshold functions, these neural networks are able to express non-linear decision surfaces.

Figure 4: 한 개의 인풋 레이어, 두 개의 히든 레이어, 한 개의 출력 레이어를 갖는 멀티 레이어 순전파 신경망으로, 뉴런들은 시그모이드 임계 함수를 사용하고 있기 때문에 이 뉴럴 신경망들은 비선형 결정 표면을 표현할 수 있습니다.

The most common neural network learning technique is the error backpropagation algorithm.

가장 흔한 신경망 학습 기술은 에러 역전파 알고리즘입니다.

It uses gradient descent to learn the weights in multilayer networks.

해당 기술은 멀티 레이어 신경망의 가중치를 학습하기 위하여 그레디언트 디센트 방식을 사용합니다.

It works in small iterative steps, starting backwards from the output layer towards the input layer.

해당 기술은 출력 레이어부터 시작하여 입력 레이어를 향해 나아가는 적은 반복 횟수동안 작동합니다.

A requirement is that the activation function of the neuron is differentiable.

역전파 알고리즘이 적용되기 위해선, 뉴런의 활성화함수가 미분이 가능해야 합니다.

Usually, the weights of a feed-forward neural network are initialised to small, normalised random numbers using bias values.

일반적으로, 순전파 신경망의 가중치들은 작고, 정규화가 적용된 무작위의 숫자들로 초기화 됩니다.

Then, error backpropagation applies all training samples to the neural network and computes the input and output of each unit for all (hidden and) output layers.

그 다음, 에러 역전파가 모든 훈련 샘플들에 적용되고, 모든 출력 레이어(히든, 출력)의 각 유닛의 입력값과 출력값을 계산합니다.

신경망 유닛들의 집합은 위의 공식과 같고, U 는 서로소 합집합 I, H, O 는 입력, 히든, 출력 유닛들의 집합입니다.

We denote input units by i, hidden units by h and output units by o.

또한, 여기서는 입력 유닛을 i, 히든 유닛은 h, 출력 유닛은 o 라고 칭할 것입니다.

For convenience, we define the set of non-input units U , H t O.

편의를 위해ㅡ 우리는 non-입력 유닛 집합을 다음과 같이 정의할 수 있습니다

For a non-input unit u ∈ U, the input to u is denoted by xu, its state by su, its bias by bu and its output by yu.

non-입력 유닛 u ∈ U 에 대하여, u에 대한 입력값은 x_u 로 표현되고, 그것의 상태는 s_u 로 표현되며, 편향은 b_u 로 나타내지고, 출력은 y_u 라 칭합니다.

Given units u, v ∈ U, the weight that connects u with v is denoted by Wuv.

주어진 유닛들 u, v ∈ U 에 대하여, u 와 v 를 연결하는 가중치는 W_uv 로 표현됩니다.

To model the external input that the neural network receives, we use the external input vector x = <x1, . . . , xn>.

신경망이 받아들이는 외부 입력을 모델링하기 위해, 우리는 외부 입력 벡터 x = <x1, . . . , xn> 를 사용합니다.

For each component of the external input vector we find a corresponding input unit that models it, so the output of the i^th input unit should be equal i^th component of the input to the network (i.e., xi), and consequently |I| = n.

각 외부 입력 벡터의 구성요소를 모델링하는 상호 입력 유닛을 찾습니다, 그러므로 i^th 번째 입력 유닛의 출력은 네트워크에 대한 입력의 i^th 번째 구성요소와 동일해야 하고 (i.e., xi) 결론적으로 |I| = n 이 됩니다.

non 입력 유닛 u ∈ U 에 대해, y_u 라고 명시된 u 의 출력은, (1)의 식과 같이 시그모이드 활성화 함수를 사용하는 것으로 정의된다. s_u 는 u 의 상태를 의미하고, 이것은 (2)의 식과 같이 정의된다. b_u 는 u 의 편향을 의미하고, z_u 는 u의 가중치가 적용된 입력을 뜻한다. 이는 (3)의 식처럼 표현될 수 있다.

where X[v,u] is the information that v passes as input to u, and Pre (u) is the set of units v that preceed u; that is, input units, and hidden units that feed their outputs yv (see Equation (1)) multiplied by the corresponding weight W[v,u] to the unit u.

X[v,u] 는 v가 u를 향한 입력으로 들어간다는 정보를 의미하고, Pre(u) 는 u보다 앞서가는 유닛 v들의 집합이다. 즉, 유닛 u 를 향해 출력 y_v를 상호되는 가중치 W[v,u] 를 곱하여 넘기는 입력 유닛들과 히든 유닛들을 뜻한다.

Starting from the input layer, the inputs are propagated forwards through the network until the output units are reached at the output layer.

입력 레이어부터 시작하여, 입력들은 네트워크를 통해 출력 유닛들이 출력 레이어에 도달할 때 까지 앞으로 전달하게 됩니다.

Then, the output units produce an observable output (the network output) y.

그 후, 출력 유닛들은 관찰가능한 출력(네트워크 출력) y 를 만들게 됩니다.

More precisely, for o ∈ O, its output yo corresponds to the o th component of y.

더 정확하게는, o ∈ O 에 대해서, 출력 y_o 은 y의 o 번째 구성 요소와 상호작용 합니다.

Next, the backpropagation learning algorithm propagates the error backwards, and the weights and biases are updated such that we reduce the error with respect to the present training sample.

다음으로, 역전파 학습 알고리즘은 에러를 후방으로 전달하고, 가중치와 편향을 현재 훈련 샘플에서 에러를 줄이도록 업데이트합니다.

Starting from the output layer, the algorithm compares the network output yo with the corresponding desired target output do.

해당 알고리즘은 출력 레이어부터 시작하여, 네트워크의 출력인 y_o 와 상호되는 목표 출력인 d_o 을 비교하게 됩니다.

It calculates the error e_o for each output neuron using some error function to be minimised.

해당 알고리즘은 각 출력 뉴런의 에러 e_o 를 최소화하기 위한 몇몇 에러 함수를 사용하여 계산합니다.

에러 e_o 은 위 그림의 첫 번째 공식으로 계산되고, 네트워크의 전반적인 에러를 계산하기 위하여 두 번째 공식을 사용합니다.

가중치 W[u,v] 를 갱신하기 위해, 우리는 위 그림에서 첫 번째 공식을 사용할 것입니다. η 는 러닝 레이트를 뜻합니다. 이제 우리는 요소(위 그림에서 파셜y, 파셜s)를 활용하여 활성화와 관련된 에러를 미분하는 것을 가중치를 계산하고, 상태 및 가중치에 대한 주의 파생상품을 계산합니다.(?)

출력 유닛의 활성화와 관련된 에러의 파생 함수는 위에서 첫 번째 식과 같습니다. 출력 유닛에 대한 상태와 관련된 활성화의 파생 함수는 두 번째 식과 같습니다.

출력 유닛을 o 에 대하여, 에러 신호는 (4) 식과 같습니다 출력 유닛들에대해서, 우리는 (5) 식을 갖습니다. 또한 히든 유닛 h 와 출력 유닛 o 사이의 가중치를 다음과 같이 업데이트할 수 있습니다.

dictionary

organised : 조직화되다 iterative : 반복적인 differentiable : 미분 가능한 preceed : 앞서다 propagate : 전달하다