03 Perceptron and Delta Learning Rule - PAI-yoonsung/lstm-paper GitHub Wiki

Artificial Neural Networks consist of a densely interconnected group of simple neuron-like threshold switching units.

인공 신경망은 빽빽하게 서로 연결된 뉴런과 같은 역할을 하는 임계점 스위칭 유닛들의 그룹으로 구성됩니다.

Each unit takes a number of real-valued inputs and produces a single real-valued output.

각 유닛은 일정 수의 실수 입력값을 받고 단일 실수 출력값을 만듭니다.

Based on the connectivity between the threshold units and element parameters, these networks can model complex global behaviour.

임계 유닛들과 속성 파라미터들 사이의 연결을 기반으로, 네트워크가 복잡한 전역 행동들을 모델링할 수 있습니다.

3.1 The Perceptron

The most basic type of artificial neuron is called a perceptron.

퍼셉트론은 가장 기본적인 타입의 인공 뉴런입니다.

Perceptrons consist of a number of external input links, a threshold, and a single external output link.

퍼셉트론은 일정 수의 외부 입력 링크들과 임계점, 단일 외부 출력 링크로 구성됩니다.

Additionally, perceptrons have an internal input, b, called bias.

추가적으로, 퍼셉트론은 b, 바이어스(편향)라고 불리우는 내부 입력도 갖고 있습니다.

The perceptron takes a vector of real-valued input values, all of which are weighted by a multiplier.

퍼셉트론은 증폭기에 의해 가중치(weight)가 적용된 실수 입력값들의 벡터를 받습니다.

In a previous perceptron training phase, the perceptron learns these weights on the basis of training data.

이전의 퍼셉트론 훈련 단계에서, 퍼셉트론은 학습 데이터를 기반으로한 가중치들을 학습하게 됩니다.

It sums all weighted input values and ‘fires’ if the resultant value is above a pre-defined threshold.

퍼셉트론은 모든 가중치가 적용된 입력값들을 합치고, 만약 그 합쳐진 결과값이 사전에 정의된 임계값보다 높을 경우 촉진됩니다.

The output of the perceptron is always Boolean, and it is considered to have fired if the output is ‘1’.

퍼셉트론의 결과값은 언제나 불리언 값이고, 만약 결과값이 1일 경우에는 촉진된 것으로 간주됩니다.

The deactivated value of the perceptron is ‘−1’, and the threshold value is, in most cases, ‘0’.

퍼셉트론의 비활성화 값은 -1 이고, 임계값은 대부분의 경우 0 으로 설정됩니다.

As we only have one unit for the perceptron, we omit the subindexes that refer to the unit.

우리는 단일 퍼셉트론 유닛을 갖고 있기 때문에, 우리는 유닛을 참조하는 서브인덱스(?) 는 생략하게 됩니다.

Given the input vector x = hx1, ..., xni and trained weights W1, ..., Wn, the perceptron outputs y; which is computed by the formula y = ( 1 if Pn i=1 Wixi + b > 0; −1 otherwise.

주어진 입력 벡터 x = x_1, ..., x_n 와 훈련된 가중치 W_1, ..., W_n 이 있다고 할 시, 퍼셉트론 출력값 y 은 위의 공식으로 계산됩니다.

우리는 z= 시그마(i=1 ~ n) W_i * x_i 를 가중치 적용 입력값이라 하고, s = z + b 를 퍼셉트론의 상태라고 합니다.

For the perceptron to fire, its state s must exceed the value of the threshold.

퍼셉트론이 촉발되기 위해선, 해당 퍼셉트론의 상태 s 가 반드시 임계값을 초과해야 합니다.

Single perceptron units can already represent a number of useful functions.

단일 퍼셉트론 유닛들은 이미 몇 가지의 유용한 함수를 표현할 수 있습니다.

Examples are the Boolean functions AND, OR, NAND and NOR.

그 예시로, 불리언 함수들인 AND, OR, NAND 그리고 NOR 가 있습니다.

Other functions are only representable using networks of neurons.

다른 함수들은 다수의 뉴런들의 네트워크를 통해서만 표현될 수 있습니다.

Single perceptrons are limited to learning only functions that are linearly separable.

단일 퍼셉트론들은 오직 선형으로 분리가 가능한 함수들만 배울 수 있습니다.

In general, a problem is linear and the classes are linearly separable in an n-dimensional space if the decision surface is an (n − 1)-dimensional hyperplane.

일반적으로, 결정 표면이 (n-1) 차원의 하이퍼플레인(?)일 경우, 해당 문제는 선형이고 클래스들이 n차원 공간에서 선형으로 분리 가능합니다.

The general structure of a perceptron is shown in Figure 1.

일반적인 퍼셉트론의 구조는 Figure 1 과 같습니다.

Figure 1: The general structure of the most basic type of artificial neuron, called a perceptron. Single perceptrons are limited to learning linearly separable functions.

Figure 1: 퍼셉트론이라 불리우는 가장 기본적인 형태의 인공 뉴런의 일반적인 구조입니다. 단일 퍼셉트론은 선형으로 분리 가능한 함수들만을 배울 수 있다는 한계가 있습니다.

3.2 Linear Separability

To understand linear separability, it is helpful to visualise the possible inputs of a perceptron on the axes of a two-dimensional graph.

선형 분리를 이해하기 위해선, 퍼셉트론의 입력으로 들어올 수 있는 값들을 2차원 그래프의 축들 위에 시각화해보는 것이 좋습니다.

Figure 2: Representations of the Boolean functions OR and XOR. The figures show that the OR function is linearly separable, whereas the XOR function is not.

Figure 2: 불리언 함수인 OR 과 XOR 의 모습을 보여줍니다. 위의 그림들은 OR 함수는 선형으로분리가 가능하지만 XOR 함수는 불가능하다는 것을 보여줍니다.

Figure 2 shows representations of the Boolean functions OR and XOR.

Figure 2 는 불리언 함수인 OR 과 XOR 의 모습을 보여줍니다.

The OR function is linearly separable, whereas the XOR function is not.

OR 함수는 선형으로 분리가 가능한 반면, XOR 은 불가능합니다.

In the figure, pluses are used for an input where the perceptron fires and minuses, where it does not.

Figure 2 에서는 + 들은 퍼셉트론이 활성화된 입력에서 사용됐고, - 들은 그렇지 않습니다.

If the pluses and minuses can be completely separated by a single line, the problem is linearly separable in two dimensions.

만약 + 와 - 가 하나의 선으로 완벽하게 나뉘어질 수 있다면, 해당 문제는 2차원에서 선형으로 분리가 가능한 문제입니다.

The weights of the trained perceptron should represent that line.

학습된 퍼셉트론의 가중치는 그 분리선을 나타낼 것입니다.

3.3 The Delta Learning Rule

Perceptron training is learning by imitation, which is called ‘supervised learning’.

퍼셉트론 훈련은 지도 학습 이라고 불리우는 흉내내기를 통해 배우게 됩니다.

During the training phase, the perceptron produces an output and compares it with a derived output value provided by the training data.

훈련 과정에서, 퍼셉트론은 단일 출력값을 생산해내고 이를 훈련 데이터에서 제공하는 파생 출력값들과 비교합니다.

In cases of misclassification, it then modifies the weights accordingly.

만약 오분류가 일어난 경우, 퍼셉트론은 가중치를 적절하게 수정하게 됩니다.

[55] show that in a finite time, the perceptron will converge to reproduce the correct behaviour, provided that the training examples are linearly separable.

[55] 논문은 제한된 시간 안에, 퍼셉트론은 선형으로 분리가 가능한 훈련 예시에서 제공된 올바른 행동을 다시 생산해내기 위해 수렴하게 될 것이라고 합니다.

[55] Marvin L. Minsky and Seymour A. Papert. Perceptrons: An introduction to computational geometry. Expanded. MIT Press, Cambridge, 1988.

Convergence is not assured if the training data is not linearly separable.

만약, 훈련 데이터가 선형으로 분리되지 않는 경우는, 수렴이 이루어질지 알 수 없게 됩니다.

A variety of training algorithms for perceptrons exist, of which the most common are the perceptron learning rule and the delta learning rule.

다양한 퍼셉트론의 훈련 알고리즘이 존재하는 가운데, 가장 일반적인 알고리즘은 퍼셉트론 훈련 룰과 델다 훈련 룰 입니다.

Both start with random weights and both guarantee convergence to an acceptable hypothesis.

이 둘 모두 무작위한 가중치를 갖고 시작하여 받아들여질만한 가설로의 수렴을 보장합니다.

Using the perceptron learning rule algorithm, the perceptron can learn from a set of samples A sample is a pair <x, d> where x is the input and d is its label.

퍼셉트론 학습 룰 알고리즘을 사용할 경우, 퍼셉트론은 샘플들의 세트인 A 를 통해 학습을 진행하게 됩니다. 샘플은 <x, d> 의 페어로 이루어져 있고, x 는 입력값, d 는 라벨값을 뜻합니다.

For the sample <x, d>, given the input x = <x1, . . . , xn>, the old weight vector W = <W1, . . . , Wn> is updated to the new vector W' using the rule

샘플 <x, d> 의 경우, 입력값 x = <x1, . . . , xn> 가 주어지면, 이전의 가중치 벡터 W = <W1, . . . , Wn> 가 다음의 룰을 통해 새로운 벡터 W' 로 업데이트 됩니다.

W'_i = W_i + ∆W_i, with ∆W_i = η(d − y)x_i,

새로운 가중치 = 낡은 가중치 + ∆W_i
∆W_i = 러닝 레이트 * (라벨값 - 예측값) * 입력값

where y is the output calculated using the input x and the weights W and η is the learning rate.

y는 입력값 x에 가중치 W를 연산하여 만들어지는 출력값(예측값)이고, η 는 러닝 레이트 입니다.

The learning rate is a constant that controls the degree to which the weights are changed.

러닝 레이트는 상수이고, 가중치에 변화를 주는 정도를 조절할 수 있습니다. (기존 가중치에 얼마만큼의 변화를 줄 것인가)

As stated before, the initial weight vector W0 has random values.

시작하기 전에, 초기 가중치 벡터 W0 은 랜덤한 값을 갖고 있습니다.

The algorithm will only converge towards an optimum if the training data is linearly separable, and the learning rate is sufficiently small.

알고리즘은 훈련용 데이터가 선형으로 분리가 가능하고, 러닝 레이트가 효율적인 정도로 작을 때에만 최적의 값으로 수렴할 것입니다.

The perceptron rule fails if the training examples are not linearly separable.

퍼셉트론 룰은 훈련 예시들이 선형으로 분리되지 않는다면, 실패하게 됩니다.

The delta learning rule was specifically designed to handle linearly separable and linearly non-separable training examples.

델다 러닝 룰은 선형으로 분리 가능한 훈련 예시들과 그렇지 못한 예시들을 다루기 위해 특수하게 디자인되었습니다.

It also calculates the errors between calculated output and output data from training samples, and modifies the weights accordingly.

델다 러닝 룰은 계산된 출력(예측값)과 훈련 샘플로부터의 출력 데이터(라벨값) 사이의 에러를 계산하고, 적절하게 가중치를 수정합니다.

The modification of weights is achieved by using the gradient optimisation descent algorithm, which alters them in the direction that produces the steepest descent along the error surface towards the global minimum error.

가중치 수정은 그레디언트 옵티마이제이션 디센트 알고리즘을 사용하여 진행되고, 이는 에러 표면의 가중치들을 글로벌 미니멈 에러를 향해 가는 가장 가파른 경사를 생성하는 방향으로 바꿔주는 알고리즘입니다.

The delta learning rule is the basis of the error backpropagation algorithm, which we will discuss later in this section.

델타 러닝 룰 알고리즘은 에러 백프로파게이션 알고리즘을 기반으로 삼습니다. 이는 다음 섹션에서 알아보게 될 것입니다.

3.4 The Sigmoid Threshold Unit

시그모이드 임계 유닛은 또 다른 종류의 인공 뉴런입니다. 퍼셉트론과 아주 유사하지만, 출력값 계산에 시그모이드 함수를 사용합니다. 출력 y는 위의 공식들에 의해 계산됩니다.

where b is the bias and l is a positive constant that determines the steepness of the sigmoid function.

b 는 bias(편향) 이고, l 은 시그모이드 함수의 경사각을 결정하는 양의 상수입니다.

The major effect on the perceptron is that the output of the sigmoid threshold unit now has more than two possible values;

퍼셉트론의 주요 효과는, 시그모이드 임계 유닛의 출력이 두 개의 양의 값들 외의 다른 값들도 가진다는 것입니다.

now, the output is “squashed” by a continuous function that ranges between 0 and 1.

이제부터는, 출력값이 0~1 사이의 범위를 갖는 연속적인 함수에 의해 짓눌리게(?) 됩니다.

Accordingly, the function 1 / (1−e^(−l×s)) is called the ‘squashing’ function, because it maps a very large input domain onto a small range of outputs.

시그모이드 함수는 스쿼싱 함수 라고 불리우는데, 이는 해당 함수가 매우 거대한 입력 도메인을 아주 작은 범위의 출력들로 매핑하기 때문이다.

For a low total input value, the output of the sigmoid function is close to zero, whereas it is close to one for a high total input value.

입력 값의 총합이 낮은 경우는 시그모이드 함수의 출력은 0에 근접하게 되고, 반대로 입력 값의 총합이 높은 경우는 1에 근접하게 됩니다.

The slope of the sigmoid function is adjusted by the threshold value.

시그모이드 함수의 기울기는 임계값에 의해 적용됩니다.

The advantage of neural networks using sigmoid units is that they are capable of representing non-linear functions.

시그모이드 유닛을 사용하는 신경망의 장점은 비선형 함수를 표현할 수 있다는 것입니다.

Cascaded linear units, like the perceptron, are limited to representing linear functions. A sigmoid threshold unit is sketched in Figure 3.

퍼셉트론같은 계단식 선형 유닛들은 선형 함수들만 표현할 수 있습니다. 시그모이드 임계 유닛은 Figure 3 에 나와있습니다.

Figure 3: The sigmoid threshold unit is capable of representing non-linear functions. Its output is a continuous function of its input, which ranges between 0 and 1.

Figure 3: 시그모이드 임계 유닛은 비선형 함수도 표현할 수 있습니다. 해당 유닛의 출력은 입력의 연속적인 함수이며 0~1 사이의 범위를 갖습니다.

dictionary

squashed : 짓누른? cascaded : 계단식, 폭포수 형의 represent : 표현하다