04 Feed Forward Neural Networks and Backpropagation - PAI-yoonsung/lstm-paper GitHub Wiki

In feed-forward neural networks (FFNNs), sets of neurons are organised in layers, where each neuron computes a weighted sum of its inputs.

feed-forward neural networks (FFNNs) λŠ” λ ˆμ΄μ–΄μ—μ„œ μ‘°μ§ν™”λœ λ‰΄λŸ°λ“€μ˜ λͺ¨μž„μž…λ‹ˆλ‹€. 각 λ‰΄λŸ°μ€ μž…λ ₯κ°’λ“€μ˜ κ°€μ€‘μΉ˜κ°€ 적용된 합을 κ³„μ‚°ν•©λ‹ˆλ‹€.

Input neurons take signals from the environment, and output neurons present signals to the environment.

μž…λ ₯ λ‰΄λŸ°λ“€μ€ ν™˜κ²½μœΌλ‘œλΆ€ν„° μ‹ ν˜Έλ₯Ό λ°›κ³ , 좜λ ₯ λ‰΄λŸ°λ“€μ€ μ‹ ν˜Έλ“€μ„ ν™˜κ²½μ„ ν–₯ν•΄ ν‘œν˜„ν•©λ‹ˆλ‹€.

Neurons that are not directly connected to the environment, but which are connected to other neurons, are called hidden neurons.

λ‰΄λŸ°λ“€μ€ ν™˜κ²½κ³Ό 직접 μ—°κ²°λ˜μ§„ μ•Šμ§€λ§Œ, νžˆλ“  λ‰΄λŸ°μ΄λΌκ³  λΆˆλ¦¬μš°λŠ” λ‹€λ₯Έ λ‰΄λŸ°λ“€κ³Ό μ—°κ²°λ©λ‹ˆλ‹€.

Feed-forward neural networks are loop-free and fully connected.

Feed-forward neural networks λŠ” 반볡이 μ—†κ³ , μ™„μ „νžˆ μ—°κ²°λ©λ‹ˆλ‹€.

This means that each neuron provides an input to each neuron in the following layer, and that none of the weights give an input to a neuron in a previous layer.

μ΄λŠ” 각 λ‰΄λŸ°μ΄ λ‹€λ₯Έ λ ˆμ΄μ–΄μ˜ λ‰΄λŸ°λ“€μ—κ²Œ μž…λ ₯값을 μ œκ³΅ν•œλ‹€λŠ” 것을 μ˜λ―Έν•˜κ³ , λͺ¨λ“  κ°€μ€‘μΉ˜λ“€μ€ 이전 λ ˆμ΄μ–΄μ˜ λ‰΄λŸ°μ— μž…λ ₯을 μ£Όμ§€ μ•ŠλŠ”λ‹€λŠ” 것을 λœ»ν•©λ‹ˆλ‹€.

The simplest type of neural feed-forward networks are single-layer perceptron networks.

κ°€μž₯ λ‹¨μˆœν•œ ν˜•νƒœμ˜ μ‹ κ²½ μˆœμ „νŒŒ λ„€νŠΈμ›Œν¬λŠ” 단일 λ ˆμ΄μ–΄ νΌμ…‰νŠΈλ‘  λ„€νŠΈμ›Œν¬μž…λ‹ˆλ‹€.

Single-layer neural networks consist of a set of input neurons, defined as the input layer, and a set of output neurons, defined as the output layer.

단일 λ ˆμ΄μ–΄ 신경망 λ„€νŠΈμ›Œν¬λŠ” λ‰΄λŸ°λ“€μ˜ μ§‘ν•©μœΌλ‘œ κ΅¬μ„±λ˜λŠ”λ°, μž…λ ₯ λ‰΄λŸ°λ“€μ˜ 집합인 μž…λ ₯ λ ˆμ΄μ–΄μ™€ 좜λ ₯ λ‰΄λŸ°λ“€μ˜ 집합듀인 좜λ ₯ λ ˆμ΄μ–΄λ‘œ κ΅¬μ„±λ©λ‹ˆλ‹€.

The outputs of the input-layer neurons are directly connected to the neurons of the output layer.

μž…λ ₯ λ ˆμ΄μ–΄ λ‰΄λŸ°λ“€μ˜ 좜λ ₯은 μ§μ ‘μ μœΌλ‘œ 좜λ ₯ λ ˆμ΄μ–΄μ˜ λ‰΄λŸ°λ“€μ„ ν–₯ν•΄ μ—°κ²°λ©λ‹ˆλ‹€.

The weights are applied to the connections between the input and output layer.

κ°€μ€‘μΉ˜λ“€μ€ μž…λ ₯ λ ˆμ΄μ–΄μ™€ 좜λ ₯ λ ˆμ΄μ–΄ μ‚¬μ΄μ˜ 연결듀에 μ μš©λ©λ‹ˆλ‹€.

In the single-layer perceptron network, every single perceptron calculates the sum of the products of the weights and the inputs.

단일 λ ˆμ΄μ–΄ νΌμ…‰νŠΈλ‘  λ„€νŠΈμ›Œν¬μ—μ„œλŠ”, 각각의 단일 νΌμ…‰νŠΈλ‘ μ΄ κ°€μ€‘μΉ˜μ™€ μž…λ ₯κ°’ μ‚¬μ΄μ˜ κ³±(product) μ—°μ‚°μ˜ 합을 κ³„μ‚°ν•©λ‹ˆλ‹€.

The perceptron fires β€˜1’ if the value is above the threshold value;

νΌμ…‰νŠΈλ‘ μ˜ 값이 μž„κ³„κ°’μ„ λ„˜μ–΄κ°ˆ κ²½μš°μ—λŠ” 1 둜 ν™œμ„±ν™”λ©λ‹ˆλ‹€.

otherwise, the perceptron takes the deactivated value, which is usually β€˜-1’.

κ·Έ μ™Έμ—λŠ”, νΌμ…‰νŠΈλ‘ μ€ λΉ„ν™œμ„±ν™” 값을 κ°–κ²Œ 되고, μ΄λ•ŒλŠ” 주둜 -1 을 κ°–κ²Œ λ©λ‹ˆλ‹€.

The threshold value is typically zero.

μž„κ³„κ°’μ€ 일반적으둜 0 을 μ‚¬μš©ν•©λ‹ˆλ‹€.

Sets of neurons organised in several layers can form multilayer, forwardconnected networks.

μ—¬λŸ¬ λ ˆμ΄μ–΄λ‘œ μ‘°μ§ν™”λœ λ‰΄λŸ°μ˜ 집합듀은 μ „λ°©μœΌλ‘œ μ—°κ²°λ˜λŠ” 닀쀑 λ ˆμ΄μ–΄ λ„€νŠΈμ›Œν¬λ₯Ό ν˜•μ„±ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

The input and output layers are connected via at least one hidden layer, built from set(s) of hidden neurons.

μž…λ ₯, 좜λ ₯ λ ˆμ΄μ–΄λŠ” μ΅œμ†Œ ν•˜λ‚˜μ˜ νžˆλ“  λ ˆμ΄μ–΄λ₯Ό 톡해 μ—°κ²°λ˜κ³ , 이 νžˆλ“  λ ˆμ΄μ–΄λŠ” set(s) 개의 νžˆλ“  λ‰΄λŸ°λ“€λ‘œ λ§Œλ“€μ–΄μ§‘λ‹ˆλ‹€.

The multilayer feed-forward neural network sketched in Figure 4, with one input layer and three output layers (two hidden and one output), is classified as a 3-layer feed-forward neural network.

닀쀑 λ ˆμ΄μ–΄ μˆœμ „νŒŒ 신경망은 Figure 4 에 λ‚˜νƒ€λ‚˜μžˆκ³ , 이 신경망은 ν•˜λ‚˜μ˜ μž…λ ₯ λ ˆμ΄μ–΄μ™€ 3개의 좜λ ₯ λ ˆμ΄μ–΄(2개의 νžˆλ“  λ ˆμ΄μ–΄μ™€ ν•˜λ‚˜μ˜ 좜λ ₯ λ ˆμ΄μ–΄) λ₯Ό κ°–κ³  있기 λ•Œλ¬Έμ— 3-λ ˆμ΄μ–΄ μˆœμ „νŒŒ μ‹ κ²½λ§μœΌλ‘œ λΆ„λ₯˜λ©λ‹ˆλ‹€.

For most problems, feed-forward neural networks with more than two layers offer no advantage.

λŒ€λΆ€λΆ„μ˜ λ¬Έμ œμ—μ„œ, λ ˆμ΄μ–΄λ₯Ό 2개 λ„˜κ²Œ κ°–λŠ” μˆœμ „νŒŒ 신경망은 λ”±νžˆ 이점을 μ£Όμ§€ λͺ»ν•©λ‹ˆλ‹€.

Multilayer feed-forward networks using sigmoid threshold functions are able to express non-linear decision surfaces.

μ‹œκ·Έλͺ¨μ΄λ“œ μž„κ³„ ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜λŠ” 닀쀑 λ ˆμ΄μ–΄ μˆœμ „νŒŒ 망은 λΉ„μ„ ν˜• κ²°μ • ν‘œλ©΄μ„ ν‘œν˜„ν•˜λŠ” 것이 κ°€λŠ₯ν•©λ‹ˆλ‹€.

Any function can be closely approximated by these networks, given enough hidden units.

μΆ©λΆ„ν•œ 수의 νžˆλ“  μœ λ‹›λ“€μ΄ μ£Όμ–΄μ§„λ‹€λ©΄, 이 λ„€νŠΈμ›Œν¬λ₯Ό 톡해 μ–΄λ– ν•œ ν•¨μˆ˜λ“  λΉ„μŠ·ν•˜κ²Œ κ·Όμ‚¬ν•˜λŠ” 것이 κ°€λŠ₯ν•©λ‹ˆλ‹€.

Figure 4: A multilayer feed-forward neural network with one input layer, two hidden layers, and an output layer. Using neurons with sigmoid threshold functions, these neural networks are able to express non-linear decision surfaces.

Figure 4: ν•œ 개의 인풋 λ ˆμ΄μ–΄, 두 개의 νžˆλ“  λ ˆμ΄μ–΄, ν•œ 개의 좜λ ₯ λ ˆμ΄μ–΄λ₯Ό κ°–λŠ” λ©€ν‹° λ ˆμ΄μ–΄ μˆœμ „νŒŒ μ‹ κ²½λ§μœΌλ‘œ, λ‰΄λŸ°λ“€μ€ μ‹œκ·Έλͺ¨μ΄λ“œ μž„κ³„ ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜κ³  있기 λ•Œλ¬Έμ— 이 λ‰΄λŸ΄ 신경망듀은 λΉ„μ„ ν˜• κ²°μ • ν‘œλ©΄μ„ ν‘œν˜„ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

The most common neural network learning technique is the error backpropagation algorithm.

κ°€μž₯ ν”ν•œ 신경망 ν•™μŠ΅ κΈ°μˆ μ€ μ—λŸ¬ μ—­μ „νŒŒ μ•Œκ³ λ¦¬μ¦˜μž…λ‹ˆλ‹€.

It uses gradient descent to learn the weights in multilayer networks.

ν•΄λ‹Ή κΈ°μˆ μ€ λ©€ν‹° λ ˆμ΄μ–΄ μ‹ κ²½λ§μ˜ κ°€μ€‘μΉ˜λ₯Ό ν•™μŠ΅ν•˜κΈ° μœ„ν•˜μ—¬ κ·Έλ ˆλ””μ–ΈνŠΈ λ””μ„ΌνŠΈ 방식을 μ‚¬μš©ν•©λ‹ˆλ‹€.

It works in small iterative steps, starting backwards from the output layer towards the input layer.

ν•΄λ‹Ή κΈ°μˆ μ€ 좜λ ₯ λ ˆμ΄μ–΄λΆ€ν„° μ‹œμž‘ν•˜μ—¬ μž…λ ₯ λ ˆμ΄μ–΄λ₯Ό ν–₯ν•΄ λ‚˜μ•„κ°€λŠ” 적은 반볡 νšŸμˆ˜λ™μ•ˆ μž‘λ™ν•©λ‹ˆλ‹€.

A requirement is that the activation function of the neuron is differentiable.

μ—­μ „νŒŒ μ•Œκ³ λ¦¬μ¦˜μ΄ 적용되기 μœ„ν•΄μ„ , λ‰΄λŸ°μ˜ ν™œμ„±ν™”ν•¨μˆ˜κ°€ 미뢄이 κ°€λŠ₯ν•΄μ•Ό ν•©λ‹ˆλ‹€.

Usually, the weights of a feed-forward neural network are initialised to small, normalised random numbers using bias values.

일반적으둜, μˆœμ „νŒŒ μ‹ κ²½λ§μ˜ κ°€μ€‘μΉ˜λ“€μ€ μž‘κ³ , μ •κ·œν™”κ°€ 적용된 λ¬΄μž‘μœ„μ˜ μˆ«μžλ“€λ‘œ μ΄ˆκΈ°ν™” λ©λ‹ˆλ‹€.

Then, error backpropagation applies all training samples to the neural network and computes the input and output of each unit for all (hidden and) output layers.

κ·Έ λ‹€μŒ, μ—λŸ¬ μ—­μ „νŒŒκ°€ λͺ¨λ“  ν›ˆλ ¨ μƒ˜ν”Œλ“€μ— 적용되고, λͺ¨λ“  좜λ ₯ λ ˆμ΄μ–΄(νžˆλ“ , 좜λ ₯)의 각 μœ λ‹›μ˜ μž…λ ₯κ°’κ³Ό 좜λ ₯값을 κ³„μ‚°ν•©λ‹ˆλ‹€.

신경망 μœ λ‹›λ“€μ˜ 집합은 μœ„μ˜ 곡식과 κ°™κ³ , U λŠ” μ„œλ‘œμ†Œ ν•©μ§‘ν•© I, H, O λŠ” μž…λ ₯, νžˆλ“ , 좜λ ₯ μœ λ‹›λ“€μ˜ μ§‘ν•©μž…λ‹ˆλ‹€.

We denote input units by i, hidden units by h and output units by o.

λ˜ν•œ, μ—¬κΈ°μ„œλŠ” μž…λ ₯ μœ λ‹›μ„ i, νžˆλ“  μœ λ‹›μ€ h, 좜λ ₯ μœ λ‹›μ€ o 라고 μΉ­ν•  κ²ƒμž…λ‹ˆλ‹€.

For convenience, we define the set of non-input units U , H t O.

편의λ₯Ό μœ„ν•΄γ…‘ μš°λ¦¬λŠ” non-μž…λ ₯ μœ λ‹› 집합을 λ‹€μŒκ³Ό 같이 μ •μ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€

For a non-input unit u ∈ U, the input to u is denoted by xu, its state by su, its bias by bu and its output by yu.

non-μž…λ ₯ μœ λ‹› u ∈ U 에 λŒ€ν•˜μ—¬, u에 λŒ€ν•œ μž…λ ₯값은 x_u 둜 ν‘œν˜„λ˜κ³ , κ·Έκ²ƒμ˜ μƒνƒœλŠ” s_u 둜 ν‘œν˜„λ˜λ©°, 편ν–₯은 b_u 둜 λ‚˜νƒ€λ‚΄μ§€κ³ , 좜λ ₯은 y_u 라 μΉ­ν•©λ‹ˆλ‹€.

Given units u, v ∈ U, the weight that connects u with v is denoted by Wuv.

μ£Όμ–΄μ§„ μœ λ‹›λ“€ u, v ∈ U 에 λŒ€ν•˜μ—¬, u 와 v λ₯Ό μ—°κ²°ν•˜λŠ” κ°€μ€‘μΉ˜λŠ” W_uv 둜 ν‘œν˜„λ©λ‹ˆλ‹€.

To model the external input that the neural network receives, we use the external input vector x = <x1, . . . , xn>.

신경망이 λ°›μ•„λ“€μ΄λŠ” μ™ΈλΆ€ μž…λ ₯을 λͺ¨λΈλ§ν•˜κΈ° μœ„ν•΄, μš°λ¦¬λŠ” μ™ΈλΆ€ μž…λ ₯ 벑터 x = <x1, . . . , xn> λ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.

For each component of the external input vector we find a corresponding input unit that models it, so the output of the i^th input unit should be equal i^th component of the input to the network (i.e., xi), and consequently |I| = n.

각 μ™ΈλΆ€ μž…λ ₯ λ²‘ν„°μ˜ κ΅¬μ„±μš”μ†Œλ₯Ό λͺ¨λΈλ§ν•˜λŠ” μƒν˜Έ μž…λ ₯ μœ λ‹›μ„ μ°ΎμŠ΅λ‹ˆλ‹€, κ·ΈλŸ¬λ―€λ‘œ i^th 번째 μž…λ ₯ μœ λ‹›μ˜ 좜λ ₯은 λ„€νŠΈμ›Œν¬μ— λŒ€ν•œ μž…λ ₯의 i^th 번째 κ΅¬μ„±μš”μ†Œμ™€ 동일해야 ν•˜κ³  (i.e., xi) 결둠적으둜 |I| = n 이 λ©λ‹ˆλ‹€.

non μž…λ ₯ μœ λ‹› u ∈ U 에 λŒ€ν•΄, y_u 라고 λͺ…μ‹œλœ u 의 좜λ ₯은, (1)의 식과 같이 μ‹œκ·Έλͺ¨μ΄λ“œ ν™œμ„±ν™” ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜λŠ” κ²ƒμœΌλ‘œ μ •μ˜λœλ‹€. s_u λŠ” u 의 μƒνƒœλ₯Ό μ˜λ―Έν•˜κ³ , 이것은 (2)의 식과 같이 μ •μ˜λœλ‹€. b_u λŠ” u 의 편ν–₯을 μ˜λ―Έν•˜κ³ , z_u λŠ” u의 κ°€μ€‘μΉ˜κ°€ 적용된 μž…λ ₯을 λœ»ν•œλ‹€. μ΄λŠ” (3)의 μ‹μ²˜λŸΌ ν‘œν˜„λ  수 μžˆλ‹€.

where X[v,u] is the information that v passes as input to u, and Pre (u) is the set of units v that preceed u; that is, input units, and hidden units that feed their outputs yv (see Equation (1)) multiplied by the corresponding weight W[v,u] to the unit u.

X[v,u] λŠ” vκ°€ uλ₯Ό ν–₯ν•œ μž…λ ₯으둜 λ“€μ–΄κ°„λ‹€λŠ” 정보λ₯Ό μ˜λ―Έν•˜κ³ , Pre(u) λŠ” u보닀 μ•žμ„œκ°€λŠ” μœ λ‹› vλ“€μ˜ 집합이닀. 즉, μœ λ‹› u λ₯Ό ν–₯ν•΄ 좜λ ₯ y_vλ₯Ό μƒν˜Έλ˜λŠ” κ°€μ€‘μΉ˜ W[v,u] λ₯Ό κ³±ν•˜μ—¬ λ„˜κΈ°λŠ” μž…λ ₯ μœ λ‹›λ“€κ³Ό νžˆλ“  μœ λ‹›λ“€μ„ λœ»ν•œλ‹€.

Starting from the input layer, the inputs are propagated forwards through the network until the output units are reached at the output layer.

μž…λ ₯ λ ˆμ΄μ–΄λΆ€ν„° μ‹œμž‘ν•˜μ—¬, μž…λ ₯듀은 λ„€νŠΈμ›Œν¬λ₯Ό 톡해 좜λ ₯ μœ λ‹›λ“€μ΄ 좜λ ₯ λ ˆμ΄μ–΄μ— 도달할 λ•Œ κΉŒμ§€ μ•žμœΌλ‘œ μ „λ‹¬ν•˜κ²Œ λ©λ‹ˆλ‹€.

Then, the output units produce an observable output (the network output) y.

κ·Έ ν›„, 좜λ ₯ μœ λ‹›λ“€μ€ κ΄€μ°°κ°€λŠ₯ν•œ 좜λ ₯(λ„€νŠΈμ›Œν¬ 좜λ ₯) y λ₯Ό λ§Œλ“€κ²Œ λ©λ‹ˆλ‹€.

More precisely, for o ∈ O, its output yo corresponds to the o th component of y.

더 μ •ν™•ν•˜κ²ŒλŠ”, o ∈ O 에 λŒ€ν•΄μ„œ, 좜λ ₯ y_o 은 y의 o 번째 ꡬ성 μš”μ†Œμ™€ μƒν˜Έμž‘μš© ν•©λ‹ˆλ‹€.

Next, the backpropagation learning algorithm propagates the error backwards, and the weights and biases are updated such that we reduce the error with respect to the present training sample.

λ‹€μŒμœΌλ‘œ, μ—­μ „νŒŒ ν•™μŠ΅ μ•Œκ³ λ¦¬μ¦˜μ€ μ—λŸ¬λ₯Ό ν›„λ°©μœΌλ‘œ μ „λ‹¬ν•˜κ³ , κ°€μ€‘μΉ˜μ™€ 편ν–₯을 ν˜„μž¬ ν›ˆλ ¨ μƒ˜ν”Œμ—μ„œ μ—λŸ¬λ₯Ό 쀄이도둝 μ—…λ°μ΄νŠΈν•©λ‹ˆλ‹€.

Starting from the output layer, the algorithm compares the network output yo with the corresponding desired target output do.

ν•΄λ‹Ή μ•Œκ³ λ¦¬μ¦˜μ€ 좜λ ₯ λ ˆμ΄μ–΄λΆ€ν„° μ‹œμž‘ν•˜μ—¬, λ„€νŠΈμ›Œν¬μ˜ 좜λ ₯인 y_o 와 μƒν˜Έλ˜λŠ” λͺ©ν‘œ 좜λ ₯인 d_o 을 λΉ„κ΅ν•˜κ²Œ λ©λ‹ˆλ‹€.

It calculates the error e_o for each output neuron using some error function to be minimised.

ν•΄λ‹Ή μ•Œκ³ λ¦¬μ¦˜μ€ 각 좜λ ₯ λ‰΄λŸ°μ˜ μ—λŸ¬ e_o λ₯Ό μ΅œμ†Œν™”ν•˜κΈ° μœ„ν•œ λͺ‡λͺ‡ μ—λŸ¬ ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ κ³„μ‚°ν•©λ‹ˆλ‹€.

μ—λŸ¬ e_o 은 μœ„ 그림의 첫 번째 κ³΅μ‹μœΌλ‘œ κ³„μ‚°λ˜κ³ , λ„€νŠΈμ›Œν¬μ˜ μ „λ°˜μ μΈ μ—λŸ¬λ₯Ό κ³„μ‚°ν•˜κΈ° μœ„ν•˜μ—¬ 두 번째 곡식을 μ‚¬μš©ν•©λ‹ˆλ‹€.

κ°€μ€‘μΉ˜ W[u,v] λ₯Ό κ°±μ‹ ν•˜κΈ° μœ„ν•΄, μš°λ¦¬λŠ” μœ„ κ·Έλ¦Όμ—μ„œ 첫 번째 곡식을 μ‚¬μš©ν•  κ²ƒμž…λ‹ˆλ‹€. Ξ· λŠ” λŸ¬λ‹ 레이트λ₯Ό λœ»ν•©λ‹ˆλ‹€. 이제 μš°λ¦¬λŠ” μš”μ†Œ(μœ„ κ·Έλ¦Όμ—μ„œ νŒŒμ…œy, νŒŒμ…œs)λ₯Ό ν™œμš©ν•˜μ—¬ ν™œμ„±ν™”μ™€ κ΄€λ ¨λœ μ—λŸ¬λ₯Ό λ―ΈλΆ„ν•˜λŠ” 것을 κ°€μ€‘μΉ˜λ₯Ό κ³„μ‚°ν•˜κ³ , μƒνƒœ 및 κ°€μ€‘μΉ˜μ— λŒ€ν•œ 주의 νŒŒμƒμƒν’ˆμ„ κ³„μ‚°ν•©λ‹ˆλ‹€.(?)

좜λ ₯ μœ λ‹›μ˜ ν™œμ„±ν™”μ™€ κ΄€λ ¨λœ μ—λŸ¬μ˜ νŒŒμƒ ν•¨μˆ˜λŠ” μœ„μ—μ„œ 첫 번째 식과 κ°™μŠ΅λ‹ˆλ‹€. 좜λ ₯ μœ λ‹›μ— λŒ€ν•œ μƒνƒœμ™€ κ΄€λ ¨λœ ν™œμ„±ν™”μ˜ νŒŒμƒ ν•¨μˆ˜λŠ” 두 번째 식과 κ°™μŠ΅λ‹ˆλ‹€.

좜λ ₯ μœ λ‹›μ„ o 에 λŒ€ν•˜μ—¬, μ—λŸ¬ μ‹ ν˜ΈλŠ” (4) 식과 κ°™μŠ΅λ‹ˆλ‹€ 좜λ ₯ μœ λ‹›λ“€μ—λŒ€ν•΄μ„œ, μš°λ¦¬λŠ” (5) 식을 κ°–μŠ΅λ‹ˆλ‹€. λ˜ν•œ νžˆλ“  μœ λ‹› h 와 좜λ ₯ μœ λ‹› o μ‚¬μ΄μ˜ κ°€μ€‘μΉ˜λ₯Ό λ‹€μŒκ³Ό 같이 μ—…λ°μ΄νŠΈν•  수 μžˆμŠ΅λ‹ˆλ‹€.

dictionary

organised : μ‘°μ§ν™”λ˜λ‹€ iterative : 반볡적인 differentiable : λ―ΈλΆ„ κ°€λŠ₯ν•œ preceed : μ•žμ„œλ‹€ propagate : μ „λ‹¬ν•˜λ‹€