An Introduction to Forward Error Correction

Introduction

Communicating information is crucial in our modern lives. Even this wiki article, created to share important information, demonstrates this concept. In one communication method, we convert information into bits and send them out, which defines a communication system. However, every communication system inevitably encounters errors known as noise, which can lead to dire consequences. Consequently, we need error-correcting mechanisms within our communication systems, with forward error correction being a widely used method. This article will explain the formation of forward error correction codes and their importance by introducing you to the basic architecture of a communication system, how we model noise, information theory basics, and forward error correction codes.

Communication System Overview

Figure 1: A high-level depiction of a communication system.

Figure 1 depicts a simple model of a communication system. We begin with a user, for example, the computer that sends information as bits. The transmitter then transmits these bits using a physical medium, which is processed by a receiver.

Figure 2: Noise causing bit flips in our message

The bits that are sent as electromagnetic waves or lasers will experience noise in the channel in the form of electrical interference or blockage that will cause bit flips depicted in Figure 2. In the binary case, this means we will interpret a one as a zero or a zero as a one.

Although noise in nature is random, theorists have developed various channel models for analytical purposes. These channels are defined not just by their structure but also by their channel capacity. I will introduce two commonly used models: the binary symmetric channel (BSC) and the additive white Gaussian noise (AWGN) channel.

A crash course in Information Theory

To understand the properties of a channel, we require an understanding of what channel capacity implies. To do so, the knowledge of a couple of fundamental information theory concepts is crucial. I present the discrete case, however, it is easy to extend it to the continuous domain by replacing sums with integrals.

Entropy

Assume we have a discrete random variable $X$ which represents possible binary messages in our system that are distributed according to $p$. Then, the entropy $H(X)$ can be defined as

$$H(X) = -\sum_{x \in X} p(x)\log_2 p(x)$$

Entropy akin to the term in thermodynamics is a measure of the uncertainty of a random variable. To illustrate, imagine a coin that can be biased to have heads with probability p and tails with probability 1-p. This is equivalent to sending a one-bit message with a probability p of sending a one and 1-p of sending a zero. The entropy for this example would be

$$H(X) = -plog(p) - (1-p) log(1-p)$$

This expression can then be graphed as a function of $p$ using Matlab for analysis.

Figure 3: Entropy vs p for a binary random variable

The most illustrative case is when $p=.5$ and the entropy equals 1. This conveys that if the coin is equally likely to be heads or tails, we are uncertain about 1 coin flip. In other words, there is no way to predict what the next coin toss will be, because both outcomes are equally likely. Extending this to the idea of bits, if a zero or one is equally likely for our one-bit message, we are uncertain about one bit in the message. From this intuitive standpoint, we can examine bits as a fundamental unit of measure for information. The entropy or "uncertainty" goes down when we move away from $p=.5$ because that provides us with more information making the outcome more "certain".

Mutual Information

Assume we have two discrete random variables $X$ and $Y$ which represent possible binary messages in our system that are distributed according to $p_X$ and $p_Y$ respectively. Then, the mutual information $I(X;Y)$ can be defined as

$$\begin{eqnarray} I(X;Y) &=& H(X) - H(X|Y) \nonumber \\ &=& H(Y) - H(Y|X) \nonumber \end{eqnarray}$$

Figure 4: Venn diagram showing the relationship between entropies and mutual information

The mutual information is a measure that describes how much information two random variables share. Another way to think about mutual information is that it is a measure of how much information we gain about one random variable from knowing the other variable. This is reflected in the calculation because $H(X|Y)$ is the uncertainty of $X$ based on $Y$ and if we take that away from the total uncertainty of $X$, $H(X)$, we end up with the "certainty" of $X$ from knowing $Y$. This relationship is summarized in Figure 4 which demonstrates how the information we gain and lose is related to each other and reflected in the mutual information equation.

To illustrate, imagine you roll a six-sided die where $X$ represents the value and $Y$ represents whether the roll was even. These two random variables will have mutual information because Y tells us information about $X$. Calculating the mutual information, $H(X) = log_2(6)$ because each value is equally likely and $H(X|Y) = log_2(3)$ because for each case of Y, the probabilities of possible rolls go up to one-third instead of one-sixth. Thus, $I(X:Y) = log_2(6)-log_2(3) = 1$ bit. Moreover, if we were to have $X$ and $Y$ represent a roll of a dice, respectively, then the mutual information is zero because the two events are independent, so we are unable to gain information about the other dice roll from knowing about either dice roll.

Channel Capacity

Assume we have our channel input $X$ and channel output $Y$, which has experienced noise, then our channel can be characterized by the channel capacity

$$C = \max_{p_X(x)} I(X;Y)$$

By taking the maximum amount of information that is shared, the channel capacity defines the maximum rate (bits/channel use) at which we can send reliable information over the channel because the maximum rate is equivalent to the maximum amount of shareable information. This is important because the channel capacity sets a ceiling on the rate at which channels can operate providing us with a lower bound on our performance that we strive to reach.

Basic Models of Noise: Channels

Now, let's take a look at two widely used channels that are used to simulate the performance of a system, the binary symmetric channel (BSC) and the additive white Gaussian noise (AWGN) channel. Knowing their structure and capacity provides context to the performance of codes we will discuss in the next section and materializes our abstract discussion of channels.

The binary symmetric channel

Figure 5: Diagram of a BSC channel [2]

The binary symmetric channel is symmetric and has a probability error, $p$, which defines the probability of a bit flip. So, if we send a zero, it will become a one with probability p and a one will become a zero with probability p.

Let's calculate the capacity to characterize this channel. To find the capacity, we need to maximize $H(Y)$ and minimize $H(Y|X)$ to find the max of $I(X|Y)$. From our discussion of entropy, we know that with a binary system, our entropy maxes out at 1, so we set $H(Y)$=1.

Ultimately,

$$\begin{eqnarray} H(Y|X) &=& \sum_{x\in X} p(x)H(Y|X=x) \nonumber \\ &=& p(0)H(Y|X=0) + p(1)H(Y|X=1) \nonumber \\ &=& .5(-p(Y=1|X=0)\log_2 p(Y=1|X=0) - p(Y=0|X=0)\log_2 p(Y=0|X=0)) + .5(-p(Y=0|X=1)\log_2 p(Y=0|X=1) - p(Y=1|X=1)\log_2 p(Y=1|X=1)) \nonumber \\ &=& .5(-p\log_2 p - (1-p)\log_2 (1-p)) + .5(-p\log_2 p - (1-p)\log_2 (1-p)) \nonumber \\ &=& -p\log_2 p - (1-p)\log_2 (1-p) \end{eqnarray}$$

$$\begin{eqnarray} I(X;Y) &=& H(Y)-H(Y|X) \nonumber \\ &\leq& 1+(1-p)log_2(1-p)+plog_2(p) \nonumber \end{eqnarray}$$

This capacity equation can be graphed with MATLAB.

Figure 6: Channel capacity of a BSC channel

It makes sense that our capacity is at zero when $p$=.5 because when there is an equally likely probability of a bit flip, we have no idea what the bit could be. Moreover, when there is no probability of error, our capacity is one because we can send the straight message bits, which is rate 1 since we send one bit per channel use. It may be perplexing as to why the capacity goes up when the error goes up past $p$=.5, but this occurs because if our $p$=1, all we have to do is flip the bits to get back to our original message, which is again rate 1 and reflected in Figure 6.

The additive white Gaussian noise (AWGN) channel

Figure 7: Diagram of an AWGN channel [2]

When examining the AWGN channel, we move away from binary input and now $X\in \mathbb{R}$. The noise that we add is $Z$ and $Z$ is distributed $\mathcal{N}(0,\sigma)$. So, $Y = X \oplus Y$.

The capacity equation derivation for the AWGN channel poses a couple of restrictions on power and is quite complicated. Thus, I will provide the capacity equation. The capacity of an AWGN channel is

$$C = \frac{1}{2} \log_2 (1+\text{SNR})$$

The capacity of an AWGN channel is beautifully simple, despite how complex the Gaussian function is, and is very useful because we only consider SNR, which is usually a defined measure. This capacity equation can be graphed with MATLAB.

Figure 8: Channel capacity of an AWGN channel

As we increase the signal power, we can more clearly understand our message, which supports why we gain capacity. However, we have demonstrated that the max capacity in a binary system is 1 because we are limited to sending only one bit per message. Therefore, to reach the higher levels of capacity depicted in Figure 8, we must use different modulation schemes to increase the number of bits per message. For example, to reach a capacity of 2, we must use a modulation scheme that has four constellations; in other words, we send two bits per message.

Forward Error Correction (FEC)

Under noise channels, the channel capacity informs us that there is a theoretical rate limit at which we can send data without error [3]. To reach channel capacity, we require forward error correction or channel coding because the noise prevents us from simply sending the bits. Forward error correction is a method to protect our information from noise by adding bits that increase the redundancy of our message and lower the chance of losing our message to noise.

Figure 9: Overall communication system with the addition of forward error correction [4]

We need forward error correction codes because communication systems inevitably face noise and that addition is reflected in Figure 9. The encoding step is where we add redundancy to our message and the decoding step utilizes the redundancy to recover the original message that has been altered due to noise. Ultimately, this is why we call these implementations FEC codes because we encode and decode our messages, which creates codes that our computers communicate with.

Repetition Codes [1]

We begin with a simple example of a forward error correction code, a three-bit repetition code. The three-bit repetition code sends 000 when the input is a 0 and 111 when the input is a 1. After the message travels through the channel, it will experience noise in the form of bit flips, and we must decode our message. Our decoding is decided based on Table 1.

Output	Decoded Output
000	0
001	0
010	0
100	0
101	1
110	1
011	1
111	1

Table 1: Three-bit repetition code

Our system is now more robust to errors because we require two bit flips to cause an error instead of one bit flip. Now, we are prepared to analyze the repetition code's performance. Presume we have $p$=.3 in a BSC. We examine the code over a BSC channel instead of an AWGN channel because we are looking at binary input-output, however, most codes are tested over an AWGN channel because it mimics nature and our input is often all real. If we were to send one bit, our probability of error is .3, but when we send three bits, we need to have one or two bit flips, so our probability of error is $3*.3^2*.7 + .3^3=.216$. We have reduced our error by around 10% because we applied forward error correction and added redundant information by repeating the bit. This demonstrates the power of forward error correction in reducing our percentage error, however, this code is not optimal because even if we are reducing the error, we are not reaching channel capacity. The rate for a three-bit repetition code has gone down from 1 to $\frac{1}{3}$, which demonstrates the tradeoff in code design between rate and probability of error.

We can reduce this error by increasing the number of repetitions, N. The code below plots the theoretical and experimental error rates of various N-bit repetition codes.

Figure 10: Performance of repetition codes based on N and p =.3

As we go higher in N, we need higher test cases, which requires too much computational complexity. Therefore, in Figure 10, there are fewer green dots, which are the experimental points, compared to points for the blue curve, which is theoretical. The dots match up well with the blue curve confirming our theoretical results. The oscillating behavior can be explained by the fact that for even number N, we have to consider ties, which increases the number of codewords that can be an error, so we see spikes for even N compared to odd N. Ultimately, for repetition codes as we increase N our error is shown to approach zero in Figure 10, which is desirable, however, we are moving away from capacity, so this code is not sufficient for practical applications, but a good way to understand how forward error correction improves error by adding redundancy.

Hamming Codes

Another simple example of a forward error correction code is the Hamming code. Hamming codes, developed by R.W. Hamming introduces the use of redundant parity bits that are used to determine whether a bit error occured during transmission. In odd parity checks, the parity bit is set to 1 if there are an odd number of 1's. Likewise, in even parity checks, the parity bit is set to 1 if there are an even number of 1's. Upon receiving the transmitted bit sequence, if the parity bits do not corresponnd with the data, we are confident an error has occured, either in the data bits, or the parity bits.

For example, suppose we wish to transmit a 4 bit long sequence using Hamming Code (7,4). Hamming Code (7,4) is the algorithm used to trasmit 4 bits of data with 3 redundant bits. Aside from Hamming Code (7,4), there exists many algorithms for various sizes of data, such as (16,11), (32,26), etc. These redundant bits are strategically placed at positions corresponding to powers of 2, counted left to right. Suppose the data to be transmitted is 1011. Then our data is as follows, where R1, R2, R3, and R4 represent our redundant bits.

Position	1	2	3	4	5	6	7
Bits	R1	R2	1	R3	0	1	1

The R1 bit is calculated by checking for odd parity in every bit position whose binary representation has 1 in the least significant bit, i.e., 1, 3, 5, 7, etc. Since position 3 and 7 are set to 1, we have an even number of 1's, so R1 is 0.

Position	1	2	3	4	5	6	7
Bits	0	R2	1	R3	0	1	1

The R2 bit is calculated by checking for odd parity in every bit position whose binary representation has 1 in the second least significantbit, i.e, 2, 3, 6, 7, 10, etc. Since positions 3, 6, and 7 are set to 1, we have an odd number of 1's, so R2 is 1.

Position	1	2	3	4	5	6	7
Bits	0	1	1	R3	0	1	1

The R3 bit is calculated by checking for odd parity in every bit position whose binary representation has 1 in the third least significantbit, i.e, 4, 5, 6, 7, 12, etc. Since positions 6, and 7 are set to 1, we have an even number of 1's, so R3 is 0. Thus, our final encoded bit string is 0110011.

Position	1	2	3	4	5	6	7
Bits	0	1	1	0	0	1	1

Now suppose we send the encoded bit string 0110011 and receieve 0111011, which has a bit flip in the 4th position. To decode our Hamming code, we need to calculate the check bits, which are found by checking for odd parity in the correspondinig positions of each redundant bit. Let C1, C2, and C3 be our check bits. Checking positions 1, 3, 5, 7, we have 2 1's, which is even, so C1 is 0. Likewise, Checking positions 2, 3, 6, 7, we see that there are 4 1's, which is even, so C2 is 0. However, upon checking positions 4, 5, 6, 7, there are an odd number of 1's, which signals that an error has occured, and C3 is 1.

Check bits	C3	C2	C1
Bits	1	0	0

In binary representation, the check bits C3 C2 C1 is equivalent to 4. Hence, we know that the 4th position has a bit flip. Flipping the 4th bit, we obtain our original transmitted bit string of 0110011, and removing the redundant bits yields the original encoded message, 1011.

Compared to repetition codes, Hamming codes can save significant bandwidth, since only 3 redundant bits are used to transmit 4 data bits. However, unlike repetition codes, a maximum of 1 bit flip can occur before Hamming codes are unable to recover the original message. Thus, it is imperative to choose an encoding scheme that is best fit for the noise levels of the operating channel. To this day, Hamming Code (7,4) still has numerous practical applications such as satelite transmission.

The vast ecosystem of codes

Although the simplistic repetition codes perform poorly, and are impractical, simple codes such as Hamming Code (7,4) has yet to be rendered obsolete. Each encoding scheme aims to reach the theoretical performance bound for a given noise level, and many are waiting to be discovered. A list of codes to explore along with their application space include

Reed Solomon Codes (used in CDs, DVDs)
Polar Codes (used in the 5G standard)
LDPC Codes (satellites and WiFi)
Turbo Codes (3G/4G standard and deep space)
Convolutional Codes and Viterbi Decoding (satellites and mobile devices)

Although I will not present the technical aspect of more codes due to their complexity, I will present the modern progress of coding. There have been codes found that approach the Shannon coding limit, which is defined by the capacity of the channel. In other words, there have been codes found that can reach channel capacity with a very low error rate close to zero. In Figure 11, we see codes compared to the maximum capacity of an AWGN channel indicated by the dark line labeled Shannon limit. LDPC codes have proven to be very close to capacity approaching in the simulations of this paper with around a .1dB gap, which is very small. It is important to note that codes are not one size fits all and there are always tradeoffs in complexity, rate, and bit error for all codes that create the need for designers.

Figure 11: LDPC performance under an AWGN channel [5]

Conclusion

Communication systems are vital to our everyday lives, as we rely on communicating with each other wirelessly for any operation or just to connect with loved ones. This article presented the basic architecture of communication systems and the problems that noise creates for our system. We presented the need for forward error correction to achieve capacity through a technical discussion of information theory and analysis of channels. Forward error corrections make modern-day communication possible and we always require more channel coding theorists to push the boundary. Thus, equipped with the structural framework of forward error correction this article has laid, I encourage you to explore more codes and delve further into the world of channel coding and make the world more connected as we increase the rate at which information can travel.

References

[1] D. MacKay, “Repetition codes,” www.inference.org.uk, May 10, 1997. https://www.inference.org.uk/mackay/itprnn/1997/l1/node6.html (accessed Feb. 10, 2024).

[2] T. M. Cover and J. A. Thomas, Elements of information theory. Hoboken, N.J.: Wiley-Interscience, 2006.

[3] C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.

[4] C. Liu, “What is FEC, and How Do I Use It?,” www.signalintegrityjournal.com, Jun. 01, 2019. https://www.signalintegrityjournal.com/articles/1284-what-is-fec-and-how-do-i-use-it

[5] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 619–637, 2001, doi: https://doi.org/10.1109/18.910578.

Appendix

%Code for plotting entropy of a coin flip
figure 
p = linspace(0,1,1001);
H = -p.*log2(p) - (1-p).*log2(1-p);
plot(p, H);
title("Entropy, H vs probability of heads, p");
xlabel('probability of heads, p')
ylabel('Entropy, H')

%Code for plotting BSC capacity
figure
p = linspace(0,1,1001);
C = 1+(1-p).*\log_2 (1-p)+p*\log_2 (p);
plot(p, C);
title("Capacity of BSC vs Probability of error, p");
xlabel('Probability of error, p')
ylabel('Capacity, C')

%Code for plotting AWGN channel capacity
figure 
SNR = linspace(1, 10.^(2.5), 1000);
C = .5*(log2(1+SNR));
plot(10*log10(SNR), C)
title("Capacity of AWGN channel vs SNR");
xlabel('SNR (dB)')
ylabel('Capacity, C')

%Simulation of repetition code performance
N = [1,3,5,6,7,8,9,10,20,30,40,50,60]; %Number of Repetitions
p = .1; %Probability 
numTrials = 100000; %Number of Trials
errorRate = zeros(1,size(N,2)); %Store experimental error rate
theoreticalErrorRate = zeros(1,size(N,2)); %Store theoretical error rate
for n = 1:numTrials
    message = randi([0 1], 1, 1);
    for i = 1:size(N,2)
        if message == 1
            encoded_message = ones(1,N(i));
        else
            encoded_message = zeros(1,N(i));
        end
        output= bsc(encoded_message, p);
        [numerrs,pcterrs] = biterr(encoded_message,output);
        if pcterrs >= .5
           errorRate(i) = errorRate(i) + 1; 
        end
    end
end
errorRate = errorRate./numTrials;

for i = 1:size(N,2)
    theoreticalErrorRate(i) = calculateErrorRate(p, N(i));
end

N = 1./N;
scatter(N, errorRate,[], "green", "filled");
set(gca,'yscale','log')
hold on
plot(N, theoreticalErrorRate, Color="blue");
title("Bit error rate vs Rate (1/N)");
xlabel('Rate (1/N)')
ylabel('Bit error Rate')

function P_error = calculateErrorRate(p, N)
    P_error = 0;
    % Start summing from the smallest number of errors needed to fail (more than half)
    start = ceil(N/2);
    % Calculate for more than half of the bits being in error
    for k = start:N
        P_error = P_error + nchoosek(N, k) * p^k * (1-p)^(N-k);
    end
end

An Introduction to Forward Error Correction - ECE-180D-WS-2024/Wiki-Knowledge-Base GitHub Wiki

An Introduction to Forward Error Correction

Introduction

Communication System Overview