102. AWS EC2 02 - qyjohn/AWS_Tutorials GitHub Wiki

This section covers the concept of instance-store volumes and EBS volumes. We introduce disk I/O observation tools such as dd and iostat. We also cover how to set up a RAID0 disk array using multiple instance-store volumes.

(1) Instance Store Volumes

If we look at the metrics of the different instance types, we notice that some instance types have "SSD Storage" while some other instance types have "EBS Only". Here the "SSD Storage" refers to the instance-store volumes, which are also called ephemeral volumes. An EC2 instance might have more than one instance-store volumes, depending on the instance type. The storage is located on disks that are physically attached to the host computer. Instance store volumes are ideal for temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content.

It is important to remember that if you stop / start your EC2 instance, the content on your instance-store volumes will be lost.

In the EC2 Console, launch an EC2 instance with Ubuntu 16.04 AMI, and m3.xlarge as the instance type. In "Step 2: Choose an Instance Type" you will notice that the m3.xlarge instance type has 2 x 40 (SSD) instance store volumes. In "Step 4: Add Storage" you will notice that the two instance store volumes are mapped to /dev/sdb and /dev/sdc respectively. You can certainly change these mappings, but we recommend that you don't. It is important to note that instance store volumes are available to your EC2 instance only when you specify them when you launch the EC2 instance. You can't launch an EC2 instance and then add the instance store volumes later.

When the EC2 instance is up and running, SSH into the instance and look at the block devices available. "xvda" refers to your root EBS volume /dev/xvda, which is 8 GB. The root EBS volume has one partition /dev/xvda1, which is mounted to mounting point / on the operating system. "xvdb" refers to the first instance store volume /dev/xvdb, which is mounted to mounting point /mnt. "xvdc" refers to the second instance store volume /dev/xvdc, which is not mounted to the operating system.

$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0    0    8G  0 disk 
└─xvda1 202:1    0    8G  0 part /
xvdb    202:16   0 37.5G  0 disk /mnt
xvdc    202:32   0 37.5G  0 disk

You might ask:"Aren't the instance store volumes supposed to be 40 GB each? Why I only see 37.5G in the output?" Let's add a "-b" option to the lsblk command, which shows the size of the volumes in bytes.

$ lsblk -b
NAME    MAJ:MIN RM        SIZE RO TYPE MOUNTPOINT
xvda    202:0    0  8589934592  0 disk 
└─xvda1 202:1    0  8581692416  0 part /
xvdb    202:16   0 40256929792  0 disk /mnt
xvdc    202:32   0 40256929792  0 disk

At this point you should know that 40256929792 / (1024 x 1024 x 1024) = 37.49. That's why you see 37.5G in the previous output. Also, you should know that 8589934592 / (1024 x 1024 x 1024) = 8, and that's why you see 8G for the root EBS volume in the previous output.

In Linux operating system, all devices have a (major, minor) number pair. The major number identifies the driver to be used for the device (e.g. hard disks, input/output devices etc...) while the minor number identifies the specific device (i.e. tells which bus the device is connected to). Going into how these numbers come from can be quite complicated, but you can take a look at your /proc/devices to get some general ideas. As you can see, 202 refers to "xvd" in /proc/devices.

$ more /proc/devices 
Character devices:
  1 mem
  4 /dev/vc/0
  4 tty
  4 ttyS
  5 /dev/tty
  5 /dev/console
  5 /dev/ptmx
  5 ttyprintk
  7 vcs
 10 misc
 13 input
 21 sg
 29 fb
 89 i2c
 99 ppdev
108 ppp
128 ptm
136 pts
180 usb
189 usb_device
249 bsg
250 watchdog
251 rtc
252 dimmctl
253 ndctl
254 tpm

Block devices:
  1 ramdisk
  2 fd
259 blkext
  7 loop
  8 sd
  9 md
 11 sr
 65 sd
 66 sd
 67 sd
 68 sd
 69 sd
 70 sd
 71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
202 xvd
252 device-mapper
253 virtblk
254 mdp

(2) EBS Volume

Instance-store volumes provides temporal storage for application, which will be lost after a stop / start operation, or when the EC2 instance is terminated. If you want your data to be persistent after stop / start operations, or after the EC2 instance is terminated, you will need to use the Elastic Block Store (EBS). EBS provides persistent block storage volumes for use with EC2 instances. Each EBS volume is automatically replicated within its Availability Zone to protect you from component failure, offering high availability and durability.

EBS volumes persist independently from the running life of an EC2 instance. After a volume is attached to an instance, you can use it like any other physical hard drive.

Please read the following AWS documentation to understand the basic concepts related to EBS, as well as how to use EBS volumes:

Amazon EBS Volumes

As an exercise, you should create a couple of EBS volumes, attach them to (and detach from) an EC2 instance, and observe the behavior using the "lsblk" command. Also, you should look into your /var/log/syslog and /var/log/kern.log to see what information was written to these log files.

(3) Observing Disk I/O

In Linux, people usually observe disk I/O activities using iostat, which is included in the sysstat package.

$ sudo apt-get install sysstat
$ iostat 
Linux 4.4.0-64-generic (ip-172-31-12-86) 	03/03/2017 	_x86_64_	(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.04    0.00    0.05    0.07    0.09   99.74

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              1.61        41.52        26.73     242196     155912
xvdb              0.08         0.95         0.01       5569         40
xvdc              0.06         0.75         0.00       4380          0

The CPU status line has the following columns:

%user: The percentage of CPU utilization that occurred while executing at the user level (this is the application usage).
%nice: The percentage of CPU utilization that occurred while executing at the user level with nice priority.
%system: The percentage of CPU utilization that occurred while executing at the system level (kernel).
%iowait: The percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
%steal: The percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
%idle: The percentage of time that the CPU or CPUs were idle and the systems did not have an outstanding disk I/O request.

The xvda line shows the disk I/O activities for the root EBS volume. On average there are 1.61 transactions per second (tps). Since the operating system is booted, there have been 242196 KB read from /dev/xvda and 155912 KB written to /dev/xvda, resulting in 41.52 KB/s for reads and 26.73 KB/s for writes.

So, how is the KB/s data calculated?

$ uptime
 02:04:26 up  1:37,  2 users,  load average: 0.00, 0.00, 0.00

As you can see, the operating system has been up and running for 1 hour and 37 minutes, which is approximately 5820 seconds. For reads, 242196 / 5820 = 41.61 KB/s; for writes, 155912 / 5820 = 26.80 KB/s. These numbers are pretty close to the above-mentioned iostat output.

You can use iostat to observe the disk I/O activities in real time. In the following command output, the iostat program uses a 1 second monitoring interval. When you do not have any disk I/O activity on the system, you see a lot of 0's in the output.

$ iostat 1
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
xvdb              0.00         0.00         0.00          0          0
xvdc              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
xvdb              0.00         0.00         0.00          0          0
xvdc              0.00         0.00         0.00          0          0

Open a new SSH connection to the EC2 instance, issue the following command to copy all the content on /dev/xvda to /dev/null (a null device), with a block size of 1 MB. We recommend that you read the following Wikipedia document to understand how dd, /dev/null and /dev/zero works.

$ sudo dd if=/dev/xvda of=/dev/null bs=1M
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 136.916 s, 62.7 MB/s

Now go back to the SSH window running iostat, you will see something similar to the following. In terms of CPU utilization, you have 24.43% iowait, and 75.57% idle. If you remember that we are using an m3.xlarge instance type with 4 vCPU, you will know that 1/4 = 25%. Obviously one of your 4 vCPU is waiting for disk I/O all the time, because the dd process is running in a single thread fashion. Since the other 3 vCPU are doing nothing, you have 3/4 = 75% idle. In terms of disk I/O, you have 454 transactions per second (tps), with a read throughput of 58112 KB/s. Remember that 58112/454 = 128, so the size of each transaction (one I/O request to the disk) is 128 KB. With this test, you know that your 8 GB root EBS volume on an m3.xlarge instance can read at approximately 60 MB/s at its peak performance.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   24.43    0.00   75.57

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn 
xvda            454.00     58112.00         0.00      58112          0
xvdb              0.00         0.00         0.00          0          0
xvdc              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   24.62    0.00   75.38

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda            468.00     59904.00         0.00      59904          0
xvdb              0.00         0.00         0.00          0          0
xvdc              0.00         0.00         0.00          0          0

(4) Disk I/O Characteristics of Instance Store Volumes

Now we have some rough idea about how to use the dd program to drive disk I/O, and how to use iostat to observe disk I/O. Let's do some experiments to see how your instance-store volumes performs. We will do this test on /dev/xvdb, which is your first instance-store volume.

$ sudo dd if=/dev/xvdb of=/dev/null bs=1M
38392+0 records in
38392+0 records out
40256929792 bytes (40 GB, 37 GiB) copied, 111.025 s, 363 MB/s

$ sudo dd if=/dev/xvdb of=/dev/null bs=1M
38392+0 records in
38392+0 records out
40256929792 bytes (40 GB, 37 GiB) copied, 114.293 s, 352 MB/s

$ sudo dd if=/dev/xvdb of=/dev/null bs=1M
38392+0 records in
38392+0 records out
40256929792 bytes (40 GB, 37 GiB) copied, 111.946 s, 360 MB/s

As shown in the command outputs, the test results are quite similar. On average we achieve 350 MB/s in read capacity, but from time to time iostat reports over 800 MB/s during peak time.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.65   17.31    0.27   80.77

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
xvdb           6323.00    809344.00         0.00     809344          0
xvdc              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.65   17.31    0.27   80.77

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
xvdb           6257.00    800896.00         0.00     800896          0
xvdc              0.00         0.00         0.00          0          0

Now we take a look at how fast we can write to the instance-store volume:

$ sudo dd if=/dev/zero of=/mnt/testfile bs=1M count=20000 oflag=direct 
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 181.982 s, 115 MB/s

$ sudo dd if=/dev/zero of=/mnt/testfile bs=1M count=20000 oflag=direct 
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 137.525 s, 152 MB/s

$ sudo dd if=/dev/zero of=/mnt/testfile bs=1M count=20000 oflag=direct 
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 139.974 s, 150 MB/s

It is interesting that the first test only achieved 115 MB/s, while the second and third test result achieved 150 MB/s. To understand the discrepancy, we need to understand the Ext4 file system lazy initialization (ext4lazyinit) process. In short, when creating an Ext4 file system, the existing regions of the inode tables must be cleaned (overwritten with nulls, or "zeroed"). This is a time-consuming process, especially if you have a big hard drive. With ext4lazyinit, the initialization does not happen immediately when you create the Ext4 file system, but rather occurs gradually in the background. However, if you attempt to write to an inode that is not yet initialized, your write operation will need to wait until it is initialized to proceed. As such, it is likely that you will observe a performance drop during disk I/O benchmark, if the Ext4 file system is not yet fully initialized. In the example shown above, the low performance observed in the first test was caused by uninitialized inodes. In the second and third tests, since the inodes have already been initialized, these tests achieved better performance.

Again, on average we achieve 150 MB/s in write capacity, but from time to time iostat reports over 800 MB/s during peak time.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.25   24.18    0.00   75.57

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
xvdb            669.00         0.00     82944.00          0      82944
xvdc              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.00   24.00    0.00   75.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
xvdb            702.00         4.00     87040.00          4      87040
xvdc              0.00         0.00         0.00          0          0

(5) Disk Array with RAID0

When you have two instance-store volumes of the same size, you can create a RAID0 disk array that is twice as big. The benefit of using RAID0 is that when you perform disk I/O the I/O can be done on two disks in parallel, which significantly improves disk I/O performance.

$ sudo mdadm --create --verbose /dev/md0 --level=0 --name=MY_RAID --raid-devices=2 /dev/xvdb /dev/xvdc
mdadm: chunk size defaults to 512K
mdadm: /dev/xvdb appears to contain an ext2fs file system
       size=39313408K  mtime=Fri Mar  3 00:27:12 2017
mdadm: /dev/xvdc appears to contain an ext2fs file system
       size=39313408K  mtime=Thu Jan  1 00:00:00 1970
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

$ sudo mkfs.ext4 /dev/md0
mke2fs 1.42.13 (17-May-2015)
Creating filesystem with 19640320 4k blocks and 4915200 inodes
Filesystem UUID: 29960103-7409-4784-85eb-3ddb08c6cb80
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

$ sudo mount /dev/md0 /mnt

As shown in the following test results, we can achieve approximately 850 MB/s read throughput with RAID0.

$ sudo dd if=/dev/md0 of=/dev/null bs=1M
76720+0 records in
76720+0 records out
80446750720 bytes (80 GB, 75 GiB) copied, 97.654 s, 824 MB/s

$ sudo dd if=/dev/md0 of=/dev/null bs=1M
76720+0 records in
76720+0 records out
80446750720 bytes (80 GB, 75 GiB) copied, 94.2401 s, 854 MB/s

$ sudo dd if=/dev/md0 of=/dev/null bs=1M
76720+0 records in
76720+0 records out
80446750720 bytes (80 GB, 75 GiB) copied, 94.6586 s, 850 MB/s

As shown in the following test results, we can achieve approximately 260 MB/s write throughput with RAID0.

$ sudo dd if=/dev/zero of=/mnt/testfile bs=1M count=40000 oflag=direct
40000+0 records in
40000+0 records out
41943040000 bytes (42 GB, 39 GiB) copied, 164.849 s, 254 MB/s

$ sudo dd if=/dev/zero of=/mnt/testfile bs=1M count=40000 oflag=direct
40000+0 records in
40000+0 records out
41943040000 bytes (42 GB, 39 GiB) copied, 160.171 s, 262 MB/s

$ sudo dd if=/dev/zero of=/mnt/testfile bs=1M count=40000 oflag=direct
40000+0 records in
40000+0 records out
41943040000 bytes (42 GB, 39 GiB) copied, 159.563 s, 263 MB/s

Now you have created a RAID0 device. Try to write something onto your RAID0 device. Reboot (not stop / start) your EC2 instance, then try to mount the RAID0 device to /mnt again. What do you see? Why would this happen? And how do you resolve the issue?

Extended reading:

RAID

Create an EC2 instance with the i2.8xlarge instance type. Experiment with the various RAID configurations (RAID0, RAID1, RAID5, RAID10, etc). Compare the disk I/O performance and storage capacity of these different options. During this process, take notes on the minimum requirements (the number of disks), as well as how many disks can be lost before you lose data, for each configuration.

Remember to terminate the EC2 instance when you are done with the experiments.

(6) Swap

When you run an application, your application access data in physical memory, which is usually referred to as RAM (ramdom access memory). If your application needs to access data on your hard drive, the data needs to be loaded from the hard drive into memory first. From time to time, your applications might require more memory than the amount of memory available on the system. This can be achieved by swapping. In short, modern operating systems use a technique call paging to store data from memory to disk, and load data from disk to memory. When the system needs more memory resources and the memory is full, inactive pages in memory are moved to the swap space, making space for the active pages. With this approach, applications can allocate more memory than the amount of physical memory available on the system. This makes it easier for developers to write applications. However, since disk access is far slower than memory access, there is a performance panalty in doing this.

While swap space can help machines with a small amount of RAM, it should not be considered a replacement for more RAM. Swap space can be a dedicated swap partition (recommended), a swap file, or a combination of swap partitions and swap files. In the old days, the best practice says "swap should equal 2x physical RAM for up to 2 GB of physical RAM, and then an additional 1x physical RAM for any amount above 2 GB, but never less than 32 MB".

On EC2 instances, we do not enable swap by default. The main reason is that if you need to access more memory, you can simply upgrade to a bigger instance type. However, there are still people who wants to enable swap on their EC2 instance.

On Ubuntu, you can check the amount of swap using the "free" command. For example, the following command shows that I do not have any swap:

$ free
              total        used        free      shared  buff/cache   available
Mem:       15664208      130480    15211008        8812      322720    15390900
Swap:             0           0           0

I am running an EC2 instance using the r3.large instance type, with 1 x 30 GB SSD instance store volume. I want to use the space on the instance-store volume as swap. This can be achieve by creating the swap space using mkswap, then enable the swap space using swapon.

$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   8G  0 disk 
└─xvda1 202:1    0   8G  0 part /
xvdb    202:16   0  30G  0 disk 

$ sudo mkswap /dev/xvdb
Setting up swapspace version 1, size = 30 GiB (32212250624 bytes)
no label, UUID=76d9a197-5d4b-4040-a5e1-91ac8f4a7aa3

$ sudo swapon /dev/xvdb
$ sudo swapon
NAME      TYPE      SIZE USED PRIO
/dev/xvdb partition  30G   0B   -1

$ free
              total        used        free      shared  buff/cache   available
Mem:       15664208       82964    15251672        8812      329572    15438216
Swap:      31457276           0    31457276

If for any reason you need to disable the swap, you can do a swapoff. As you can see, swapon and swapoff, they are quite easy to remember.

$ sudo swapoff /dev/xvdb
$ free
              total        used        free      shared  buff/cache   available
Mem:       15664208       74452    15260104        8812      329652    15446692
Swap:             0           0           0

Apart from the free command, there are some other ways to observe memory and swap consumption on your operating system. We recommend that you give a try to the utilities described in the following article. Think about how you can take advantages of these utilities for things other than monitoring memory and swap usage.

8 Useful Commands to Monitor Swap Space Usage in Linux

(7) Summary and Homework

In this tutorial, we introduce instance-store volumes and EBS volumes. We learn how to observe disk I/O activities using dd and iostat. We also learn how to set up a RAID0 disk array using multiple instance-store volumes.

As you homework, you will need to do the following:

(1) Launch multiple EC2 instances with different instance types to understand the performance difference of their instance-store volumes. You should at least test all the instance types for the C3, R3, and I2 product family. The final product will be a table with the EC2 instance type, single disk read and write throughput observed (averaged value of at least 3 tests), RAID0 disk array (of all instance-store volumes on the EC2 instance) read and write throughput observed (averaged value of at least 3 tests).

(2) Launch an EC2 instance with multiple instance-store volumes, use user-data to create a RAID0 disk array, format the disk array using Ext4 file system, and mount the disk array to /data.

(3) Create a big EBS volume (like 1 TB) and attach it to an EC2 instance. Change the instance type, as well as the "EBS Optimized" option, then run the dd and iostat commands to observe its performance. Create a table with the mappings between EC2 instance type and observed throughput.

(4) Launch an EC2 instance with an instance-store volume, and use the instance-store volume as swap. Do something (whatever it is) so that you can observe swap usage on using the utilities we described here. Write down your observations on when swapping occurs and when swapping disappears. You might want to read this article on swappiness for more information.

Please remember to terminate all EC2 instances when you are done with your exercise.

If you still have time, you should google for Ext4 file system lazy initialization, as well as the difference between sequential I/O and random I/O. Also, use IOZone to carry out some benchmark and compare the results with dd.