System Architecture - Naetw/A-simple-ray-tracer GitHub Wiki

System Architecture

This page details the system architecture, physics/optics background knowledge, and several parts of code.

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/system-architecture.png

A-simple-ray-tracer is basically built on top of three different core modules:

Hittable (base class of object)
Material
Camera

How is the image rendered? Let's go through each module of the ray tracer in the top-down manner.

Camera

What is ray tracing?

In the real world, we use a camera to take a photo; while in the virtual world, we use a virtual camera to take a photo virtually. (use "render an image" henceforth for short and more precise since the word "render" holds the flavor of calculation.)

To know why we need a virtual camera to render an image and how a virtual camera works, we first need to grasp basic ray tracing.

Ray tracing is a rendering technique for generating an image by tracing the path of light and simulating the effects of its encounters with virtual objects (e.g., reflection, refraction).

Let's first see the mechanism of a real camera to appreciate the technique of the ray tracing.

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/a-real-camera.png

A real camera consists of:

An aperture, which controls how many light rays can pass into the camera.
A lens, which bends the coming light rays to make them focus on the image sensors.
Image sensors, which take in light rays for image rendering.

Implementation of a virtual camera is actually a simulation of the process that a real camera goes through when it takes a photo. That process is "light → objects → camera → image". However, a BIG issue needs to be addressed before we jump into writing the code. That is, the light source emits infinite rays that are impossible to follow all of them.

In fact, we don't have to follow all of the light rays since most of the rays will NEVER reach the camera (as dotted lines show in below):

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/a-real-camera-with-light-rays.png

For the rays that never reach the camera, the effort of tracing them is wasted. We can only trace rays that are guaranteed to reach the camera. The neat trick to do so is looking at the process "light → objects → camera → image" backwards.

Tracing rays backwards means that the process starts from the camera and then follows rays to pass through the lens and the aperture on its way out into the scene (outside the camera). And that's what ray tracing does. The two rays, traced from the light source and the camera, will be identical, except for the direction.

More detailed explanation can be found at here.

In summary, we implement a Camera class as a virtual camera, which applies ray tracing to simulate a real camera, but in a reverse manner.

Ray tracing in (virtual) Camera

Before digging deeper, we should have the mapping of components between a real camera and a virtual camera.

The component that the scene is projected and recorded onto:

A real camera uses image sensors to record the incoming light rays of the scene.
A virtual camera uses a view window to record the outgoing light rays from the virtual camera to the scene.

Consequently, the positions they place the component used for recording are different:

A real camera places the image sensors behind the aperture.
A virtual camera places the view window in front of the aperture (outside the camera).

The view window is placed outside because the effect of aperture is mandatory and the light rays are cast from the camera in the virtual camera scenario.

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/comparison-bt-real-n-virtual-camera.png

Now we know that the Camera class is the core of a ray tracer since it's responsible for casting the light rays out to the scene. Following is a geometric illustration of what the Camera class does for rendering an image:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/step-by-step-of-rendering.png

The involved steps are:

Determine the position of the virtual camera, the looking direction, and the size of the view window.
Erect a plane based on the parameters in Step 1 between the objects and the virtual camera. This is the view window through which the light rays will pass.
Construct light rays from the camera to the pixels on the view window.
Determine which object each light ray intersects and then compute the color for that intersection point (reflection and refraction involve).

Antialiasing

Antialiasing is an important feature that should be implemented. Without antialiasing, you'll see jaggies along edges of objects. When you use a real camera to take a photo, you usually see no jaggies. It's because the edge pixels are a blend of colors of foreground and background. Thus, to remove jaggies in the image generated by the ray tracer, we average a bunch of colors (samples) inside each pixel.

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/samples-in-pixel.png

The constructor of the Camera class determines the number of samples inside each pixel.

Camera::Camera(const Point3 &origin, const Point3 &look_at,
           /* some parameters */
           const uint32_t samples_per_pixel, const uint32_t max_depth)
: m_origin(origin), m_lens_radius(aperture / 2),
  m_samples_per_pixel(samples_per_pixel), m_max_depth(max_depth) {
  // ...
}

In the case of above picture, there're four samples per pixel.

The Camera class then averages the colors of these four light rays and the result of it is the final color of the pixel.

for (uint32_t s = 0; s < m_samples_per_pixel; ++s) {
    auto u = (i + getRandomDouble01()) / (image_width - 1);
    auto v = (j + getRandomDouble01()) / (image_height - 1);
    const Ray &ray = getRay(u, v);
    pixel_colors[i] += ray.generateRayColor(world, m_max_depth);
}

getRandomDouble01() is used for generating arbitrary points inside the pixel.

The effect of antialiasing:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/antialias-before-after.png

source: https://github.com/RayTracing/raytracing.github.io/blob/master/images/img-1.06-antialias-before-after.png

Defocus blur

Before talking about defocus blur, let's review some physics through understanding the effect of

The curvature of the lens
The size of the aperture

A converging lens converges the light rays into a single point. And the curvature of that lens affects the position of the point where the light rays converge.

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/different-curvature-lens-illustration.png

Here introduces some new terms - focus distance and focus plane.

Focus distance is the distance between the light poitn and the plane (usually the image plane) where the focus point is.
Focus plane is the plane where everything will be in perfect focus on the image plane.

When a source of the light is not at the focus plane, the light rays converge at the position behind/in front of the focus point. And that causes the blurriness:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/blurriness-illustration.png

As you can see, the light ray converges behind the focus point and consequently affects an area instead of a single point (pixel) of the image plane. The blurriness stems from this behavior.

Besides the curvature of the lens that controls the focus distance, the size of the aperture also affects the blurriness. The smaller the aperture is, the smaller the affected area is:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/effect-of-aperture-size.png

So, the effect of defocus blur (depth of field, in photography) is an effect of the combination of

The curvature of the lens
The size of the aperture

Here are some pictures that show the effect of tuning the curvature of the lens and the size of the aperture.

Focus distance at the bigger sphere with a bigger aperture:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/blur-10-5.png

Focus distance at the bigger sphere with a smaller aperture:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/blur-10-1.png

As you can see, the picture with a smaller aperture shows less blurriness.

Focus distance at the smaller sphere with a bigger aperture:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/blur-6-5.png

To achieve the same effect of defocus blur (depth of field) in the ray tracer, we simulate the functionalities of the aperture and the lens.

Aperture

The aperture is the key factor to generate the blurriness. By increasing the size of the aperture, we can get more light rays from the other places and they blur the pixel.

To control the size of aperture, we add a new member in the Camera class - m_lens_radius. When a Camera object casts a ray, the radius is used to generate an arbitrary point around the origin of the camera. In this way, we can simulate the effect of taking in more light rays when the radius (aperture) is bigger, as the following picture illustrates:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/effect-of-aperture-in-real-virtual-camera.png

The constructor of the Camera class determines the size of aperture

Camera::Camera(const Point3 &origin, const Point3 &look_at,
           /* some parameters */
           const double aperture, const double focus_distance,
           /* other parameters */ )
: m_origin(origin), m_lens_radius(aperture / 2),
  m_samples_per_pixel(samples_per_pixel), m_max_depth(max_depth) {
  // ...
}

The aperture takes effect when we try to get a light ray cast by the camera

Ray Camera::getRay(double u, double v) const {
    const Vector3 &radius_vector =
        m_lens_radius * Vector3::getRandomVectorInUnitDisk();
    // apply the offset on the coordinate system of the camera
    const Vector3 &offset =
        m_x_axis * radius_vector.x() + m_y_axis * radius_vector.y();

    const Point3 &new_origin = m_origin + offset;
    return Ray(new_origin, m_lower_left_corner + u * m_horizontal +
                               v * m_vertical - new_origin);
}

Curvature of the lens

Curvature of the lens determines the distance between the center of the virtual camera and the plane where everything is in perfect focus (analogy to focus distance). Instead of simulating a real lens, in computer graphics, we take advantage of the thin lens approximation and the way we render images - casting rays outward from a single point.

Thin lens approximation is an approximation that help us to simulate the real lens without the need of complicated calculations. Thin lens approximation establish a few rules. The most important one of them is that "any ray that pass through the center of the lens will not change its direction". And since the virtual camera casts rays outward from a single point, we can first illustrate the image formation with a real camera which aperture shrinks into a tiny hole:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/image-formation-in-pinhole-camera.png

Here, we have our camera focus on the plane at the distance fd and the image plane is formed at the plane at the distance f. Then we want to transform this into the same structure of the virtual camera, which can be achieved by applying similar triangles:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/image-formation-in-pinhole-camera-with-similar-triangles.png

Now you can see that we don't have to care about the situation inside of the camera (left side of the lens). By placing the virtual image plane outside the camera (right side of the lens) using similar triangles, we have the same structure as the virtual camera.

Next, we have to achieve the effect of the curvature of the lens with the help of the aperture. If we want everything is in perfect focus on the plane at the fd distance, we need to place the virtual image plane at there.

When the virtual image plane is at the focus plane (the plane where everything is in perfect focus), the light rays generated by the arbitrary points around the origin of the camera will have less chances to encounter the different object. And that means the color of the different object won't affect the pixel of some part of the focused object.

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/image-plane-at-focus-plane.png

As you can see, light rays from different points will still encounter the adjacent parts of the focused object. And that's what the focus is.

If we DO NOT place the image plane at the focus plane:

https://github.com/Naetw/A-simple-ray-tracer/blob/main/images/image-plane-not-at-focus-plane.png

The green color at the leg will be blended with the orange color at the head. And that causes the blurriness.

In summary, the distance where we place the virtual image plane do play the role of the curvature of the lens, which determines that the objects at the specific distance will be in perfect focus.

Reference

https://raytracing.github.io/books/RayTracingInOneWeekend.html
- Tutorial of implementing this ray tracer
http://wwwx.cs.unc.edu/~rademach/xroads-RT/RTarticle.html
- Great explanation of ray tracing
https://www.youtube.com/watch?v=xiH1eHcPn8s
- Great explanation of why images are blurry
https://phys.libretexts.org/Bookshelves/University_Physics/Book%3A_University_Physics_(OpenStax)/Map%3A_University_Physics_III_-_Optics_and_Modern_Physics_(OpenStax)/02%3A_Geometric_Optics_and_Image_Formation/2.05%3A_Thin_Lenses
- Great explanation of thin lens (in physics)
https://www.khanacademy.org/computing/pixar
- Great series of videos talking about ray tracing in computer graphics
https://physicssoup.wordpress.com/2012/05/18/why-does-a-small-aperture-increase-depth-of-field/
- Great explanation of depth of field (in physics)
https://www.cs.ubc.ca/~lsigal/teaching08/LS07_Cameras_Part1.pdf
- Great illustration of combination of thin lens approximation and the pinhole camera