Computer Vision: Edge Detection and Convolution - 180D-FW-2024/Knowledge-Base-Wiki GitHub Wiki
An Introduction to Image Processing and Edge Detection
The goal of this wiki article is to introduce the fundamental process of computer vision and image processing known as convolution. While its uses are vast, it is the underlying operation that allows us to extract key features, including the main topic of this article, edge detection.
Convolution
As Wikipedia summarizes it, “convolution is a mathematical operation on two functions (f and g) that produces a third function (f*g),” and the formula to find the outputted formula is
Figure 1: Convolution Eq.
Seems confusing right? Well, the formula can be confusing, instead, the process can be summed up by taking a series of steps. To keep notations clear, let's say we have 2 formulas f and g the convolution of f and g notated as f*g we would:
1. Flip g
2. Drag g across f
3. Calculate how much of the intersection
4. Drag g across all the way across
If you are still confused, don’t worry! When doing convolution on an image, the process is much simpler. In this scenario, we have an image and a kernel. Within this kernel, each square has a weight that will be used to calculate the final value of the pixel, and this kernel is dragged across every pixel of the image to produce a new image. The middle of the kernel must be dragged across pixels of the image; for the pixels at the edge, the excess squares will have a value of 0 (this is known as padding)
So, the next logical question is how these weights are determined and what their purpose is. Different kernels result in different images. A kernel with a positive center value and negative surrounding values allows an image to be sharpened. Similarly, they can be used to blur images, remove noise, and invert the colors of an image. The possibilities and applications are endless.
Central differencing
One of the most common methods for edge detection is through the use of gradients. However, before diving into the specifics, let's first define an edge. An edge in image processing is a mark that delimits two objects in an image and is characterized by a change in pixel intensity. To simplify things, let's use a black-and-white image.
Figure 2: Base image
From this image, we want to produce an image that outlines and looks like
Figure 3: Image with Prewitt filter applied
As described by its definition, an edge is a rapid shift in intensity so by taking the derivative, we can get a local maxima/minima that signifies an edge. Being a discrete structure, we use the formula f(n+1)-f(n-1)/2 which then can be represented as [-.5 0 .5] in a matrix to compute the derivative of an image. Being a 2D image, 2 gradients must be found, a gradient on the x-axis (finds vertical edges) and a gradient on the y-axis (finds horizontal edges). Their matrices and their application can be seen below.
Figure 4: 2D gradient matrix
Figure 5: Base image with gradient vector applied
Further alterations can then be made to this principle matrix to further emphasize edges. There are two common variations: Prewitt and Sobel. In short, both these filters place greater importance on localization (how the surrounding pixels are and how they affect the image) with Sobel highlighting the pixels closer to the point of interest and Prewitt not. Their respective matrices and applications can be seen below.
Figure 6: Filter Kernels
- (a) Prewitt filter kernel
- (b) Sobel filter kernel
Figure 7: Image with Sobel filter applied
Figure 8: Image with Prewitt filter applied
While it may seem perfect, central differencing is extremely prone to noise which is also what the Prewitt and Sobel filter somewhat attempt to solve. The best way to explain this is that noise creates high frequency zones (zones with rapid change in lighting values) within an image, essentially a false edge in an image. As a result, when the kernel slides over the adjacent points to the noise, it will cause the pixel to have a different value that what was expected. Ultimately, noise disrupts the edge detection process causing noise to be amplified.
Filter | Central Differencing | Prewitt | Sobel |
---|---|---|---|
Pro | Simple, Fast | Better on uniform lighting | Better localization, Best at handling noise |
Con | Most sensitive to noise | Less precise than Sobell in high gradient areas | Over smoothing, less effective on uniform lighting |
Bilateral filtering
The solution to the issue of noise is known as a bilateral filter. The bilateral filter aims to maintain the edges in a picture while also blurring out the noise that plagued the central differencing filter. Before introducing the filter, we must first introduce a Gaussian distribution and its application on a kernel. The distribution takes on the form of a bell where values at the extremities are less likely to occur while values near the center are more likely to occur. The drop-off and the width of the curve are parameterized by the value 𝝈. As for its use as a kernel, we can use the distribution to blur images where the pixel of interest (the center) is weighed heavily while the surrounding pixels’ influence steadily wanes as we get further from the pixel. This blur, known as a Gaussian blur, places heavy importance on localization allowing areas with noise to have surrounding pixels help determine the final value of the pixel. Therefore in the case where our kernel lands on a noisy pixel, the surrounding value are able to weigh into the pixel's calculation and drop the influence of the noisy picture.
Figure 9: Gaussian filter as a kernel (white represents 1, black 0, which is the weight multiplied to the intensity to decide the final value of the pixel)
However, with just the blur alone, we would also blur the edges. Therefore, we can add an extra factor that looks at the intensity of the surrounding pixels. Once again, we use a Gaussian distribution, however, this time in the intensity domain. We can finally come to the bilateral filter formula:
Figure 10: Bilateral filter Eq
Let's build the intuition for the filter. There are 4 terms in this equation:
$\frac{1}{W_p}$: This is a normalization factor that essentially serves to maintain the intensity values in the image. Without it, the output image would no longer resemble the initial image.
$G_{\sigma s}$: This value is just the Gaussian blur filter discussed earlier. P is the pixel of interest, Q are the pixels within the kernel. Serves as a weight
$G_{\sigma r}$: This is the value of the Gaussian distribution within the intensity domain. The Gaussian distribution formula is given in Figure 11 where x = (Ip-Iq). Serves as a weight
Figure 11: Gaussian Distribution formula
$I_p$ = the intensity value of the pixel of interest It must first be noted that μ is the mean. However, since we are centered around pixel IP and calculating values relative to it, we can essentially zero out μ. Now let's build the intuition behind this formula.
- Case 1: No edge
In the case when where $I_p-I_q$ (the intensity of pixel p (our point of interest) and q (a pixel in the kernel)) are similar, then $I_p-I_q$ ≈ 0. This ultimately results in $e^0$ and effectively causes the distribution to return $f(I_p-I_q) = 1$. Because of this, the Gaussian filter and the value of the pixel can take over.
- Case 2: An edge occurs
In this case, the intensity difference between I_p-I_q >> 0. For the sake of the argument, and to keep things simple, approaches ∞. As a result, e-∞ and the distribution returns $f(I_p-I_q)$ ≈ 0. Preventing the Gaussian filter from having effect on the pixel. The edge is maintained
- Case 3: Noise
In this case, the noise, creates a false edge in creating a rapid intensity change. However, since this rapid intensity change is not consistent within the kernel, the summation in the intensity of the pixel within in the kernel will ultimately cause $G_{\sigma s} ~= 1$ and allow the gaussian blur to take effect
Figure 12: Image with Gaussian filter and bilateral filter applied
Once again, this filter may seem perfect; it solves our biggest issues: detects edges and filters out noise. However, the bilateral filter’s biggest drawback is its non-linear nature. Unlike the center differencing filter where a simple kernel could be applied to all pixels of the image, the bilateral filter essentially computes a kernel per pixel making it computationally expensive. And especially in the use of real-time applications, this can cause issues with heat, speed, and responsiveness.
Advanced Edge Detection Methods
Canny Edge Detection
One of the most popular and reliable ways to find edges in an image is the Canny Edge Detection. Developed by John F. Canny in 1986, it remains a standard method in computer vision due to its high accuracy and relatively strong noise immunity. Canny Edge Detection typically consists of several steps:
Step 1. Convert to Grayscale (if needed)
- Most edge detection methods, including Canny, operate on a single intensity channel rather than color channels. If your image is already grayscale, you can skip this step.
Step 2. Noise Reduction with Gaussian Blur
- The first critical step is to apply a Gaussian filter (blur) to smooth out high-frequency noise. By doing so, you can reduce the risk of tiny fluctuations being mistaken for edges. $I_{blur}(x,y)$ = $G_σ * I(x,y)$ where $(G_σ)$ is a Gaussian kernel parameterized by σ.
Step 3. Compute Gradient Magnitude and Direction
- Once the image is smoothed, compute horizontal $(G_x)$ and vertical $(G_y)$ intensity gradients. From these, calculate the overall gradient magnitude and gradient direction: Magnitude = $\sqrt{G_x^2 + G_y^2}$, Direction = $\tan^{-1}({G_y}/{G_x})$. Areas where the gradient is large are likely to be edges.
Step 4. Non-maximum Suppression
- To ensure edges appear thin (one pixel wide), each pixel’s gradient magnitude is compared with its neighbors along the gradient direction. If the pixel’s gradient is not a local maximum, it is suppressed and set to zero. This step sharpens the edge outlines.
Step 5. Double Thresholding & Hysteresis
- The final step uses two thresholds: a high threshold and a low threshold. Strong edge pixels exceed the high threshold and are accepted as edges. Weak edge pixels fall between the low and high thresholds. Through a connectivity analysis(known as hysteresis), these weak edges are classified as true edges if they are connected to any strong edge.
Figure 13: Progression through the five steps of Canny Edge Detection
Canny Edge Detection offers several advantages, including high accuracy, strong noise resistance, and effective edge connectivity. By applying Gaussian smoothing before gradient calculation, it reduces random noise, resulting in well-defined and clean edges. The hysteresis step further ensures that valid edges remain continuous, preventing them from being fragmented by minor intensity variations. Despite these benefits, this method can be relatively slower than simpler filters (like Sobel) because of its multi-step process, especially in real-time applications. However, with hardware acceleration (like GPU) or the use of optimized libraries (like OpenCV), it is often possible to achieve near-real-time performance for moderately sized images.
Laplacian of Gaussian (LoG)
Laplacian of Gaussian (LoG) is another powerful edge detection method that combines Gaussian smoothing with the Laplacian operator (a second derivative), sometimes referred to as the Marr-Hildreth operator. The process can be summarized in three steps:
Step 1. Gaussian Blur to Reduce Noise
- Before detecting edges, a Gaussian filter is applied to smooth out high-frequency noise. This is crucial because the Laplacian operator, being a second derivative, is very sensitive to rapid intensity changes, including noise.
Step 2. Apply the Laplacian
- The Laplacian($∇^2$) highlights areas of rapid intensity change by computing the second derivative of the image. In practice, you can convolve the image with a LoG kernel, which effectively merges both the Gaussian blur and Laplacian steps into a single operation.
Step 3. Zero-crossing Detection
- After applying LoG, edges are typically located where the response crosses zero. These zero-crossings usually indicate boundaries between objects or regions of different intensity.
Laplacian of Gaussian (LoG) effectively combines smoothing and edge detection by using a Gaussian blur to reduce noise while the Laplacian pinpoints areas of rapid intensity transition, resulting in distinctive and thin boundaries through zero-crossing edges. Despite these advantages, LoG can be sensitive to noise because second-derivative methods may amplify certain noise patterns. Also, parameter selection, particularly the Gaussian sigma (σ), is important because a large σ can miss finer details but is more robust to noise, whereas a small σ detects finer edges but is more sensitive to noise. Overall, LoG is powerful for applications requiring precise edge localization, although it demands careful parameter tuning and can be more computationally expensive than basic gradient-based filters.
Conclusion
Edge detection has become an important feature in the field of computer vision. Being able to separate the difference between two objects within an image has vast uses outside of just object detection.
Medical Application:
- Tumor boundary detection
- Organ segmentation
- Blood vessel tracking
Industrial Application:
- Quality control inspection
- Part measurement
- Defect detection
Autonomous Vehicle Application:
- Lane detection
- Obstacle identification
- Traffic sign recognition
Security Application:
- License plate recognition/detection
- Motion detection
Sources:
https://en.wikipedia.org/wiki/Convolution