Machine Learning for Computer Vision - 180D-FW-2023/Knowledge-Base-Wiki GitHub Wiki

Introduction

Computer vision is what allows our computers to see. At the core of computer vision is machine learning, a subset of artificial intelligence, that allows that vision to not just be a set of pictures but information that can be comprehended, in some cases even superior to that of humans. Driving this new era are the machine learning methodologies, that allow the computer to process and learn from large sets of visual data. Giving machines the power to recognize objects and context.

Background

While knowledge of these topics is becoming more commonplace there are still large populations, even in fields of electrical and computer engineering, which are degrees closely associated with this work, that do not understand what the terms “machine learning” or “computer vision” even mean.

Turn to your left and state what objects you see. Computers, unlike us, do not have the natural ability to have an image presented to them and then instantly identify everything in that image. This is where computer vision in general comes into play, it is an attempt to take an image of something, for example, an orange and a dog, and to be able to know there is an orange and a dog in that image. It allows a computer to pick out faces in a crowd of people, enables the ability to use augmented reality, and even helps cars drive without a human driver.

Yet another thing we take for granted is our brains. We form complex thoughts and make decisions. A computer, however, can only follow commands. It typically does not learn from its successes or mistakes but instead, will continue to try and do the same thing repeatedly until someone tells it to do something differently. Machine learning is an attempt to allow a computer to imitate the way humans learn through the use of data and algorithms, to hopefully improve its accuracy.

Machine Learning's Role in Computer Vision

Types of Machine Learning in Computer Vision

Computer vision has come to be referred to as a subsect of machine learning but, there is a distinction. Nearly everything done in computer vision can theoretically be done using old methods before machine learning, with more image-processing approaches. The difference now is with machine learning the same things are done quicker and to a higher level of reliability.

There are many different ways to categorize computer vision algorithms with different ideologies on why or how they should be categorized. For this article, I will use the more traditional descriptions of supervised, unsupervised, and semi-supervised learning. In the realm of machine learning, reinforcement learning is also popular but is currently less used in computer vision.

Supervised Learning

With supervised learning, we provide a labeled dataset that is divided into a training set and an answer key or validation set.

This process follows the following steps.

Collecting data i.e. pictures of what we are trying to have our model be able to “see” and or distinguish for example pictures of animals. We must annotate each image with its correct answer; dog, horse, cat, etc.
We then train an AI/ML model to associate features with specific items using both the training and testing data set; which are pieces of information about the content of an image, such as color or shape.
Finally we test the model on a new never-before-seen data set to see how accurately it can identify our targets.
If the model is accurate we are done, otherwise our options are to retrain on the current data set further, train on new labeled images to further train, or modify how our model is trained.

Unsupervised Learning

Unlike in supervised learning with unsupervised learning, we do not label datasets the model must determine relationships on its own.

The process for unsupervised learning follows the same as supervised except labels are missing from the provided data and the model is forced to categorize on its own.

Semi-Supervised learning

Semi-supervised learning is the middle ground between the supervised and unsupervised where some of our images would be labeled and some would not.

Paradigms for Computer Vision

While machine learning is used for many of the paradigms of computer vision two of the big ones that I will touch on are vector machines and neural networks.

Support Vector Machines

SVMs work to create a binary between our data, a dichotomy of true and false. Creating a hyperplane that maximizes our margin between the two categories. Our margin is built through support vectors which are the data points closest to the hyperplane. One advantage to SVMs is they can handle nonlinear classification tasks by way of a kernel to transform the input into higher dimensions for division into the two categories. Some common kernels are linear, polynomial, and sigmoid.

Neural networks

A structure that is modeled after the human brain, with a network made from interconnected nodes called neurons. An overview of a neural network is the input layer which is where we pass the features to the network, the output layer where the predictions and or outcomes based on the inputs are finalized, and the hidden layers in between the input and output layers. There are weights between connections of neurons, the neurons use activation functions, like ReLU to introduce non-linearity to the network, and there is backpropagation of the error to adjust weights all in an attempt to reduce the difference between our predicted and actual outputs.

Deep learning is a subset of this that uses a multitude of hidden layers to create a more hierarchical representation of data. A good example of a deep learning convolutional neural network (The convolutional aspect makes it better suited for images) model used in computer vision is ResNet. Convolutional neural networks or CNNs are used in image recognition and classification.

Challenges and Limitations

Even with the leaps and bounds machine learning and deep learning specifically have pushed computer vision forward it is not without its faults.

The first is finding or developing high-quality diverse datasets that the models can be trained on. It is time-consuming and costly to label the vast amounts of data needed to train accurate models but without it, the models can be inaccurate or biased.

A complex neural network acts like a black box which makes it challenging to understand how a model arrives at a specific decision. The lack of this understanding can create distrust or lackluster confidence in the model's outputs, especially where the decision-making process is crucial.

Finally, the outputs of models are based on the inputs they receive if a would-be harmdoer can figure out what features are used to decide they can be influenced to either circumvent or deceive the model.

Application and Conclusion

Machine learning has become an integral part of computer vision so much so as machine learning evolves so does computer vision with it. The model's machine learning allows us to produce allows us to efficiently do image classification, object detection, and recognition, facial recognition, image segmentation, video analysis, medical image analysis, and augmented reality at accuracy levels never seen before. While machine learning has brought computer vision to unprecedented levels, there are still critical challenges that need addressing. These include decision transparency, data quality, and security, to ensure ethical and effective use of this technology in its various environments. The evolution and application of these technologies hold momentous potential. Still, a continued effort is necessary to navigate the challenges to leverage the full capabilities of machine learning in the realm of computer vision.