My recent articles have been a series on neural networks where we go from the simple perceptron to complicated architectures and how to deal with common problems in deep learning. If you are interested, feel free to check the series here:
Neural Networks
One exciting area neural networks have made significant strides in is computer vision. Think AI for self-driving cars and face recognition!
However, the regular fully connected neural network that most people know about is not suitable for many real-life image recognition tasks. It works on the famous MNIST dataset, but it has small images of 28×28 pixels.
High-definition (HD) images have 1280×720 pixels. That’s approximately 1,000,000 pixels, which would mean 1,000,000 neurons in the input layer. Not to mention the millions of weights required for the hidden layers, rendering regular neural networks unsuitable due to the dimensional complexity.
So, what do we do?
Convolutional Neural Networks!
Convolutional neural networks (CNN) are the gold standard for the majority of computer vision tasks today. Instead of fully connected layers, they have partially connected layers and share their weights, reducing the complexity of the model.
For instance, for each neuron in a fully connected neural network layer, we would require 10,000 weight of an image of 100×100 pixels. However, a CNN can have only 25 neurons to process the same image.
In this article, we are going to dive into the fundamental building block behind CNNs, convolution.
Like many things in machine learning, CNNs are inspired by nature. Computer scientists looked at how the visual cortex in the brain works and applied a similar concept to neural networks.