Artificial intelligence (AI) is a term you must have heard, even if you are from the IT world. AI is when machines and computer systems simulate human intelligence processes. Right now, AI is literally taking over the world – at least 90% of tech giants invest in it. According to the Data and AI Leadership Executive Survey, the number of AI-friendly companies participating in the survey has doubled in one year. Another survey states that half of the interviewed companies use AI.
Some more specific applications of AI include expert systems, natural language processing, speech recognition, and machine (computer) vision. The latter type of AI – computer vision – has already been integrated into road traffic, bank payments, and social networks. For the last decades, AI vision has learned to solve many tasks with an accuracy reaching the human one.
“As many others have noticed and pointed out, the neocortex has a highly uniform architecture too across all of its input modalities. Perhaps nature has stumbled by a very similar powerful architecture and replicated it in a similar fashion, varying only some of the details. This consolidation in architecture will in turn focus and concentrate software, hardware, and infrastructure, further speeding up progress across AI. […] Anyway, exciting times.” – Andrej Karpathy, hunted by Elon Musk to develop computer vision for Tesla, tweeted about the AI vision.
Many companies have started using computer vision in artificial intelligence tasks. Karpathy is working on AI-driven cars. NASA uses AI vision to track astronauts, and the police use it to track criminals. AI vision has became a solid part of our daily routine. Do you notice where computer vision works for you every day? We bet you use it daily. At least, you do so if you are an Amazon, Apple, or Google client.
Considering that computer vision has already become a part of our lives, it’s time to learn how AI vision works and decide whether to rely on it. Five years ago, we thought of AI as a “child.” Has it grown enough to be relied on? We recommend you to decide on your own after you find out:
- What inspired people to develop AI vision
- Whether AI vision has similarities with a human one
- How AI vision works
- Where we meet AI vision
- What Computer Vision Isn’t Capable Of
The Idea of Computer Vision
Once, people decided to teach computers to act as a brain. The idea belonged mainly to psychologist Frank Rosenblatt. Many call him the father of AI. In the late 1950s, Rosenblatt made a computer simulate a neural network with the help of biology and math. To learn something, the neurons in the human brain build connections. This principle laid the foundation of artificial intelligence.
MIT co-founder Marvel Minsky made the next step. He expected his student to teach the computer to describe everything it “saw” throughout the summer. It’s worth saying that it was a summer project and it failed. Although the computer still wasn’t able to recognize images accurately, it recognized the edges of the objects in pictures.
AI vision was first applied to printed texts of any font (optical character recognition) or even hand-written texts (intelligent character recognition). It was already possible in the 1970s. After this breakthrough, a lot is being done in business, entertainment, transportation, healthcare, and everyday life.
The 1970s were crucial for computer vision as many of its technological basics appeared in that decade. In the 1980s, computers could already accomplish complicated tasks. Thanks to David Marr and others, AI could see curves and edges, and notice similar visual patterns. Later, the computer was able to recognize not only lines but also shade, focus, and texture. This happened thanks to the Convolutional Neural Network that boosted image processing.
In 2001, AI was already able to recognize faces. Since the AlexNet project in 2012, AI vision has been making fewer mistakes, and now it’s much more accurate. Of course, it’s still a difficult task for AI to recognize a cat in a downward pose. Anyway, it can learn how to do that. Huge efforts were made by the ImageNet team, which attracted more than 50,000 people worldwide to tag images manually. It helped AI learn some patterns and be able to continue studying on its own.
Is computers’ vision similar to living things’ one?
The idea of CNN (convolutional neural network) is based on the neuron principle. CNN consists of layers that recognize image patterns gradually from simple to complex ones, from lines to whole faces. Artificial layers are similar to the layers of neurons in a brain. Artificial neurons are called perceptrons, and CNN is a network using these perceptrons.
Speaking about human vision, some neurons get activated when particularly exposed to vertical lines, others – to horizontal or diagonal ones. That’s what Hubel and Wiesel described in 1962. Dividing specific tasks to separate artificial neurons is what CNN does too.
Perceptrons evaluate information differently or, speaking mathematically, artificial neurons weight inputs differently deciding which of them are important. Our brain filters information in a similar way. We can’t remember all the faces we see during the day. We save only valuable information. What about neuronal layers?
The cerebral cortex keeps neurons in six horizontal layers. These layers differ by neuron type and their connections. However, neural signaling doesn’t actually go through all the cortex layers in a hierarchic manner. Signals don’t necessarily move from the first layer to the last one.
The way information is transmitted throughout the neurons doesn’t depend on layers’ topology. In CNN layers, it does. CNN uses the neuronal layers principle in a different way: Information is gradually passed from layer to layer.
All of this came from “neurocognition” proposed by Kunihiko Fukushima in 1980. He introduced two basic types of CNN layers: convolutional layers and downsampling layers. These layers contain units similar to different neurons, which can process visual information of different complexity. Fukushima, inspired by these cells, proposed a cascading model in which neurons pass information in a hierarchical way: From layer to layer.
Investigating human vision did lead to the appearance of artificial intelligence vision. Now, computer systems recognize complex worlds even in motion. Moreover, they learn by themselves how to do it more effectively.
AI and Computer Vision: How Do They Relate?
Computer vision became possible due to several achievements. Math, biology, programming, and engineering are often combined to develop an AI product. Computer vision can be called an AI vision as it is based on AI technologies. Also, machine vision partially relates to computer vision. Their technologies are often combined. Anyway, computer vision is more common for many tasks like monitoring products on lines or reading QR codes. So, how does it work?
Pixels: AI sees colors and lines
To be precise, AI recognizes patterns. It processes millions of images to be able to make conclusions about them. This is where deep learning takes place, making a system learn.
Images are made of pixels. Pixels have their codes, and every image is stored as data consisting of these codes. All the colors are based on red, blue, and green ones (as in the RGB model, for example). It means that every particular color has three values. While we see dogs, the computer sees numbers. For example, AI understands orange pixels as a set of (255, 165, 0) numbers. As a result, the computers see a grid of such numbers instead of the image.
If a computer deals with a 1920*1080 pixel image, then it has to read 2,073,600 pixels. To recognize a dog in this picture, the computer has to see some patterns throughout all the pixels in the image. We do a similar thing: firstly, we notice the objects’ features that are simple and familiar to us. This is why we can distinguish a dog from a car by having only their silhouettes.
Computers try to distinguish familiar patterns too – see lines or shapes that are associated with something from the computer’s database. The more matches the database contains, the more chances that the computer will categorize the image correctly.
Technology: Brain-inspired CNN
Convolution is a mathematical function combined with the principles of a neural network into the Convolution Neural Network. CNN has layers as the cortex does. It has layers that gradually filter image features from simple to difficult ones:
- Input layer;
- Convolutional layer;
- Pooling layer;
- Dense layer.
The core of CNN is the convolution layer. Think up the image as a grid of numbers again. On this layer, thanks to multiplying the grid on the convolution matrix (CM), a computer can extract features from an image. After CM has been multiplied on each grid’s cell, we get a transformed grid. The computer understands its values as features like edges or lines, and their patterns can be familiar to the AI database.
Convolution is run many times to make predictions about the patterns and check their accuracy. The neural networks will continue doing iterations until the accuracy is as high as possible. This relates to all the layers.
If we get 10 feature matrices as an output from the convolution layer, these 10 matrices are passed to the next layer as an input. Pooling and dense layers also work with an image for many iterations too. But their functions are different.
The pooling layer reduces the dimensions of feature matrices, thus summarizing the main information. The input image can contain many deviations from the simple object’s patterns: shades, rotations, or crops. They complicate the recognition of the object. At a pooling layer, invariant features that interfere with image processing are just down-sampled or reduced.
Finally, the dense layer has to classify an image using the output of previous layers. It has to deal with all the extracted image features from the previous layers and name the objects from that image. The deep layer is a fully-connected layer, called so because of the highly-interconnected artificial neurons. Other layers lack this power.
Convolutional layers contain neurons connected only with the previous level. It’s not enough for an object’s prediction. The deep layer copes with this task by using many interconnected neurons at the same time. Basing its prediction on the extracted features from the previous layers, the deep layer is where artificial intelligence vision reaches its high accuracy.
At the programming level, image processing doesn’t look like a simple image filtering within the hierarchy of layers. In different cases, AI deals with different amounts of layers and different iterations of image processing, and does it in a different amount of time.
Considering that AI has to process billions of images to understand the complex modern world, we imagine people sitting and trying to fill in its database pretending AI to be their student. Now, AI is trying to study on its own. AI is a smart “child” that needs only material to start.
AI teaches itself: Deep learning
To be able to recognize objects in images fast, AI needs lots of materials. First face recognitions were possible due to the manual processing of photos. People marked features on face photos, and AI had only to compare new faces with its ready database. AI didn’t work automatically, and the error was too large. To accomplish such difficult tasks of computer vision, machine learning is used.
Now, AI uses deep learning technologies to learn on its own. AI mostly doesn’t need people after it has been fed with some database. People don’t explain every single rule to the AI. They apply statistical learning algorithms – logistic regression, decision trees, linear regression, and support vector machines – so that AI starts remembering new patterns on its own. Deep learning captures features automatically, and people don’t have to do it manually.
To train, AI still needs material introduced by people in the first stages. To recognize a dog, developers have to show many dogs to a computer to prepare it. Later, AI will continue teaching itself while processing new images. It also means that AI will not only look for corresponding images from its database, now, it also knows how to classify brand-new images if something similar has already been uploaded or seen.
Many AI tech giants share their work with social network giants like Meta and Google or leave it open-source. It enables gathering big data, sharing it, and giving AI more possibilities to study.
Thanks to the early computer vision technologies that worked with big data manually, many modern AI vision technologies accomplish specific tasks. Today, AI vision is being developed by thousands of teams worldwide.
For example, the YOLO algorithm enables real-time object detection and tracking. Its task is not just to detect an object in the shot but to associate all the information from the previous shots. The You Only Look Once principle means that the neuronal network processes an image only once to detect all the objects. Then it watches them. It’s possible due to the deep layers and deep learning.
Now, computer vision is almost a self-sufficient technology that makes some predictions better than people do. In the study funded by Google, deep learning algorithms detected cancer cells in the breast with an accuracy higher than radiologists do. AI systems show a reduction of 5.7%and 1.2% (USA and UK) in false positives and 9.4%and 2.7% in false negatives. A good argument for trusting AI, isn’t it?
From Stores to Tractors: Computer Vision Applications
What can computer vision tell us about an image? We know that it can detect objects and even track them in real-time. What else? Using Google Street View, vision AI that captured cars on American roads predicted incomes and even voting patterns in different cities’ areas. For example, the citizens are likely to vote for Democrats if there are more sedans than pickups in that city.
Another thing AI can do for people is to count animals in national parks. AI software called Wildbook automatically identifies species by their appearance. This AI vision can recognize unique coat patterns or other features like ear outlines or fluke. Wildbook has a database of 20 species. Now it cooperates with Microsoft AI for Earth Programme to solve different environmental problems. We deal with giraffes or jaguars not often, and such stories don’t cut us to the heart as much as AI that we meet daily.
Snapchat and Amazon
Did you know that you can focus on any product with a Snapchat camera, and AI will show you this product on Amazon? If you visit a physical Amazon store, computer vision will watch you and tell its developers how you behave. AI can extract analytics from the whole shopping journey: from recommending a parking lot to gathering emotional data to making predictions about the products that are interesting for a customer.
Behind the scenes, AI also helps at the manufacturing stage. Using machine vision, product lines are monitored for defective goods or packaging. By the way, reading barcodes is what Optical Character Recognition (OCR), a type of machine vision, does when you buy something.
It’s likely that a big part of retail will implement AI vision soon. Different teams are already working on new technologies to detect and track the products so that these technologies may become cheaper. Thus, more stores will be able to apply for them.
Amazon delegated AI so much work that the company established AWS Panorama, a separate project that sells computer vision services for different businesses. For example, they helped an airport cope with queues. AWS also helps a gas exploration company monitor workers’ social distancing and detect oil leaks. Playing Fender guitar? AWS knows how much time was spent on the production of a guitar. It helps Fender to monitor how long it takes to produce a guitar and what manufacturing spots can be optimized.
There are many more examples only of Amazon’s AI vision. Now, imagine how many tasks are solved by AI vision every day when taking into account that every tech giant works with AI.
John Deere tractors
John Deere combines have been taking care of fields for almost 200 years. The company is gradually implementing AI technologies with the speed of a tech giant. In 2020, John Deere developers released a concept of the semi-autonomous tractor, which could find optimal routes between crops, analyze harvest quality, spray herbicides accurately, and remove weeds on its own. All of these features were made with computer vision.
To analyze crops and spray herbicides, we don’t necessarily need a tractor. Drones can do that too. Using drones anticipates us to the Precise Agriculture and solve the problem of food losses. Nearly 15% of food is lost annually during the harvesting and drones can decrease this number.
Computer vision can help humanity cope with hunger. In agriculture, vision AI offers solutions on how to minimize the harvest losses. Thus, a predicted 10 billion population may face fewer supply risks. Also, we’ll need fewer herbicides if AI spays more accurately than people do. It may solve the ecological problem with extra herbicides.
Apple’s face recognition
This is the thing we use not daily but hourly. Starting from iOS 10, new iPhone models are unblocked by FaceID based on face detection algorithms. iPhone cameras track a face in real-time and allow authorization if the face belongs to the phone’s owner. In iOS, face recognition is not only used to unblock the screen but also to recognize people in photos. In this case, photos are sent to a cloud server to detect faces with the deep learning technology.
This is what Facebook did too. Until 2021. Facebook shut down face recognition due to weak law regulation and social concerns. This option wasn’t limited just by face recognition: an automatic alt text system also generated image descriptions for blind people. This system used Face Recognition to tell whether a person or friends were in the image. People continue to discuss this issue because it is where AI benefits society. What about fun?
Did you try exchanging your face with your friend’s face in any app? Or have you already seen what you would look like in your old age? Then you’ve tried realistic face manipulation. This AI vision technology is used not only to amuse users but also to make deepfakes. This is where computer vision becomes dangerous as deepfakes can be used to manipulate society.
It’s already been done with Russians watching the Ukrainian president’s deep fake video where he said that he didn’t cope with the war and was ready to surrender Ukraine, which was a lie.
What truly good things has facial recognition already done? Besides criminals detected on public cameras, vision AI can find missing children. New Dehli police traced almost 3,000 of 45,000 missing children just in four days thanks to facial recognition applied to a TrackChild database. One more example of how computer vision benefits our society.
Today, there is too much work for computer vision. AI examples can make up a list of hundreds of points. A few more are:
- Sports broadcasting: tracking ball, puck; predicting players’ performance.
- Healthcare: tumor detection, remote monitoring of a patient, medical imaging.
- Self-driving cars: Tesla and Google’s Waymo are not the only ones. There are many other semi-autonomous cars on roads already.
- Translating: open your Google Translate app and try to use visual real-time translation.
- Photo archives: New York Times cooperates with Google and uses its Vision API technology to digitize millions of photos from archives.
- Farming and wildlife: detecting and tracking animals in national parks or farms; detecting infection symptoms.
Speaking about healthcare, CNN and deep learning help doctors detect Covid. Using chest X-ray images, the Covid-Net team’s application – DarwinAI – predicts the disease with more than 92% accuracy. Due to its open-source database, the software has a lot of materials to learn from.
Not bad for a “teen” who helps humanity solve problems in retail, agriculture, social networking, and healthcare. It might be that AI has reached the possibilities of a grown-up’s intelligence. AI vision literally got into every life sphere. Though, there is something that AI is “too young” or not ready to cope with.
What Computer Vision Is Not Capable Of
The main limitation is not about AI not knowing something: It is a good deep learning “student”. The problem is that hardware often limits AI vision potential.
Machine learning demands high-efficient processors: CPU and GPU have to render high-quality images or videos. CPU capabilities are often not enough for computationally intensive tasks while GPU helps to accelerate AI vision computation. Thus, GPU frees up the CPU for tasks other than computer vision.
Besides efficient computers, computer vision needs edge devices. They get connected to cameras to gather data in real-time, thus saving time for processing data in clouds. Edge devices process data locally, and as a result, real-time data has no latency issues. By processing data locally, businesses can save money by getting the data processing done locally.
Getting an edge device is not a problem, but it is added to the “consumer basket” for computer vision, and the price gets higher. It’s hard to estimate how much a perfect computer for AI vision would cost. The sky’s the limit. On a common laptop, only simple AI vision tasks can be run.
AI12 lab researchers calculated how much it would cost if complex Google’s NoisyStudent deep learning tasks would run in a cloud-like Amazon’s AWS, for example. Considering that NoisyStudent works on CNN and includes 480 million parameters, the price would reach $10K – $200K (only for 340 million parameters).
Read more about the image recognition to understand how this technology works and why image detection revolutionizes business.
If combining machine and computer AI, there must be a camera with high resolution. If the goal is to track an object, then a machine needs a camera capable of recording high-definition streams. Add this to the price too.
Besides the hardware, another limitation is the lack of high-quality data. To teach AI to recognize objects, it has to be trained on labeled data with high-resolution images. Dealing with a bunch of low-quality X-rays, it’s hard for AI vision to predict disease. Also, there is often not enough data. Covid-Net succeeded because of the constant filling with new scans during the pandemic. Other projects may fail because of privacy issues that limit data accumulation.
Here, AI vision deals with another problem – ethics and law regulation. Several US states have already banned the facial recognition systems in police body cameras. Considering that AI can find a criminal or a missing child, it seems to be a problem of a weak law regulation which still remains pretty unclear now.
Racial and gender biases reached the AI vision too. In most cases, AI is trained on a dataset containing few images of women and people with darker skin. The problem is that it indeed leads to inaccurate identification – it’s not just an ethical issue.
On its way, AI vision will face many moral problems and will be challenged by society’s trust. Ethics, hardware, and poor-quality data challenge AI. However, the main issue is that AI still needs a human. It still needs manually labeled data.
However, it’s a matter of time before AI will solve problems more autonomously. Computer vision is no more a technological “child.” It seems to be a grown-up and we can already be proud of it. This is the time to remember its main achievements.
To Conclude: Computer Vision We Deserve
The main and most important points to consider when talking about the computer vision are the following:
- The principles of human neuronal networks inspired scientists to develop computer vision technologies that are similar to neuronal layers architecture.
- In the 1980s, computer vision started solving complex tasks to detect and track objects in images.
- CNN, based on living things’ principles, and deep learning are the main modern computer vision technologies.
- Today, computer vision is used in healthcare, retail, traffic, sports, agriculture, social sciences, and smartphones. There are many other attractive spheres where AI will be applied in a few years.
- We must agree that computer vision includes non-ethical and risky applications like any other digital technology. AI vision has simplified human lives not only at work but in the daily routine too.
- To speak about computer vision like a pro, read this or watch this.
Whether to rely on AI or trust it with your life (while driving an autonomous car, for example) is your personal choice. However, what you should accept, no matter what you think about all the high-tech stuff, is that AI has already been watching you since you opened your browser or unblocked your phone. Moreover, it keeps surrounding you every step of your daily routine. So the best thing to do is to be aware and knowledgeable about how computer vision is being developed and in what ways you can take advantage of it personally or business-wise.
Need a certain developer?
Leverage the top skills and resources to scale your team capacity.