Humans are highly visual creatures. We gather the vast majority of information about our environments, our daily tasks and each other through visual perception. In fact, sight is so critical to our experience that nearly half the human brain is directly or indirectly involved in visual processing.
As a result, much of the world we’ve created for ourselves — the work we do and the obstacles we face — depends on some form of visual input. As we increasingly look to Artificial Intelligence (AI) to help solve a range of real-world challenges, this means computer vision will, by necessity, have a significant role to play.
In the simplest terms, Computer Vision (CV) refers to the use of machine learning to analyze, understand and respond to digital images or videos.
Applying deep learning algorithms to input from cameras allows visual information to be converted into data that can be processed and evaluated for patterns. By analyzing a selection of images, neural networks can be trained to recognize, classify and react to what they “see.”
But computers don’t learn to identify objects the same way humans do. Our broad intelligence means we’re able to use a wide range of attributes, as well as other senses, to identify, characterize and make inferences about many observations at once. We easily adapt our understandings to accommodate variations in size, angle, shape or color.
While computer vision can be trained to solve almost any visual problem, each model is extremely narrow in its capabilities. It’s only able to recognize what it’s been taught to recognize.
One of the most complex examples of computer vision comes from self-driving cars. Although the capabilities of autonomous vehicles may seem broad, they rely on hundreds or thousands of CV models, non-CV models and higher-level machine learning models working together. The system as a whole is capable of identifying, locating and responding to street signs, traffic lights, pedestrians and other vehicles on the road — but each function requires a specialized algorithm, fine-tuned to that task.
This narrow functionality makes computer vision extremely effective at addressing individual challenges. It also means CV models need to be carefully developed, trained and implemented with a specialized function in mind.
As tactics become more advanced and technologies from Internet Protocol (IP) cameras to edge gateways become less cost prohibitive, computer vision has shifted from the realms of possibility to accessibility for modern organizations. Grand View Research reports the value of the global computer vision market is expected to reach $19 billion by 2027, up from just over $11 billion in 2020.
Computer vision has enormous potential to address a variety of challenges from operations and logistics to employee safety, particularly in environments where human capabilities are costly, physically limited or high risk.
Computer vision can reliably execute the same task over and over without instances of human error caused by distraction or fatigue. It also enables observation of remote or dangerous locations with incredible accuracy 24 hours a day. Models can even be trained to process information at high speeds or in visual spectrums such as UV or infrared that would otherwise be invisible to the human eye.
So, what does the current landscape look like for computer vision? Which industries are best positioned to capitalize on this technology today and in the next few years?
Energy: From nuclear plants to wind farms, the need for safety, efficiency and regulatory compliance have resulted in a broad range of use cases across the energy industry. Forward-thinking organizations are already leveraging computer vision to monitor equipment for signs of wear, as well as to safely and effectively inspect the condition of linear assets such as power lines or pipelines. Correlating multiple vision models or other sensors to detect cracks, leaks or warning lights can help to pinpoint anomalies and provide early maintenance warnings.
In addition to the benefits of reliable, round-the-clock equipment monitoring, computer vision offers highly effective means of regulating “danger zones.” It can be used to identify authorized personnel badges in restricted areas or even provide alerts when an individual has crossed a designated safety threshold.
Manufacturing: Production lines have played host to some of the earliest applications of computer vision, with models trained to regulate machinery, count products or evaluate product quality. Beyond simply detecting defects, cross-referencing data from other cameras and sensors can help to pinpoint the source of production issues, accelerating repairs and preventing costly downtime. Machine vision models are even being used to associate packaging with product descriptions to prevent mislabeling or shipping errors.
As with the energy industry, computer vision also provides opportunities to improve employee safety in manufacturing environments. Image classification models are being leveraged to identify whether workers are wearing required equipment. If a system detects an employee without a safety vest or helmet, visual alerts or notifications can be sent to the appropriate parties to prompt corrective action.
Transportation: Beyond autonomous vehicles, the transportation and logistics industries can also benefit from more discrete applications of machine vision. From pallet counting and sorting to damage alerts and warehouse surveillance, this technology has the potential to streamline operations and reduce supply chain disruptions.
To ensure safety and efficiency among transportation fleets, some organizations have applied visual models to ensure proper docking, loading, fueling or tire pressure. CV-enabled aerial vehicles (drones) are being leveraged by modern railroad companies to conduct inspections along thousands of miles of railway. This solution reduces costly and sometimes dangerous fieldwork, allowing human inspectors to shift their focus from finding problems to fixing them.
Retail: Particularly within large retail environments, computer vision provides a variety of solutions ranging from inventory management to customer service. While positioning cameras to monitor every shelf throughout a store may not be realistic, or indeed cost effective unless executed at enterprise scale, CV models can be useful for monitoring critical stock in warehouses or specific displays.
Computer vision can also be used to monitor the store environment, sending alerts if a refrigerator door is left open or when checkout lines are getting too long. With the increased popularity of self-checkout, computer vision is increasingly being applied to shrinkage and loss prevention by correlating visual data with Point-of-Sale (POS) machines to ensure that items being scanned by customers match the appropriate product description.
Healthcare: Medical diagnostics are often among the highest profile applications for computer vision, with researchers and tech giants alike exploring the benefits of AI for risk assessment or early detection of disease. But while this area represents enormous potential for good, the potential human harm caused by a misdiagnosis has far greater implications than most other CV use cases. This means many additional layers of precautions are required for these models, including more vigorous training, narrower margins for error and more active human involvement in the decision-making process.
But beyond diagnostics, there are many other applications within the healthcare field where computer vision has demonstrated value with far lower risk. Vision models can be used to track handwashing among medical staff, providing reminders if it appears this step has been missed. CV systems can be used to track inventory in pharmacies or supply closets to prevent stock from running low. Optical Character Recognition (OCR) is also increasing in popularity as a way to automate document processing, reducing administrative burdens and ultimately lowering the cost of care.
The measurable benefits of lower costs, increased efficiency and reduced downtime mean there’s significant Return on Investment (ROI) to be captured through computer vision. And thanks to the growing democratization of AI, intelligent edge and cloud solutions, these benefits are increasingly within reach.
But while the hardware, and even many off-the-shelf CV architectures are readily available, there are a few key points to consider before kicking off your pilot project.
First and foremost, you’ll need to make sure you have the knowledge and resources to build, implement and operationalize a highly accurate CV system at scale. Computer vision is a unique field that requires a specialized skill set. If your organization already employs a team of data scientists, this is a great starting point, but unless your team has successfully implemented a CV model in the past, it’s well worth the time to engage with an experienced consultant that can guide your project from ideation to execution.
Once you’ve assembled your team, you’ll need to analyze the problem you’re trying to solve to evaluate any ethical considerations, as well as the relative risk and corresponding accuracy that will be required. While high accuracy is always the goal, anomalies, environmental changes and other unknown variables will make some degree of uncertainty inevitable. For this reason, some challenges are better suited for computer vision than others, particularly those that enable your organization to augment rather than replace the human decision-making process.
As you prepare to develop and train a CV model, it will be important to consider not only the quantity of relevant images available but also the quality — including lighting, angle, size, color of the backdrop and more. Due to the relative rigidity of computer vision, it can be difficult to know what types of outliers will lead to misidentification or misclassification. As a result, training will need to include a variety of positive and negative examples to improve results. An experienced CV consultant or decision scientist will be able to help direct these efforts.
No matter the use case, any form of AI must be put into production in a way that's verifiable and supported by human decision-making. This means that in addition to training your computer vision model, you’ll need to determine the most effective way to introduce the resulting intelligence to users.
Unfortunately, when humans are consistently presented with highly accurate intelligence, we quickly become reliant upon it and may fail to notice otherwise obvious mistakes. To prevent this, CV systems should employ tactics that continue to rely upon and engage active responsibility from human workers — especially in healthcare or military environments where the risk of inaccuracy is highly consequential.
Artificial intelligence is only as accurate as the data used to train it. Sadly, but inevitably, human biases are reflected in human data, often in ways that can be difficult to perceive. To overcome these challenges, developers must understand and adhere to the practices of responsible AI, ensuring CV models are consciously structured and rigorously tested under a range of conditions — particularly when human imagery is involved. In addition to relying on experienced consultants in this field, engaging a diverse range of decision-makers can help to broaden perspectives, uncover unexpected challenges and safeguard against the perpetuation of bias in your model.
AI, including computer vision, represents an entire business lifecycle. As a result, you’ll need to make sure you’re not only investing in the skills needed to develop an effective model, but also the infrastructure, pipeline and operational expertise to implement that model in a way that creates real, long-term business value. This is where MLOps comes into play.
Artificial intelligence is self-disrupting by nature, meaning the circumstances for which a model is trained are bound to change from the moment of implementation. MLOps goes beyond traditional DevOps to ensure that once a model is deployed, it’s constantly tested and retested, and that it can retrain itself against data dynamically so that it continues to provide high-quality intelligence even as the surrounding business environment changes.
By embracing computer vision, modern organizations have a unique opportunity to drive higher quality, reduce the cost of goods and services, and position themselves at the forefront of disruption. As advancements in custom vision, new methodologies and algorithms continue to progress, computer vision will become easier and more cost-effective to deploy.
Already, the market has seen an influx of new products geared toward localized AI and CV as manufacturers begin to recognize the demand for these solutions. While some of today’s edge devices have the ability to run smaller AI models independent of the cloud, in the next few years, increased efficiencies will make it possible to run higher accuracy models on lower power devices. This will enable companies to adopt computer vision more rapidly and at lower cost.
Those that begin exploring, investing in and piloting CV solutions today will be well-positioned to capitalize on these benefits while driving industry transformation in a way that improves employee safety, increases engagement and boosts customer satisfaction.