28 Feb Seeing beyond: the Computer Vision revolution
Computer vision sector
The ever-increasing interest, on behalf of governative and industrial entities, in investing in Computer Vision solutions, to integrate in their own productive and operative systems to optimize the different phases of the process, is made evident by the market value of the computer vision, attested at 15 billion dollars in 2022 and that is expected to reach 82,1 billion dollars by 2032, with 18,7% growth between 2023 and 2032. Although industrial applications are in high demand, artificial intelligence finds employment and considerable following in different fields of application, from automotive to medical, but also and especially in social and security with the aim of supporting and flanking people in their everyday life.
Artificial intelligence, computer vision and machine learning
Within the wide artificial intelligence (AI) field of study, computer vision identifies computer’s ability to analyze and extract significant information from images and videos. The algorithms and models developed in this field allow computer to reproduce the human visual apparatus’ functions and processes. Even though this kind of artificial intelligence algorithms have existed since the 60s, the progresses in Machine Learning made in the last 10 years, as well as the remarkable steps ahead in data memorization, in calculus ability and in high quality input devices at a low-cost, took to remarkable improvements in software ability to explore this kind of contents.
How computer vision works
In computer vision, elaborations involve visual contents like images, videos, icons and any other graphic representation that is made of pixels. Although it might look like a simplified system to recognize objects, people and animals in a single image or in a sequence (videos), computer vision mainly allows to extract useful information, to higher and higher levels of abstraction and understanding, so that they can be elaborated even more. Specifically, it is the ability to extract meaningful data by reconstructing a context around the image.
To be able to work accurately, Computer Vision systems need to be trained with a great quantity of images that, appropriately labelled, will build the dataset. Computer Vision models can carry out more or less in-depth investigations on an image, depending on the characteristics and on the used networks, on the image characteristics and on the considered kind of task. This kind of software applications allow to process images by analyzing the content through mathematics algorithms.
The elaboration phases
The whole process, which is rather complex, begins from the acquisition of the image and the related preprocessing to improve its quality and it ends with the results interpretations and the consequent action. The two main intermediate phases of the process are:
- characteristics extraction, where an algorithm analyzes the pixels of an image to identify the specific characteristics (values of color, shape and texture) of objects and faces within it.
- classification, during which the characteristics extracted from the frame are compared with known models. If a determined threshold between analyzed image/frame and one of the known model is exceeded, the software gives the matches and “cuts” the image in regions or groups with similar properties
Performable tasks
Depending on the application you want to develop, you can choose one or multiple of the possible tasks available. Among these, the most used are:
- Image Classification, that is the image content analysis and the attribution of a label;
- Object Detection, where the identification of one or more entities within an image happens; and
- Semantic Segmentation, that is the division of the image into sections.
With the evolution and improvements of these models, new tasks as, for example, Pose Estimation, Face Recognition, Action Recognition and Emotion Recognition are implemented in software applications, to be integrated in different “smart” technological solutions.
Computer vision, through the analysis and interpretation of images and videos, thus offers increasingly advanced solutions ranging from industrial to social and health care sectors, promoting a significant impact on the quality of life and efficiency of business processes.
Vincenzo Montedoro