Get a link to read and download your free copy of our eBook here!
Chapter
2
Indoor Positioning Technologies
Computer vision is already a crucial component in several technologies. It supports self-driving cars and drivers through features like lane detection and blind spot detection. In retail, computer vision is used to count customers, find out what interests them, and recommend similar products. In industrial settings, computer vision can be used for product quality assurance at scale. Computer vision can also be used to track assets effectively, by identifying unique objects or people in the frame of a photo or video.
There are several different algorithms that can be used to implement computer vision. The first is used to determine what a picture contains. The picture is divided into many smaller squares, each of which gets assigned probabilities that correspond to the likelihood that the picture is of a certain object. Those probabilities are the result of feeding many, similar images through a machine learning model to “train” the algorithm. These probabilities are then multiplied together to calculate the probabilities of what the whole picture represents. This algorithm is used in a more complex process called Object Detection, in which the algorithm scans over the picture and detects various objects in the picture using a trained implementation of the algorithm described previously.
The second algorithm, called Object Tracking, expands on Object Detection. Object tracking examines videos or a series of pictures using Object detection to identify objects. A tracker then traces those objects’ movement through the frames.
The third algorithm, Semantic Segmentation, examines the image and assigns each pixel in the image to an object in the picture.
In terms of hardware, computer vision can be less expensive and easier to install than other solutions. In some cases, cameras that were used as a security system can be repurposed to implement asset tracking solutions. In other cases, employees can use phones to take pictures of assets’ locations. Installing a camera system for computer vision asset tracking can be comparable to the cost of most other solutions. Cameras prices for hardware and installation are comparable to the costs for other solutions.
Computer vision can be very computationally expensive compared to other asset tracking solutions. Of the three algorithms mentioned above, the first—Object Detection—takes the least computational resources and can fit the needs of asset tracking. Even still, the algorithm requires a lot of resources. To identify the object in the object, many subsets of the initial image need to be examined by the machine. Using something easily identifiable to a computer like a QR code or barcode would reduce the amount of resources required.
A computer vision asset tracking solution would have very high accuracy—limited only by the software implementation—compared to other solutions like Bluetooth or WiFi, which rely on signal strength which has low accuracy. In addition, the image itself is much more useful to a human than a signal strength value because, for example, the same image can be used for multiple interlocking IoT systems.
The main disadvantage of computer vision asset tracking is the huge data throughput. The integer-based signal strength value that a Bluetooth or WiFi solution would use is only a couple bits. A gray scale image uses 8 bits per pixel, so a 200px by 200px image would use 320 kb, which is four orders of magnitude larger. This adds a large problem to the scalability of such a solution. It requires a large amount of bandwidth to upload and download images constantly. Although this may be solved through edge computing, it also requires significant investment in software, as well as the physical and digital infrastructure to support the system.
Computer vision can provide asset tracking solutions with high accuracy and reliability. Such solutions would have advantages over other asset tracking technologies like Bluetooth and WiFi. Unfortunately, the sheer amount of data being transferred limits the use cases within which computer vision would be effective or practical.