Computer vision could be a reliable asset tracking technology with advantages over Bluetooth and WiFi, but only for certain use cases.
Computer vision is already a crucial component in several technologies. It's vital to self driving cars and cars that assist drivers with technologies like lane detection and blind spot detection. In retail, computer vision is used to count customers, find out what interests them, and recommend similar products. In industrial settings, computer vision can be used for QA.
Computer vision can also be used to track assets effectively. There are several ways that it could work. The most common would be similar to the way that a person keeps track of things. A person would scan the room visually to find the asset, estimate how far it is from them and other identifiable landmarks, reference their position and the landmarks position on a map or floor plan, and then estimate where the asset is. This could work with a security system in place by having employees take a photo each time they leave an asset somewhere. The other way would be similar to how a person identifies where they are on a map or floor plan. The person would identify landmarks on the map that they can see, and then estimate their position on the map according to where they figure they are in relation to those landmarks.
There are several different algorithms that can be used to implement computer vision. The first is used to determine what a picture contains. The picture is divided into many smaller squares, each of which gets assigned probabilities that correspond to the likelihood that the picture is of a certain object. Those probabilities are the result of feeding many, many images through a machine learning model to "train" the algorithm. These probabilities are then multiplied together to calculate the probabilities of what the whole picture represents. This algorithm is used in a more complex process called Object Detection, in which the algorithm scans over the picture and detects various objects in the picture using the algorithm described previously.
The second algorithm, called Object Tracking, expands on Object Detection. Object tracking examines videos or a series of pictures using Object detection to identify objects. A tracker then traces those objects' movement through the frames.
The third algorithm, Semantic Segmentation, examines the image and assigns each pixel in the image to an object in the picture.
In terms of hardware, computer vision can be less expensive and easier to install than other solutions. In some cases, cameras that were used as a security system can be repurposed to implement asset tracking solutions. In other cases, employees can use phones to take pictures of assets' locations. Installing a camera system for computer vision asset tracking can be comparable to the cost of most other solutions. Cameras prices for hardware and installation are comparable to the costs for other solutions.
Computer vision can be very computationally expensive compared to other asset tracking solutions. Of the three algorithms mentioned above, the first—Object Detection—takes the least computational resources and is able to fit the needs of asset tracking. Even still, the algorithm requires a lot of resources. To identify the object in the object, many subsets of the initial image need to be examined by the machine. Using something easily identifiable to a computer like a QR code or barcode would reduce the amount of resources required.
A computer vision asset tracking solution would have very high accuracy—limited only by the software implementation—compared to other solutions like bluetooth or WiFi which rely on signal strength which has low accuracy. In addition, the image itself is much more useful to a human than a signal strength value because, for example, the same image can be used for multiple interlocking systems.
The main disadvantage of computer vision asset tracking is the huge data throughput. The integer-based signal strength value that a Bluetooth or WiFi solution would use is only a couple bits. A gray scale image uses 8 bits per pixel, so a 200px by 200px image would use 320 kb, which is four orders of magnitude larger than an integer. This adds a large problem to the scalability of such a solution. It requires a large amount of bandwidth to upload and download images constantly. There are ways around this, such as having the image processor on site—i.e., an edge image processor—or having a direct connection to the cameras.
Computer vision could be an asset tracking solution with high accuracy and reliability. Such solutions would have advantages over other asset tracking technologies like Bluetooth and WiFi. Unfortunately, however, the sheer amount of data being transferred limits the use cases within which computer vision would be effective.