Questions your company should be asking before implementing machine learning.
Machine learning (ML) is all the craze right now. You hear about Elon Musk and Mark Zuckerberg debate the future of artificial intelligence and machine learning, but you wonder, how is machine learning going to actually help my business? In this article, we briefly explain what ML is and then dive into the ML-related questions your company should be asking.
Machine learning is revolutionary because it gives computers the ability to solve problems without being explicitly programmed. In a conventional computer algorithm, a programmer will specify the rules that explicitly determine what their software will do.
ML algorithms work differently. At a high level, they make decisions/predictions by ingesting large quantities of historical data and using that knowledge to guide their results. Some examples of ML currently being used in businesses include:
Here is the typical setup for doing machine learning (at a very high level):
Let’s use email spam filters as an example. The ML model an email provider might use to detect spam is the naive bayes classifier (but other applicable models exist as well). They train this model by feeding in millions of emails that are marked as spam and emails that are marked as legitimate.
With the model sufficiently trained, they can use it to classify incoming emails as spam or not spam with high accuracy. For instance, if you receive an email containing the phrase “Nigerian Prince”, the ML model would remember that that phrase occurs frequently in previous spam emails and mark the incoming message as spam as well.
The mathematical nature of ML can be very daunting. So the question I hope to address is whether or not your business can benefit from machine learning at all. The answer to that question is very situation dependent. It depends on the problem you’re trying to solve and data you are able to collect. To begin with, here are some preliminary questions your company should ask before you get started:
You might not need a solution as sophisticated as machine learning. Just knowing basic statistics about the problem you are trying to solve might be enough.
An engineer at a data center could use machine learning to reduce their energy usage — perhaps, by finding complex relationships between IT load, water pumps, room temperature and other factors — or they could just look at how much energy each component is using and cut back on servers using too much energy.
A retail store could use a ML model like k-means clustering to find patterns in consumer purchases (e.g. “what time do people age 20-30 go shopping?”) or they could just open a spreadsheet of the store’s transactions and manually deduce what they want to know.
Basic statistics, in lieu of machine learning, might give you sufficient insight while saving you time. At the very least, it’s a good starting point.
Suppose, for example, that your company is trying to perform predictive maintenance on factory equipment. In other words, you want to estimate how long a particular machine will last before it breaks. In this hypothetical scenario, you would need sensors attached to machines collecting information such as:
Generally speaking, a machine learning algorithm without relevant data is like a detective without useful clues. The old adage holds true: garbage in, garbage out.
You have to train a ML model with a large amount of data before you can use it. For them to work with sufficient accuracy, they need to have at least thousands of data points (and preferably more). It is possible to get pre-trained models, but it’s unclear if a pre-trained model will exist for the specific type of problem you’re trying to solve.
If you still think ML is applicable, it’s worth consulting with someone knowledgeable about the different ML models. Surprisingly, the difficult part is not building these machine learning models.
TensorFlow, MATLAB and R are examples of open-sourced programs that provide pre-built ML models. The difficult part is retrieving and reformatting your data from your SQL database (or whatever storage option you use) to your ML program.
To illustrate the difficulty of this process, take this quote from the Google Cloud Next 2017 presentation on machine learning:
“We’re getting a lot of free attention in this room and other rooms around machine learning because it’s new science, it’s unicorns and glitter, it’s all magic at this point. No data, no quality data, no machine data, no coalesced data out of 19 different databases into a single data store … no machine learning. I have no solution for anyone in this room if you say ‘but a lot of my transactional data is in my Oracle financial system, but my online system is in my e-commerce system which is hosted somewhere else, but don’t worry, all my logging data which I want to combine into learnings as well sits on my Apache servers which is at my hoster … let’s do some machine learning’. And I’ll say, ‘come back to me when you have big data’.”
Again, the solution to this problem is to consult with someone familiar with both machine learning and database technology.
In summary, when thinking of implementing machine learning in your business always start simple with traditional statistics. From there, you can start to consider if it is worth consulting with someone familiar with the variety of ML models out there. They can help you put together a complete ML solution — from data retrieval, to data storage, to actually training the ML model — and deliver powerful functionality to your product or company.
Alternatively, you could look into AutoML programs that programmatically do this process for you.