I was graciously invited to give two panel discussions at the IoT World conference that happened last week in Santa Clara, CA. Since the panels are not recorded, here are my thoughts and jots from before and during the Wednesday 5/15/2019 panel, entitled Wrangling IoT Data for Machine Learning. (Actually, I'm going into even more detail than I had time for at the panel.) Despite that the conference organizers approached me about speaking on behalf of my former employer about some topics that honestly I was given just a few weeks to investigate and could only report back with failures even now, I managed to convince them that I was fluent in other things that were more generic -- unrelated to the job I knew I was about to quit.
(Note: My thoughts and jots for the Thursday 5/16 panel are coming later.)
The first question we were tasked with answering in this panel related to the business calculations that must be made before taking on a project in Machine Learning; also, how one might calculate return on investment, and what use cases make sense or not.
Hello [Company], Tell Us About Yourself
Before deciding whether to build, buy, or partner (the three ways in which one takes on any technical project), analyzing your staff's competencies needs to be top of mind. If you don't already have staff competent in data science, IoT, or the skills you need to finish the project, then in order to be good at hiring, you need to ensure your corporate culture, rewards, mission, vision, virtues, and especially the task at hand is going to appeal to potential recruits. You could have devoted employees who care about the outcome, want to see it through, and work together to build a well-architected solution with good continuity. With the solution's architecture well-understood by the team as they build it, their "institutional memory" allows them to add features quickly, or at least know where they would best fit. Or, you could hire folks who only stay for a short-term basis, with different developers spending lots of time wrapping their heads around the code and then refactoring it to fit the way they think, which takes away time from actually writing any useful new business logic. The end result may be brittle and not well-suited for reuse. Certainly it is healthy to add people to the team with differing viewpoints, but small teams of people should not completely change or else it will kill the project's momentum. (Trust me, I've lived this.)
If you're not ready to augment your staff or address these hiring concerns, it's OK. An IoT project is complex to develop because at this time, there is not an easy "in-a-box" solution; still many services are required to be integrated, such as sensor chips, boards, firmware, communication, maybe a gateway, a data analytics and aggregation engine, and the cloud. In fact, there are plenty of valuable and trustworthy solutions providers you can choose from, and you can meet a lot of them on the IoT World vendor floor. By buying a product that complements your company's skill set, you can deliver a more well-rounded product. And a good service provider will have a variety of partners they work with for themselves: with a robust knowledge of the landscape, you will more likely find something that truly suits your needs. Now, if you are starting off with zero expertise in IoT or machine learning, there are vendors who will sell you complete turn-key solutions, but it is not likely to be cheap because each domain involved with IoT requires distinct expertise, and currently integration of these domains is fraught with tedium (though there are groups looking to abstract away some of the distinctions and make this easier).
Finally, if you are clever, you may find a way in which your solution or some part of it may in fact be a value add to a solutions provider, thus giving you even more intimate access to their own intellectual property, revenue streams, or ecosystem of partners. In this case, you are truly becoming a partner, establishing your position on the channel ecosystem, and not just being another client.
It's All About the Benjamins
Particular to the data, there is a cost involved to aggregate, store, and analyze it. Where is it being put -- the cloud right away? Physical storage on a gateway? If so, what kind of storage are you buying, and what is the data retention policy for it? If the devices are doing a common task, how do you aggregate it for analysis, especially if you are trying to train a machine learning model without the cloud? And if you are using the cloud, what is your upload schedule if you are choosing to batch upload the data? It had better not be at peak times, or at least not impact the system trying to run analysis too.
One big piece of food for thought is: does your data retention policy conflict with training your machine learning algorithm? This is important from a business perspective because your data may not be around long enough, for various reasons, to make a useful model. Or, on the flip side, your model may be learning from so much information that it might pick up contradictory signals from changing underlying conditions, such as a bull market turning into a bear market. (However, this case can be rectified in several ways, such as feeding in additional uncorrelated attributes for each example, or picking a different model better suited to accounting for time series data.)
Perhaps far from the last monetary consideration is to examine your existing infrastructure. Are sensors even deployed where you need them? There could be a substantial cost of going into secure or dangerous areas. For instance, in the oil & gas industry, there are specially designated safety zones called Class I, Division 1 where anything that could cause a spark would blow up a facility, causing major damage and loss of life. Personnel and equipment must be thoroughly vetted so as to avoid potentially deadly situations. Or, better yet, is there a way to monitor the infrastructure remotely or from afar, thus avoiding requiring access to such sensitive areas? Using remote video or sound monitoring may remedy the need for intrusive monitoring, but the remote system put in place needs to be at least as reliable as the risk you assume by going into such sensitive areas in the first place.
Figuring the Return On Investment
Briefly, I want to touch on some points to keep in mind when considering the ROI on an IoT project. Hopefully these will mostly already be obvious to you. They break down into three categories: tangible impacts, intangible impacts, and monetization. We should not fail to consider a project just because we cannot figure out how to quantitatively measure its impact.
First, the tangible impacts: a successful IoT project (particularly in an industrial realm) will reduce downtime by employing predictive maintenance analysis or warn before issues get out of hand. This increases productivity, reduces RMAs/defects in products, and could reduce job site accidents as well. In this case, it is a lot easier to measure operational efficiency.
The things that may be harder to account for include the safety mindset that might be brought about by a well-implemented IoT tool that users find helpful or essential to doing their job, rather than obtrusive or threatening their job by telling on them when they mess up. One baseline could be comparing safety accidents year over year, but this number cannot be taken at face value; it must be compared to other numbers of productivity, and even then it might never account for other side effects of having a better safety mindset, such as improved job satisfaction, which could lead to a better home life for users of the IoT tool.
Finally, one unexpected way the product could pay off could be monetization. By making part of it generic and selling it as a service, you might build a user base who themselves are freed up to focus on their skill sets. Maybe you have built up a data warehouse that others might find useful, made some digital twin models of items others use, or are performing some kind of transformation on recorded data in order to derive insight. In any event, this gives your product legs; in case the main premise of it fails or does not pay off, then at least some of the work is still valuable.
Where AI Makes Sense
I have gotten into discussions about this with people who think AI and machine learning is the answer to everything. To me, machine learning is more than just filling out a business rule table, such as "at 6:30 I always turn down the thermostat to 73, so always make sure it's 73 by then". In short, machine learning is most fun and applicable to a problem when the target state changes. For instance, you're a bank trying to decide whether or not to give someone credit, but the underlying credit market changes over the course of a few years, thus affecting the risk of taking on new business. Problems like these really get the best bang for their buck out of machine learning models because the model can be updated constantly on new data. One way to find out when to trigger model training (if you're using a supervised approach, such as decision trees or neural networks) is to use an unsupervised approach such as K-means clustering, looking for larger groups of outliers becoming inputs to your model, and then making sure your original model is still performing well or if it has failed to generalize to potential changes in underlying conditions.
Other types of interesting problems for AI & ML are those involving image or audio data, for which researchers have tried for decades using classical mathematical approaches but for which basic neural networks showed dramatic improvements in accuracy over the classical approaches. Neural networks are simply better at learning which features really matter to the outcome. They will build up the appropriate filter, whether it represents some intrinsic property of a sound wave or some portion of a picture.
The most creative uses of AI and ML will enable previously impossible interactions. Think about something crazy like teaching a speech recognition engine on custom words for your specific application and embedding it into some tiny device, or possibly using a smartphone camera to take pictures of something to learn its size.
Run Machine Learning Where Again? - Cloud, Edge, Gateway
The apps I usually build for clients usually revolve around these three characteristics:
- Clients are typically highly price sensitive
- Latency is a non-issue
- Sensors send data every ~5 minutes unless conditions deteriorate
With this in mind, I am looking to reduce the bill of materials cost as much as possible, and so I make the edge as dumb as it can get. The analytics goes into the cloud. And even if you're a believer in data being processed on the edge, you're probably not going to get away without cloud somewhere in your project anyway. A robust cloud provider will offer solutions for not just data aggregation/analysis, but also:
- Device firmware updates over-the-air
- Data visualization tools
- Digital twins
- Generate test data manually or using other means
- Run a GPU cluster along with your edge to do retraining
However, with advents in transfer learning, and with cheaper hardware coming out like Intel Movidius, nVidia Jetson, and Google Coral, edge training will become more of a reality.
As I am most familiar with Google product offerings, Firebase allows for running models locally with no cloud connection. Their cloud can serve an old model until training is finished. If you wish to run your models on the edge, you will need to get clever about exactly when to deploy the new model: either in "blue/green" fashion (a flash cut to the new model all at once) or using "canary" deployments (where a small percentage of inputs are classified with the new model for starters).
Furthermore, given that we are unlikely to get rid of the cloud in IoT projects anytime soon, a big opportunity is to make tools whose user experience is the same from the cloud to the edge device in order to improve continuity and reduce frustration.
Picking an AI/ML Platform
The third question in the panel related to picking a machine learning service provider. My general thoughts on this revolve around considering the providers who have built products useful to specific industries. On the vendor floor, there were small companies with solutions catering to manufacturing, supply chain, chemicals, utilities, transportation, oil & gas, and more. Larger companies have consulting arms to build projects for multiple different industries. In either case, whoever you choose can hopefully bring domain-specific knowledge about your industry to solve your machine learning problem, and can save time by already having common digital twins in a repository or common KPIs for your assets or employees. The hope here is that with a vendor targeting a specific industry, they will have already accumulated domain knowledge so they won't need so much "getting up to speed" about the general problem your company faces, but can jump right into solving higher-order creative problems.
However, if these services are built on top of a cloud provider that decides to crawl into the space of the specialized provider you choose to work with, it could obviate them. For instance, if Google decides to get into a particular business where there are already players, they will offer a similar service but for free. As such, pick one service provider positioned for growth, with staying power due to a niche or protected IP. Or, actually pick multiple technologies or providers of different sizes to protect against one going extinct. For instance, maybe different types of wireless radios might be useful in your application. But imagine if you'd put all your eggs in WiMAX in the early 2010s; you wouldn't have much of a solution now. As such, it is helpful to find tools and technologies that are at least interoperable with partners, even if the use case is specific.
Other Considerations In Passing
Besides what was addressed above in the panel, there were some remarks prepared in case we had additional time (but it seems we ran out).
Tune In To the Frequency of Retraining
Models over time will likely need to adapt to changing inputs. A good machine learning model should be able to generalize to novel input -- that is, make correct predictions on data that hasn't been seen before. However, there are a few indicators that might indicate it's time to retrain or enhance the model.
- More misses or false positives. In data science parlance, a confusion matrix is the breakdown of how many items of a given class were labeled into which class. The diagonal of this matrix is the correct answer (i.e. class 1 -> 1, 2 -> 2, and so on). Thus, if numbers outside the diagonal start getting high, this is a bad sign for the model's performance on accuracy.
- Changing underlying conditions. As described earlier, one could imagine this as a bull market turning into a bear market.
However, there could be multiple paths to monitor the need for retraining, or even mitigate it.
- Consider a push/pull relationship between supervised and unsupervised models, as described above. If outliers are becoming more common in unsupervised models, consider making sure your supervised models are cognizant of these examples by running more training. Perhaps new classes of objects need to be introduced into your supervised models.
- Maybe the wrong model is at play. There could be a fundamental problem where, for example, a linear regression is in play where a logistic regression should really be used.
- Perhaps the business KPIs actually need to be re-evaluated. Are the outcomes produced by the data in the right ballpark for what we even want to know about, or are we coming up with the wrong business metric altogether?
In the quest for real-time analysis of your model, it should be analyzed whether or not such a task is attainable, or even required. Factors that could drive whether to do it could include:
- Is it mission-critical?
- How many objects need to be analyzed in real-time? Too many objects will increase demand on the processor.
- Is analysis cheap enough at the edge to conduct with modern silicon?
I’ve usually advocated against using deep learning when there are simpler mathematical models requiring less compute, even if it takes more feature engineering up front. However, it’s probably not long until we have silicon so cheap that we can run and even train such advanced models with relative ease. And the good news is the more powerful the analysis engine (i.e. operating on 2D video rather than 1D sensor data), the more analyses we can draw from the same data, requiring less hardware updates and instead relying on simpler updates to firmware and software.
One particular question to the panel involved how humans educate machines. Currently, we rely on annotations on data to make it obvious where we should be drawing from. This can be something as simple as putting a piece of data into an appropriate database column. However, unstructured data like server logs is becoming ever more important for deriving insights.
But maybe on the flip side of this is when do machines begin to educate each other, and educate humans as well? The most obvious play on this regards decision support. If humans can become educated by an AI tool in an unobtrusive way to, say, be safer on the job, then this is one such way we can make an impact on ourselves through machines. Another good way is to gain insight into decisions being made for regulatory purposes. As certain institutions are audited to ensure there is no discrimination or advantages being given to certain parties, a machine learning model needs to be auditable and educate its human interpreters into its behavior. And education doesn't have to be hard; even kids in middle school are fully capable of playing with Google's machine learning tools to build amazing products, unbounded by years of skepticism formed by bad engineering experiences.
However, the more dubious problem is when machines train other machines. While this could be a good thing in some applications, like hardening security, right now you can see generative adversarial networks (GANs) being used to create deep fakes. Now it is possible to spoof someone's voice, or even generate fake videos of events that never happened, all to push an agenda or confuse people in trial courts.
Obviously, this is a lot more than can be said in 40 minutes, and frankly more than I even intended to write. However, it is a complex field right now and all good food for thought, and hopefully by airing out some of these thoughts, it will help simplify and demystify the use of AI in IoT so we can converge on a more universal standard and set of best practices.