Thursday, October 3, 2019

Touring with Turo - The Pros and Cons


Driving to more remote yet scenic places as a side trip to a big city can reveal places and ways of life unknown to folks who don’t get out of the downtown bubble.  Not everything a city dweller enjoys is in the city, so part of the experience of a culture is to get to know where folks go to take a break.  For instance, when visiting Washington, DC, why not hit Virginia Beach too?  Maybe next time you’re in Boston, think of taking a couple hours to go to Cape Cod.  Maybe if you’re in New York City, take a visit to Long Island (though better plan that one in advance).  But if you find yourself in Detroit, it might be better to go to Windsor just across the Canadian border and stay put. :-P  As such, for once, I will regale you with experiences not from software I'm writing, but that I'm using as a consumer.

Background - What brought me here?


I was in quite a conundrum during a recent trip, visiting a city where several famous Ivy League schools were all about to start at once, and the vast majority of the students would be moving in.  I arrived in the city late Tuesday, had a conference Wednesday, and then had Thursday and Friday free.  One of these days was to be allocated for taking a tour of somewhere I could easily get to by car.  As such, I had to pick if I wanted Thursday or Friday to use the car rental.

The benefits I saw to picking Thursday included:
  • Possibly higher availability of rental cars — that is, closer to the weekend could mean higher demand
  • No rush to get back to the airport for the flight home Friday night

However, Friday saw the following benefits:
  • I could leave my luggage in the car.  Now, since my luggage simply consisted of a backpack, it wasn't too crucial, other than wearing it all day would get me all hot and sweaty, so it would be nice to lock it in the car while walking around.
  • Turn the car in at the airport right before my flight, without having to make a special trip to the rental car place which is probably at the airport anyway
Then there’s the whole modern-day dilemma of whether to rent from a traditional rental car company or to go with a car-sharing service like Turo.  I thought I’d try something different because a lot of the standard rental car places seemed to be in areas not exactly in the direction I wanted to travel, and the ones with the least taxes would force me to have to drive through the city (and all its traffic) in order to go the desired direction.  Plus, why not try something completely novel?

As I found myself into Thursday, I’d pretty much decided to rent the car that day rather than wait until Friday.  I checked Turo and availability was already vanishing rapidly.   I saw something else that was available later in the afternoon, which gave me some time to visit a museum I had prioritized before picking up the car.  Well, on my way to get the car, the cab driver (yes, a real live cabbie was just chilling out in front!) started chatting and saying this part of town I was going to pick up the Turo rental isn’t somewhere I want to be waiting on a ride for very long.  Uh-oh…

Fine Points of Comparison


Having gone through the afternoon and night with the Turo rental, here are things to consider if you want to give Turo a try:
  • Good for knowing exactly what vehicle you will get.  Most rental car companies only give you a vague idea as to the size of the vehicle, such as “mid-sized” (which most folks would consider compact), or even “compact” (which might as well be the Tata Nano, the $2,500 car produced in India).  Of course, they might specify a vehicle that fits into their categories, but those specific vehicles never seem to be available by the time you get there.
  • Flexible scheduling.  Book the car to be available exactly when you need it for how long you need it.  Want to return it at 11:00 PM or later?  Maybe the owner will facilitate that for you.
  • Book a car in cities or areas you’re familiar with, otherwise you might get freaked out if you’re returning late at night and it’s in a neighborhood that people claim is sketchy.  Of course, this is perhaps what I get for picking the cheapest car by a long shot in the city that day, so your mileage may vary depending on your price sensitivity.
  • The prices didn’t really seem that much better than a rental car company.  However, I was feeling this way after mostly having seen Porsches made available for $150 or $200/day, and I don’t really know how much a regular rental car company would ask for a Porsche.  That’s expensive even relative to what I normally get from a typical rental car company, but even for the cheaper vehicles, it seemed like Expedia was showing conventional deals for maybe $10-$30 more per day than what Turo was offering.
On the other hand, it’s worth considering these points about a rental car company before you steer all the way into the Turo camp:
  • Better guarantees about the quality of vehicle you’re getting.  It’s not up to one individual to maintain the car or take care of check engine lights; if a traditional rental has a problem, the company should back it with a guarantee or replacement from their ample-sized lot.  Better yet, such problems should be unlikely in the first place if the staff rigorously inspects and repairs vehicles in a timely fashion in between rentals at company service bays.
  • Often, rental car companies have nice grounds.  Even if they might be in a sketchy area outside, I’ve never felt threatened or unsafe on their property because they are well-lit, often gated, and with ample room to park your vehicle as you return it (which may be problematic if you’re looking to return the vehicle late at night in a densely-populated area).
  • However, you would need to make sure the facility is open 24 hours if you’re planning to return it late.  Otherwise you might be charged an extra day or have to go out of your way to drop it off properly before you can go on.

The Bottom Line


Given my own personal preference, a rental car company would be the way I’d go if renting a car in an unfamiliar city and not planning to return it until late.

Thursday, September 5, 2019

Unit Test Database Mocking in Golang is Killing Me!


For some reason, writing a particular set of unit tests in Golang proved to be an extreme hassle.  In this case, I was trying to add a column to a table and then write a unit test to verify this new functionality.  No matter how much I looked around for where to correctly specify how to build this table with the new column, the Cassandra database kept complaining the column didn’t exist.  Imagine how frustrating it is to specify a relational database schema that seems to be ignored by your unit tests.

Background


Our system consists of a Cassandra database plus a Go backend server.  The Go unit tests require Cassandra to be running locally on a Docker container, but do not seem to actually utilize the existing databases, opting to make its own that is out of reach from anything I can see from TablePlus.  The Go code itself utilizes these objects through some level of indirection by a manager, a harness, and a couple modules dealing in the actual queries.

The Fix


In the test harness, I was running several commands following the format

CREATE TABLE IF NOT EXISTS table {
column1 boolean,
column2 int
}

Well, turns out that despite my best efforts to define how the table schema was supposed to be set up, the test engine was not actually running the setup every time.  In fact, it was relying on some old cached copy of the tables living somewhere.  Upon changing the command to

DROP TABLE table;
CREATE TABLE table { … }

I was able to see the tests perform correctly on my local machine.  Oddly, in looking for errors upon creation, it would still complain of timeouts on occasion, but overall the functionality appeared correct — the unit test was at least passing.  Interestingly, the QA environment from which we automatically run these tests from a CircleCI pipeline did not need the whole DROP TABLE fix; it worked properly upon simply specifying the correct column configuration in the test harness.

One caveat was that I had to split out the DROP and CREATE statements into separate calls to Session.Query().  When trying to separate them with a semicolon, Go complained it could not execute the statement.

Ultimately, this provided hours of frustration for me in trying to figure out why the unit test would not work locally.  I was afraid the same situation would apply to QA, but fortunately it didn’t.

Thursday, August 1, 2019

John Osborne, Retro Pinball Designer, in Lodi

Back in May, I got to hang out in California for a couple weeks to attend a couple conferences: Google I/O and IoT World.  I did a couple panels at IoT World, and one of them is already summarized on this previous blog post; the other one is coming later.  However, as a bonus during my time in California, I also got to attend the Golden State Pinball Festival in Lodi.  Here, they had John Osborne, a pinball designer who worked at Gottlieb from 1972-1984.  The following are expansions of notes I took during his Ask Me Anything presentation.

A Bit of Background


John Osborne started his professional journey by studying electromechanical (EM) engineering at Fresno State.  This, of course, elicited cheers from the local crowd.  After college, he started working at Gottlieb in 1972 at the age of 21.  Besides Gottlieb, he was also interested in working at Chicago Coin, which he said was sketchy, because the hiring manager's last name was always changing; as such, he never applied.  He was also interested in working at Williams, but rather than talking to corporate recruiting, he was told to talk to a distributor.  That also seemed weird, so he didn't bother.  There were some stories about being flown from Fresno to Chicago to meet the team and do interviews and such, but those are better told in person!  Anyway, the first thing John did after design was to work on the penultimate manual to describe all you would ever need to know about EM games.  However, it never got published due to still being unfinished when solid-state machines came out.  It doesn't seem like John kept any drafts or notes of this manual, sadly.

The Process of Design


Concepts would originate from both customers & engineers, but names & themes would usually not come from Gottlieb. The game Blue Note was something John came up with all by himself, but themes like poker & pool would always sell.

The first stage of design at Gottlieb involved the hand sample, where you assemble the game yourself.  This includes drilling holes on the playfield and other important spots, and running wire by hand.  I can testify to this being a large hassle from having done Wylie 1-Flip, which even still had much fewer components and thus less wire to run.  Nevertheless, after the hand sample comes the engineering sample: this involves drafting a formal layout (with schematics & cables) to make tooling, and using nail board to run cables in a more organized layout.  At this point, you basically have the real game, except for screen printing and artwork.  Lots of games would then be played on this machine where many metrics and percentages would be calculated, including score, how much of the game's objectives were completed, how far in any sequences you got, etc. Wayne Neyens, the head of engineering at Gottlieb, wanted people to test games who weren't too skilled at pinball; the average player was ideal for simulating what would actually happen in the field.  However, people testing the games would get yelled at for sitting while playing.

Given 3-ball vs. 5-ball play modes (why anyone would set an EM to 3-ball play is beyond me), the replay scores should be comparable given the amount of play, so logic might raise the necessary scores when moving into 5-ball play.  However, I'm not sure any EM schematic I've seen actually employs this logic.  In any event, to calculate the scoring for replays, the Gottlieb testers would employ a tally sheet that lists all possible scores for the game, rounded to nearest 1,000.  By tallying your rounded-off score, it effectively makes a histogram of scores achieved on the game.  The designers would then set the recommended first replay value to be the median of the tallied scores.  The second replay would be set to 14,000 points above that, then the third replay is another 8,000 points above the second replay value.  Before the first arrow was placed (onto the median value), 50 games needed to be played.  This tends to result in 30% replays in games in the field.  These tallies did not include specials, which were rare (2-3% of games).

Two game samples would be played with real money to make sure every last mechanic would work in the game prior to real production.  Portale or Lanielle, being the two better distributors of Gottlieb games, would typically get the sample games, thus the engineering samples might have wound up in the wild and into someone's private collection nowadays.

The most interesting thing to me that John Osborne said was that good games like Spirit of 76 or Card Whiz, which became popular, would keep the shop busy and the engineering team at leisure with little to do. On the other hand, whenever the design team was cranking out dogs, it kept engineering busy trying to satisfy unhappy distributors & a bored machine shop waiting for the next big hit to yield many orders.

The Oddball Add-a-Ball Games, and Italian games in general


New York and Wisconsin were big add-a-ball markets due to laws and the stigma against gambling.  Even "shoot again" features didn't satiate these laws.  Some add-a-ball games have an extra ball penalty upon tilt. Add-a-balls generally award 2.5 balls per game.

Italian add-a-ball games offer no replays at all.  The legislature had a unique way to envision how to protect the currency, and for pinball machines, it required manufacturers not to step up the ball count unit because buying 5 balls and getting 6 would devalue the Italian lira.  Another oddity about the Italian games is the "Light box advance unit" (LBAU) featuring a "card" rather than an apron that says "Buttons" rather than "Flipper buttons".  These games (such as Team One, exported to Italy as Kicker) increment a "Wow" feature that lights up lights and then takes each Wow off when you lose the ball rather than adjusting the ball count unit.  Yet even the LBAU didn't work in some Italian cities; they wanted a novelty feature, and this entailed setting the "Wow" feature to score a ton of points and only reset like 1/2 the sequence.

Transformers in Italian EM games run at 230 volts and 50 Hertz, but yet feature 6 primary taps (including 170V, 190V, and 210V for people who live far from the power distribution center). Incidentally, solenoids & flippers run hotter at 50Hz.

John might be the only representative of Gottlieb at this point :-P

This is a neat device that was fashioned for a hockey game.  As opposed to the action of foosball, this mechanism would allow the player figure on the field to spin more naturally and control a puck by rotating left or right.

A bunch of memorabilia, including a rare Q*bert drink coaster


Some EM Tips & Tricks


As John is one of the few EM designers still around, attendees were anxious to hear about some maintenance tips and tricks that have been lost to time.

All 1, 2, and 4-player light boxes are the same, except for Centigrade 37.  As such, as Gottlieb designers built an EM game, all that was necessary was for it to be compatible with the standard light boxes.

White lube goes onto any mechanical parts relating to discs.  Black lube should be applied to gears. Use just a dot of light oil between plastic & metal parts, like the metal/plastic interaction in a score reel or even a shooter rod.  The step spindle on a decagon unit would get some white lube. Parts catalogs from the 70s would mention recommended procedures.

The V relay was a neat innovation, since this relay subtracts if you press the replay button only, not if you're trying to coin in another player into your game. The price of 1 game for 25 cents, 3 for 50 cents was a cool mechanical innovation.

One interesting glitch that was expensive to operators was a Chicago Coin video game where you'd pull a shooter to start the video game. You could cause the lights on the game to flicker by messing with this shooter and/or other buttons, and the electromechanical noise through the lines would actually add credits to the game.  Gottlieb Totem had some kind of a weird trick to add either 68 or 86 credits when inserting quarters and performing some sort of interaction that might be described elsewhere on the Internet.

Developing for Solid State Machines


The development machine used to write all the game firmware was the Rockwell PPS/4.  If I recall, the language of choice was Fortran, and all the engineers on staff learned how to program, even if their background was originally electromechanical engineering with relays and solenoids.  (It's easy to think of solenoids as Boolean logic anyway; with solid state, now they have access to larger data structures and traditional math.)  However, John's account of life at Gottlieb was that after their sale to Columbia Pictures in 1977, it always felt like the company was on its last legs and about to close.  As such, there weren't too many engineers at Gottlieb that had to learn Fortran!

One device they used to test game code was called a "Romulator." It spoofed a PROM, allowing its user to enter machine code and plug it into your game.  However, its battery life was terrible.  Once you had the game code the way you wanted it, you had to run as fast as you could to the one PROM burner in the building, which was 3 offices away.  And if someone stopped you in the hall to chat... well, there goes your game!

Relating to Haunted House


According to IPDB, Haunted House was the last game John designed at Gottlieb.  As tends to be par for the course (for me anyway) when asking Gottlieb engineers about their games, he was disappointed with the outcome, complaining that the "design committee" had really taken his game and made it unrecognizable.  (On the other hand, John Trudeau lamented about the build quality of the Gold Wings game.)

If John had his way to modify the game, he would hide the trap door, and show the ball action as it happens down by the flippers rather than hiding it.  The game program got way too complicated; he'd rather see a simpler rule set, but advised me that folks tend to value items when retained in their original state rather than being "hacked" or modified in some fashion.  I had mentioned two things to him: one was to use Hall effect sensors to track the ball and only activate the correct set of flippers with one flipper button (rather than having to remember to press an alternate set of flipper buttons when it reaches a particular level), and also to add multiball to the game (and apparently this hack already exists).

Thursday, May 30, 2019

My IoT World 2019 Panels: Recap

I was graciously invited to give two panel discussions at the IoT World conference that happened last week in Santa Clara, CA.  Since the panels are not recorded, here are my thoughts and jots from before and during the Wednesday 5/15/2019 panel, entitled Wrangling IoT Data for Machine Learning.  (Actually, I'm going into even more detail than I had time for at the panel.)  Despite that the conference organizers approached me about speaking on behalf of my former employer about some topics that honestly I was given just a few weeks to investigate and could only report back with failures even now, I managed to convince them that I was fluent in other things that were more generic -- unrelated to the job I knew I was about to quit.

(Note: My thoughts and jots for the Thursday 5/16 panel are coming later.)

Business Calculations


The first question we were tasked with answering in this panel related to the business calculations that must be made before taking on a project in Machine Learning; also, how one might calculate return on investment, and what use cases make sense or not.

Hello [Company], Tell Us About Yourself


Before deciding whether to build, buy, or partner (the three ways in which one takes on any technical project), analyzing your staff's competencies needs to be top of mind.  If you don't already have staff competent in data science, IoT, or the skills you need to finish the project, then in order to be good at hiring, you need to ensure your corporate culture, rewards, mission, vision, virtues, and especially the task at hand is going to appeal to potential recruits.  You could have devoted employees who care about the outcome, want to see it through, and work together to build a well-architected solution with good continuity.  With the solution's architecture well-understood by the team as they build it, their "institutional memory" allows them to add features quickly, or at least know where they would best fit.  Or, you could hire folks who only stay for a short-term basis, with different developers spending lots of time wrapping their heads around the code and then refactoring it to fit the way they think, which takes away time from actually writing any useful new business logic.  The end result may be brittle and not well-suited for reuse.  Certainly it is healthy to add people to the team with differing viewpoints, but small teams of people should not completely change or else it will kill the project's momentum.  (Trust me, I've lived this.)

If you're not ready to augment your staff or address these hiring concerns, it's OK.  An IoT project is complex to develop because at this time, there is not an easy "in-a-box" solution; still many services are required to be integrated, such as sensor chips, boards, firmware, communication, maybe a gateway, a data analytics and aggregation engine, and the cloud.  In fact, there are plenty of valuable and trustworthy solutions providers you can choose from, and you can meet a lot of them on the IoT World vendor floor.  By buying a product that complements your company's skill set, you can deliver a more well-rounded product.  And a good service provider will have a variety of partners they work with for themselves: with a robust knowledge of the landscape, you will more likely find something that truly suits your needs.  Now, if you are starting off with zero expertise in IoT or machine learning, there are vendors who will sell you complete turn-key solutions, but it is not likely to be cheap because each domain involved with IoT requires distinct expertise, and currently integration of these domains is fraught with tedium (though there are groups looking to abstract away some of the distinctions and make this easier).

Finally, if you are clever, you may find a way in which your solution or some part of it may in fact be a value add to a solutions provider, thus giving you even more intimate access to their own intellectual property, revenue streams, or ecosystem of partners.  In this case, you are truly becoming a partner, establishing your position on the channel ecosystem, and not just being another client.

It's All About the Benjamins


Particular to the data, there is a cost involved to aggregate, store, and analyze it.  Where is it being put -- the cloud right away?  Physical storage on a gateway?  If so, what kind of storage are you buying, and what is the data retention policy for it?  If the devices are doing a common task, how do you aggregate it for analysis, especially if you are trying to train a machine learning model without the cloud?  And if you are using the cloud, what is your upload schedule if you are choosing to batch upload the data?  It had better not be at peak times, or at least not impact the system trying to run analysis too.

One big piece of food for thought is: does your data retention policy conflict with training your machine learning algorithm?  This is important from a business perspective because your data may not be around long enough, for various reasons, to make a useful model.  Or, on the flip side, your model may be learning from so much information that it might pick up contradictory signals from changing underlying conditions, such as a bull market turning into a bear market.  (However, this case can be rectified in several ways, such as feeding in additional uncorrelated attributes for each example, or picking a different model better suited to accounting for time series data.)


Perhaps far from the last monetary consideration is to examine your existing infrastructure.  Are sensors even deployed where you need them?  There could be a substantial cost of going into secure or dangerous areas.  For instance, in the oil & gas industry, there are specially designated safety zones called Class I, Division 1 where anything that could cause a spark would blow up a facility, causing major damage and loss of life.  Personnel and equipment must be thoroughly vetted so as to avoid potentially deadly situations.  Or, better yet, is there a way to monitor the infrastructure remotely or from afar, thus avoiding requiring access to such sensitive areas?  Using remote video or sound monitoring may remedy the need for intrusive monitoring, but the remote system put in place needs to be at least as reliable as the risk you assume by going into such sensitive areas in the first place.

Figuring the Return On Investment


Briefly, I want to touch on some points to keep in mind when considering the ROI on an IoT project.  Hopefully these will mostly already be obvious to you.  They break down into three categories: tangible impacts, intangible impacts, and monetization.  We should not fail to consider a project just because we cannot figure out how to quantitatively measure its impact.

First, the tangible impacts: a successful IoT project (particularly in an industrial realm) will reduce downtime by employing predictive maintenance analysis or warn before issues get out of hand.  This increases productivity, reduces RMAs/defects in products, and could reduce job site accidents as well.  In this case, it is a lot easier to measure operational efficiency.

The things that may be harder to account for include the safety mindset that might be brought about by a well-implemented IoT tool that users find helpful or essential to doing their job, rather than obtrusive or threatening their job by telling on them when they mess up.  One baseline could be comparing safety accidents year over year, but this number cannot be taken at face value; it must be compared to other numbers of productivity, and even then it might never account for other side effects of having a better safety mindset, such as improved job satisfaction, which could lead to a better home life for users of the IoT tool.

Finally, one unexpected way the product could pay off could be monetization.  By making part of it generic and selling it as a service, you might build a user base who themselves are freed up to focus on their skill sets.  Maybe you have built up a data warehouse that others might find useful, made some digital twin models of items others use, or are performing some kind of transformation on recorded data in order to derive insight.  In any event, this gives your product legs; in case the main premise of it fails or does not pay off, then at least some of the work is still valuable.

Where AI Makes Sense


I have gotten into discussions about this with people who think AI and machine learning is the answer to everything.  To me, machine learning is more than just filling out a business rule table, such as "at 6:30 I always turn down the thermostat to 73, so always make sure it's 73 by then".  In short, machine learning is most fun and applicable to a problem when the target state changes.  For instance, you're a bank trying to decide whether or not to give someone credit, but the underlying credit market changes over the course of a few years, thus affecting the risk of taking on new business.  Problems like these really get the best bang for their buck out of machine learning models because the model can be updated constantly on new data.  One way to find out when to trigger model training (if you're using a supervised approach, such as decision trees or neural networks) is to use an unsupervised approach such as K-means clustering, looking for larger groups of outliers becoming inputs to your model, and then making sure your original model is still performing well or if it has failed to generalize to potential changes in underlying conditions.

Other types of interesting problems for AI & ML are those involving image or audio data, for which researchers have tried for decades using classical mathematical approaches but for which basic neural networks showed dramatic improvements in accuracy over the classical approaches.  Neural networks are simply better at learning which features really matter to the outcome.  They will build up the appropriate filter, whether it represents some intrinsic property of a sound wave or some portion of a picture.

The most creative uses of AI and ML will enable previously impossible interactions.  Think about something crazy like teaching a speech recognition engine on custom words for your specific application and embedding it into some tiny device, or possibly using a smartphone camera to take pictures of something to learn its size.

Run Machine Learning Where Again? - Cloud, Edge, Gateway


The apps I usually build for clients usually revolve around these three characteristics:

  • Clients are typically highly price sensitive
  • Latency is a non-issue
  • Sensors send data every ~5 minutes unless conditions deteriorate
With this in mind, I am looking to reduce the bill of materials cost as much as possible, and so I make the edge as dumb as it can get.  The analytics goes into the cloud.  And even if you're a believer in data being processed on the edge, you're probably not going to get away without cloud somewhere in your project anyway.  A robust cloud provider will offer solutions for not just data aggregation/analysis, but also:
  • Device firmware updates over-the-air
  • Data visualization tools
  • Digital twins
Plus, advanced machine learning training is only taking place on the cloud through advanced clusters of GPUs or TPUs, due to the scale of data and number of epochs required to train a useful image or NLP model to a reasonable degree of accuracy.  Thus, you might as well put data into the cloud anyway (even if not streaming, then in batch) unless you plan to:
  • Generate test data manually or using other means
  • Run a GPU cluster along with your edge to do retraining
However, with advents in transfer learning, and with cheaper hardware coming out like Intel Movidius, nVidia Jetson, and Google Coral, edge training will become more of a reality.

Friction-Free Cloud


As I am most familiar with Google product offerings, Firebase allows for running models locally with no cloud connection.  Their cloud can serve an old model until training is finished.  If you wish to run your models on the edge, you will need to get clever about exactly when to deploy the new model: either in "blue/green" fashion (a flash cut to the new model all at once) or using "canary" deployments (where a small percentage of inputs are classified with the new model for starters).

Furthermore, given that we are unlikely to get rid of the cloud in IoT projects anytime soon, a big opportunity is to make tools whose user experience is the same from the cloud to the edge device in order to improve continuity and reduce frustration.

Picking an AI/ML Platform


The third question in the panel related to picking a machine learning service provider.  My general thoughts on this revolve around considering the providers who have built products useful to specific industries.  On the vendor floor, there were small companies with solutions catering to manufacturing, supply chain, chemicals, utilities, transportation, oil & gas, and more.  Larger companies have consulting arms to build projects for multiple different industries.  In either case, whoever you choose can hopefully bring domain-specific knowledge about your industry to solve your machine learning problem, and can save time by already having common digital twins in a repository or common KPIs for your assets or employees.  The hope here is that with a vendor targeting a specific industry, they will have already accumulated domain knowledge so they won't need so much "getting up to speed" about the general problem your company faces, but can jump right into solving higher-order creative problems.

However, if these services are built on top of a cloud provider that decides to crawl into the space of the specialized provider you choose to work with, it could obviate them.  For instance, if Google decides to get into a particular business where there are already players, they will offer a similar service but for free.  As such, pick one service provider positioned for growth, with staying power due to a niche or protected IP.  Or, actually pick multiple technologies or providers of different sizes to protect against one going extinct.  For instance, maybe different types of wireless radios might be useful in your application.  But imagine if you'd put all your eggs in WiMAX in the early 2010s; you wouldn't have much of a solution now.  As such, it is helpful to find tools and technologies that are at least interoperable with partners, even if the use case is specific.

Other Considerations In Passing


Besides what was addressed above in the panel, there were some remarks prepared in case we had additional time (but it seems we ran out).

Tune In To the Frequency of Retraining


Models over time will likely need to adapt to changing inputs.  A good machine learning model should be able to generalize to novel input -- that is, make correct predictions on data that hasn't been seen before.  However, there are a few indicators that might indicate it's time to retrain or enhance the model.
  • More misses or false positives.  In data science parlance, a confusion matrix is the breakdown of how many items of a given class were labeled into which class.  The diagonal of this matrix is the correct answer (i.e. class 1 -> 1, 2 -> 2, and so on).  Thus, if numbers outside the diagonal start getting high, this is a bad sign for the model's performance on accuracy.
  • Changing underlying conditions.  As described earlier, one could imagine this as a bull market turning into a bear market.
However, there could be multiple paths to monitor the need for retraining, or even mitigate it.
  • Consider a push/pull relationship between supervised and unsupervised models, as described above.  If outliers are becoming more common in unsupervised models, consider making sure your supervised models are cognizant of these examples by running more training.  Perhaps new classes of objects need to be introduced into your supervised models.
  • Maybe the wrong model is at play.  There could be a fundamental problem where, for example, a linear regression is in play where a logistic regression should really be used.
  • Perhaps the business KPIs actually need to be re-evaluated.  Are the outcomes produced by the data in the right ballpark for what we even want to know about, or are we coming up with the wrong business metric altogether?

In the quest for real-time analysis of your model, it should be analyzed whether or not such a task is attainable, or even required.  Factors that could drive whether to do it could include:
  • Is it mission-critical?
  • How many objects need to be analyzed in real-time?  Too many objects will increase demand on the processor.
  • Is analysis cheap enough at the edge to conduct with modern silicon?
I’ve usually advocated against using deep learning when there are simpler mathematical models requiring less compute, even if it takes more feature engineering up front.  However, it’s probably not long until we have silicon so cheap that we can run and even train such advanced models with relative ease.  And the good news is the more powerful the analysis engine (i.e. operating on 2D video rather than 1D sensor data), the more analyses we can draw from the same data, requiring less hardware updates and instead relying on simpler updates to firmware and software.

One particular question to the panel involved how humans educate machines.  Currently, we rely on annotations on data to make it obvious where we should be drawing from.  This can be something as simple as putting a piece of data into an appropriate database column.  However, unstructured data like server logs is becoming ever more important for deriving insights.

But maybe on the flip side of this is when do machines begin to educate each other, and educate humans as well?  The most obvious play on this regards decision support.  If humans can become educated by an AI tool in an unobtrusive way to, say, be safer on the job, then this is one such way we can make an impact on ourselves through machines.  Another good way is to gain insight into decisions being made for regulatory purposes.  As certain institutions are audited to ensure there is no discrimination or advantages being given to certain parties, a machine learning model needs to be auditable and educate its human interpreters into its behavior.  And education doesn't have to be hard; even kids in middle school are fully capable of playing with Google's machine learning tools to build amazing products, unbounded by years of skepticism formed by bad engineering experiences.

However, the more dubious problem is when machines train other machines.  While this could be a good thing in some applications, like hardening security, right now you can see generative adversarial networks (GANs) being used to create deep fakes.  Now it is possible to spoof someone's voice, or even generate fake videos of events that never happened, all to push an agenda or confuse people in trial courts.

Epilogue


Obviously, this is a lot more than can be said in 40 minutes, and frankly more than I even intended to write.  However, it is a complex field right now and all good food for thought, and hopefully by airing out some of these thoughts, it will help simplify and demystify the use of AI in IoT so we can converge on a more universal standard and set of best practices.

Thursday, February 21, 2019

The Fastest Path to Object Detection on Tensorflow Lite

Ever thought it would be cool to make an Android app that fuses Augmented Reality and Artificial Intelligence to draw 3D objects on-screen that interact with particular recognized physical objects viewed on-camera?  Well, here's something to help you get started with just that!

Making conference talks can be a chicken-and-egg problem.  Do you hope the projects you've already worked on are interesting enough to draw an audience, or do you go out on a limb, pitch a wild idea, and hope you can develop it between the close of the call for papers and the conference?  Well, in this case, the work I did for DevFests in Chicago and Dallas yield a template for talks formulated by either approach.

The most impressive part is that you can recreate for yourself the foundation I've laid out on GitHub by cloning the Tensorflow Git project, adding Sceneform, and editing (mostly removing) code.  However, it wasn't such a walk in the park to produce.  Here are the steps, edited down as best I can from the stream of consciousness note-taking that this blog post is derived from.  It has been distilled even further in slides on SlideShare, but this might give you some insights into the paths I took that didn't work -- but that might work in the future.

  • Upgrade Android Studio (I have version 3.3).
  • Upgrade Gradle (4.10.1).
  • Install the latest Android API platform (SDK version 28), tools (28.0.3), and NDK (19).
  • Download Bazel just as Google tells you to.  However, you don't need MSYS2 if you already have other things like Git Shell -- or maybe I already have MinGW somewhere, or who knows.

Nota Bene: ANYTHING LESS THAN THE SPECIFIED VERSIONS will cause a multitude of problems which you will spend a while trying to chase down.  Future versions may enable more compatibility with different versions of external dependencies.

Clone the Tensorflow Github repo.

A Fork In the Road


Make sure you look for the correct Tensorflow Android example buried within the Tensorflow repo!  The first one is located at path/to/git/repo/tensorflow/tensorflow/examples/android .  While valid, it's not the best one for this demo.  Instead, note the subtle difference -- addition of lite -- in the correct path, path/to/git/repo/tensorflow/tensorflow/lite/examples/android .  

You should be able to build this code in Android Studio using Gradle with little to no modifications.  It should be able to download assets and model files appropriately so that the app will work as expected (except for the object tracking library -- we'll talk about that later).  If it doesn't, here are some things you can try to get around it:

  • Try the Bazel build (as explained below) in order to download the dependencies.
  • Build the other repo at path/to/git/repo/tensorflow/tensorflow/examples/android and then copy the downloaded dependencies into the places where they would be placed.

However, by poking around the directory structure, you will notice is the population of several BUILD files (not build.gradle) that are important to the Bazel build.  It is tempting (but incorrect) to build the one in the tensorflow/lite/examples/android folder itself; also don't bother copying this directory out into its own new folder.  You can in fact build it this way, if you remove the stem of directories mentioned in the BUILD file so you're left with //app/src/main at the beginning of the callout of each dependency.  By doing this, you will still be able to download the necessary machine learning models, but you will be disappointed that it will never build the object detection library.  For it to work all the way, you must run the Bazel build from the higher-up path/to/git/repo/tensorflow folder and make reference to the build target all the way down in tensorflow/lite/examples/android .

For your reference, the full Bazel build command looks like this, from (the correct higher-up path) path/to/git/repo/tensorflow :
bazel build //tensorflow/lite/examples/android:tflite_demo

Now, if you didn't move your Android code into its own folder, don't run that Bazel build command yet.  There's still a lot more work you need to do.

Otherwise, if you build with Gradle, or if you did in fact change the paths in the BUILD file and copied the code from deep within the Tensorflow repo somewhere closer to the root, you'll probably see a Toast message about object detection not being enabled when you build the app; this is because we didn't build the required library.  We'll do this later with Bazel.

Now, let's try implementing the augmented reality part.

But First, a Short Diatribe On Other Models & tflite_convert


There's a neat Python utility called tflite_convert (that is evidently also a Windows binary, but somehow always broke due to failing to load dependencies or other such nonsense unbecoming of something supposedly dubbed an EXE) that will convert regular Tensorflow models into TFLite format.  As part of this, it's a good first step to import the model into Tensorboard to make sure it's being read in correctly and to understand some of its parameters.  Models from the Tensorflow Model Zoo imported int0 Tensorboard correctly, but I didn't end up converting them to TFLite, probably due to difficulties, as explained in the next paragraph.  However, models from TFLite Models wouldn't even read in Tensorboard at all.  Now these might not be subject to conversion, but it seems unfortunate that Tensorboard is incompatible with them.

Specifically, tflite_convert changes .pb files or models in a SavedModel dir into .tflite format models.  The problems with tflite_convert on Windows were firstly finding just exactly where Pip installs the EXE file.  Once you've located it, the EXE file has a bug due to referencing a different Python import structure than what things are now.  Building from source also has the same trouble; TF 1.12 from Pip doesn't have the same import structure that tflite_convert.py expects.  Easiest thing to do is just download the Docker repo (on a recent Sandy Lake or better system -- which means that even my newest desktop with an RX580 installed can't handle it) and use tflite_convert in there.

Looking Into Augmented Reality


Find the Intro to Sceneform codelab.  Go through the steps.  I got about halfway through it before taking a pause in order to switch out quite a lot of code.  The code I switched mostly revolved around swapping the original CameraActivity for an ArFragment and piping the camera input into the ArFragment into the Tensorflow Lite model as well.  More on the specifics can be seen in the recording of my presentation in Chicago (and in full clarity since I painstakingly recorded these code snippets in sync with how they were shown on the projector).

To build Sceneform with Bazel, first I must say it's probably not possible at this time.  But if you want to try (at least on Windows), make sure you have the WORKSPACE file from Github or else a lot of definitions for external repos (@this_is_an_external_repo) will be missing, and you'll see error messages such as:

error loading package 'tensorflow/tools/pip_package': Unable to load package for '@local_config_syslibs//:build_defs.bzl': The repository could not be resolved

After adding in the Sceneform dependency into Bazel, I also faced problems loading its dependencies.  There were weird issues connecting to the repository of AAR & JAR files over HTTPS (despite the Tensorflow Lite assets worked fine).  As such, I was greeted with all the things Bazel told me that Sceneform depended on...one at a time, since Bazel would not tell me all the dependencies of all the libraries at once.  I was stuck downloading about 26 files one at a time, as I would continuously download libraries that depended on about 3 others themselves.  Or not... so I wrote a script to automate all this.

The following script, while useful for its intended purpose, alas did not solve its intended goal because once you do all this, it claims it's missing a dependency that you literally can't find on the Internet anywhere.  This leads me to believe it's currently impossible to build Sceneform with Bazel at all.  Nevertheless, here it is, if you have something more mainstream you're looking to build:

import ast
import re
import urllib.request

allUrls = []
allDeps = []

def depCrawl(item):
if (item['urls'][0] not in allUrls):
allUrls.append(item['urls'][0])
depStr = ""
for dep in item['deps']:
depCrawl(aar[dep])
depStr += "\n%s_import(" % item['type']
depStr += "\n  name = '%s'," % item['name']
filepath = ":%s" % (item['urls'][0].split("/"))[-1]
if (item['type'] == "java"):
depStr += "\n  jars = ['%s']," % filepath
else:
depStr += "\n  aar = '%s'," % filepath
if (len(item['deps']) > 0):
depStr += "\n  deps = ['%s']," % "','".join(item['deps'])
depStr += "\n)\n"
if (depStr not in allDeps):
allDeps.append(depStr)

f = ""

with open('git\\tensorflow\\tensorflow\\lite\\examples\\ai\\gmaven.bzl') as x: f = x.read()

m = re.findall('import_external\(.*?\)', f, flags=re.DOTALL)
print(len(m))
print(m[0])

aar = {}

for item in m:
aarName = re.search('name = \'(.*?)\'', item)
name = aarName.group(1)
aarUrls = re.search('(aar|jar)_urls = (\[.*?\])', item)
type = "java" if aarUrls.group(1) == "jar" else "aar"
urls = ast.literal_eval(aarUrls.group(2))
aarDeps = re.search('deps = (\[.*?\])', item, flags=re.DOTALL)
deps = ast.literal_eval(aarDeps.group(1))
deps = [dep[1:-5] for dep in deps]
dictItem = {"urls": urls, "deps": deps, "type": type, "name": name, "depStr": aarDeps.group(1)}
aar[name] = dictItem
if (len(urls) > 1):
print("%s has >1 URL" % name)

#depCrawl(aar['com_android_support_support_v4_28_0_0']);
depCrawl(aar['com_google_ar_sceneform_ux_sceneform_ux_1_4_0']);
print(len(allUrls))
print(len(allDeps))
print("".join(allDeps))

for url in allUrls:
print("Downloading %s" % url)
urllib.request.urlretrieve(url, 'git\\tensorflow\\tensorflow\\lite\\examples\\ai\\%s' % url.split("/")[-1])


The important part of this script is toward the bottom, where it runs the depCrawl() function.  In here, you provide an argument consisting of the library you're trying to load.  Then the script seeks everything listed as a dependency for that library in the gmaven.bzl file (loaded from the Internet), and then saves it to a local directory (note it's formatted for Windows on here).

Giving Up On Bazel For the End-To-End Build


Nevertheless, for the reasons just described above, forget about building the whole app from end to end with Bazel for the moment.  Let's just build the object tracking library and move on.  For this, we'll queue up the original command as expected:

bazel build //tensorflow/lite/examples/android:tflite_demo

But before running it, we need to go into the WORKSPACE file in /tensorflow and add the paths to our SDK and NDK -- but not so much as to include references to the specific SDK version or build tools version, because when they were in, it seemed to get messed up.

Now:

  • Install the Java 10 JDK and set your JAVA_HOME environment variable accordingly.
  • Find a copy of visualcppbuildtools_full.exe, and install the following:
    • Windows 10 SDK 10.0.10240
    • .NET Framework SDK
  • Look at the Windows Kits\ directory and move files from older versions of the SDK into the latest version
  • Make sure your Windows username doesn't contain spaces (might also affect Linux & Mac users)
  • Run the Bazel build from an Administrator command prompt instance
  • Pray hard!
Confused by any of this?  Read my rationale below.


Eventually the Bazel script will look for javac, the Java compiler.  For this, I started out installing Java 8, as it was not immediately clear which Java that Bazel was expecting to use, and according to Android documentation, it supports "JDK 7 and some JDK 8 syntax."  Upon setting up my JAVA_HOME and adding Java bin/ to my PATH, it got a little bit further but soon complained about an "unrecognized VM option 'compactstrings'".  Some research showed similar errors are caused by the wrong version of the JDK being installed, so I set off to install JDK 10.  However, according to Oracle, JDK 10 is deprecated, so it redirected me to JDK 11.  Then, I had another issue with some particular class file "has wrong version 55.0, should be 53.0".  Once again, this is due to a JDK incompatibility.  I tried a little bit harder to seek JDK 10, and eventually found it but had to login to Oracle to download it (bugmenot is a perfect application to avoid divulging personal information to Oracle).

Once installing JDK 10, I then came across an error that Bazel can't find cl.exe, relating to the Microsoft Visual C++ '15 compiler & toolchain required to build the C++ code on Windows.  However, downloading the recommended vc_redist_x64.exe file didn't help me, since the installer claims the program is already installed (I must have installed Visual C++ a long time ago).  However, the required binaries are still nowhere to be found in the expected locations.  I ended up finding an alternate source, a file called "visualcppbuildtools_full.exe".  Unfortunately, this installs several GB of stuff onto your computer.  I first selected just the .NET Framework SDK to hasten the process, save hard disk space, and avoid installing unnecessary cruft, but then it couldn't find particular system libraries, so I had to select Windows 10 SDK 10.0.10240 and install that as well.

Trying again with the build, now it can't find Windows.h.  What?!?  I should have just installed the libraries & include files with this SDK!  Well, it turns out they did install correctly, but according to the outputs of SET INCLUDE from the Bazel script, it was looking in the wrong directory: C:\Program Files (x86)\Windows Kits\10\Include\10.0.15063.0 rather than C:\Program Files (x86)\Windows Kits\10\Include\10.0.10240.0.  To make my life easier, I just copied all the directories from 10240 into 15063, renaming the original directories in 15063 first.  I later had to do the same thing with the Lib directory, in addition to Include.

Upon setting this up, I made it to probably just about the completion of the build:

bazel-out/x64_windows-opt/bin/external/bazel_tools/tools/android/resource_extractor.exe bazel-out/x64_windows-opt/bin/tensorflow/lite/examples/android/tflite_demo_deploy.jar bazel-out/x64_windows-opt/bin/tensorflow/lite/examples/android/_dx/tflite_demo/extracted_tflite_demo_deploy.jar
Execution platform: @bazel_tools//platforms:host_platform
C:/Program Files/Python35/python.exe: can't open file 'C:\users\my': [Errno 2] No such file or directory

Aww, crash and burn!  It can't deal with the space in my username.  Instead, make an alternate user account if you don't already have one.  Now one thing you may notice is that the new account doesn't get permission to access files from the original user account, even if you define it as an Administrator.  Using Windows "cmd" as Administrator will finally allow you success with your Bazel build.

!


Look closely; this is the image of success.

Now, you're not out of the woods yet.

Tying It All Together


Now, you need to actually incorporate the object detection library in your Android code.

  • Find the libtensorflow_demo.so file built by Bazel.  It's probably been stashed somewhere in your user home directory, no matter your operating system.
  • Copy this file into your Android project.  Remember where you normally stash Java files?  Well this will go into a similar spot called src/main/jniLibs/<CPU architecture> , where <CPU architecture> is most likely going to be armeabi-v7a (unless you're not reading this in 2019).
  • To support this change, you'll also need to add a configuration to your build.gradle file so that it will only build the app for ARMv7; otherwise if you have an ARMv8 (or otherwise different) device, it won't load the shared library and you won't get the benefit of object tracking.  This is described in the YouTube presentation linked above.
The final thing to do to get this all working is to add in the rest of the Sceneform stuff.  At this point, if you've followed the coding instructions in the YouTube video linked above that mentions what to change, then all you should need to do is build the Sceneform-compatible *.sfb model.

But hold tight!  Did you see where the Codelab had you install Sceneform 1.4.0 through Gradle, but now the Sceneform plugin offered through Android Studio is now at least 1.6.0?  Well if you proceed in building the model, you won't notice any difficulties until the first time your app successfully performs an object detection and tries to draw the model...only to realize the SFB file generated by the plugin isn't forward-compatible with Sceneform 1.4.0 which you included in your app.  The worst part is that if you try to upgrade Sceneform to 1.6.0 in Gradle, your Sceneform plugin in Android Studio will refuse to work properly ever again.

Your two solutions to this problem:
  • Rectify the Sceneform versions (plugin & library) prior to building anything, or at least making your first SFB file
  • Just use Gradle to build your SFB file, as shown in the YouTube video
Turns out you don't need the Sceneform plugin in Android Studio at all, and after a while it'll probably seem like a noob move, especially if you have a lot of assets to convert for your project or you're frequently changing things.  You'll want it to be automated as part of your build stage.

The big payoff is now you should be able to perform a Gradle build that builds and installs an Android app that:
  • Doesn't pop a Toast message about missing the object tracking library
  • Performs object detection on the default classes included with the basic MobileNet model
  • Draws Andy the Android onto the detected object

Any questions?

This is a lot of stuff to go through!  And I wonder how much of it will change (hopefully be made easier) before too long.  Meanwhile, have fun fusing AI & AR into the app of your dreams and let me know what you build!

As for me, I'm detecting old computers: (but not drawing anything onto them at this point)

* Not responsible for the labels on the detected objects ;) Obviously the model isn't trained to detect vintage computers!

And for the sake of demonstrating the whole thing, here's object detection and augmented reality object placement onto a scene at the hotel right before I presented this to a group: