Thursday, January 11, 2018

Tensorflow, from Scratch to the Cloud

Having used plenty of pre-built machine learning models using Tensorflow and GCP APIs, and having gone through the pains of setting up Tensorflow on plain vanilla Amazon EC2 AMIs (not even the pre-configured ones with all the goodies installed already) and getting it to run classifications through the GPU and on Tensorflow Serving, I thought it was high time I try coding my own machine learning model in Tensorflow.

Of course, the thing most folks aspire to do with Tensorflow when starting out is to build a neural network.  I wanted to model my basic neural network based on the MNIST examples just to get my feet wet, but use a dataset different than MNIST.  There are many datasets on Kaggle to choose from that could fit the bill, but I decided to use one of my own from a while back.  It consists of skin tones found in pictures, cropped carefully and aggregated into a dozen BMP files.  Don’t question where I got these skin tone pixels from, but rest assured that a wide variety of skin colors were covered, captured by mostly nice cameras under ideal lighting conditions.  To be honest, it’s a little bit less interesting than MNIST, because instead of 10 distinct classes, there are only two: skin and not-skin.

Performance from All Angles

Previous research shows that converting the pixels from the RGB space into the HSV space leads to better neural network performance.  Luckily, this is easy to do using the Python Imaging Library.  However, performance of neural net computation (i.e. the speed of the training phase) has improved dramatically.  Just seven years ago, I made a neural net in Weka to train on skin tones.

Back in late 2010, it was fancy if you had a system with, say, 200 CUDA cores.  Back then, my Lenovo W510 laptop shipped with 48 CUDA cores.  Now, I have a roughly 18-month old Aorus X3 laptop that boasts 1280 CUDA cores.  Note from above that there are orders of magnitude more nodes in the Tensorflow neural net, and that the Tensorflow net is also considering all 3 values in the HSV space (quantized 0-99, thus 1 million possible values), and not just the best 2 out of 3 as in the previous research (quantized 0-255, thus 65,536 possible values).  The performance comparison is as such:

Model Type Time
Tensorflow (GPU 256) 90.9
Weka (CPU 5) 17
Weka (CPU 256) 750

Building the Network

Ultimately, the trickiest part of building the network was formatting the data in a way that would be meaningful to train on, and setting up the data types of the training data and variables to match up altogether and keep the training functions happy.

Ultimately, I ran into four pitfalls, one operational, one in code, one with clashing resources, and one out of stupidity or blindness:

  1. Can’t pass in a tensor of (1000000, ?) into a variable expecting (1000000, ?).  This one eluded me for the longest time because I was positive the correct tensor was being fed into the training function.  I finally saw the light when I decided to split up the training data so that training is not run on the entire dataset at once.  In doing so, I decided to have it train on 2,000 examples at a time, rather than the whole million.  Then, the error message changed, and it immediately became apparent that the error was being caused by the computation to calculate how many HSV values have [0, 1, 2, …] appearances in the dataset, not the calculation of the neural network itself.  How frustrating it is that such an amount of time was wasted trying to look into an issue that was ultimately caused by something merely calculating metrics that I had already discovered earlier and had since moved on from.
  2. Can’t allocate 0B of memory to the cross_entropy or optimizer or whatever it was.  My Tensorflow is only compiled to run on the GPU, I’m pretty sure, and doesn’t handle CPU computations.  I had a separate terminal window open with the Python <shell> to quickly run some experiments before I put them in the real training code.  However, having this open seemed to tie up resources needed by the program, so closing the extra terminal eliminated my memory allocation error.
  3. Uninitialized variable <something> for the computations of precision & accuracy.  For this, I had to run init_local_variables right before <either defining them in the graph or running them>.  It was not good enough to run this right after the line “with tf.Session() as sess”.
  4. Weird classifications.  I was expecting a binary classification to behave like a logistic regression and simply output “0” if it’s not skin and “1” if it’s skin.  Unfortunately, the neural network tended to return results more like a linear regression, and was giving me nonsensical values like 1.335 and -40.8987 for the single class.  I ended up changing the output layer of the neural network to reflect what I had originally, which called for two classes of “skin” and “not-skin”.  When I made this change, it was possible that the calculated values when evaluating any given valid HSV value could still be totally out of line with the “0” and “1” I would expect (and also not a cumulative probability adding up to 1), but at least by taking np.argmax() of the output layer, I can turn it into the outcome I was expecting.  And, it actually works very well, with precision and recall exceeding 97 or 98% with not a whole lot of effort.

All This Effort Just To Find a Better Way

Now, after having written my model with low-level Tensorflow routines, and trained it to the point where I thought it would return really good results, it was time to try to put it on the cloud.  For this, the output from Saver (a “checkpoint”) basically needs to be converted into a SavedModel.  One can do this by writing a whole lot of code defining the exact behavior of the neural network in the first place, but it seems to be much more efficient (with the end goal being to exporting the model to the cloud) to use an Estimator object to define the neural network.

And this doesn’t even take into account what kind of goodness lies before me if I simply decide to use Keras…

However, I wanted to see how feasible it was to write a converter from checkpoint to SavedModel using the standard Tensorflow APIs.  Scouring for examples and tutorials, and diving in to find out what exactly all this code was doing to see how I could shorten it into the most concise possible example, I realized there were new APIs that these examples weren't leveraging.

Basically, the two main principles involve:




There are several signature_defs you can choose from:


It makes the most sense to me to use the "predict" when you're looking to perform inference on a pre-trained example using your SavedModel, use the "regression" when want to solve an equation based on inputs and outputs you provide directly into the function (and not through a SavedModel), and use the "classification" for learning and inferring the class in which something belongs to, once again, by using examples directly into this function and not through the SavedModel.  Now, I haven't quite used the latter two functions yet, so these are purely assumptions, so I'll update this if I find it to be false later.

As for the meta-graph and variables, all you need to do to make a prediction-flavored SavedModel working on the cloud is to populate these three positional variables with the correct terms (not to mention your Tensorflow session variable):

 tags: just set to tf.saved_model.tag_constants.SERVING.

signature_def_map: This is an object conisisting of mappings between one or more Tensorflow tag constants representing various entry points and the signature defined above.  If you assigned the output of predict_signature_def(...) to prediction_signature, then you would probably want to use this:


Finally, define in your SavedModelBuilder an operation to reinitialize all graphs.  It is not a problem to run this one if you are mobing around a lot.


From here on out, you just need to actually execute the exact Python scripts to train the model and build it into the correct format.  For the specific code I used to build this skin tone detector and convert it to a format usable for Google Cloud ML Engine, see my GitHub repo at

Making Inferences With Your Model

Assuming you have gone through the steps of uploading the contents of your entire SavedModel directory to a bucket in Google Cloud, and you have gone through the steps of initializing an ML Cloud Engine "Model" pointing to this bucket, you now need to create a script that can call Cloud ML Engine to make predictions.  The Python code to do this is also very simple.  The main gist is that, unlike for other GCP API accesses from Python, you must configure the path to your service account using an environment variable rather than with the Python libraries.

Once you have done this, there is some very simple code from Google that demonstrates making an inference with your cloud model.  However, one thing they don't make clear is how to make multiple inferences at once.  If you do this incorrectly, you may run into an error such as Online prediction is not a matrix.  If this happens to you, study how the instances variable you pass into predict_json() must be an array of dictionaries:

instances = [{"hsv": [5., 49., 79.]},
     {"hsv": [4., 59., 63.]},
     {"hsv": [21., 67., 2.]},
     {"hsv": [99., 99., 99.]},
     {"hsv": [5., 35., 83.]},
     {"hsv": [5., 29., 71.]}]

One dictionary key must be defined for each expected input into the graph.  Recall that you defined the expected inputs in the inputs field of the call to predict_signature_def() earlier, and the keys required will be the same as the keys you defined in this field.  One bit of good news about the Cloud ML Engine is that it seems to only count one usage even if you specify multiple input values to the API (i.e. multiple elements in the instances array) in a single request.

Again, here is the pointer to my GitHub repo where you can see how all this fits together:

Thursday, December 28, 2017

A Cold Wind From the USSR - Part 1

In my retrocomputing adventures, I have sought things not purely based on style alone, but based on combinations of processor, wide 3rd-party adoption and support, and "interesting" factor.  Having picked up at least one system from each common type of processor (8088, x86, Zilog Z80, 6502, 68xx, 68xxx) and the schemes popular in the USA (IBM & compatibles, Apple, Commodore, Amiga, Atari, Tandy), I thought it was time to try for something from overseas.  Plenty of computers were made for the UK market, such as the Amstrad, Acorn, and BBC Micro, and Japan had many interesting varieties of computers offered by NEC alone, not to mention their other manufacturers such as Sharp and Fujitsu.

However, really not much is known (in English) about computers from behind the Iron Curtain.

A Brief History of Why This Is a Thing

One thing that is for sure: in the 1970s, the Soviet Union, in an effort to keep up with rapidly-evolving Western technology, decided to put an end to all the custom hardware implementations (and poor reliability that comes along with small-run manufacturing) and simply pirate designs from the West.  By studying patents for, say, the IBM/360 mainframe, the Soviets (and even domestic competitors to IBM) could begin to design similar computers.  Furthermore, President Nixon's detente saw relaxed export restrictions on computer hardware in 1974.  For what couldn't be imported or smuggled into the Eastern bloc physically (Zilog Z80-based systems were prevalent, such as a ZX Spectrum clone, as well as the "Pravetz" line of famously unreliable 6502 Apple clones), they would copy and manufacture their own designs of systems, especially involving DEC's PDP-11 systems and Intel's 8080 chips, and they would devote time to copying, rewriting, or reverse-engineering the popular software that went along with them as well.

Amidst the copying, which ran rampant in the Eastern bloc among both hardware and software, one interesting innovation was the advent of DEC's various PDP-11 architectures being shrunken down into a single chip, such as the K1801BM1, for which the Soviets were designing "microcomputers" by 1981.  At that very moment in history, IBM was taking the western world by storm with the introduction of their PC 5150, and totally caught DEC off-guard.  DEC scrambled to answer IBM, but the production of the DEC Professional line of microcomputers was too little, too late.  As such, most people remember PDP-11 systems as mini-fridge (or larger)-sized minicomputers, rather than desktop micros.  However, many Soviet microcomputers were based on these PDP-11 clones, including the "Elektronika 60", the computer which Tetris was originally programmed on.

Fast-forward to mid-2017, as I am having lunch at Google I/O, and a thought exploded in my head.  Instead of going with common Western European computers or Japanese computers which are popular among hard-core gaming enthusiasts and hardware collectors, why not go for something out of the Eastern bloc?  We never hear anything about those machines.

The Three-Month Sales Cycle

Shortly thereafter, I was paging through the For Sale section of the vcfed forums and found this interesting post offering three Soviet DVK-3 PDP-11 clone computers -- well, looks like I could turn my thought into a reality!  Now, at the time, I didn't know the first thing about Russian computers, not even that their power system runs at 240V rather than American 120V.  I discussed various issues with the seller for the next 3 months or so:

  • Does it work? - Among the three computers, one fails memory tests, two don't even come up.  Oh, wait; one works, one fails memory tests, and one just prints random dots on the screen.
  • How much does it cost? - It is not easy to search for such things unless you know Russian and/or can type it into Google, and then you are relegated to looking at online forums because their primary auction website is closed.  Some common low-end computers and peripherals are offered on eBay fairly consistently, though.
  • What will it take to ship it here? - About $1,050 was the original estimate from CDEK, the shipping company.  As such, this is a serious piece of computer, not just the little home-user hobbyist kit you can find on eBay.  Luckily, CDEK ended up charging less than that, but the savings were spent on other additions & protections noted later.
  • What kind of power does it take? - Well, 240V of course, but also at 50Hz, not 60Hz.  He offered me some solutions for a step-up transformer, and of course, the local makerspace has a fancy transformer and frequency converter I ended up using to run tests and find out exactly what I needed.
  • What kind of media will it come with? - Floppy drives & hard drive.  The floppy disks did not end up making it through customs.
  • Does the hard drive work? - The hard drive would not boot the OS, and the HDD controller seems to cause a memory error for that matter.
  • How do I interface with the hard drive? - The HDD is basically an ST-412 clone, 10MB MFM.
  • Does it have documentation? - There are various booklets that come with it.  The document called МАТЕМАТИЧЕСКОЕ ОБЕСЛЕЧЕНИЕ ЭВМ, which Google Translate said was "Mathematical Destruction of Computers," was not included, but I got a .DOC file of it which is evidently a PDP-11 assembly instruction reference.  I ended up receiving a "Passport" (technical reference) for the MC1201.03 mainboard and two programming references, one seemingly centered around DEC's RT-11 OS, errors, tests, and assembly language, and the other possibly discussing ФОДОС (FODOS) which seems to be a bit more custom.
  • What do I do if it needs to be fixed? - Donate it to a museum.  The rest of the world makes DIP-socket chips with pin spacings of 0.1" (2.54mm), but Soviets only made chips to that standard for export only.  Most of their serious-grade stuff was made at a "metric inch" of just 2.5mm, meaning that for all but the smallest chips, replacing any failing chip with an equivalent American version would be difficult. 
  • How can it be packed for such a long trip and not break tragically? - The seller built a large crate for me that exactly enclosed the original box and Styrofoam packaging that came with the system to begin with.
  • Which shipper will be the gentlest on it? - Well, I hope UPS won't bust it when they take it from CDEK.
  • Can we get shipping insurance? - No.  And also, CDEK wouldn't provide a crate for international use, thus why the seller had to build one.
And then, once I told him to take my money and go forward with shipping, besides the hope he wouldn't simply cut and run with such a large amount of money (particular just for the shipping cost alone), there were issues with getting such a thing shipped internationally and through customs.  The seller also went on a two-week vacation, which I was hoping he would not disappear nor perish from, but luckily for me, he came back and sent the package to me right away.  It was weird, because they needed a number of documents from him, which required things like his birthdate.  It's not like he's trying to get a passport, he's just trying to send out a computer!

And Now, the Fun Part...

Fortunately for me, on September 5, a large Russian crate was placed on the corner of the retaining wall on my front yard.  (I was hoping to catch the UPS guy to tell him to deliver it around back; what ended up happening led to a 90-minute struggle to lug this large crate up the rest of the hill, over the front stairs, and around side of the house.)  Once I got it into the garage, I unpacked it and made sure it was in one piece -- indeed it was! -- but was immediately overwhelmed with some sort of stench and got a runny nose quickly after opening the box.  There was a lot left to do before I would feel comfortable powering it on for the first time, so I put it all back in the crate and went back to work.

I must say the Russians have done a great job of communicating and sharing information about this system.  There are plenty of grainy photocopies of old books and schematics to be found online, and numerous forum posts that translate into broken English.  There is apparently even a community of people making new hardware for these systems, just as people make new hardware for old Commodores, Apples, and so on.  And, as one Imgur comment says, "In Moscow, these are just computers."  Not retro, but still things people use every day, so one would hope they'd be well-documented!

A large crate appeared atop my front steps.  The hydraulic lift cart is waiting patiently for me to get some bricks to elevate the 150-lb. wooden crate to a level where I can easily push it on.

The computer, with its original packing material

Stay tuned for Part 2, where I talk about what all it took to get this system checked out and powered on for the first time.

Thursday, December 7, 2017

Journey to a Fully Custom Pinball Machine - Part 2

From walking the show floor at Texas Pinball Fest 2016, I couldn't help but get the vibe that something novel and big would be in store for TPF 2017 -- something beyond the big but also typical/expected releases of commercial games such as The Big Lebowski and Ghostbusters (more on those later), but in fact the ushering in of a new era: totally home-brew and open-source pinball.  As the re-themed games became more impressive from 2015 to 2016, and with easy access to leaning about hardware, fabrication techniques to develop new things and restore/renew/improve on old things, and a rejuvenated fascination with pinball in general, it was not surprising to me in the least that we would see someone totally knock it out of the park like Scott Danesi did at TPF 2017 with Total Nuclear Annihilation.

However, just in case Scott wasn't there with his amazing game (for which I placed one of the pre-orders slated to ship sometime in 2018), I wanted to produce some work as well in order to show what could be done in this realm by just two people working hard together over a short period of time.  Unfortunately, while this article accounting my activities around the Wylie 1-Flip custom pinball machine is long overdue and probably should have been published way back in May, something big transpired that really made me put it off for a long time.  The basis for the electronics in Wylie 1-Flip was the Intel Edison development kit, since it was a convenient mix of an x86-based chip running Linux combined with an interface supporting Arduino sketches without having to wait as long as a Raspberry Pi does to boot.  However, as you may know, Intel decided to discontinue much of its 2017 hobbyist IoT line, leaving me lamenting the significant time invested into learning a dead platform and lots of memory-hogging tabs open in Chrome for my research.  (Well, I'm not really lamenting the time; after all, I did study Latin, a famously dead language, and continue to tinker with retro-computers that haven't been manufactured nor supported in decades.  However, using a discontinued platform doesn't exactly usher the art of pinball into the cutting edge.)

Where We Left Off

In case you missed Part 1 of this series, there was another goal besides making an awesome custom game to go along with the trend I predicted for TPF 2017: it was also to impress my coworkers and continue producing mind-blowing projects to show off alongside their top creative talent at various internal and external events.  You got a slight peek at the CAD design process of the game, and the frustration around installing the various mechanisms that go on top and below the playfield, but then also learned at a high level the enhancements and innovations that went into it.  Here is where I start describing the innovations at a lower level.

So I can finally close those Chrome tabs...

Despite that Intel Edison is no longer a thing, I wanted to still describe for you the stumbling blocks in working with the Edison platform that cost me so much time and trouble.  Granted, there's always a learning curve with anything, but here I was biting off a whole lot at once by trying to basically hand-route all the electronics for the game and write controlling logic for it using a platform I hadn't explored too deeply for its hardware capabilities before in the two weeks or so I had left between finishing the cabinet and actually taking the game to shows.  Yes, it was pretty insane, given that a "Makers Gonna Make" event was to be held on 3/2, followed quickly by TPF 2017 starting on 3/24.  However, Stacy decided to take a buyout package from her employer at the time and took a couple months off work, and believe it or not, she spent a great deal of her time off dealing with artwork and 3D modeling the various parts for this machine.

As the Edison supported a couple different modes of development (one involving the Arduino IDE and another in standard C++ with gnu/gcc through MRAA), I had to choose which one would suit me the best.  It looked like, at first, the Arduino approach would be simple because it was a familiar programming style and way less verbose than the C++ constructs of MRAA.  My first approach was to utilize interrupts to watch for changes in state on any of the sensors, but if I recall correctly, it was really only feasible to set up a whole bunch of rising-edge and falling-edge interrupts using gnu C++.  I did experiments for a long time in just trying to get reading a pin to work, but it is confusing how the pin numbers are laid out between the GPIO numbering scheme on the board, the Arduino IDE's view of the 20 standard I/O pins, and what the GPIO "files" are named on the file system.

// Arduino | Edison | MRAA
//       0 | 26     | 130
//       1 | 35     | 131
//       2 | 13     | 128
//       3 | 20     | 12
//       4 | 25     | 129
//       5 | 14     | 13
//       6 | 0      | 182
//       7 | 33     | 48
//       8 | 47     | 49
//       9 | ???    | ???
//      10 | 51     | 41
//      11 | 38     | 43
//      12 | 50     | 42
//      13 | 37     | 40
//      14 | 31     | 44
//      15 | 45     | 45
//      16 | 32     | 46
//      17 | 46     | 47
//      18 | 36     | 14

//      19 | 15     | 165
Sheer quackery.

The next big annoyance was that the event loop didn't even work properly when there were rising- or falling-edge interrupts triggered.  The basic premise here is simple; when an interrupt is triggered, raise a flag.  Then, when the event loop runs a condition to check if the flag has been set, run the desired action (e.g. score points, flash an animation, increment the ball counter...) and clear the flag.  By using rising- and falling-edge interrupts, I can monitor for the side of the button press I really care about -- the actuation, rather than the release.  However, by using such interrupts on the Edison, it would for some reason only pick up on the very first pin being monitored -- the left lane rollover switch.  At the time, I was only trying to wire up the three rollover lanes on top, and coded it up to read from these switches in this manner, but I obviously didn't proceed like that with the rest of the switches because functions for each rising edge on each specific I/O pin are not named explicitly in the rest of the code.  Instead, I resorted to pin change interrupts, monitoring all the I/O pins for any change whatsoever.  At least this way, it'll tell me which pin changed as a function argument which can get passed directly into an array, saving me from explicitly naming each pin.  The downside was that I had to get serious about my debouncing code, since interrupts were being triggered on the actuation and the release of the switch, and if you know anything about switches, it's possible there were 2 or 3 such toggle cycles registered by the I/O pin before the ball moved away from the area.

I figured that there's no point in using pin change interrupts; I might as well just read all the switches at once during the event loop, setting all the flags at once before they each get analyzed one at a time (acting accordingly for whoever is pressed).  It's not quite as pretty as using interrupts, but:

  • My early understanding of the disassembled code for Gottlieb's Gold Wings (1986) indicates they only use interrupts for countdown & event timers, and that they read pin statuses at some point in the event loop like this anyway
  • MRAA interrupt frequency is only about 100Hz anyway due to the complexity of what's involved in checking for interrupts on the Edison, so if my event loop runs faster than 100 times per second, I'm able to react faster than the interrupts anyway

In the table above, you might have noticed those Edison pin numbers, and especially the MRAA pin numbers, get pretty high.  This is because there are a whole bunch of other GPIO pins available on the system to be configured.  I spent a great deal of time, energy, and effort trying to figure out how to tap into all these extra pins, but was ultimately disappointed that all these extra GPIO pins were only there to feed into various multiplexers to change the purpose of the 20 standard Arduino I/O pins.  Because the processor inside the Edison wasn't engineered with exactly the same types of I/O registers as, say, the ATmega328, functionality such as serial UART, PWM, SPI, and even setting up pull-up or pull-down resistors in front of the I/O pins.  The ATmega chips handle this all internally, but the Intel processor had to externalize this into a ton of extra GPIO pins I thought I could hack to read from more sensors, but alas not without compromising functionality I need in order to keep the rest of the system behaving as expected.  To see what all the extra GPIO pins control and where the table above is codified, read this codethis article, and this thorough writeup.

In short, given that:
  • It's unfeasible to access GPIO pins outside of the Arduino realm for your own uses
  • The gnu C++ coding style requires a whole lot more variables to be created, casting to be performed, and just longer lines of code to be written than the Arduino C++ style
  • Despite the documentation here and even from Intel's own site, attempts at making an input pin also utilize an internal pull-up resistor through MRAA code (and possibly the initial line states if I recall, for that matter, meaning solenoids might randomly fire upon starting the system) never seemed to work, leading me to have to solder on my own bank of resistors to the board by hand and possibly compromise electrical reliability of the system
  • Evidently I was trying to do something with timer interrupts or just pure waiting around for some amount of time that didn't work in gnu C++ either, whereas in Arduino I could use a very simple delay() function
I ended up porting my pinball code back to Arduino C++ after doing all this work in gnu C++.

Then came the next pitfall: Edison Arduino C++ code can't send serial data, despite the best advice from here and here.  As I was using BriteBlox LED displays as my DMD of choice (also not a great idea for quality purposes, as they tend to flake out at times, probably due to voltage fluctuations in the presence of unstable power, which is largely but not 100% helped by attaching a huge capacitor between power & ground), they must be driven by serial signals in order to show anything meaningful.  I already had lots of experience writing Arduino serial routines to deal with BriteBlox as that's their native environment, but the Arduino implementation of Serial.write() on Edison just wasn't having anything to do with me.  This means I had to go back to gnu C++ once again (just for the graphics & serial routines), write a routine in there to parse the .BMP graphics files I utilized for DMD artwork, and then promptly send this over serial.  I ended up finding a way from within Arduino C++ to execute binaries with arguments, so each time I needed something put on the DMD (whether it's graphics or just a simple score change), I'd use something akin to this, explained here:

String hi = "/home/root/dmd score ";
hi.concat(" ");
hi.concat(" &");
Updating the score on Wylie 1-Flip.

Where do we go from here?

The next endeavor would likely have been to launch the Wylie 1-Flip game software upon powering up the Edison.  (Right now, you have to reflash the Arduino side of the processor with the program in order for it to start.)  However, considering that:
  • Intel Edison is discontinued
  • There are still electrical gremlins in the system causing random switches to appear toggled when nothing in the game is happening, meaning the pop bumper constantly goes off, the score & flipper changes at will, and the ball in play counter moves up on its own until your game is terminated
I'm keen on switching this project to the Android Things framework and hope that it'll bring about a less buggy, more electrically isolated hardware platform where I can write all my code in one place without so many confusing or deceiving constructs.

Nevertheless, here's what I have so far:

Unfortunately, based on the few times I've gotten to play the game thus far, it doesn't really seem all that fun anyway.  There are still some issues with the ball getting stuck and the shooter lane not working well that really hamper it (not to mention by far the most annoying electrical issues mentioned earlier), but maybe once I solve those issues, it would actually be something I would play.  As you can see, the legs are built in a special way so that the machine can really be expertly nudged, because while play-testing it in Visual Pinball, the game was much more fun if you pushed on the cabinet.

That's already a lot of hand-cut wiring, and there's probably still a ways to go! (At least judging by how the leg plates hadn't been put on yet, so there was probably still a lot being worked on)

I don't anticipate you'll be seeing a Part 3 of this series anytime soon -- maybe after TPF 2018 in March at the earliest, if I manage to switch successfully to Android Things and happen to solve problems in a noteworthy fashion.

Epilogue - And what of Ghostbusters or The Big Lebowski?

As for those two pins mentioned at the top of this article, neither has fared well: Dutch Pinball has been facing many difficulties shipping TBL to those who pre-ordered it, despite the passage of many years since the initial hype, and the value of Ghostbusters and many other games designed by John Trudeau has taken a hit (if only temporarily) since he was arrested for possessing child pornography outside Chicago in August just as Hurricane Harvey was rolling into the Texas coast.  Meanwhile, if anyone needs to just drop their Ghostbusters LE edition quickly, you know how to get a hold of me... ;) Sorry, I managed to find a Pro edition for cheap, and it's holding me over just fine.

Thursday, September 7, 2017

A "Baby Tornado" to aid in Python server development


Since my last post, I've been highly focused on Tensorflow projects at home and at work.  In the process of running Tensorflow behind an API, I've needed to make code changes to the "secret sauce" (business logic) that stands before Tensorflow and actually provides it with its data.  This could be in the pipeline of multiple Tensorflow models chained together, image manipulation, working with data that gets outputted from the model, or whatever other reasons.  Unfortunately, it is often slow and wastes a bunch of time to constantly restart the whole server (including reinitializing Tensorflow for 20 or 30 seconds), especially when you simply made a typo or used the wrong variable name or something like that.

Besides the Tensorflow work, I've been involved in many blog-worthy pursuits since my last post but simply haven't had time to write about them.  (In fact, I meant to write this last week, but forgot.)  Anyway, at the end of June, right before my previous post, I began running biweekly meetups called "Tensorflow Tuesday Office Hours."  Here, interested people get together in various locations around town to talk about Tensorflow and get their questions addressed, whether it be about installation, scaling it up, mathematical questions, or picking a model.  In the process of helping people install, I decided it'd be worthwhile to try the mainline Tensorflow version that includes the Jupyter notebook rather than the "devel" version that has command-line access only but has more of the Tensorflow Github repo included in its image.  It had been many years since I used Jupyter, and had forgotten its benefits as, for such a long time, I fought the Python shell to enter long functions and make tweaks to specific lines in them.  Of course, with Jupyter, you just click on what you want to tweak, then rerun that code snippet.  (2013 called me and congratulated me on this rediscovery. :-P)

It didn't take me long to realize I could utilize a Jupyter notebook to run a Python server where I could change the route functions that a Python server calls when a request is made to a particular endpoint on the server.  This would allow me to make small tweaks to the business logic for the sake of testing the accuracy, performance, or simply fixing typos, without having to wait on Tensorslow [sic] to restart.


The original application I was going to test this with was written using a Flask server.  Flask is a popular choice for quick proofs of concept written in Python, but has many downsides that make it unsuitable for production.  And, as much as I tried to change the underlying route function that Flask would call, it seems like the Flask process would simply take over the entire Jupyter notebook and no other code snippets could be run in Jupyter once you start a Flask server.  Maybe further research would uncover why or how to get around it, but since the app was being ported to Tornado anyway, I put the Flask research to bed and attempted to do this with Tornado.  To make a long story short, I got it working, and can make changes to the functions that Tornado runs whenever it runs a server route.

Where does the code live?

Check out my Jupyter notebook on GitHub here:

In this notebook, simply run In [1], In [2], and In [3].  Each time you want to change what a particular API endpoint and request type does, just edit the code in In [2] and run In [2].  Call your endpoint again and observe the change!

As far as Tensorflow is concerned, you could initialize it in the notebook in stage, load the model in stage 3, and then not have to worry about those steps ever again -- just change your business logic in stage 2. Enjoy!

Thursday, June 22, 2017

My Tensorflow Project Isn't Saving the World

Among all the hype around the latest and greatest technologies, there is so much publicity devoted toward how they are being used in grand schemes to cure cancer, reduce energy waste, conserve water, solve poverty, and so forth.  While all these things are wonderful to humanity, there has to be someone left in the background who helps all the do-gooders unwind when it's time to take a break!

The TL/DR Version: Get To the Point!

Use clever arguments when loading up your Docker container so you don't have to shut it down and restart it when you want to mount external directories from the host filesystem or expose the port for the Tensorboard server.  There is also nvidia-docker available if you want to use your CUDA cores.

sudo nvidia-docker run -it -p 6006:6006 -v ~/Pictures/video-game-training/:/video-game-training bash

Use the --output_user_root option in your Bazel builds so you can save it to that external directory on the host you provided earlier.  This way, when you have to shut down your Docker instance, your Bazel build will still be there (though you will have to recreate some symlinks in the Bazel project directory).

bazel --output_user_root=/video-game-training/bazel-build build tensorflow/examples/image_retraining:retrain

Don't forget to store your image category directories within a "training image root" directory at the same level as the bazel-build directory, or else Bazel might try to train on its own model files.

Also, don't forget that if you export the trained model to somewhere outside /tmp, and then iterate on this model, that you pass the location to the correct model to the classification step.  Otherwise, you might classify with the wrong model, which could lead to confusion and frustration.

Use my fork of the Imker repo (maybe someday I'll make a pull request to put it in the mainstream code) if you want to download only a portion of the images in a particular category from any Wiki site such as Wikimedia Commons.  This could be built upon so you can segregate training and test data.

Just Use the Devel Docker Image; CUDA Optional

Ignoring my original plans for what I was planning to do with TensorFlow, it struck me one night to build a classifier that could recognize different game cartridges for the Nintendo Entertainment System (NES).  I had a lot of pre-work to embark on because it had been a long time since my system had been updated with the latest supporting packages.  However, all of it ended up being all for naught; I found the "virtualenv" approach for installing Tensorflow to be so fraught with tedium that I ended up going for the simple Docker approach.  This is the Tensorflow installation approach I've been recommending since November and it seems to still be worth sticking to.

I have a pretty old nVidia graphics card (a GeForce 650 Ti) in my (mostly even older) desktop running Linux (and Windows at times, mostly during tax season).  It still supports nVidia Compute Capability 3.0 which is just barely enough to run the capabilities I need to perform machine learning, play with the Blockchain, and so forth.  To make Tensorflow performant inside Docker, a special add-on called nvidia-docker allows access to your CUDA cores from inside your Docker container, so I can still get blazing fast performance from my own hardware without needing to install everything in my primary environment (which is evidently too jacked up to support the Tensorflow installation).  Docker is great for providing a uniform, trouble-free experience when running apps anyway because it provides an isolated environment not subject to your system's specific configuration.  However, the version of Docker originally on my system was so old that the required libraries for nvidia-docker were not present; luckily, the upgrade path was simple thanks to their clear instructions.

In fact, thanks in part to my pre-work from before, and lots of good Internet guides on this topic already, getting Tensorflow working on my desktop in this manner went smoothly, if not for some early trial and error, and of course the usual long wait times for compilations to finish.  As I've often said, just use Docker.

Once you have Docker and nvidia-docker installed, here is the best way to run the Tensorflow image.  Note that if you don't have the image already, Docker will automatically download it:

sudo nvidia-docker run -it -p 6006:6006 -v ~/Pictures/video-game-training/:/video-game-training bash

Let's break this down:

  • There's a way to avoid running docker with sudo, but it hides any semblance of auditability or traceability for when users go beyond their expected behaviors and start to get mischievous.
  • nvidia-docker is the binary that supports Docker instances accessing CUDA cores.
  • run tells Docker to launch the specified image in its own isolated environment, with its own filesystem and process tree.
  • -it (or -i -t) specifies first to run the container in Interactive mode, leaving stdin (standard input) open even if nothing is attached.  Secondly, a pseudo-TTY port is opened so the user can actually send input to the container.
  • -p 6006:6006 exposes the Tensorboard port inside the container to the host.  When you start the server, you can access it through localhost:6006 on a browser on your host machine.  Tensorboard is a great way to visualize what is going on inside your training algorithm from the model construction and details standpoint, plus illustrate simple representations of how the data exists in the classification space (as simple as you can make it in as few dimensions as we humans can easily perceive).
  • The -v option allows you to specify or mount a directory (not an entire filesystem; there's a different way to do that) from your native filesystem to include into your Docker container as it runs.  In this case, I wanted to expose the video-game-training directory from my user account's Pictures folder onto my Docker instance as /video-game-training so that the algorithm would have access to all my training data.
  • is the Docker image name.
  • bash is the command to run on the Docker image once it starts.  You can run any executable you want, but it is easiest to run a terminal instance.

First Crack At Building a Classifier: Aligning Pictures And Commands

For object classifiers, good training data comes from as many images as you can get of the subject material.  To support this, I took videos of various NES game cartridges while moving the camera around so as to film it from various angles.  Depending on the lighting, the sun or lights would also reflect back into the camera and cause slight imperfections in the label.  I labored for quite a while in the hot Texas sun taking videos of these games with different backgrounds behind the cartridges so that the classifier would learn how to focus on what is important.

Once my environment was all set up and ready to go, I ran this Tensorflow example pretty much verbatim.  It took approximately 24 minutes to run the first step which sets up the Bazel build to run the training task.  However, as my Docker instance did not have any training data loaded into it, I had to exit out of it in order to add the file mount as described above.  Unfortunately, upon logging back into my Docker container, all this pre-work had been wiped out as a result of it all being built in some temporary .cache directory under the root home.  And, to add insult to injury, running that Bazel setup command the second time took more than twice as long -- clocking in at just short of 50 minutes!

Lesson Learned

One easy way to avoid losing your entire Bazel build when Docker decides to refresh the file system from scratch is to specify the --output_user_root option to Bazel before building to be the same as the external file system or directory from the host that you mounted inside Docker.  In my case, this meant specifying the following setting for my build:

bazel --output_user_root=/video-game-training/bazel-build build tensorflow/examples/image_retraining:retrain

Continuing With Trying To Break Bazel And My Docker Instance

Now, this meant I had to put my training examples one level deeper in this directory, or else the next step would possibly try to train on whatever output is in the Bazel build directory itself.  After running the Bazel build, I exited my Docker instance to see what would happen.  When I reopened it, I found that the symlinks in the /tensorflow folder had been changed to point to /root/.cache/bazel, which did not exist (and never existed because I made the build in another folder).  It took just a hair bit of manual tedium to point the symlinks back to the right place, but upon doing so, the bazel-bin "retrain" command specified in the Google example to actually perform training worked without a hitch.  With everything in place, this command took less than 15 minutes to perform 4,000 training steps utilizing my approximately 800 pictures of each the MegaMan and MegaMan 2 cartridges.  The exact syntax looks like this:

bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir /video-game-training/pictures

The output of this step produces two files in the /tmp/ directory: output_graph.pb and output_labels.txt (also /tmp/retrain_logs/ is important if you want to look at your TensorBoard at any point).  I moved these files into a model/ directory inside the directory exposed to Docker from my host system.

As for classification, I utilized the same strategy, using the --output_user_root option on the bazel build "label_image" step (obviously ignoring the conjoined bazel-bin step for the time being, thus stopping short of image classification).  This Bazel build took about 20 minutes:

bazel --output_user_root=/video-game-training/bazel-build build tensorflow/examples/label_image:label_image

Once this step was complete, I exited and re-entered Docker once again, and my symlinks had been similarly screwed up.  Upon restoring them (like last time), I found a picture of the MegaMan 2 cartridge from out on the Internet, and ran it through the classifier in this manner:

bazel-bin/tensorflow/examples/label_image/label_image \
--graph=/video-game-training/model/output_graph.pb \
--labels=/video-game-training/model/output_labels.txt \
--output_layer=final_result \
--image=/video-game-training/megaman2-ex-01.jpg \

And voila, a reproducible classification each time, without having to leave my Docker instance open, simply by reconstructing those symlinks!  (That part could easily be scripted in a batch file, in fact.)

Note: Without that last line in the classification command, you will probably stumble into an error saying "Running model failed: Not found: FeedInputs: unable to find feed output input".   As it turns out, Google's example command is a little bit deficient, but fortunately some forum posts succinctly clarified the issue and offered the solution.

Because the Whole World Isn't Video Game Artwork

My training data consisted of only pictures of the label up-close, and mostly ignored the rest of the cartridge.  However, my first classification picture was in fact of the entire cartridge.  I was astounded at the results, because even considering this difference, the algorithm was 96% certain that my picture of the MegaMan 2 cartridge was in fact MegaMan 2; the 4% remainder was its (very weak) confidence that it was the original MegaMan cartridge.  Now, having spent most of my professional career up until now as a tester, I immediately wanted to see how it would perform on junk input.  I fed it an old picture of one of my pinball machines (Gold Wings, of no relation whatsoever to MegaMan), but the algorithm was 86% confident that what I just showed it was in fact MegaMan, and only 14% confident that it was MegaMan 2.  This was amusing to me, because I suppose in the algorithm's limited worldview of only having been trained on examples of MegaMan or MegaMan 2, it was in no position to say with any authority that anything was in fact neither!

Wikimedia Commons appealed to me as a good location to get quality public-domain photos to use as "negative" training examples (though I suppose I could have used private images with rights held by the authors, and since their data is buried deep within a machine learning model, you would never be the wiser!).  The only downside is their site offers only 200 photos at a time for a given category, and it would be a huge waste to sit there, expand each one, and manually click Save.  Fortunately, Wikimedia Commons supports API calls that will allow you to download all the media for a given category.  Better yet, there is already a Java program called Imker that offers a CLI and GUI wrapper around the API calls.

The only problem with Imker is their current UI only offers you the ability to download every single file within a given category, not to break it up into just a fraction of randomly-selected images.  Nevertheless, Imker is open-sourced, so I forked the Git repo and began hacking away at the Java code so that I could download just 10,000 of the 272,812 images currently in the "PD-user" category on Wikimedia Commons.  After sorting out a lingering issue, and waiting a few hours (thanks in large part to my crude rate limiter), I have 10,000 images from A-Z, not to mention A-Z in other languages, consisting of roughly 75% JPEGs, 18% PNGs, 5% SVGs, 1% GIFs (even animated), and some TIFFs thrown in for good mix.  Not only that, but the images consist of things like maps, diagrams of all sorts of things in many different languages, road signs, cars, street scenes, landmarks, molecular diagrams, and all sorts of other random stuff only a small percentage of the population could possibly care about. :-P

The beautiful part about using the pre-trained, robust Inception model is that you don't have to worry about scaling your input data to a particular size.  I was able to use these images just exactly as they came, and I only had trouble with two images that apparently contained bad data and failed to download properly (had Imker not stopped due to some exceptions regarding unhealthy API responses, this might have been avoided).  Apparently, it even dealt with all these file formats adeptly too.

Important Note: One thing that stumped me as to why my model was only showing "megaman1" and "megaman2" after I had trained "not-games" was because I was using an old copy of the model in my classification argument.  Make sure you set the correct path to your model!

In any event, the Tensorflow model retrained to distinguish between Mega Man 1, Mega Man 2, and "Not a game" performed successfully in my two trials thus far.

Trained on MM1 or MM2 MM1, MM2, or Not a game
Confidence Mega Man 2 Pinball machine Mega Man 2 Pinball machine
Mega Man 1 3.9% 86% 4.0% 9.3%
Mega Man 2 96.1% 14% 56.8% 1.6%
Not a Game N/A N/A 39.2% 89.1%