Thursday, January 18, 2018

A Cold Wind From the USSR - Part 2, Keyboard Emulator for a PDP-11

To read Part 1 of this series, click here.

The Russian Elektronika DVK-3 PDP-11 clone is a very fascinating computer to me, and I want to share it with the world.  Part of that endeavor involves allowing people to actually interact with it and play games on it remotely, not just see videos of me using it.  To get this to happen, I need to actually allow people to input into the system via the keyboard.  I could build a ridiculous robotic action to hit the keys on behalf of remote users, but this would likely introduce a great deal of latency into the system and would not be good for those critically-timed Tetris block rotations.  The best bet is to spoof the keyboard by basically building a whole new one, but instead of keys, use an Internet connection and microcontroller to generate the scan codes.

The Elektronika MS7004 keyboard is based on the DEC LK201 keyboard interface, standard for PDP-11 minicomputers.  It utilizes the RS423 serial communication standard at 4800 baud.  I (painstakingly) (well, really Stacy did it after I gave up in an epic fit of rage) fabricated a 5-pin DIN connection "tapping" system that allowed me to insert a breadboard in between the keyboard and computer so that I could measure the signals generated by the keyboard, and eventually drive signals with a different device.  First, the DVK-3 provides +12V and -9V on two pins of the DIN connector.  Another two pins are used for communication from the computer to the keyboard, and vice versa.  These pins do not operate at TTL, but instead, RS423 defines these pins as conveying a "High" at between +4 to +6V, and "Low" at -4 to -6V.  Finally, the fifth pin in the DIN connection is Ground.  It seems silly to me that they would not use the outer ring of the DIN connection as Ground and just go with a 4-pin DIN connection.  However, I was not in the room, nor even born yet, when they designed it originally, nor have I even been to that side of the planet to this day, so there is no way I could have suggested it to them.

Nevertheless, these odd non-TTL voltages mean that I can't just drive the scan codes straight from a microcontroller.  I'll need to put some other devices in front of the computer's DIN input so that it will recognize the scan codes correctly.


Approach #1 - Two Voltage Dividers & NPN Transistor


My first thought on creating the correct RS423 voltage for the keyboard was to drive down +12V to +6V and to drive up -9V to -6V with voltage dividers.  To divide voltage in half, you simply need two resistors of the same value in series, with one end of the series hooked up to +12V and the other to GND, and then tap your +6V from where the two resistors are connected to each other.  To cut voltage by 1/3, resistor 1 (from -9V) must have half the resistance of resistor 2 (to ground).

LTSPICE circuit simulation showing the original input voltage divided by 2/3.

My original idea was to use an Arduino to drive the scan codes, and source the power from the Arduino from the divided -6V output.  I would wire -6V to the Arduino's ground, and the computer's ground to the Arduino's VIN.  From there, I could use an NPN transistor with its emitter hooked up to the same -6V to expose the the scan code pin to either -6V or the divided +6V depending on its activation.  The circuit worked fine in theory, but I ran into some practical hurdles.  The original resistors I picked for the voltage divider allowed for just a small amount of current at those voltages, and with the demands of the Arduino, soon my voltage dividers were reading nowhere near their intended values.  I ended up with just over a volt showing up across the +5V and GND pins on the Arduino, not even enough to power it on.

My next thought was to lower the resistance values on the voltage dividers.  I thought about Ohm's Law and how much power would be required for higher current, and grabbed from my 1/2-watt resistors and put the 1/4-watt resistors away.  This time, I went with an order of magnitude less resistance, but the Arduino still only showed a little bit over 2 volts.  Using the unregulated power pins didn't seem to help the situation either.  You might be wondering "Why not just go with really low-value, high-wattage resistors?"  Well, the electronics store was closed, plus there's no reason I should ever use anything more than a 1/2-watt resistor unless someone is paying me to do something specific.

Finally, I decided to hook up the Arduino not to the divided -6V and GND from the computer, but to the divided +6V and the divided -6V, so that hopefully the regulator would have a better shot at regulating down to +5V.  However, this still caused the Arduino to read just 3.7V or so across the 5V pin, and it smashed my high-side divider (which was supposed to cut +12 to +6) down to just 0.8V.  Clearly, even if I could get the Arduino working, it wouldn't generate the needed voltage for the scan code!  And besides all this, the NPN transistor was somehow transferring its +0.7V base bias to somewhere else in the circuit, as I could see values differ by 0.7V when it was installed versus not.

(Mind you, I'm testing all this using a proper bench power supply, not from the computer itself at this point, so there is no risk of damaging the Soviet goods.)

Approach #2 - Two 7805 Voltage Regulators plus NPN BJT or N-channel MOSFET


The 7805 voltage regulator is a TO-220 packaged device that simply takes an input voltage, and a ground voltage, and can supply some amperage at +5V from ground.  No more messing around with specific currents coming from a voltage divider; I wanted the voltage regulator to handle everything for me.  The idea here was to use one to regulate +12 down to +5, and to regulate -9 (as if it's the ground input into the 7805) up to -4V.  I was still trying to drive the logic from the low side (-4V) to use an NPN transistor (or even a MOSFET to properly switch on voltage at the gate rather than current at the base).  (Also note that my bench can't supply -9V; it only supplies -12V, so really my regulated line is sitting at -7V for testing.)

Polluting the Groundwater


With all this wired up, the voltage regulators would show a spread of about 12V between their output pins.  However, by inserting the Arduino on the regulated low side between -7V and GND once again, the circuit wasn't able to provide enough juice to it.  I had the Arduino regulate itself from both the regulated lines at +5/-7, and once I wired it up, what would normally show a 12V spread was now showing something more like a 9V spread.  At least I managed to get the circuit to power up.  However, the voltage of the switching side of my NPN transistor (and even a 2N7000 N-channel MOSFET) relative to computer ground would only swing between +5 and roughly -0.5, still not enough to show the computer a proper Low signal for the keyboard scan codes.

Something must be polluting the groundwater, as the voltage between the low-side voltage regulator and the computer ground seems to vary slightly, and is much higher than it should be.  I can imagine, since the 7805 consists of various transistors and diodes, that it must not like providing an output voltage to something that is actually using it as a ground rather than as the high-voltage source.  I might have to go back to a voltage divider once again...

Approach #3 - One 7805, One PNP, and Something for the Negative Side


I thought to myself that it must be necessary to regulate the logic from the high side rather than the low side, since the 7805 probably doesn't like when its output is used as a ground rather than as the high-side input.  I decided to use the 7805 on the +12V side to both regulate the High RS423 signal and provide +5V to the Arduino.  This means that when the Arduino outputs +5V on its serial line, then the serial line can be driven up to the same value.  However, when it outputs 0V, then the serial line will need to be driven down to -5V, which could be provided by the low-side 7805 or by a plain voltage divider once again.  I needed to find a way to make the Arduino pin behave as if it would output either +5V or high Z, since with high Z, I could use a pull-down resistor to bring it to -5V.

As it turns out, for a PNP transistor, the emitter must be at a higher voltage than the collector, thus the emitter is hooked up to the voltage source rather than to ground as it is when an NPN transistor.  Thus, the base is always negative with respect to the emitter, and once you drop the base at least 0.7V from the emitter, current will start flowing through the base and the output of interest will rise to the 5V provided from the regulator.  However, when the Arduino pushes 5V to the PNP's base, then no current will flow through the transistor since the base voltage will be the same as the emitter voltage, thus the voltage at the output of interest will drop to the output from our negative-side voltage regulator or divider that is supposed to be providing -5V.

Here, it is shown that as the base (in blue) swings from 5V to 0V, the output of interest at the PNP's collector (in green) will swing from -5V to 5V.  The red line indicates the current through the base.

And if I need the RS423 signal polarity reversed, I can invert the Arduino's serial line with another transistor or an inverter IC.  Luckily, the signal did not need to be inverted; the output of the PNP transistor is normally low and pulses high, which is the same type of signal produced by the MS7004 keyboard.


Ironing Out Last-Minute Details


In practice, the -9V line was showing -8V with the LM7805 voltage regulator sitting on it, and its regulation treating 0V as high and -8V as ground means the regulated voltage was only sitting at -3V relative to ground.  This would not be enough to drive the RS423 signal.  Rather than grabbing a variable voltage regulator, I decided to simply build a plain voltage divider and connect its output to the collector of the PNP transistor.  At first, I decided to use a 4700-ohm resistor and a 6800-ohm resistor for the voltage divider, but the high-side input seemed to be getting squished.  I found that a 330 and 470-ohm resistor did the trick, but then realized I read the measurements originally from the wrong spot.  With the original resistors and the oscilloscope probing from the correct location, I found out that both these resistor configurations are suitable for the job.  (As for which one keeps -9V more true to -9V... I'm not sure, but I'd wager the higher-value resistors would do the job better.)

Before wiring up my Arduino, I needed it to send the correct values to the system in order for any key presses to be registered.  There are two things to consider here: first, the keyboard introduces itself to the computer (and indicates the absence of errors) by sending the bytes 0x01 0x00 0x00 0x00 at once following power-up.  Secondly, the ODT (Octal Debugging Terminal) of the PDP-11 only accepts a few keys from the keyboard, such as the numbers 0-7, the letter B (followed by 3 letters indicating a device driver location to load), and the letter R (indicating a CPU register number).  To avoid pulling my hair out debugging something that's not a problem, I referred to a graphic of scan codes for the LK201 PDP-11 keyboard interface and programmed the Arduino to spit out the number 4 (0xD0) once every second following the 4-byte handshake explained earlier, which itself is sent one second following power-up.


The schematic of my final keyboard emulator design.  Note the red stylized "LT" letters are a trademark of Linear Technology, just in case you are not familiar.  The LM7805 is a common part.

It still took me a few power cycles of the computer before I finally saw the desired output, especially making sure that RS423-compatible voltages would be output by the circuit, but eventually I saw a line of "4"s growing across the screen every second, corresponding to the serial pulse waveform being monitored by the oscilloscope.


The March of the Fours begins...

The Arduino and circuit involved to make keyboard emulation happen

The waveform of the number 4 as an LK201 scan code.  I just noticed that the low side only goes down to about -2V according to the oscilloscope, rather than the -4 to -6V expected.  Oh well, it worked regardless.

Thursday, January 11, 2018

Tensorflow, from Scratch to the Cloud

Having used plenty of pre-built machine learning models using Tensorflow and GCP APIs, and having gone through the pains of setting up Tensorflow on plain vanilla Amazon EC2 AMIs (not even the pre-configured ones with all the goodies installed already) and getting it to run classifications through the GPU and on Tensorflow Serving, I thought it was high time I try coding my own machine learning model in Tensorflow.

Of course, the thing most folks aspire to do with Tensorflow when starting out is to build a neural network.  I wanted to model my basic neural network based on the MNIST examples just to get my feet wet, but use a dataset different than MNIST.  There are many datasets on Kaggle to choose from that could fit the bill, but I decided to use one of my own from a while back.  It consists of skin tones found in pictures, cropped carefully and aggregated into a dozen BMP files.  Don’t question where I got these skin tone pixels from, but rest assured that a wide variety of skin colors were covered, captured by mostly nice cameras under ideal lighting conditions.  To be honest, it’s a little bit less interesting than MNIST, because instead of 10 distinct classes, there are only two: skin and not-skin.

Performance from All Angles


Previous research shows that converting the pixels from the RGB space into the HSV space leads to better neural network performance.  Luckily, this is easy to do using the Python Imaging Library.  However, performance of neural net computation (i.e. the speed of the training phase) has improved dramatically.  Just seven years ago, I made a neural net in Weka to train on skin tones.

Back in late 2010, it was fancy if you had a system with, say, 200 CUDA cores.  Back then, my Lenovo W510 laptop shipped with 48 CUDA cores.  Now, I have a roughly 18-month old Aorus X3 laptop that boasts 1280 CUDA cores.  Note from above that there are orders of magnitude more nodes in the Tensorflow neural net, and that the Tensorflow net is also considering all 3 values in the HSV space (quantized 0-99, thus 1 million possible values), and not just the best 2 out of 3 as in the previous research (quantized 0-255, thus 65,536 possible values).  The performance comparison is as such:

Model Type Time
Tensorflow (GPU 256) 90.9
Weka (CPU 5) 17
Weka (CPU 256) 750


Building the Network


Ultimately, the trickiest part of building the network was formatting the data in a way that would be meaningful to train on, and setting up the data types of the training data and variables to match up altogether and keep the training functions happy.

Ultimately, I ran into four pitfalls, one operational, one in code, one with clashing resources, and one out of stupidity or blindness:

  1. Can’t pass in a tensor of (1000000, ?) into a variable expecting (1000000, ?).  This one eluded me for the longest time because I was positive the correct tensor was being fed into the training function.  I finally saw the light when I decided to split up the training data so that training is not run on the entire dataset at once.  In doing so, I decided to have it train on 2,000 examples at a time, rather than the whole million.  Then, the error message changed, and it immediately became apparent that the error was being caused by the computation to calculate how many HSV values have [0, 1, 2, …] appearances in the dataset, not the calculation of the neural network itself.  How frustrating it is that such an amount of time was wasted trying to look into an issue that was ultimately caused by something merely calculating metrics that I had already discovered earlier and had since moved on from.
  2. Can’t allocate 0B of memory to the cross_entropy or optimizer or whatever it was.  My Tensorflow is only compiled to run on the GPU, I’m pretty sure, and doesn’t handle CPU computations.  I had a separate terminal window open with the Python <shell> to quickly run some experiments before I put them in the real training code.  However, having this open seemed to tie up resources needed by the program, so closing the extra terminal eliminated my memory allocation error.
  3. Uninitialized variable <something> for the computations of precision & accuracy.  For this, I had to run init_local_variables right before <either defining them in the graph or running them>.  It was not good enough to run this right after the line “with tf.Session() as sess”.
  4. Weird classifications.  I was expecting a binary classification to behave like a logistic regression and simply output “0” if it’s not skin and “1” if it’s skin.  Unfortunately, the neural network tended to return results more like a linear regression, and was giving me nonsensical values like 1.335 and -40.8987 for the single class.  I ended up changing the output layer of the neural network to reflect what I had originally, which called for two classes of “skin” and “not-skin”.  When I made this change, it was possible that the calculated values when evaluating any given valid HSV value could still be totally out of line with the “0” and “1” I would expect (and also not a cumulative probability adding up to 1), but at least by taking np.argmax() of the output layer, I can turn it into the outcome I was expecting.  And, it actually works very well, with precision and recall exceeding 97 or 98% with not a whole lot of effort.

All This Effort Just To Find a Better Way


Now, after having written my model with low-level Tensorflow routines, and trained it to the point where I thought it would return really good results, it was time to try to put it on the cloud.  For this, the output from Saver (a “checkpoint”) basically needs to be converted into a SavedModel.  One can do this by writing a whole lot of code defining the exact behavior of the neural network in the first place, but it seems to be much more efficient (with the end goal being to exporting the model to the cloud) to use an Estimator object to define the neural network.

And this doesn’t even take into account what kind of goodness lies before me if I simply decide to use Keras…

However, I wanted to see how feasible it was to write a converter from checkpoint to SavedModel using the standard Tensorflow APIs.  Scouring for examples and tutorials, and diving in to find out what exactly all this code was doing to see how I could shorten it into the most concise possible example, I realized there were new APIs that these examples weren't leveraging.

Basically, the two main principles involve:

tf.saved_model.signature
_def_utils.predict_signature_def(
inputs,
outputs
)

And:

builder.add_meta_graph_and_variables(...)

There are several signature_defs you can choose from:

predict_signature_def(...)
regression_signature_def(...)
classification_signature_def(...)

It makes the most sense to me to use the "predict" when you're looking to perform inference on a pre-trained example using your SavedModel, use the "regression" when want to solve an equation based on inputs and outputs you provide directly into the function (and not through a SavedModel), and use the "classification" for learning and inferring the class in which something belongs to, once again, by using examples directly into this function and not through the SavedModel.  Now, I haven't quite used the latter two functions yet, so these are purely assumptions, so I'll update this if I find it to be false later.

As for the meta-graph and variables, all you need to do to make a prediction-flavored SavedModel working on the cloud is to populate these three positional variables with the correct terms (not to mention your Tensorflow session variable):

 tags: just set to tf.saved_model.tag_constants.SERVING.

signature_def_map: This is an object conisisting of mappings between one or more Tensorflow tag constants representing various entry points and the signature defined above.  If you assigned the output of predict_signature_def(...) to prediction_signature, then you would probably want to use this:

{
 tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
 prediction_signature
},


Finally, define in your SavedModelBuilder an operation to reinitialize all graphs.  It is not a problem to run this one if you are mobing around a lot.

main_op=tf.saved_model.main_op.main_op())

From here on out, you just need to actually execute the exact Python scripts to train the model and build it into the correct format.  For the specific code I used to build this skin tone detector and convert it to a format usable for Google Cloud ML Engine, see my GitHub repo at https://github.com/mrcity/mlworkshop/tree/master/google-apis/cloud-ml.

Making Inferences With Your Model


Assuming you have gone through the steps of uploading the contents of your entire SavedModel directory to a bucket in Google Cloud, and you have gone through the steps of initializing an ML Cloud Engine "Model" pointing to this bucket, you now need to create a script that can call Cloud ML Engine to make predictions.  The Python code to do this is also very simple.  The main gist is that, unlike for other GCP API accesses from Python, you must configure the path to your service account using an environment variable rather than with the Python libraries.

Once you have done this, there is some very simple code from Google that demonstrates making an inference with your cloud model.  However, one thing they don't make clear is how to make multiple inferences at once.  If you do this incorrectly, you may run into an error such as Online prediction is not a matrix.  If this happens to you, study how the instances variable you pass into predict_json() must be an array of dictionaries:

instances = [{"hsv": [5., 49., 79.]},
     {"hsv": [4., 59., 63.]},
     {"hsv": [21., 67., 2.]},
     {"hsv": [99., 99., 99.]},
     {"hsv": [5., 35., 83.]},
     {"hsv": [5., 29., 71.]}]

One dictionary key must be defined for each expected input into the graph.  Recall that you defined the expected inputs in the inputs field of the call to predict_signature_def() earlier, and the keys required will be the same as the keys you defined in this field.  One bit of good news about the Cloud ML Engine is that it seems to only count one usage even if you specify multiple input values to the API (i.e. multiple elements in the instances array) in a single request.

Again, here is the pointer to my GitHub repo where you can see how all this fits together: https://github.com/mrcity/mlworkshop/tree/master/google-apis/cloud-ml