Thursday, May 22, 2014

Getting your significant other's attention in a large house

The physics of sound in our house lend well to Stacy being able to shout up to me from downstairs, but not so well for me shouting back downstairs.  It makes communication inefficient, and often times she's not exactly watching for messages on GChat either.  I don't like going halfway downstairs when I'm trying to hold very skinny wires in a special arrangement when poking around for defects in LEDgoes, much less just to get a Yes/No answer on a simple question.

Recently, we picked up a Yamaha HTR-4065 stereo receiver for our living room.  Stacy was not happy with the sound quality of the original receiver, and hearing how the Yamaha can fill up the room while not vibrating every tiny appendage on your body with obnoxious bass, it was definitely money well-spent.  Hopefully it will add some more years to our natural hearing capability before we have to get hearing aids or whatever magical, mystical implant they develop by the time we're that old.  However, there is something else that piqued my interest about this receiver in particular that can solve the problem I proposed in the preceding paragraph.

You'd think the communication problem would be simple to solve; just install an intercom, right?  Sure, our house was built in the '90s, but the folks who built it weren't that wealthy.  Thus, it's not wired for an intercom.  We are both licensed ham radio operators, but neither of us listen for messages from each other unless we have previously agreed to, and usually the 2m or 70cm simplex frequency we use doesn't get a great range -- after all, we are two great big bags of water that excel at attenuating RF signals.  There are other aspects of our life, though, that elevate the Yamaha receiver into an excellent, effective means of communication: you can change its input source over WiFi.

Most of the time, the Yamaha is blasting music or TV when she's home (which also describes why I have such a hard time getting messages downstairs); thus it's always on.  All of our entertainment devices are routed through that receiver -- our turntable, "cable" (fiber?) box, DVR, Google TV, and PlayStations are switched at the receiver so now there's only one HDMI cable going into the TV.  We also hoard single-board computers such as the Raspberry Pi, Beaglebone Black, Udoo, pcDuino, and even some exotic builds of the Panda board that were never released to the public.  Most of the time, these boards sit idle, to be honest; now one of them is about to be put to work for me.

How will this be done?

Our devices are all Internet-connected, making them perfect targets for IoT/M2M protocols such as MQTT or AT&T's M2X platform.  Gone are the days when you need to use socket programming and open up ports on your router to have external devices connect to them; now I can run an application on my Android smartphone without bothering to turn its WiFi radio on, and through the magic of these M2M protocols, I can use it to control these devices in my house (or out of my house, or someone else's devices, or what have you).

The mobile app simply takes a recording from the microphone.  This recording is stuffed into a JSON object by storing the binary data as two letters representing each byte's hexadecimal value (done efficiently thanks to this StackOverflow post).  Hence, we can send the message in plain text and not have to worry about any special UTF-8 or Base64 encoding involved with sending binary values that would break the JSON object's structure.  (Not only that, but I wasn't able to figure out what string encoding they were using, and trying to guess & check in Android Java was becoming painful).  The string representing the audio recording gets submitted as the payload to the 2lemetry API.  The specific API call I use


ultimately publishes the message to their MQTT service, so that anyone subscribed to the prescribed {domain}/{stuff}/{thing} on their MQTT endpoint will receive the message.  Their service doesn't have a nice blob-storage capability (like MySQL has) that could straight-up store a binary file (but it does support JSON nicely), hence the need for a bit of additional legwork.  AT&T's M2X service should support blob storage by the end of 2014, at which time it might be worth re-evaluating the services.

An MQTT client, written in Python, is left to run perpetually on the Udoo single-board computer.  It subscribes to 2lemetry's MQTT server (or broker, in proper terms) to listen for new messages on the desired channel & pathway.  Once the mobile app finishes uploading the voice data onto the broker, the Udoo client will receive an asynchronous message and commence downloading the JSON-encoded data from the broker.  The JSON data (particularly the value containing the audio data) will be converted from "hex string" format back to a list of byte-sized numbers (0 to 255), and the series of numbers will be saved into an audio file.

The Yamaha HTR-4065 can be controlled via their AV Controller mobile app, but several eager developers have already taken the time to discover the ins & outs of the Yamaha stereo's API.  A great list of many (if not all) of the possible commands is here:  After the Udoo has converted my JSON-encoded message back into a binary file, it will be responsible for switching the HDMI input in such a way as to get the precise timing of when the message can start playing, then play the message, and finally switch the receiver back to the original HDMI input when the message is done playing.  This is achieved using a couple of POST requests, described in the previous link, to the stereo's API.  Sadly, it also means that the video will also go black (or switch to the Udoo's desktop) temporarily while the message is playing.

Do we expect this to be effective?

Stacy doesn't tend to look at the TV screen very much; she runs shows just to have background noise.  Nevertheless, the sudden change in the environment should, from a psychological perspective, wake up sensory neurons and put them on full alert, especially if what's currently playing involves loud music or an intense action scene: to suddenly hear silence will be startling.  Even if the scene is quiet, it'll be shocking to hear my voice coming through the speakers -- how did I just work my way into the show?  Since the receiver is totally remote-controllable, I can even power it on if it's off, and the message will still get through!  The only way for it not to work is if the power goes out, in which case my shouting will probably be more effective than usual.

This seems like the equivalent of a modern-day Rube Goldberg device!

Damn straight, and that's the way I like it.  It could have been worse, though: I might have had to use TCP sockets, or perhaps develop a robot to attach to the front of the stereo receiver in order to select the right inputs, plus use a camera with optical character recognition to determine what HDMI input to switch back to after the message has finished playing.  Even though it may seem to fulfill a niche market now, having to design a robot to press buttons would be the epitome of over-engineering and over-thinking.  My implementation should easily apply to several different applications, devices, and "user stories" as-is or with little modification.

Other things worth noting

  • 2lemetry now requires that the "password" of a user looking to authenticate themselves on the MQTT broker be hashed with MD5 before being sent to the server.  This threw me off for a couple days.
  • In Python, to convert the "hex string" back into a list of numeric values between 0 and 255, I used an interesting array operation to convert each pair of letters to integers, then each integer into a byte:intArr = [int(i, 16) for i in hexArr[0:len(hexArr):2]]
    audioByteArray = bytearray(intArr)

    It evidently wasn't good enough to write the integer array straight to the file: some of the data was getting misinterpreted and corrupted, so I needed to use the approach above.  There is a competing approach I could potentially use straight into the output file writer:

    from binascii import unhexlify
    output.write(unhexlify(''.join(format(i[2:], '>02s') for i in b)))

    However, its use of "format" leads me to believe it could be a bit less efficient.  I might do a benchmark to see which method runs faster.
  • I managed to debug the Android side of this application by downloading a hex editor to see if the file it saved locally matched what the MQTT broker had stored as the last message it had received in my particular topic pertaining to audio messages.
  • The length of the audio file doesn't need to be calculated.  I looked into MP4 headers for a little bit before realizing I can just spin up an instance of VLC in its own thread (with some particular arguments so that VLC only plays the audio file once and then quits), then write thread.join() so my Python client will resume regular operation when VLC is done.  The interesting question is what happens if a message comes in while VLC is playing audio already; will the MQTT client hold the message, or discard it?
  • Before using the MQTT client in Python, make sure you've run

    pip install paho-mqtt

    to install the required Python library.
  • The Android app is mostly a mashup of two previous applications I wrote: Street Beats and SpeechPipe.  This is why I could get it done in two weeks with so many other distractions, even amidst all the new learnings.  The features I did for this application will be rolled into SpeechPipe.
  • The Udoo allows several choices of Linux distributions to be installed on the device by means of SD card.  I chose Linaro Ubuntu 12.04 but had terrible luck getting any sort of audio to come out of it, even with VLC.  VLC was even seg-faulting on me!  I then tried xbmc media player, but didn't feel like going through the trouble of writing a plugin (even though it shouldn't take long).  Luckily, I already had an SD card in the Udoo formatted with some flavor of Ubuntu already, and that one seemed to work fine.  The MQTT client I wrote in Python had no trouble running on Linux.
  • The audio quality from whatever compression Android is using sounds terrible.  I need to keep playing with codecs & compression rates.  Nevertheless, a few seconds of audio is only taking up 8KB of space right now!
  • Sorry I'm not showing very much of my code in this post.  I need to clean it up and get it production-ready before I am comfortable open-sourcing it.
  • Every time we switch the configuration of our entertainment devices, it takes me about a week to figure out the new setup regarding the remote controls.  Now, to watch regular TV, I have to:
    • turn on the receiver with the Yamaha remote,
    • turn on the TV with the Samsung or FiOS remote,
    • switch to regular TV mode with the Google TV remote, and
    • wake up the FiOS receiver with the FiOS remote.

No comments:

Post a Comment