Thursday, September 27, 2018

Angular Noob: An Observable On An Observable Observing a Promise

With the reusability and extensibility of modern Web components, I do not look back on the days of jQuery with much fondness.  However, I haven't paid much attention to Angular since Angular 1.  Since its syntax didn't really appeal to me, I opted to learn Polymer instead.  Well now, given a new opportunity, I am diving into a much more modern Angular and TypeScript.  Unfortunately, I am finding that a lot of articles people write on Angular, when you're diving into a well-established code base, are about as dense as reading toward the end of a book on quantum mechanics.  It's English alright, but the jargon is applied thickly.  (And this is coming from someone who has even impressed some of Google's Tensorflow engineers with their machine learning skillz.)

The problem at hand is fairly straightforward.  We want to notify something in the UI upon the outcome of a RESTful request we make to an external resource so that it can display useful information to the user.  We call http.get(), which returns an Observable of type Response (Observable<Response>).  Upon the outcome of the Observable (basically the one event this particular instance fires), we will run either onResponse() or onError().

To describe this in code, imagine the following:

Main App TypeScript File:

ngOnInit() {
  this.dataService.loadFromAPIOrOtherSite();
  // handle routing, and whatever else you can imagine happening here
}


Data Service TypeScript file:


loadFromAPIOrOtherSite() {
  this.dataLoader.loadData().subscribe(
    user => this.onResponse(user),
    error => this.onError(error)
  );
}

Data Loader Service TypeScript file:

loadData() {
  return this.http.get(url)
    .map(response => this.transposeData(response.json()))

    .catch(error => Observable.throw(error));

The way this works is that once the page loads, the data will be fetched.  The obvious problem here is that the main page never gets informed as to the status of the data fetch; as such, the user is not notified when the server fails to respond properly.  Now, theoretically, you could inform the data service about the UI you are looking to manipulate, but I think it makes more sense for the page to deal with its own issues, rather than anything else.

It becomes apparent that what I need to do is get the loadFromAPIOrOtherSite() function to in fact be an Observable itself.  The loadFromAPIOrOtherSite() function utilizes an Observable, so of course the loadData() function returns an Observable that resolves into either the successful answer or an error message.  Unfortunately, a lot of the pedagogy on this topic informs you to use some of the chaining or aggregation functions found in the RxJs library, such as map(), which is overkill for a single GET request.  I don't have a whole array of things to process, nor do I care to append the output of one Observable directly to another Observable.  And, even if there was an array of things to process, it's unclear to me how I could allow the side processes to complete while still returning the request and its status to the main page controller.  I also don't want either of the data services manipulating the DOM directly in order to show the user an error message -- I want the main page controller to handle this.

After enough searching around on Stack Overflow, I finally came across this answer that shows how to nest Observables in a plain fashion, without anything fancy.  It nests an Observable in an Observable by observing the Subscription coming out of the subscribe() function.


Applying This To the Code


There's a little bit of extra logic in here to deal with what happens when the loadFromAPIOrOtherSite() call finishes before or after ngAfterViewInit().  On one hand, you might try to manipulate DOM elements that aren't rendered yet, leading to an undefined mess.  On the other hand, the view might finish rendering before the data load has finished.

Main App TypeScript File:

// You'll want this to deal with timing of the completion of your Observable

import { AfterViewInit } from '@angular/core';

ngOnInit() {
  this.dataService.loadFromAPIOrOtherSite().subscribe(
    data => {
      // happy path
      this.done = true;
      doSomethingOnUI();
    },
    error => {
      // unhappy path
      this.done = true;
      doSomethingOnUI();
    }
  )};
}

ngAfterViewInit() {
  this.elem = document.querySelector('#elem');
  doSomethingOnUI();
}

doSomethingOnUI() {
  if (this.elem && this.done) {
    // do something with this.elem
  }
}

Data Service TypeScript file:


import { Observable } from '@rxjs/Observable';

import { Observer } from '@rxjs/Observer';

loadFromAPIOrOtherSite() {
  return Observable.create((observer: Observer<any>) => {
    this.dataLoader.loadData().subscribe(
      data => {
        observer.next(this.onResponse(data));
        observer.complete();
      },
      error => {
        observer.next(this.onError(error));
        observer.complete();
      }
    );
  )};
}

Now, it's helpful when 
this.onResponse() and this.onError() return something (even as simple as a string or integer), because observer.next() propagates that return value as an "observation" to the subscriber to loadFromAPIOrOtherSite().  And, with observer.complete(), it will be the last thing that subscription will ever receive.

Nesting This Even Further: Moar Nesting!


It's possible that the previous example doesn't go as far as you need.  What if you want to do something else, like check for incomplete data inside this.onResponse() and augment it with additional data, or show an error to the user if it can't be augmented in the necessary way?  And on top of that, how about that this extra data collection function returns a Promise rather than an Observable?  Let's build upon the previous idea and make even more wrappers.

Note that the Data Service TypeScript file now has a subscription to onResponse() as well, not just loadData():

loadFromAPIOrOtherSite() {
  return Observable.create((observer: Observer<any>) => {
    this.dataLoader.loadData().subscribe(
      data => {
        this.onResponse(data).subscribe(
          augmentedData => {
            observer.next(augmentedData);
            observer.complete();
          }
       // etc...

We must also modify onResponse() to return an Observable itself, and not just a basic literal or some JSON object.  You'll notice this follows a similar pattern to before, along with handling a lot of possible unhappy paths:

onResponse(data) {
  // used to just "return 42;" or something simple like that
  return Observable.create((observer: Observer<any>) => {
    if (!isTotallyUnsuitable(data)) {
      let moarData = Observable.fromPromise(this.promiseService.promiseReturner());
      moarData.subscribe((data) => {
        if (cantAugment(data)) {
         observer.next(() => {return "Failure to augment the data"});
         observer.complete();
        }
        // augment the data here (happy path)
        observer.next(data);
        observer.complete();
      }
    } else {
      observer.next(() => {return "Failure to get good data at all"});
      observer.complete();
    }
  });
}

Epilogue


Now, if you know how to do such complex Observable nesting with map(), concatMap(), or forkjoin(), you're welcome to let the world know in the comments below!  And be sure to upvote the Stack Overflow post below if you liked this article!

Sources:



  • https://stackoverflow.com/questions/49630371/return-a-observable-from-a-subscription-with-rxjs/49631711#49631711
  • https://alligator.io/rxjs/simple-error-handling/

Thursday, August 30, 2018

Talking about Digital Fight Club

Earlier this month, I attended the third installment of the Digital Fight Club, put on by Digital Dallas.  Digital Fight Club is now going to be held at various events across the country!  I'd love to make it out to every one of them.  You all who are where it will be held are in for a treat, and should not miss it.  Read more at http://www.digitalfightclub.co/.


You Talk About Digital Fight Club


The format involves two thought leaders sparring in a short debate on a particular topic centered around emerging technology, design, and organizational behavior.  Each round of debate, consisting of both debaters' opening arguments and rebuttals, lasts under five minutes.  Then, another five minutes or less goes to at least two of the five judges to ask a question addressed to one of the debaters, and the other can rebut the answer as well.  Finally, the judges and audience decides a winner for each round.

Despite the brevity, it is a power-packed punch of information, opinion, and emotion from the debaters, and some have really hit it home in topics where you might not have seen two sides to the issue.  I have finally nursed my sore fingers back to health from live Tweeting the event.


Why Me? Why Today? Why Now?


I've been going to their events since "Digital Dumbo / Digital Dallas" launch party back in 2013, and they always coordinate an engaging time and draw interesting folks from mostly the creative and product side, but also the tech side.  (Although I didn't run into a whole lot of other engineers at Digital Fight Club this time; maybe they are all hanging around Plano and Frisco these days rather than on Lower Greenville.)  In fact, one of their events in 2014 landed me a job.  As my company and Digital Dallas were still partners in 2016, I scored free tickets to Digital Fight Club and attended with some of my favorite innovation-hungry coworkers.  It was a blast, and I definitely couldn't wait until the next one.

By the time Digital Fight Club was announced, with none other than Mark Cuban as one of the judges, one of my cousins booked a family getaway down in Galveston starting the same day and going on through the weekend.  And, unfortunately, since my employer decided to scale back substantially on sponsoring outside events, I could not get tickets through work.  Considering my family trip, I did not want to double-book myself and deprive someone else of a ticket.

Unfortunately, Hurricane Harvey blew into town that weekend.  Apparently the resorts were already starting to close and evacuate that Wednesday, so I was now removed from two things I wanted to do -- not going to Galveston, and extremely remiss that I was not going to Digital Fight Club either.  (Note to self: just double-book yourself anyway in case something else falls through like this.)

Given this, I knew that come hell or high water, I was not missing Digital Fight Club 3.


Let the Fights Begin


There were five fights this year, in the realms of:

  • Retail: Physical vs. Digital
  • Voice Marketing and Control
  • Design: Speed, Technology, and Process
  • Blockchain: Security & Trust vs. Promise
  • Smart Cameras / Smart Images
You can read about the participants here at the official Digital Fight Club 2018 Dallas page.  What I would like to relate to you are the viewpoints of the debaters and then how I felt about these views.

Retail

The fight in retail seemed to stem around the behavior of millennials versus the new generation (which is back on track for having to suffer through an uncreative name such as Generation Z).  It's still unclear to me what a millennial is; some say 1980-2000, but I would probably peg it as more of 1985-1995.  People younger than this have not grown up in the same world millennials did; they don't remember a world pre-9/11 or pre-broadband Internet, and have pretty much had computers and social media around constantly.  And people older than this at least got jobs right out of college, and could generally afford to start life right away like all the previous generations in post-war America.

But to get back to the arguments, Elie presented the viewpoint of people looking for "social, memorable, material transactions" with physical retail.  They remember the ambiance and experience of buying something and getting to interact with it, and claimed that Gen Z prefers to get physical and interactive with their shopping experiences, noting that it has been hard for e-commerce to pull this off.  On the other hand, Daniel claimed that the United States is over-retailed; over 25 sq. ft. of retail shopping space exists per person, as opposed to just 4 sq. ft. in Great Britain.  (Unless they actually reported their number in meters instead of feet, in which case that would actually be 43 sq. ft. :-P)  Claiming the millennial viewpoint, he believed the existing Fortune 500 brick-and-mortar stores would die off and yield retail space to online retailers looking to establish a brick-and-mortar presence.

Stacy wrote on Twitter that millennials care about experiences.  We don't like products so much, as that's just materialistic crap that fills your house and you can't take it to the grave.  In fact, we would rather go ax-throwing, indoor skydiving, or travel to a foreign continent than fill up our houses with junk that collects dust.  And we can easily share experiences digitally, just as how I live-tweeted Digital Fight Club.  As such, this leaves the vast majority of what we buy to groceries and household items, which are usually sold in sad stores with ugly fluorescent lighting, long lines, difficult parking, and staffed by people just trying to get by or still in school.  That's not the type of "memorable transaction" we care to relive, so why not just order groceries and toilet paper from Amazon?  Who needs to get social about that stuff anyway?  Trash bags are mundane, rarely change, and quite frankly reordering them is why the Amazon Dash button exists.  On the opposite side of mundane, lots of cool stuff can be found on Etsy, giving products a much wider exposure than anything in any boutique could get.

As you can imagine, Stacy sided with Daniel, but I sided with Elie.  I'm not a realistic case when it comes to shopping; since Stacy handles all the mundane stuff (and even things I know I need but that I don't want to put energy into getting for myself), this means I am only ever shopping for things that really strike my fancy, and that will always give me a memorable, emotionally positive experience from buying.  Consider all the suit shopping I've done at department stores, and all the interesting conversations I've had with the owners and other customers at Tanner Electronics.  And that once I got a summer job "working for my dealer" -- a computer store in Colleyville that I frequently bought parts from.  Plus, there's something really gratifying about having it in my hand as quickly as possible, even though many times I don't even use it for a while.  However, even though I like the personal touch, a lot of deals I find originate through private forums or other such electronic means anyway.

As it turns out, Elie came out the winner among the judges and audience voting.

Voice

With more and more buzz about digital assistants, there has yet to be a killer app necessarily defined just for voice.  To Michelle's point, it is just another "customer touch point" or method of interaction with a system.  I can almost as easily type something into the Google Assistant chat box, and ultimately the system is converting speech into text in order to comprehend the action anyway.  Michelle went on to make more sharp criticisms, such as AIs exerting "racist" tendencies such as only doing a good job at understanding native American or British English speakers, and not understanding those with foreign accents.  Now this is just the start; imagine over time, as AIs learn actions from those it understands best, eventually they may become biased to perform these actions, and then do the wrong thing for other cultures once the speech training improves.  Finally, she hit on the privacy aspect of advertisers now wanting to listen in on your voice interactions or possibly the conversation in the room while the agent is not actively interacting with you.  Being marketed to when you're trying to relax or right before sleep is annoying, not to mention seeing things the next day in your email or on your screen that you talked about in passing is downright creepy.

Unfortunately, Chris didn't have many points that stuck out to me, but did offer that voice is innate to humans all the way back to our days in our mother's womb.  He believed that 30% of web surfing sessions will be screenless in the next 2 years.  I'm maybe an oddball once again in this scenario; I love a keyboard (even a modern MacBook keyboard) rather than touching to input text on my phone, and while I do love speech-to-text on the phone, I don't often use it just because others might hear my thoughts and take them out of context.  I don't need them knowing my business or what I'm doing with my device.  And honestly, just having gone back and forth over the design of a "concierge" for something at work, where voice is a ridiculous way to sift through dozens of results, I was not bullish on voice that night anyway.

That being said, I dictated a paper for one of my masters' classes sometime in 2010-2011 using speech-to-text on my first Android phone.  It worked great, but as Stacy was taking me to Walgreens, where you can guarantee running into old people (especially when you're at one in the northern Chicago suburbs), a lot of them seemed very confused as to what I was doing.  Usually they had perhaps their adult children with them, who explained what I was doing, but were probably still impressed themselves that in fact I was dictating into my phone with such ease.

Nevertheless, I go on the Web to seek information, which is easiest to consume quickly when read.  No one recites GitHub readmes aloud before a group and expects anyone to remember everything.  I cringe at the thought of when I'll be too blind to use a computer monitor, and will have to fumble through some other way of sensing information, likely in a format presented not nearly as densely.  It's something I've discussed with accessibility engineers at Google I/O (and they had impressive answers for my concerns), but nevertheless hope it is still far, far off for me.  I voted for Michelle, and she handily won that round.

Running a Design Organization

Now I will say that one of the debaters in this round was James Helms, who was a judge of my product LEDgoes / BriteBlox back when it was on the Expose UX web series in 2014 (back when I had hair).  During the taping, I didn't think it came across so well, so it was hard to bring myself to watch the episode.  Nevertheless, I went to the Expose UX launch party, and met some absolutely wonderful, inspiring, enthusiastic, and life-changing people there, and we hung out talking in the lobby until security kicked us out.  (Then one of them became my coworker for a whole year!)  Now, I would need corroboration on this recollection, but it's possible I was one of just a couple people with a product on the show who actually bothered to show up to the party.  It was so nerve-racking sitting through the four products at the debut and seeing if I was going to be next!  Fortunately I wasn't, but soon after, I watched the episode for myself.  And I must say, the editors did a fantastic job putting the episode together.  I really liked the way it turned out, even though I really wish I'd have gotten the part about it being 1,000% funded on Kickstarter at least shared with the judges, if not into the final cut.

Ok, enough backstory on that.  Jeriad (a last-minute fill-in for someone) and James took on the fundamental principle of operating a design shop.  As an engineer, not such an expert in the daily life of a designer, I didn't have a lot of context on these arguments.  What I can summarize for you is that James made a rather inarticulate point about how designing fast is sexy, but you need to take it slowly.  He picked it up a touch by touching on asking what people want rather than what they wish they had (the old adage of a car versus a faster horse).  James' opponent, Jeriad, had a much better stage presence that night, with the viewpoint that the C-suite gets excited by things that happen fast.  Designers are at the forefront of disruption, so don't let your workflow be the same -- particularly, don't let it stop you from seeing something innovative.  And, of course, fail fast -- you know when something sucks quickly, so focus on service design.

Because of the strength of Jeriad's delivery, I voted for him.  Jeriad won, but still lots of people sided with James; the 57%-43% outcome was the closest margin of any of the debates.

Blockchain

Now we go from the closest outcome right into the farthest outcome, and back into familiar territory for me.  Most people surely can form opinions about voice and retail quickly, given their lifelong experiences and feelings on convenience versus creepy factor.  Most of the people in the audience had probably lived through a design sprint and had certain statements from the previous round resonate with them in different ways.  As for me, I've been dappling in blockchain since 2013, started developing on Chain Core & Colored Coins in 2015, tried to get into Ethereum in 2016, and lately have been exploring Hyperledger projects.  I've given three distinct presentations on blockchain at company-wide internal events in the past five months, so now I'm definitely back into my comfort zone.

Stacy & I spent a very long time talking to Mark, the first debater in this round, at the after-party, and he shared a lot of interesting stories with me, the most memorable ones relating to Launch DFW's early days.  Unfortunately, Mark recalled to me that he had an opening argument prepared.  From what I saw, at the last second, he decided to try to explain blockchain, and that alone took up exactly the roughly 90 seconds allotted for his opening, allowing him to get about half a sentence out for his main point.  While it was extremely impressive that he could describe all of blockchain in the space of an elevator pitch, I imagine it probably still went over the heads of most people, since he had to speak it all very quickly.

Nevertheless, Mark eventually made a point that resonated with me: enterprises are successful at (just like with most things) making the blockchain more bureaucratic and expensive.  When you look at the architectural complexity of Hyperledger Fabric compared to Bitcoin, which runs the same computations on a bunch of distributed nodes, or that Hyperledger Sawtooth requires specific instructions only implemented on the latest Intel chips that cloud service providers either don't or in fact no longer provide, you go "yep, that's an enterprise's work."  However, these systems do have massive advantages, not just for enterprises who are historically risk-averse, but for cheapskates like me: there is no mining, thus you can freely exchange information without worrying about the price of some underlying unit of currency that has to be exchanged as commission for posting the data onto the ledger.  And I would say this is important.  No one wants to be exposed to the circus that has played out in Ethereum-land, such as "The DAO" hack resulting in a hard fork, and millions more dollars lost through other misanthropic deeds or simply brainless bugs.

This was a good point, but Jaime, Mark's opponent, hit on something even more important.  Even with a public blockchain, someone could easily misrepresent something of value encoded on the blockchain that is more than simply a unit of the underlying currency (say, a smart contract representing movie tickets in a theater or units of commodities to be delivered through a commodity exchange), whether through intentional fraud, or just by a bug or fat-fingering something.  At least in a public blockchain, there might be more third parties trying to audit the data.  But when enterprises are playing in these "walled gardens"[1] where they are doing who knows what with your data and your money, and humans (who make mistakes) are in charge of writing the "chaincode", then what's at stake?  Jaime's point was essentially that enterprise blockchains are nothing more than centralized distributed databases that allow enterprises to cheat.  And, to me, it makes more sense to just run a common instance of Cassandra DB or Kafka message queuing service if that's all you're looking for.

Ultimately, Jaime took 90% share of the votes, including mine.

[1] Walled gardens being "private, permissioned" networks.  I think of "private" and "permissioned" as two different things on two different dimensions, not as two related things next to each other on a continuum, as many people often recite "public -> permissionless -> private".

Smart Images

With this one as well, my bias toward one of the judges at the onset only led me to disappointment.  As Skip runs Spacee and one of the AR/VR meetup groups around town, I'd run into him a few times in the community.  However, he took on a viewpoint that not many folks agreed with.  He may have been trying to be sardonic by flippantly dismissing legitimate privacy concerns, making comments such as "Robots, kill me last."  His points started by encouraging users to give up a little bit of privacy for the sake of convenience to, say, order from a place before you even think of it.

Thierry, his opponent, pointed out that KFC in China is already ordering for people based on facial recognition.  (Are we this habitual?  Maybe at KFC, sure, but when given this many choices in life, why be predictable?  How would this help me try new things?)  Thierry had a mic-drop kind of moment with the audience when he proclaimed that every technology has promptly invited some kind of abuse.  He asked where your business will be when the rules are broken.  Skip, on the other hand, actually did drop the mic after making his plea to the robots that I mentioned earlier.  Skip finished by proclaiming a daunting notion that "you think you control things at home but you don't."  Why not let an AI watch over you sleeping while you're sick?  Would this be beneficial?  Helpful to a doctor?  Or would it just go to some ad marketer to get you to buy medicine?

Finally, Thierry encouraged the audience to continue innovating and "pushing tech to the edge" but to do so in a transparent way, particularly by being up-front with your data collection practices.  (Well, you kind of have to these days, in light of the GDPR.)  He went on to claim that people have grown so accustomed to change that we are devolving, rather than evolving, as we are just letting whatever happen around us and not taking action.  Well, people do take action, but we have so little bandwidth these days compared to the energy it takes to fight all these battles, and there's not enough time in the world left to acquire enough unbiased insight into what is truly going on.  We are collectively leaving each other in the dust, possibly to cannibalize society, while we empower each other to learn and do more than ever thought possible.  Thierry's fear of Big Brother tendencies won over me and most of the rest of the audience.

Epilogue

All told, with as much energy and excitement as there was, and with such efficiently-run fights, the meat of the content was all over in about 45 minutes.  I think there are plenty of people who would agree that adding a minute to the discussions, or adding another round altogether, would be great.  Stacy and I were next to last to leave the afterparty, at about 11:30 PM, way after the outdoor bar had closed.

This is a really fun, innovative way to consume content.  It's not necessarily for learning about something so much as thinking about higher-level concepts, but it definitely busts the boring old "give a talk or presentation about something" paradigm that usually exists when sharing information.

Thursday, June 28, 2018

Validating Pre-Made Tensorflow Estimators Mid-Stream


In Francois Chollet’s book Deep Learning with Python, he stresses the importance of utilizing a separate validation set of data while training a machine learning model in order to test periodically (say after every epoch) that the accuracy on something else besides strictly the training data (e.g. this validation set) is in fact improving.

Machine learning models are subject to learn relationships that have nothing to do with the problem at hand.  For instance, a model tasked with trying to determine which way a military tank is facing might end up making assumptions based on whether it is day or night.  This is often a result of trying to eke out the model’s maximum performance, say by optimizing for the smallest value of a loss function.  However, what ends up happening is that the model overfits on the training data, which means it loses its generalization — its ability to predict the correct outcome of new samples or examples that we as humans would intend for it to classify.

One way to compensate for this is to withhold a separate set of data, known as a validation set, so that we monitor for not just the loss we are trying to optimize for, but also the performance of the model on a separate set of data it hasn’t seen.  While the loss function may be decreasing, thus insinuating the model is getting more accurate, you might in fact see that the performance on the validation dataset stops improving, or in fact reverses course and gets worse.  By evaluating a validation dataset throughout training, we can figure out when to terminate training for the best results.

Pre-made (Canned) Tensorflow Estimators


As each day goes by, there are more and more benefits to using canned estimators.  For instance:
  • They manage for you which parts are best run distributed vs. a single machine
  • They can stand on the shoulders of Tensorflow Hub modules, allowing you to focus on adding just one or more instances of a simple type of layer to much more complicated models pre-filled with highly trained and optimized parameters
  • You focus on the configuration of the model as a whole, not configuring so many details of every specific layer

However, pre-made Estimators do not offer many functions other than train, evaluate, predict, and exporting as a SavedModel.  And, looking for extensibility of pre-trained Estimators can be tricky: neither the train function nor a pre-trained Estimator object offer a direct, apparent way to run evaluation of a validation dataset mid-stream during training.

But Chollet said we need to validate frequently!


As it turns out, Tensorflow allows you to run side functions (invoked by callbacks) during train, evaluate, and predict.  These can serve any purpose, such as hooking up to the Twitter API to post memes to Twitter at certain intervals during the underlying operation (at either specified intervals of time or steps), or as we are most interested in doing, running an evaluation alongside training.

The feature is specified in these three functions by the hooks parameter.  Hooks are instances of the SessionRunHook class that allow you to define custom code that runs at the start and end of each Session, as well as before and after each step.  (There are built-in functions that will help you count steps or time so you’re not executing your desired action at every step.)

Let’s take a look at a complete SessionRunHook example for a pre-made Estimator.

class ValidationHook(tf.train.SessionRunHook):
    def __init__(self, parent_estimator, input_fn,
                 every_n_secs=None, every_n_steps=None):
        print("ValidationHook was initialized")
        self._parent_estimator = parent_estimator
        self._input_fn = input_fn
        self._iter_count = 0
        self._timer = tf.train.SecondOrStepTimer(every_n_secs, every_n_steps)
        self._should_trigger = False

    def begin(self):
        self._timer.reset()
        self._iter_count = 0

    def before_run(self, run_context):
        self._should_trigger = self._timer.should_trigger_for_step(self._iter_count)

    def after_run(self, run_context, run_values):
        if self._should_trigger:
            print("Hook is running")
            validation_eval_accuracy = self._parent_estimator.evaluate(input_fn=self._input_fn)
            print("Hook is done running. Training set accuracy: {accuracy}".format(**validation_eval_accuracy))
            self._timer.update_last_triggered_step(self._iter_count)
        self._iter_count += 1

Our purpose above is to run evaluate on the Estimator during training after a predefined number of seconds (every_n_secs) or steps (every_n_steps) elapses.  To this end, we can actually pass in the Estimator itself as an instantiation argument to our ValidationHook object, rather than trying to create a new Estimator in this scope.  You will see in begin() that some variables get initialized just at session runtime.  The before_run() function defers to a Timer initialized in the constructor that dictates whether or not to run our desired operation (the validation evaluation) in the after_run() function.  Without asking the timer if it’s time, the evaluation in after_run() would run after each step, and that would waste a substantial amount of time.

Now, actually running the Estimator with intermediate validation steps is a simple matter of defining the correct parameters in train().

settings = tf.estimator.RunConfig(keep_checkpoint_max=2, save_checkpoints_steps=STEPS_PER_EPOCH, save_checkpoints_secs=None)

estimator = tf.estimator.DNNClassifier(
    config=settings,
    hidden_units=desired_layer_sizes,
    feature_columns=[my_feature_column],
    n_classes=class_count,
    optimizer=tf.train.RMSPropOptimizer(learning_rate=desired_learning_rate)
)

estimator.train(
    input_fn=train_input_fn,
    hooks=[ValidationHook(estimator, validation_input_fn, None, STEPS_PER_EPOCH)],
    steps=STEPS_PER_EPOCH * 40
)
print("Done")

What’s happening here is that I’m writing a RunConfig seeking to reduce space consumed on the hard drive by keeping a low number of recent checkpoints on hand.  Then, I configure it to save checkpoints at every STEPS_PER_EPOCH steps and make sure to explicitly disable saving checkpoints periodically, since defining both parameters is disallowed and the default is to save a checkpoint every 600 seconds.  Then, in train(), just make sure to specify the hook you built and the config you defined.

NOTA BENE: If you do not specify the RunConfig to save a checkpoint at your desired interval, the weights will not be updated when you run evaluate() inside the hook, and thus your validation performance will not appear to change until another checkpoint is written.

Visualizing Validation Performance with TensorBoard


TensorBoard can easily graph the loss over time as the model gets trained.  This is output by default into the events file that TensorBoard monitors.  However, there are a couple things you might wonder about:
  • How can I show other metrics, such as accuracy, or precision and recall, over time?
  • If I’m showing loss every 100 steps, and TensorBoard is picking this up to graph it, how do I get it to only convey my desired performance metrics at the times when they’re actually calculated?
As it turns out, the validation accuracy should already be available for you in TensorBoard.  Under your main model's output directory (defined by model_dir), there will be another directory called eval where the validation accuracy metric consumed by TensorBoard will be placed.  You can overlay the validation accuracy with the graph of loss, and/or with any other such tf.metrics collected in other log directories.

But if you want additional metrics, especially the ones defined in tf.metrics such as precision and recall at top k, there is a function called add_metrics() that actually comes from the tf.contrib.estimator module (rather than tf.estimator).  This allows you to define a metrics function returning a dictionary of your calculated metric results.  The good news is this function returns a brand new instance of Estimator, so you don’t have to worry about your original Estimator in training trying to constantly run these evaluation functions.  But the best part is even though this is now a separate Estimator object, all the parameters of the original Estimator are conveyed to this new one as they are updated by the checkpoint writing process.

To add additional metrics to TensorBoard, add a function to your code that complies with metric_fn as such:

# This is the function that meets the specs of metric_fn
def evaluation_metrics(labels, predictions):
    probabilities = predictions['probabilities']
    return {'auc': tf.metrics.auc(labels, probabilities)}

# And note the darkened modifications below:

class ValidationHook(tf.train.SessionRunHook):
    def __init__(self, parent_estimator, input_fn,
                 every_n_secs=None, every_n_steps=None):
        print("ValidationHook was initialized")
        self._estimator = tf.contrib.estimator.add_metrics(
            parent_estimator,
            evaluation_metrics
        )
        self._input_fn = input_fn
        ...
    }

    def after_run(self, run_context, run_values):
        if self._should_trigger:
            validation_eval_accuracy = self._estimator.evaluate(input_fn=self._input_fn)
            print("Hook is done running. Training set accuracy: {accuracy}".format(**validation_eval_accuracy))
            ...


NOTA BENE: While it is convenient that you only need to add a couple lines in your subclass of SessionRunHook, don't forget to add the model_dir to the parameters you use to initialize the parent Estimator object, or else TensorBoard might not be able to pick up on any of your metrics at all for both training and evaluation.

What about tf.estimator.train_and_evaluate() ?


This is a function provided in the estimator module itself, and is not exposed in pre-made Estimators.  However, it does take your Estimator as an argument.  And so it is, with very few lines of code, that you can run interleaved training and evaluation.  After defining your Estimator (omitting the hooks parameter this time), don't bother writing any hooks at all.  Just do this:

estimator = tf.contrib.estimator.add_metrics(estimator, evaluation_metrics)

train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=5000)
eval_spec = tf.estimator.EvalSpec(input_fn=validation_input_fn, start_delay_secs=60, throttle_secs=60)


tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

evaluation_metrics is the same function as defined earlier.  start_delay_secs defines the delay from the very instant your model begins training.  If you put even 1 here as the value, training probably won't have even made it beyond the first step before an evaluation is performed.  And, throttle_secs defines the minimum delay between evaluations.  If you put 1 here, chances are you will perform just a single step of training in between evaluations.  The default values are 600 and 120, respectively.

Unfortunately, the implementation to date of train_and_evaluate() seems incapable of counting steps rather than time, so it will take some empirical measurement to find out exactly how much time an entire epoch takes to run if you care to line up evaluations with epochs.  However, this approach should be very scalable onto distributed systems, for those looking for very fast training.

One other thing that seemed a bit odd to me is that, in the course of running all these experiments, I would often times delete the log directory and reuse the same name.  Most times, with the hooks method, TensorBoard would pick up on the new run and work just fine.  However, using this feature just above, TensorBoard only seems to pick up on the new run about half the time.

Sources




And, of course, the Tensorflow documentation