Skin Color Detection Using Neural Networks

Another cool project I did back in college was to create a classifier that could look at a digital photograph or video and discern where there was human skin. Once you know where the skin is, you can analyze the shape of it, or treat it like a lamina and find its "center of mass" and how it changes over time in a video. However, before you can start thinking about scanning for people and interacting with their motion (or whatever your application may be), you must first learn how to collect and interpret the raw data.

Skin in an image is found by analyzing the colors. Digital images exist in a "color space" which is a mapping of each distinct color to a particular point on a graph whose axes relate to very basic properties of colors. The most common color space is RGB -- each color can be defined by the percentage of red, green, and blue it contains. (Well actually, there are some colors that can't be represented in the RGB space, but most cameras can't detect those colors properly so we'll ignore them.) Cameras employ red, green, and blue filters in order to generate a color image. The RGB color space can be mathematically transformed into other color spaces such as [Hue, Saturation, Value], which describes the actual, pure, "color wheel" color of the pixel, how close it is to this color or to gray, and how bright it is (more towards black or white). The hue of any pixels in a black & white photo doesn't really matter since all the saturations are 0. The only feature distinguishing pixels in a B&W photo is its brightness value. Unfortunately I don't think anyone makes a camera that shoots natively in the HSV space. However, many other color spaces exist besides RGB and HSV, so feel free to look up color spaces if you're interested.

As with any machine learning algorithm, it won't do anything until you feed it some training data. I'll leave it to your imagination where I derived most of my skin tone photos from, but afterward I had to crop down to the salient parts of the photos so I was only left with relevant, positive examples of skin tones. I decided to use photos where people were in good light and not in shadows in order to reduce false positives (i.e. the algorithm finds a skin tone when it isn't really skin). Although it's impossible to eliminate false positives 100%, it does help to use a good camera. The webcam built into my laptop doesn't do a great job capturing variances in saturation, so it picked up a lot of false positives. (What does that say to you about skin tones? I'll tell you later if you can't guess the answer now!)

Analyzing Skin Tone Data The Basic Way

The most basic approach to determining where skin tones exist in a picture is first to collect the training data as I described above. Next, you transform the pixel data into your favorite color space, and then make a histogram showing what values in this color space showed up in your training data. A histogram is simply a graph illustrating how many things have a particular value. All the possible choices are illustrated along an axis of the graph, and it can be 1-D, 2-D, 3-D, or as big as you like. Thus, you can use whichever dimensions of the color space you want. Say your image is in the RGB space; you can make a histogram indicating how many skin tone pixels had what percentage of Red in them. However, lots of non-skin-tones have similar values of Red, so you could add more data and increase the accuracy by indicating what percentages of Red & Green represented skin tones in your set. You could even go all out and specify all 3 dimensions (Red, Green, and Blue) to make the ultimate* classifier. This is where you need to consider the cost of memory and the speed of your look-up table, though. Each value of R, G, and B on a computer takes up 8 bits of space, thus you can get numbers ranging from 0 to 255. For each dimension, you increase the size of your dataset by 256-fold, since for every possible value of Red, there are 256 possible values of Green, and so on. To make a histogram mapping where skin tones exist in the RGB space, you would need 16MB of space, and comparing each picture to 16MB worth of data won't work very well in real time! Not to mention, this approach wastes a ton of memory since skin tones make up a very small portion of the RGB space. To drastically reduce memory while still maintaining accuracy, I performed experiments to decide which combination of two colors yielded the best results when trying to find the skin tones in real-world data (not the training data anymore, but more G-rated photos I can put into a presentation. :-P) Not only that, but I transformed the training data & samples into the HSV space and performed the same experiments in that space too. Now I wouldn't say the RGB->HSV transform is necessarily linear; they definitely yield different results from each other.

*When you throw in ridiculous quantities of data to make the "ultimate" dataset, you may actually be guilty of overfitting. This means you made the classifier fit your training data too well, and it might miss some items you'd want to catch in real-world data. You should find the right amount of training data to put into your classifier in order to get the best results on real-world data.

Picking two dimensions that don't distinguish interesting pixels from non-interesting pixels can lead to a lot of false positives. Using only Red & Blue data to determine skin tones guarantees the worst performance, according to my study.

The algorithm to compare your image to the skin tones works like this:

For each pixel in your image, find the RGB values.
Transform the RGB values into whatever color space you selected.
Read the appropriate element of your skin color data histogram, for instance $SkinColors[$Red][$Green][$Blue].
If that element contains a number greater than an arbitrary threshold you find to work well, then it is a skin tone. If not, then you might set that pixel to black or another color to essentially discard it.

Getting More Sophisticated

This is all well & good, but the accuracy is affected by your training data. Too little and you won't include all the types of skin color (http://gizmodo.com/5431190/hp-face+tracking-webcams-dont-recognize-black-people); too much and you're bound to introduce false positives. To balance this risk (and coincidentally save ourselves a ton of memory at the same time), we can ditch our matrix/histogram approach to skin tone detection and instead go with a neural network. Neural networks are designed to be very tiny models of the human brain. The corollary to the "neuron" is the "perceptron," which takes various inputs and assigns different weights to each input depending on how strongly correlated it is to the desired output. If the sum of the weights is >0, the perceptron fires; if not, it stays silent. You can make any type of logic operator with neural networks, including Exclusive Or (one or the other statement in a pair is true, but not both). By introducing such mathematical functions to approximate what was found in the histogram, you get a smoother-looking curve which is more likely to include all skin tones, even the odd ones here & there your training data may have missed.

A 5-layer neural network used in the determination of what's a skin tone or not. The two green dots are inputs (such as Saturation & Value), and the yellow dots are the two perceptrons that get excited either when it's a skin tone or when it's not a skin tone.

In my research, the neural network-based skin tone detection approach seemed to work better than the regular histogram approach. You can read the whole report on my website if you're interested, and see all the somewhat strange-looking pictures of faces without the eyes, etc. Well, at that rate, at least you know the algorithms do what they should!

Key Takeaway

The most interesting thing I learned from this wasn't anything to do with the math or choice of algorithm, but what the histograms showed me. Human skin tones are surprisingly close to gray. In the HSV scale, the positive training data was all very close to the bare minimum saturation required to detect color. Even though it appears so vivid to us, we are pretty much all gray blobs walking around this planet, whether we appear to each other as black, white, yellow, or anything else; it's a very subtle amount of melanin providing the "brightness" that creeps in and makes the difference.

Relationship of hue to saturation among the skin tone training data. Hue makes a bigger difference.

Also, this project made a nice fusion of the learnings from my digital image class & my machine learning class; I took both in Fall 2010. I picked this project for my machine learning class, and did something wildly different for my digital image class. Expect a posting on that here eventually, but here's a hint: it's already on YouTube!

Search This Blog

GOSHtastic - Game shows, Options, Software, & Hardware!