Facial Expression Recognition with Tensorflow
Introduction:
What's Deep Learning? If you have a basic understanding of Neural Network, then it's easy to explain. A Deep Learning Network is basically a Multi-layer Neural Network. With its special Back-propagation algorithm, it is able to extract features without human direction. Some experts in the field believe that Deep Learning will replace most of other Machine Learning algorithm in the future.
Although Deep Learning is not part of my program, I am very interested in this subject. As Deep learning technique is getting more and more popular these days, adding it into my skill set is definitely a plus. Moreover, I am taking it as another aspect of adventure in Data Science and Artificial Intelligence.
As in the study of Artificial Intelligence, we want machines to be able to communicate and serve human. If I can build a system which is able to read human facial expression, it can be very useful in several areas, e.g. Sales, Marketing, Human Behavioral Analysis, Artificial Intelligence (build an AI to make people happy?)...
So, what I'm going to do is to build a Facial Expression Recognition model with a Convolutional Neural Network. The entire project can be found on my GitHub repository: https://github.com/Jian-Qiao/Facial-Expression-Recognition
Data:
At the start of this project, I found a data set from a Kaggle challenge, which is linked here. The data set contains 35,887 faces retrieved from Google and labeled by human labelers. However, the data set is very messy. There are tons of mislabeled, blurred faces and even anime face in it. So I chose to scrape my own data set and clean it by myself.
A brief explanation of what my code is going to do is:
- Ask me for a searching criteria
- It Opens a Chrome Web Browser automatically
- Go to https://www.google.com/imghp
- Input the searching criteria I give in 1 , and click on the "Search" button
- Keep scrolling down till the end of the page ( because Google only loads a small amount of data unless you keep scrolling down and ask for more )
- Click on the button saying "Show More Results" if the page cannot be scrolled down anymore
- Repeat 4 and 5 until the page cannot be scrolled down and there's no "Show More Result" button
- Scrape all links of each page
- Loop through all links and download it to a folder created and named by the criteria I gave in 3.]
The detailed code is as below:
I ran the code 7 times for "Angry Human Face", "Happy Human Face", "Disgusted Human Face", "Fearful Human Face", "Neutral Human Face", "Sad Human Face", "Surprised Human Face"
Then I manually went through each picture and deleted the false ones.
After all, I ended up with: 433 "Angry Human Face", 510 "Happy Human Face", 425 "Disgusted Human Face", 339 "Fearful Human Face", 369 "Neutral Human Face", 436 "Sad Human Face", 469 "Surprised Human Face".
Although it's not as big as the Kaggle data set, I was going to build the model with what I got.
Data Processing:
I had done several steps to get the data ready:
Load Picture and Extract Face:
As I was working on a AWS EC2 server, I use 'ssh scp' to copy all the data onto the my AWS remote machine.
As the pictures are saved in its corresponding folder, the first part of the code will walk through all folder in the "Pictures" folder and document the directory and name of each file in it. Also it will keep track of all the labels of each picture.
The face detector I used in the next section only accept JPEG or JPG file, so I had to keep only these 2 formats and drop the others.
In a Convolutional Neural Network, the algorithm will try to find the common features shared in each group. To reduce as much noise as possible, in the second part, I used a face detecting tool to extract only the facial part of each picture. A certain tool can be found in Python dlib package, which works pretty well in my opinion. (I'm going to study it and figure out how it works exactly)
This code will find each face's position in each picture and append it data to a big array. Also, it will keep the label of that face in another array.
(P.S. I dumped all my data into a Pickle file to keep a backup at every stage)
Grey:
As color won't matter in facial expression, I'm going to set all faces into a grey scale. Also, I don't want my system to be a racist 🙂
Re-scale:
The last step is to rescale the face. I choose 100*100 (although the Kaggle Data Set is 48*48), because I want my algorithm to have a chance to capture those "micro-expressions".
Convolutional Neural Network:
The difference between the Convolutional Neural Network and an Artificial Neural Network is that a CNN has another 2 modules as 'Convolutional Layer' and 'Pooling Layer'. The idea of Convolutional Neural Network is to use a small filter scanning over all pixel areas in a picture and extract that matrix as a feature for each filter location. Then the features are sent into a pooling layer. The Convolution-Pooling can be done multiple times before the final result being sent to a traditional Artificial Neural Network.
Working in Convolutional Neural Network is very complicated and requires a lot of prior knowledge, so I cannot cover it thoroughly here... I also won't be pasting my code here because the code is way too much to explain, and it requires other python script to run.
So, here is the basic structure of my CNN:
Also I am using RMSProp in my CNN. RMSProp is a way to use momentum, dynamic learning rate, decay to accelerate the training process.
I also have spread my data into batches as training a small batch a time would not only accelerate the training process, but also allows a slow computer to be able to work on it.
Result:
After couples weeks of model tuning (almost 1 week every time by my AWS Free Tier), the error rate is around 52%, which is not bad with only 2,503 Pictures. (The winner of the Kaggle Facial Expression Challenge has 34% with 35,887 faces).
I understand that for a Deep Learning Algorithm. I recently have read a great article about how ImageNet was built and how that amount of data helped building a fantastic object detection algorithm.
The question now is: what to do in the future?
Next Step:
I have downloaded a facial expression data set from MMI Facial Expression Database. Although the data is very not large (only 493 pictures) and not labeled, I wrote an interface to label those faces by myself:
And add the labeled data set to see how much improvement can I make by adding these faces into my training data set.
Also, I have just got my hands on one of these:
It's a cool little gadget called Raspberry 3. It's basically a mini computer based on Linux and capable of adding different modules.
I'm going to:
- Add a camera module to it
- Build an interface which can dynamically facial extraction and expression recognition base on every frame.
- Save the data and the prediction into a data file from time to time.
- Have someone (most likely me) to check each face-prediction match.
Closing:
From my understanding:
Data Science is more than simply cleaning data, implementing algorithm, and getting a result. It’s about using the proper tool to diagnose and find the way to solve a problem. That’s why it’s called a science.
I may come back and update this post from time to time. If you have any question or suggestion, please e-mail me at jian.qiao@outlook.com.