Facial Emotion Detection
I found the idea of facial recognition in itself controversial but facial emotion detection takes that even farther. By a slight stretch of the imagination, we can see how the addition of a captured, categorized emotion, associated with buying behavior could take an already over-surveilled populace into a brave new world indeed.
There are commercial cloud services, ‘Emotion as a Service’, EaaS for short that are offering APIs and back-end AI services to accomplish this in the chain of e-commerce today. If micro-targeting eeked you out, try adding emotion-augmented micro-targeting for grins.
How AI Does This
If one wanted to do this from scratch, you could take a dataset of face imagery, get humans to label each one manually and when you had enough, use this to train a neural network to model and predict one of the seven standard emotions from an unlabeled, unforeseen test image dataset.
How this works
There are different approaches being taken but most endeavors appear to use the FER2013 dataset as a starting point as it consists of 30,000 labeled face images. The labels follow the seven standard emotions above. Right now there really is no easy way to use unsupervised approaches to auto-label face data categorized by emotion. The fastest way to get this started would be to construct and train a chosen model on an existing, labeled face dataset.
Deep Learning Approach
The type of neural network, which activation functions, densities and such have been intense subjects of discussion on Kaggle and other data science communities. We have not made an exhaustive study but found commonalities in the highest scoring competitors on the Kaggle Facial Expression Recognition Challenge. We selected a couple of those as an evaluation and were able to successfully run them on Colab. All use Keras, a deep convolutional neural network and the FER2013 dataset. Thanks to Colab’s free GPU, it didn’t take long to train these models.
Test Data Preparation
An unlabeled test set undergoes CRNO : Convert, Reshape, Normalize, One-hot encoding. Then, the resulting images are submitted to a trained model that has seen many different facial expressions and scored reasonably well in accuracy and validation passes. CRNO is a practice that combines several pre-processing steps used in preparing images for deep-learning models.
Architecture of the Deep Learning Model
The Convolutional Neural Network (CNN) Architecture consists of a Sequential model. (From Kaggler Lx Yuan)
model = Sequential() #module 1 model.add(Conv2D(2*2*num_features, kernel_size=(3, 3), input_shape=(width, height, 1), data_format='channels_last')) model.add(BatchNormalization()) model.add(Activation('relu')) model.add(Conv2D(2*2*num_features, kernel_size=(3, 3), padding='same')) model.add(BatchNormalization()) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) #module 2 model.add(Conv2D(2*num_features, kernel_size=(3, 3), padding='same')) model.add(BatchNormalization()) model.add(Activation('relu')) model.add(Conv2D(2*num_features, kernel_size=(3, 3), padding='same')) model.add(BatchNormalization()) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) #module 3 model.add(Conv2D(num_features, kernel_size=(3, 3), padding='same')) model.add(BatchNormalization()) model.add(Activation('relu')) model.add(Conv2D(num_features, kernel_size=(3, 3), padding='same')) model.add(BatchNormalization()) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) #flatten model.add(Flatten()) #dense 1 model.add(Dense(2*2*2*num_features)) model.add(BatchNormalization()) model.add(Activation('relu')) #dense 2 model.add(Dense(2*2*num_features)) model.add(BatchNormalization()) model.add(Activation('relu')) #dense 3 model.add(Dense(2*num_features)) model.add(BatchNormalization()) model.add(Activation('relu')) #output layer model.add(Dense(num_classes, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-7), metrics=['accuracy']) model.summary()
Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 46, 46, 256) 2560 _________________________________________________________________ batch_normalization_1 (Batch (None, 46, 46, 256) 1024 _________________________________________________________________ activation_1 (Activation) (None, 46, 46, 256) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 46, 46, 256) 590080 _________________________________________________________________ batch_normalization_2 (Batch (None, 46, 46, 256) 1024 _________________________________________________________________ activation_2 (Activation) (None, 46, 46, 256) 0 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 23, 23, 256) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 23, 23, 128) 295040 _________________________________________________________________ batch_normalization_3 (Batch (None, 23, 23, 128) 512 _________________________________________________________________ activation_3 (Activation) (None, 23, 23, 128) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 23, 23, 128) 147584 _________________________________________________________________ batch_normalization_4 (Batch (None, 23, 23, 128) 512 _________________________________________________________________ activation_4 (Activation) (None, 23, 23, 128) 0 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 11, 11, 128) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 11, 11, 64) 73792 _________________________________________________________________ batch_normalization_5 (Batch (None, 11, 11, 64) 256 _________________________________________________________________ activation_5 (Activation) (None, 11, 11, 64) 0 _________________________________________________________________ conv2d_6 (Conv2D) (None, 11, 11, 64) 36928 _________________________________________________________________ batch_normalization_6 (Batch (None, 11, 11, 64) 256 _________________________________________________________________ activation_6 (Activation) (None, 11, 11, 64) 0 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 5, 5, 64) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 1600) 0 _________________________________________________________________ dense_1 (Dense) (None, 512) 819712 _________________________________________________________________ batch_normalization_7 (Batch (None, 512) 2048 _________________________________________________________________ activation_7 (Activation) (None, 512) 0 _________________________________________________________________ dense_2 (Dense) (None, 256) 131328 _________________________________________________________________ batch_normalization_8 (Batch (None, 256) 1024 _________________________________________________________________ activation_8 (Activation) (None, 256) 0 _________________________________________________________________ dense_3 (Dense) (None, 128) 32896 _________________________________________________________________ batch_normalization_9 (Batch (None, 128) 512 _________________________________________________________________ activation_9 (Activation) (None, 128) 0 _________________________________________________________________ dense_4 (Dense) (None, 7) 903 ================================================================= Total params: 2,137,991 Trainable params: 2,134,407 Non-trainable params: 3,584 ____________________________________________________________
This is not bad for a CNN with batch normalization. In the competition, the score was .66.
This falls right into the ‘squishy’ zone of predictive analytics and is highly subjective. However, kaggle contestants have gone as high as .71 accuracy which could be considered usable. It most likely will improve as models undergo optimization.
Prediction of Unlabeled Faces
Currently, I have been able to use Priya Dwvedi’s CNN for emotions to run some Trump faces through for detection. Thankfully, her model was saved at the highest accuracy and I could load it into a notebook in Colab very easily without having to re-train the model. At .56 accuracy, it got some of Trump’s facial expressions right. I had to convert to grayscale, resize and reshape the faces so the model accepted them. The featured image of this post represents the images I submitted.
My notebook (a fork of Priya’s base) of the Trump emotions test is on github here. I will say that you need to work out a good way to submit images for prediction to work with this on Colab. There is a well-known mechanism to mount your Gdrive within a notebook instance (yes, its temporary). I did that and also unzipped the image archive into a folder whose path is defined in the code.
If you wish to run my code in colab click the CO button below. The Trump images, 16 of them zipped is on this page in the ‘Data’ section of this site. You will have to upload that to your Google Drive, mount the drive in colab and !unzip trum_16_faces2.zip in a cell before using them. You will also need Priya’s CNN model of emotion-detection that was used to find targets for new images. That also is on the Data section of this site and will need to be unzipped in a cell and then run as follows:
model = load_model(model_path+"model_v6_23.hdf5")
Since Potus is everywhere and has a reasonably small set of facial expressions which represent an even smaller set of emotions, he makes a good subject to try and predict the emotion based on the expression. A Kaggler was kind enough (Muhammed Buyukkinaci) to collect these but not label them so I picked 16 and cleaned them up to pass through Priya’s model.
One can see that we are sitting at around 50% accuracy which accounts for the targets being off.
Extending The Facial Emotion Recognition Experiment
Perhaps a deeper neural network would bump the accuracy up to a more usable level as was done by Kaggler Lx Yuan). Although his network is more dense than the one I used to predict emotions of Trump’s face, it uses fewer epochs made up of three banks consisting of cascading feature density input. I ran this training and saved the model but when submitting the same facial data as before, it failed due to pre-formatting issues. I hope to figure this out soon.
Facial emotion detection using deep learning is amusing to experiment with. There are some approaches that work to a degree and can produce repeatable results. There are many commercial and governmental applications that benefit from doing this right along with a handful of good companies offering this as a service. Neurodata lab, Affectiva, Microsoft Face Rec and Amazon Rekognition (yes, another cloud service EaaS – Emotion as a Service)
The practice has attained some notoriety from an ethical standpoint due to questions about the accuracy of labeling based on ’emojification’ of facial expressions. Of course, this is another A.I. technology that could easily be abused so we have to take good care.
Periscopic, a socially conscious data visualization firm, created a feather plot of the emotions of inaugural speeches from Reagan to Trump which is quite illustrative of the changing sentiment of the orators through this period. Emotions were derived using the Microsoft Emotion API.
They also used the MS Emotion API to create the interactive Trump Emoto-Coaster infographic which is very entertaining.
You can interactively slide the emotion scale and it will cue up the video segment that shows the emotion in question.
If we were to take this down its logical path, we are going to see much more use by corporate recruiters for during remove interviews. We are already starting to see audience response to marketing campaigns using emotion recognition. Governments and law-enforcement will make use of real-time facial recognition with emotion understanding to prevent terrorist acts or mass shootings.
Read this paper to see how emotion detection may be coming under theoretical challenge right now. For more head-scratching on the subject, this engaging article in the Washington Post peels back some of the layers of doubt surrounding this.