FER | Victor Cárdenas

In this project, I developed a Facial Expression Recognition (FER) system capable of classifying Ekman's seven universal emotions — anger, disgust, fear, happiness, sadness, surprise, and neutrality — by training a custom deep learning model on a high-quality, carefully curated dataset.

The project blends classic and modern FER insights, aiming to make emotion recognition more accurate and accessible even with limited hardware.

The Challenge

Most state-of-the-art FER models are trained on constrained datasets or require high-end resources to achieve competitive accuracy.

Expression recognition degrades under real-world conditions — such as varied lighting, face coverings or subtle expressions.

The Solution

A deep learning-based FER system trained on a hybrid dataset combining real, synthetic, and web-sourced images.

The model architecture is based on a fine-tuned ResNet-18 with custom training strategies such as cross-domain data curation. This approach offers a balanced tradeoff between performance and resource efficiency.

The approach followed four essential phases:

01

Problem Framing

Defined the classification goal (7 Ekman emotions), established hardware constraints, and set performance targets.

02

Dataset Development

Curated and manually reviewed 23,450 high-quality images from multiple datasets and sources, ensuring emotion accuracy and consistency.

03

Model Training

Implemented a ResNet-18 architecture with data preprocessing, cosine learning rate scheduling, and reproducible training routines.

04

Evaluation & Benchmarking

Achieved 90% accuracy and 90% F1-score. Compared results with state-of-the-art models from recent FER literature.

A review of recent research highlights key trade-offs in FER model design:

Power = Precision

RRN + TST tops the charts (100% on CK+) — but demands serious computing power.

Smart & Slim

MobileNetV2 + ResNet-18 delivers solid results (up to 86%) — perfect for limited hardware.

Transformers Scale

ViT-based models generalize well but require more VRAM and longer training times.

SVMs Fall Behind

Lightweight and simple, but classical SVMs struggle to keep up with deep learning models.

The project is rooted in:

Ekman's theory of universal emotions (seven core emotions)
Convolutional Neural Networks (CNNs) for spatial feature extraction
Transfer Learning and Fine-tunin
Cross-domain learning through dataset hybridization and augmentation

Development and Methodology

Dataset Development:

23,450 manually reviewed images across seven emotion classes.
Sources: FER-2013, AffectNet, Oulu-CASIA, MMI, web scraping, Meta & Gemini-generated faces.

Architecture Proposed:

ResNet-18 pretrained on ImageNet.
Custom training loop with cosine LR scheduling and optional augmentations.

Resources Used:

Intel Core i7 (10th Gen)
NVIDIA GTX 1650 Ti (4GB VRAM)
Python, PyTorch, OpenCV, Albumentations

90%

Overall accuracy across 3,521 validation samples.

0.90 F1

Highest-scoring classes: Happy and Surprise.

+4%↑

Outperformed average ResNet-18 FER models (typically ~86%) on FER2013-like datasets.

According to Singh & Prabha (2024), FER models using MobileNetV2 and ResNet-18 typically achieve between 82.1% and 86% accuracy on datasets like FER2013. This project’s model, based on a refined ResNet-18 architecture, achieved 90% — exceeding the upper end of this range by ~4%.

Add emotion intensity and arousal-valence labeling.
Extend training to temporal data (videos) using LSTM or 3D CNNs.
Deploy the model in a lightweight desktop app or mobile tool for psychology, education, or HCI applications.
Evaluate performance on datasets like VFEM or AffWild2 for deeper mental health analysis integration.

Emotion Detection Using Deep Learning

Role

Year

Client

Introduction

The Challenge

The Solution

Process

01