Emotion Detection Using Deep Learning

Role

Data Scientist

Year

2025

Client

Western Institute of Technology and Higher Education

Hero Image

Introduction

In this project, I developed a Facial Expression Recognition (FER) system capable of classifying Ekman's seven universal emotions — anger, disgust, fear, happiness, sadness, surprise, and neutrality — by training a custom deep learning model on a high-quality, carefully curated dataset.

The project blends classic and modern FER insights, aiming to make emotion recognition more accurate and accessible even with limited hardware.

The Challenge

Most state-of-the-art FER models are trained on constrained datasets or require high-end resources to achieve competitive accuracy.

Expression recognition degrades under real-world conditions — such as varied lighting, face coverings or subtle expressions.

The Solution

A deep learning-based FER system trained on a hybrid dataset combining real, synthetic, and web-sourced images.

The model architecture is based on a fine-tuned ResNet-18 with custom training strategies such as cross-domain data curation. This approach offers a balanced tradeoff between performance and resource efficiency.

Project Detail

Process

The approach followed four essential phases:

01

Problem Framing

Defined the classification goal (7 Ekman emotions), established hardware constraints, and set performance targets.

02

Dataset Development

Curated and manually reviewed 23,450 high-quality images from multiple datasets and sources, ensuring emotion accuracy and consistency.

03

Model Training

Implemented a ResNet-18 architecture with data preprocessing, cosine learning rate scheduling, and reproducible training routines.

04

Evaluation & Benchmarking

Achieved 90% accuracy and 90% F1-score. Compared results with state-of-the-art models from recent FER literature.

State of the Art

A review of recent research highlights key trade-offs in FER model design:

Power = Precision

RRN + TST tops the charts (100% on CK+) — but demands serious computing power.

Smart & Slim

MobileNetV2 + ResNet-18 delivers solid results (up to 86%) — perfect for limited hardware.

Transformers Scale

ViT-based models generalize well but require more VRAM and longer training times.

SVMs Fall Behind

Lightweight and simple, but classical SVMs struggle to keep up with deep learning models.

Theoretical Framework

The project is rooted in:

  • Ekman's theory of universal emotions (seven core emotions)
  • Convolutional Neural Networks (CNNs) for spatial feature extraction
  • Transfer Learning and Fine-tunin
  • Cross-domain learning through dataset hybridization and augmentation

Development and Methodology

Dataset Development:

  • 23,450 manually reviewed images across seven emotion classes.
  • Sources: FER-2013, AffectNet, Oulu-CASIA, MMI, web scraping, Meta & Gemini-generated faces.

Architecture Proposed:

  • ResNet-18 pretrained on ImageNet.
  • Custom training loop with cosine LR scheduling and optional augmentations.

Resources Used:

  • Intel Core i7 (10th Gen)
  • NVIDIA GTX 1650 Ti (4GB VRAM)
  • Python, PyTorch, OpenCV, Albumentations

Results

90%

Overall accuracy across 3,521 validation samples.

0.90 F1

Highest-scoring classes: Happy and Surprise.

+4%↑

Outperformed average ResNet-18 FER models (typically ~86%) on FER2013-like datasets.

According to Singh & Prabha (2024), FER models using MobileNetV2 and ResNet-18 typically achieve between 82.1% and 86% accuracy on datasets like FER2013. This project’s model, based on a refined ResNet-18 architecture, achieved 90% — exceeding the upper end of this range by ~4%.

Next Steps

Next Project

Face Generation Using Autoencoders

View Project →