YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Vision Transformer (ViT) for Facial Expression Recognition Model Card

Model Overview

Model Name: trpakov/vit-face-expression
Task: Facial Expression/Emotion Recognition
Dataset: FER-2013
Model Architecture: Vision Transformer (ViT)
Finetuned from model: vit-base-patch16-224-in21k

Model Description

The vit-face-expression model is a Vision Transformer fine-tuned for facial emotion recognition using the FER2013 dataset. The dataset consists of facial images categorized into seven emotions:

Angry
Disgust
Fear
Happy
Sad
Surprise
Neutral

This model leverages transformer-based architecture to capture spatial dependencies in facial features effectively.

Data Preprocessing

Before feeding images into the model, the following preprocessing steps are applied:

Resizing: Images are resized to 224x224 pixels.
Normalization: Pixel values are normalized using the ViT feature extractor's predefined mean and standard deviation.
Data Augmentation: The following transformations are applied to improve generalization:
- Random horizontal flips (p=0.5)
- Random rotation (±10 degrees)
- Color jitter (brightness and contrast adjustments)

Training Details

Optimizer: AdamW
Learning Rate: 5e-5
Batch Size: 8
Loss Function: Cross-Entropy Loss
Epochs: 3
Imbalanced Data Handling: The dataset was limited to a maximum of 5000 samples per class to maintain balance.

Evaluation Metrics

Metric	Value
Validation Accuracy	0.7113
Test Accuracy	0.7116

Confusion Matrix Insights

The model exhibits the following patterns:

Neutral expressions are frequently misclassified, often leading to incorrect predictions.
Disgust and Fear have lower recall rates, meaning the model struggles to distinguish these emotions accurately.
Happy is often confused with Neutral and Fear, likely due to overlapping facial features.
Angry and Sad also show some misclassification overlap, which might be addressed with better feature extraction.

Limitations

Data Bias: The model's performance is influenced by the class imbalance in the FER2013 dataset.
Generalization Issues: The model may not generalize well to diverse facial expressions due to dataset limitations.
Misclassification Trends: Higher misclassification rates for minority classes such as 'Disgust' and 'Fear' suggest potential areas for improvement.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support