YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Vision Transformer (ViT) for Facial Expression Recognition Model Card
Model Overview
- Model Name: trpakov/vit-face-expression
- Task: Facial Expression/Emotion Recognition
- Dataset: FER-2013
- Model Architecture: Vision Transformer (ViT)
- Finetuned from model: vit-base-patch16-224-in21k
Model Description
The vit-face-expression model is a Vision Transformer fine-tuned for facial emotion recognition using the FER2013 dataset. The dataset consists of facial images categorized into seven emotions:
- Angry
- Disgust
- Fear
- Happy
- Sad
- Surprise
- Neutral
This model leverages transformer-based architecture to capture spatial dependencies in facial features effectively.
Data Preprocessing
Before feeding images into the model, the following preprocessing steps are applied:
- Resizing: Images are resized to 224x224 pixels.
- Normalization: Pixel values are normalized using the ViT feature extractor's predefined mean and standard deviation.
- Data Augmentation: The following transformations are applied to improve generalization:
- Random horizontal flips (p=0.5)
- Random rotation (±10 degrees)
- Color jitter (brightness and contrast adjustments)
Training Details
- Optimizer: AdamW
- Learning Rate: 5e-5
- Batch Size: 8
- Loss Function: Cross-Entropy Loss
- Epochs: 3
- Imbalanced Data Handling: The dataset was limited to a maximum of 5000 samples per class to maintain balance.
Evaluation Metrics
| Metric | Value |
|---|---|
| Validation Accuracy | 0.7113 |
| Test Accuracy | 0.7116 |
Confusion Matrix Insights
The model exhibits the following patterns:
- Neutral expressions are frequently misclassified, often leading to incorrect predictions.
- Disgust and Fear have lower recall rates, meaning the model struggles to distinguish these emotions accurately.
- Happy is often confused with Neutral and Fear, likely due to overlapping facial features.
- Angry and Sad also show some misclassification overlap, which might be addressed with better feature extraction.
Limitations
- Data Bias: The model's performance is influenced by the class imbalance in the FER2013 dataset.
- Generalization Issues: The model may not generalize well to diverse facial expressions due to dataset limitations.
- Misclassification Trends: Higher misclassification rates for minority classes such as 'Disgust' and 'Fear' suggest potential areas for improvement.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support