YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Vision Transformer (ViT) for Facial Expression Recognition Model Card

Model Overview

  • Model Name: trpakov/vit-face-expression
  • Task: Facial Expression/Emotion Recognition
  • Dataset: FER-2013
  • Model Architecture: Vision Transformer (ViT)
  • Finetuned from model: vit-base-patch16-224-in21k

Model Description

The vit-face-expression model is a Vision Transformer fine-tuned for facial emotion recognition using the FER2013 dataset. The dataset consists of facial images categorized into seven emotions:

  • Angry
  • Disgust
  • Fear
  • Happy
  • Sad
  • Surprise
  • Neutral

This model leverages transformer-based architecture to capture spatial dependencies in facial features effectively.

Data Preprocessing

Before feeding images into the model, the following preprocessing steps are applied:

  • Resizing: Images are resized to 224x224 pixels.
  • Normalization: Pixel values are normalized using the ViT feature extractor's predefined mean and standard deviation.
  • Data Augmentation: The following transformations are applied to improve generalization:
    • Random horizontal flips (p=0.5)
    • Random rotation (±10 degrees)
    • Color jitter (brightness and contrast adjustments)

Training Details

  • Optimizer: AdamW
  • Learning Rate: 5e-5
  • Batch Size: 8
  • Loss Function: Cross-Entropy Loss
  • Epochs: 3
  • Imbalanced Data Handling: The dataset was limited to a maximum of 5000 samples per class to maintain balance.

Evaluation Metrics

Metric Value
Validation Accuracy 0.7113
Test Accuracy 0.7116

Confusion Matrix Insights

The model exhibits the following patterns:

  • Neutral expressions are frequently misclassified, often leading to incorrect predictions.
  • Disgust and Fear have lower recall rates, meaning the model struggles to distinguish these emotions accurately.
  • Happy is often confused with Neutral and Fear, likely due to overlapping facial features.
  • Angry and Sad also show some misclassification overlap, which might be addressed with better feature extraction.

Limitations

  • Data Bias: The model's performance is influenced by the class imbalance in the FER2013 dataset.
  • Generalization Issues: The model may not generalize well to diverse facial expressions due to dataset limitations.
  • Misclassification Trends: Higher misclassification rates for minority classes such as 'Disgust' and 'Fear' suggest potential areas for improvement.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support