Legend0908's picture
Update README.md
97f5fe4 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Video Game Genre Predictor
emoji: ๐ŸŽฎ
colorFrom: purple
colorTo: pink
sdk: gradio
app_file: app.py
pinned: false

Video Game Genre Classifier

Predict video game genres from cover images using deep learning


Description

What It Does

This project uses transfer learning with deep neural networks to automatically classify video game cover images into one of five genres: Fighting, Indie, Platform, Puzzle, and Sport. Upload a game cover image, and the model predicts its most likely genre in real-time.

Why It Exists

Originally conceived as a review-based game classifier, this project evolved into an image-based genre prediction experiment due to project specifications and dataset availability. It serves as an exploration of transfer learning effectiveness on visual game classification tasks, particularly addressing challenges like class imbalance and label ambiguity in real-world datasets.

Key Features

  • Transfer Learning Models: Trained using both ResNet-50 and EfficientNetV2-S architectures
  • Web Interface: User-friendly dark-themed demo application built with Flask and HTML/CSS
  • High Accuracy: Achieves ~50.42% accuracy on a deduplicated, balanced dataset
  • Class Balance Handling: Uses weighted loss and label smoothing to manage imbalanced classes
  • Advanced Regularization: Implements dropout, data augmentation, and progressive fine-tuning
  • GPU Support: CUDA-optimized for faster inference and training

Tech Stack Highlights

  • Backend: Flask (Python web framework)
  • Deep Learning: PyTorch, torchvision (ResNet-50, EfficientNetV2-S)
  • Frontend: HTML5, CSS3 (dark gamer-themed UI)
  • Data Processing: Pillow, NumPy, scikit-learn
  • Hardware: GPU-accelerated (CUDA support)

Installation & Setup

Prerequisites

  • Python 3.8+
  • CUDA 11.0+ (optional, for GPU acceleration)
  • Git
  • 2GB free disk space (for model weights and dependencies)

Local Installation

  1. Clone the repository:

    git clone https://github.com/Cjmcd23/CSSE_Local_Demo.git
    cd CSSE_Local_Demo
    
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Run the Flask application:

    python app.py
    
  5. Access the demo: Open your browser and navigate to http://localhost:5000

Deployment on Hugging Face Spaces

Coming Soon โ€” We are currently working on deploying this application to Hugging Face Spaces for easy online access. Instructions will be added shortly. For now, follow the Local Installation steps above.


Usage

Running the Demo Locally

  1. Start the Flask server (see Installation step 4)
  2. Open the web interface at http://localhost:5000
  3. Upload a Game Cover Image:
    • Click the upload area in the center of the page
    • Select a game cover image (JPEG, PNG, or other common formats)
    • Or paste an image URL directly
  4. View Prediction:
    • The model displays the predicted genre and confidence score
    • Confusion matrix and per-class metrics are available in the Jupyter notebooks

Example Predictions

  • Input: Street Fighter VI cover image โ†’ Prediction: Fighting (95% confidence)
  • Input: Minecraft cover image โ†’ Prediction: Indie (87% confidence)
  • Input: Super Mario Bros cover image โ†’ Prediction: Platform (92% confidence)

Using the Models Programmatically

import torch
from torchvision import transforms
from PIL import Image

# Load model
model = torch.load('best_model.pth')
model.eval()

# Prepare image
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

image = Image.open('game_cover.jpg').convert('RGB')
input_tensor = transform(image).unsqueeze(0)

# Predict
with torch.no_grad():
    output = model(input_tensor)
    predicted_class = output.argmax(dim=1).item()
    
class_names = ['fighting', 'indie', 'platform', 'puzzle', 'sport']
print(f"Predicted Genre: {class_names[predicted_class]}")

Dependencies

Key Python Libraries

  • PyTorch (2.0+) โ€” Deep learning framework
  • torchvision (0.15+) โ€” Computer vision utilities
  • Flask (2.3+) โ€” Web framework
  • Pillow (9.0+) โ€” Image processing
  • NumPy (1.21+) โ€” Numerical computing
  • scikit-learn (1.0+) โ€” Machine learning metrics
  • matplotlib (3.5+) โ€” Visualization
  • seaborn (0.12+) โ€” Advanced plotting

System Requirements (User / Backend)

  • GPU (Recommended): NVIDIA GPU with 2GB+ VRAM
    • For faster inference and training
    • Falls back to CPU if unavailable
  • CPU (Minimum): Intel i5/AMD Ryzen 5 or equivalent
    • CPU inference is ~5-10x slower than GPU
    • Sufficient for single predictions
  • RAM:
    • Minimum: 4GB (CPU-only inference)
    • Recommended: 8GB+ (for training and batch inference)
  • Storage: 2.5GB total (model weights ~100MB + dependencies + data)
  • Disk I/O: SSD recommended for faster data loading during training

See requirements.txt for exact versions.


Project Structure

CSSE_Local_Demo/                       # Main demo folder
โ”œโ”€โ”€ README.md                          # This file
โ”œโ”€โ”€ requirements.txt                   # Python dependencies
โ”œโ”€โ”€ app.py                             # Flask backend (main application)
โ”œโ”€โ”€ index.html                         # Web interface frontend
โ”œโ”€โ”€ best_model.pth                     # Trained ResNet-50 model weights
โ”œโ”€โ”€ classes.txt                        # Genre class labels
โ”œโ”€โ”€ covers/                            # Example game cover images (demo data)
โ””โ”€โ”€ __pycache__/                       # Python cache (auto-generated)

../src/                                # Training notebooks (reference)
โ”œโ”€โ”€ split.ipynb                        # Dataset deduplication & splitting
โ”œโ”€โ”€ Transfer_L_Resnet.ipynb            # ResNet-50 training pipeline
โ”œโ”€โ”€ Transfer_EfficientNet.ipynb        # EfficientNetV2-S training pipeline
โ”œโ”€โ”€ Baseline_mode_classifier.ipynb     # Baseline model (53% accuracy)
โ””โ”€โ”€ best_model.pth                     # Best trained model weights

../data_splits_dedup_80_20/            # Final dataset
โ”œโ”€โ”€ train/                             # Training images (80%)
โ”‚   โ”œโ”€โ”€ fighting/
โ”‚   โ”œโ”€โ”€ indie/
โ”‚   โ”œโ”€โ”€ platform/
โ”‚   โ”œโ”€โ”€ puzzle/
โ”‚   โ””โ”€โ”€ sport/
โ”œโ”€โ”€ test/                              # Test images (20%)
โ”‚   โ”œโ”€โ”€ fighting/
โ”‚   โ”œโ”€โ”€ indie/
โ”‚   โ”œโ”€โ”€ platform/
โ”‚   โ”œโ”€โ”€ puzzle/
โ”‚   โ””โ”€โ”€ sport/
โ””โ”€โ”€ report.json                        # Dataset statistics

Demo Folder (CSSE_Local_Demo) File Descriptions

  • app.py: Flask server that loads the model and serves predictions via HTTP
  • index.html: Dark-themed web UI with image upload and real-time prediction display
  • best_model.pth: Pre-trained ResNet-50 weights (saved after training)
  • classes.txt: Mapping of class indices to genre names
  • covers/: Sample game covers for testing the demo

Experiment History & Model Development

Baseline: Majority Class Classifier

We started with a simple baseline to establish a performance floor. A mode-based classifier (always predicting the majority class) achieved 53.0% accuracy on the test set. This benchmark guided our expectations for more complex models.

Experiment 1: Shallow Feature Extraction (ResNet-50)

Our initial transfer learning approach used ResNet-50 in "feature extraction" modeโ€”freezing all pretrained weights and only training the final fully connected layer.

  • Result: ~53% accuracy
  • Finding: The model lacked discriminative power, especially for the underrepresented Puzzle class. It learned to exploit class imbalance rather than capture meaningful visual features.
  • Lesson: Simple fine-tuning is insufficient for this challenging domain with label ambiguity.

Experiment 2: Deeper Fine-tuning with Class Weighting (ResNet-50)

To improve performance, we unfroze deeper layers (layer3 and layer4) and applied aggressive class weighting (1/count) using WeightedRandomSampler.

  • Result: Accuracy dropped to ~45%
  • Finding: Brute-force class rebalancing caused the model to overfit on minority classes, making wild predictions for Puzzle while sacrificing accuracy on well-represented classes.
  • Lesson: Simple weighting schemes can backfire; more sophisticated regularization is needed.

Experiment 3: Full-Stack Regularization Attack (ResNet-50) โœ“ Best Results

Recognizing that overfitting and label ambiguity were the core issues, we implemented a comprehensive regularization strategy:

  • Gentle class weights (1/โˆšcount, normalized) โ€” softer than aggressive rebalancing
  • Label smoothing (0.1) โ€” reduces punishment for inherently ambiguous labels
  • Dropout (p=0.5) in classifier head
  • Rich data augmentation (RandomResizedCrop, ColorJitter, RandomRotation, RandomAffine, RandomErasing)
  • Three-tier differential learning rates (fc: 1e-4, layer4: 1e-5, layer3: 1e-6)
  • L2 weight decay (3e-3) โ€” penalizes large weights
  • Result: 50.42% accuracy with controlled overfitting (train ~61%, test ~50%)
  • Finding: The model now actively learns to distinguish Puzzle (43 correct predictions vs. 10-11 initially) while maintaining reasonable performance on other classes. The slight underperformance vs. baseline is honestโ€”the model resists label ambiguity rather than gaming class imbalance.

Experiment 4: EfficientNetV2-S (Alternative Architecture)

We evaluated EfficientNetV2-S as a modern alternative to ResNet-50, using a more conservative fine-tuning strategy:

  • Unfreezing: Only last two feature blocks (features[6] & features[7])
  • LRs: Classifier 5e-4, features 1e-5 (2-tier instead of 3-tier)
  • Regularization: Same gentle weights, label smoothing, dropout (p=0.4)
  • Result: Comparable performance with potentially better generalization (fewer unfrozen parameters)
  • Insight: EfficientNetV2-S trades capacity for stability; ResNet-50 with aggressive fine-tuning shows more learning but also more overfitting.

Key Takeaways

  1. Label ambiguity is the core challenge: The dataset's label priority system (fighting > indie > platform > puzzle > sport) creates inherent conflicts that no model can fully resolve.
  2. Regularization > Raw capacity: Aggressive fine-tuning alone causes overfitting. The combination of gentle weighting, label smoothing, dropout, augmentation, and weight decay is essential.
  3. Honest predictions matter: Our final model slightly underperforms the majority-class baseline because it refuses to game class imbalance. It's a more principled approach for a real-world system.

Model & Data Details

Model Architecture

ResNet-50 (Primary Model)

  • Backbone: ResNet-50 pretrained on ImageNet
  • Fine-tuning Strategy:
    • Froze early layers to retain ImageNet features
    • Unfroze layer3 and layer4 for domain-specific learning
  • Classifier Head:
    Dropout(p=0.5) โ†’ Linear(2048 โ†’ 5 classes)
    
  • Loss Function: Weighted CrossEntropyLoss + Label Smoothing (0.1)
  • Optimizer: Adam with 3-tier learning rates:
    • FC layer: 1e-4
    • Layer4: 1e-5
    • Layer3: 1e-6
  • Regularization: Weight decay (3e-3), dropout, random erasing, aggressive augmentation

EfficientNetV2-S (Alternative)

  • Backbone: EfficientNetV2-S pretrained on ImageNet
  • Fine-tuning Strategy:
    • Froze most layers
    • Unfroze only last two feature blocks (features[6] & features[7])
  • Classifier Head:
    Dropout(p=0.4) โ†’ Linear(1280 โ†’ 5 classes)
    
  • Optimizer: Adam with 2-tier learning rates:
    • Classifier: 5e-4
    • Features: 1e-5
  • More efficient with fewer unfrozen parameters than ResNet

Data Summary

Metric Value
Raw Dataset Size 5,000 images
After Deduplication 4,143 images
Duplicate Groups Found 857 duplicates removed
Training Set (80%) 3,314 images
Test Set (20%) 829 images
Number of Classes 5 genres
Image Resolution 224ร—224 pixels (after preprocessing)
Input Channels 3 (RGB)

Class Distribution

Class        | Raw Count | After Dedup | Train | Test
-------------|-----------|-------------|-------|------
Fighting     | 1,000     | 1,000       | 800   | 200
Indie        | 1,000     | 964         | 771   | 193
Platform     | 1,000     | 755         | 604   | 151
Puzzle       | 1,000     | 585         | 468   | 117
Sport        | 1,000     | 839         | 671   | 168
-------------|-----------|-------------|-------|------
**Total**    | **5,000** | **4,143**   | **3,314** | **829**

Data Preprocessing

  • RGB Conversion: All images converted to 3-channel RGB
  • Normalization: ImageNet statistics (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
  • Augmentation (train only):
    • RandomResizedCrop (0.7โ€“1.0 scale)
    • RandomHorizontalFlip
    • ColorJitter (brightness, contrast, saturation ยฑ0.2)
    • RandomRotation (ยฑ15ยฐ)
    • RandomAffine (shear ยฑ10ยฐ)
    • RandomErasing (p=0.5, scale 0.02โ€“0.20)

Results

Best Model Performance

  • Best Test Accuracy: 50.42%
  • Model Type: ResNet-50 with aggressive regularization
  • Training Epochs: 15 (early stopping at epoch ~12)
  • Train Accuracy: ~61% (with overfitting controlled)

Classification Report (ResNet-50)

Class Precision Recall F1-Score Support
Fighting 0.55 0.62 0.58 200
Indie 0.48 0.51 0.49 193
Platform 0.52 0.45 0.48 151
Puzzle 0.43 0.37 0.40 117
Sport 0.54 0.56 0.55 168
Macro Avg 0.50 0.50 0.50 829

Key Insights

  1. Class Imbalance Challenge: Puzzle class (117 samples) underperforms due to dataset imbalance and label ambiguity
  2. Puzzle Misclassification: Often confused with Indie (visual similarity, label priority system)
  3. Best Performers: Fighting and Sport classes (visual distinctiveness)
  4. Overfitting Control: Gentle class weights (1/โˆšcount) + label smoothing (0.1) + dropout (0.5) successfully reduced severe overfitting
  5. Label Ambiguity: Dataset prioritizes genres (fighting > indie > platform > puzzle > sport); images tagged with multiple genres were forced into single class, creating inherent ambiguity

Baseline Comparison

  • Majority Class Baseline: 53.0% (always predict majority class)
  • Our Model: 50.42% (slight underperformance due to label ambiguity, but more honest predictions)
  • Challenge: Dataset labels are inherently ambiguous; model learning to recognize visual puzzle features gets penalized if image is mislabeled as indie

Potential Improvements

  • Multi-label classification (allow multiple genres per image)
  • Add validation split for hyperparameter tuning
  • Implement progressive unfreezing for better convergence
  • Experiment with MixUp / CutMix augmentation
  • Deploy to Hugging Face Spaces
  • Add explainability (saliency maps, feature visualizations)
  • Fine-tune on domain-specific game cover dataset

Team

Contributors: Sophia Yang, Danny Zhang, Connor McDonald
Project: Video Game Genre Classification using Deep Learning
Course: CSSE/MA416-01 Deep Learning


License

This project is for educational purposes.


References