--- license: apache-2.0 tags: - video-classification - action-recognition - ucf101 - pytorch - computer-vision datasets: - UCF-101 metrics: - accuracy - f1 model-index: - name: mc3-18-ucf101 results: - task: type: video-classification name: Action Recognition dataset: name: UCF-101 type: ucf101 split: test metrics: - type: accuracy value: 87.05 name: Top-1 Accuracy - type: f1 value: 85.69 name: F1 Score language: - en --- [![🐙 GitHub](https://img.shields.io/badge/GitHub-Repository-181717?logo=github&logoColor=white&style=for-the-badge)](https://github.com/dronefreak/human-action-classification) [![📄 Paper: MC3](https://img.shields.io/badge/Paper-MC3-2EA44F?logo=arxiv&logoColor=white&style=for-the-badge)](https://arxiv.org/abs/1711.11248) [![💽 Dataset: UCF-101](https://img.shields.io/badge/Dataset-UCF--101-34aa44?logo=database&logoColor=white&style=for-the-badge)](https://www.crcv.ucf.edu/data/UCF101.php) [![⚖️ License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-green?logo=open-source-initiative&logoColor=white&style=for-the-badge)](https://opensource.org/licenses/Apache-2.0) ![Demo](demo.gif) # MC3-18 for UCF-101 Action Recognition ## Model Summary This model is an MC3-18 (Mixed 3D Convolutions) network fine-tuned on the UCF-101 dataset for human action recognition. The architecture combines 2D and 3D convolutions, delivering an efficient temporal-spatial representation while maintaining a lightweight parameter count. - **Architecture:** MC3-18 (3D CNN with mixed convolutions) - **Pretraining:** Kinetics-400 - **Parameter Count:** ~11.7M - **Input Format:** 16-frame clips, 112×112 spatial resolution - **Number of Classes:** 101 --- ## Intended Use **Primary use case:** Action classification in short, trimmed videos similar in distribution to UCF-101. **Users:** Researchers, practitioners, and engineers working on video-understanding pipelines. **Tasks:** - Action recognition - Clip-level human activity tagging - Baseline modeling for low-compute video applications Not suitable for long-horizon temporal reasoning or untrimmed video detection without adaptation. --- ## Performance ### Quantitative Results (UCF-101 Split 1, Test Set) | Metric | Value | |-------------|----------| | Accuracy | 87.05% | | F1 Score | 0.857 | | Precision | 0.868 | ### Comparison to Published Baseline - **Original MC3-18 (Kinetics-400 → UCF-101):** 85.0% - **This model:** **87.05%** (+2.05%) --- ## How to Use ### Inference Example (PyTorch) ```python import torch # Load from HuggingFace from huggingface_hub import hf_hub_download from torchvision.transforms import Compose, Resize, CenterCrop, Normalize, ToTensor model_path = hf_hub_download(repo_id="dronefreak/mc3-18-ucf101", filename="mc318-ufc101-split-1.pth") model = torch.load(model_path) # Prepare video (16 frames, C×T×H×W) transform = Compose([ Resize((128, 171)), CenterCrop(112), ToTensor(), Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]) ]) # Inference with torch.no_grad(): output = model(video_tensor) prediction = output.argmax(dim=1) ``` ## Training - **Dataset:** UCF-101 Split 1 (9,537 train / 3,783 test videos) - **Epochs:** 200 - **Batch Size:** 32 - **Optimizer:** SGD (lr=0.001, momentum=0.9, weight_decay=1e-4) - **Augmentation:** ColorJitter, RandomHorizontalFlip, RandomCrop ## Limitations - Trained only on UCF-101 (limited to 101 action classes) - Requires 16-frame clips (not suitable for real-time single-frame) - Best performance on similar action types to UCF-101 ## Citation ```bibtex @misc{mc3_18_ucf101, author = {Saumya Saksena}, title = {MC3-18 for UCF-101 Action Recognition}, year = {2024}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/dronefreak/mc3-18-ucf101}} } ``` ## License Apache-2.0