metadata
title: Zephyr 7B CyberSecurity Trainer
emoji: π
colorFrom: red
colorTo: yellow
sdk: docker
app_file: train.py
pinned: false
license: mit
Zephyr 7B CyberSecurity Fine-tuning
Fine-tuning Zephyr 7B on a curated collection of cybersecurity datasets.
Overview
This project fine-tunes the Zephyr 7B model on 18 cybersecurity-focused datasets from the thelordofweb CyberSecurity collection, creating a specialized model for cybersecurity tasks.
Datasets Included
- AlicanKiraz0/All-CVE-Records-Training-Dataset
- AlicanKiraz0/Cybersecurity-Dataset-v1
- Bouquets/Cybersecurity-LLM-CVE
- CyberNative/CyberSecurityEval
- Mohabahmed03/Alpaca_Dataset_CyberSecurity_Smaller
- CyberNative/github_cybersecurity_READMEs
- AlicanKiraz0/Cybersecurity-Dataset-Heimdall-v1.1
- jcordon5/cybersecurity-rules
- Bouquets/DeepSeek-V3-Distill-Cybersecurity-en
- Seerene/cybersecurity_dataset
- ahmedds10/finetuning_alpaca_Cybersecurity
- Tiamz/cybersecurity-instruction-dataset
- OhWayTee/Cybersecurity-News_3
- Trendyol/All-CVE-Chat-MultiTurn-1999-2025-Dataset
- Vanessasml/cyber-reports-news-analysis-llama2-3k
- Vanessasml/cybersecurity_32k_instruction_input_output
- Vanessasml/enisa_cyber_news_dataset
- Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset
Training Configuration
- Base Model: HuggingFaceH4/zephyr-7b-beta
- Method: QLoRA (4-bit quantization)
- LoRA Config: r=16, alpha=32
- Epochs: 3
- Batch Size: 4 (per device)
- Gradient Accumulation: 4 steps
- Learning Rate: 2e-4
- Optimizer: paged_adamw_8bit
Running on Hugging Face Spaces
This training script is designed to run on Hugging Face Spaces with GPU support.
Requirements
- Hugging Face Space with GPU (A100 recommended)
- Write access token
Setup
- Create a new Space with GPU support
- Upload all files from this directory
- Set your HF_TOKEN as a Space secret
- Run the training script
Output
The fine-tuned model will be saved to: Jcalemcg/zephyr-7b-cybersecurity-finetuned
License
Follows the licensing of the base Zephyr 7B model and included datasets.