Spaces:

Jcalemcg
/

zephyr-cybersecurity-trainer

Paused

App Files Files Community

zephyr-cybersecurity-trainer / README.md

Jcalemcg

Upload README.md with huggingface_hub

29e4b66 verified 9 days ago

preview code

raw

history blame contribute delete

2.21 kB

metadata

title: Zephyr 7B CyberSecurity Trainer
emoji: 🔐
colorFrom: red
colorTo: yellow
sdk: docker
app_file: train.py
pinned: false
license: mit

Zephyr 7B CyberSecurity Fine-tuning

Fine-tuning Zephyr 7B on a curated collection of cybersecurity datasets.

Overview

This project fine-tunes the Zephyr 7B model on 18 cybersecurity-focused datasets from the thelordofweb CyberSecurity collection, creating a specialized model for cybersecurity tasks.

Datasets Included

AlicanKiraz0/All-CVE-Records-Training-Dataset
AlicanKiraz0/Cybersecurity-Dataset-v1
Bouquets/Cybersecurity-LLM-CVE
CyberNative/CyberSecurityEval
Mohabahmed03/Alpaca_Dataset_CyberSecurity_Smaller
CyberNative/github_cybersecurity_READMEs
AlicanKiraz0/Cybersecurity-Dataset-Heimdall-v1.1
jcordon5/cybersecurity-rules
Bouquets/DeepSeek-V3-Distill-Cybersecurity-en
Seerene/cybersecurity_dataset
ahmedds10/finetuning_alpaca_Cybersecurity
Tiamz/cybersecurity-instruction-dataset
OhWayTee/Cybersecurity-News_3
Trendyol/All-CVE-Chat-MultiTurn-1999-2025-Dataset
Vanessasml/cyber-reports-news-analysis-llama2-3k
Vanessasml/cybersecurity_32k_instruction_input_output
Vanessasml/enisa_cyber_news_dataset
Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset

Training Configuration

Base Model: HuggingFaceH4/zephyr-7b-beta
Method: QLoRA (4-bit quantization)
LoRA Config: r=16, alpha=32
Epochs: 3
Batch Size: 4 (per device)
Gradient Accumulation: 4 steps
Learning Rate: 2e-4
Optimizer: paged_adamw_8bit

Running on Hugging Face Spaces

This training script is designed to run on Hugging Face Spaces with GPU support.

Requirements

Hugging Face Space with GPU (A100 recommended)
Write access token

Setup

Create a new Space with GPU support
Upload all files from this directory
Set your HF_TOKEN as a Space secret
Run the training script

Output

The fine-tuned model will be saved to: Jcalemcg/zephyr-7b-cybersecurity-finetuned

License

Follows the licensing of the base Zephyr 7B model and included datasets.