Jcalemcg's picture
Upload README.md with huggingface_hub
29e4b66 verified
metadata
title: Zephyr 7B CyberSecurity Trainer
emoji: πŸ”
colorFrom: red
colorTo: yellow
sdk: docker
app_file: train.py
pinned: false
license: mit

Zephyr 7B CyberSecurity Fine-tuning

Fine-tuning Zephyr 7B on a curated collection of cybersecurity datasets.

Overview

This project fine-tunes the Zephyr 7B model on 18 cybersecurity-focused datasets from the thelordofweb CyberSecurity collection, creating a specialized model for cybersecurity tasks.

Datasets Included

  • AlicanKiraz0/All-CVE-Records-Training-Dataset
  • AlicanKiraz0/Cybersecurity-Dataset-v1
  • Bouquets/Cybersecurity-LLM-CVE
  • CyberNative/CyberSecurityEval
  • Mohabahmed03/Alpaca_Dataset_CyberSecurity_Smaller
  • CyberNative/github_cybersecurity_READMEs
  • AlicanKiraz0/Cybersecurity-Dataset-Heimdall-v1.1
  • jcordon5/cybersecurity-rules
  • Bouquets/DeepSeek-V3-Distill-Cybersecurity-en
  • Seerene/cybersecurity_dataset
  • ahmedds10/finetuning_alpaca_Cybersecurity
  • Tiamz/cybersecurity-instruction-dataset
  • OhWayTee/Cybersecurity-News_3
  • Trendyol/All-CVE-Chat-MultiTurn-1999-2025-Dataset
  • Vanessasml/cyber-reports-news-analysis-llama2-3k
  • Vanessasml/cybersecurity_32k_instruction_input_output
  • Vanessasml/enisa_cyber_news_dataset
  • Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset

Training Configuration

  • Base Model: HuggingFaceH4/zephyr-7b-beta
  • Method: QLoRA (4-bit quantization)
  • LoRA Config: r=16, alpha=32
  • Epochs: 3
  • Batch Size: 4 (per device)
  • Gradient Accumulation: 4 steps
  • Learning Rate: 2e-4
  • Optimizer: paged_adamw_8bit

Running on Hugging Face Spaces

This training script is designed to run on Hugging Face Spaces with GPU support.

Requirements

  • Hugging Face Space with GPU (A100 recommended)
  • Write access token

Setup

  1. Create a new Space with GPU support
  2. Upload all files from this directory
  3. Set your HF_TOKEN as a Space secret
  4. Run the training script

Output

The fine-tuned model will be saved to: Jcalemcg/zephyr-7b-cybersecurity-finetuned

License

Follows the licensing of the base Zephyr 7B model and included datasets.