Instructions to use anysecret-io/anysecret-assistant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use anysecret-io/anysecret-assistant with PEFT:
Task type is invalid.
- Transformers
How to use anysecret-io/anysecret-assistant with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="anysecret-io/anysecret-assistant") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("anysecret-io/anysecret-assistant", dtype="auto") - llama-cpp-python
How to use anysecret-io/anysecret-assistant with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="anysecret-io/anysecret-assistant", filename="13B-GGUF/anysecret-assistant-13B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use anysecret-io/anysecret-assistant with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf anysecret-io/anysecret-assistant:Q4_K_M # Run inference directly in the terminal: llama-cli -hf anysecret-io/anysecret-assistant:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf anysecret-io/anysecret-assistant:Q4_K_M # Run inference directly in the terminal: llama-cli -hf anysecret-io/anysecret-assistant:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf anysecret-io/anysecret-assistant:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf anysecret-io/anysecret-assistant:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf anysecret-io/anysecret-assistant:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf anysecret-io/anysecret-assistant:Q4_K_M
Use Docker
docker model run hf.co/anysecret-io/anysecret-assistant:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use anysecret-io/anysecret-assistant with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "anysecret-io/anysecret-assistant" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anysecret-io/anysecret-assistant", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/anysecret-io/anysecret-assistant:Q4_K_M
- SGLang
How to use anysecret-io/anysecret-assistant with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "anysecret-io/anysecret-assistant" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anysecret-io/anysecret-assistant", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "anysecret-io/anysecret-assistant" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anysecret-io/anysecret-assistant", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use anysecret-io/anysecret-assistant with Ollama:
ollama run hf.co/anysecret-io/anysecret-assistant:Q4_K_M
- Unsloth Studio new
How to use anysecret-io/anysecret-assistant with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anysecret-io/anysecret-assistant to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anysecret-io/anysecret-assistant to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for anysecret-io/anysecret-assistant to start chatting
- Docker Model Runner
How to use anysecret-io/anysecret-assistant with Docker Model Runner:
docker model run hf.co/anysecret-io/anysecret-assistant:Q4_K_M
- Lemonade
How to use anysecret-io/anysecret-assistant with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull anysecret-io/anysecret-assistant:Q4_K_M
Run and chat with the model
lemonade run user.anysecret-assistant-Q4_K_M
List all available models
lemonade list
AnySecret Assistant - Multi-Model Collection
A specialized AI assistant collection for AnySecret configuration management, available in multiple sizes and formats optimized for different use cases and deployment scenarios.
π Available Models
| Model | Base Model | Parameters | Format | Best For | Memory |
|---|---|---|---|---|---|
| 3B | Llama-3.2-3B-Instruct | 3B | PyTorch/GGUF | Fast responses, edge deployment | 4-6GB |
| 7B | CodeLlama-7B-Instruct | 7B | PyTorch/GGUF | Balanced performance, code focus | 8-12GB |
| 13B | CodeLlama-13B-Instruct | 13B | PyTorch/GGUF | Highest quality, complex queries | 16-24GB |
Model Variants
PyTorch Models (LoRA Adapters)
anysecret-io/anysecret-assistant/3B/- Llama-3.2-3B baseanysecret-io/anysecret-assistant/7B/- CodeLlama-7B baseanysecret-io/anysecret-assistant/13B/- CodeLlama-13B base
GGUF Models (Quantized)
anysecret-io/anysecret-assistant/3B-GGUF/- Q4_K_M, Q8_0 formatsanysecret-io/anysecret-assistant/7B-GGUF/- Q4_K_M, Q8_0 formatsanysecret-io/anysecret-assistant/13B-GGUF/- Q4_K_M, Q8_0 formats
π― Model Description
These models are fine-tuned specifically to assist with AnySecret configuration management across AWS, GCP, Azure, and Kubernetes environments. Each model can help with CLI commands, configuration setup, CI/CD integration, and Python SDK usage.
- Developed by: anysecret-io
- Model type: Causal Language Model (LoRA Adapters + GGUF)
- Language(s): English
- License: MIT
- Specialized for: Multi-cloud secrets and configuration management
π¦ Quick Start
Option 1: Using Ollama (Recommended for GGUF)
# 7B model (balanced performance)
ollama pull anysecret-io/anysecret-assistant/7B-GGUF
ollama run anysecret-io/anysecret-assistant/7B-GGUF
# 13B model (best quality)
ollama pull anysecret-io/anysecret-assistant/13B-GGUF
ollama run anysecret-io/anysecret-assistant/13B-GGUF
Option 2: Using Transformers (PyTorch)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Choose your model size (3B/7B/13B)
model_size = "7B" # or "3B", "13B"
base_models = {
"3B": "meta-llama/Llama-3.2-3B-Instruct",
"7B": "codellama/CodeLlama-7b-Instruct-hf",
"13B": "codellama/CodeLlama-13b-Instruct-hf"
}
base_model_name = base_models[model_size]
adapter_path = f"anysecret-io/anysecret-assistant/{model_size}"
# Load model
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_path)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Generate response
def ask_anysecret(question):
prompt = f"### Instruction:\n{question}\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("### Response:\n")[-1].strip()
# Example usage
print(ask_anysecret("How do I configure AnySecret for AWS?"))
Option 3: Using llama.cpp (GGUF)
# Download GGUF model
wget https://huggingface.co/anysecret-io/anysecret-assistant/resolve/main/7B-GGUF/anysecret-7b-q4_k_m.gguf
# Run with llama.cpp
./llama-server -m anysecret-7b-q4_k_m.gguf --port 8080
π― Use Cases
Direct Use
All models are designed to provide expert assistance with:
- AnySecret CLI - Commands, usage patterns, troubleshooting
- Multi-cloud Configuration - AWS Secrets Manager, GCP Secret Manager, Azure Key Vault
- Kubernetes Integration - Secrets, ConfigMaps, operators
- CI/CD Pipelines - GitHub Actions, Jenkins, GitLab CI
- Python SDK - Implementation guidance, best practices
- Security Patterns - Secret rotation, access controls, compliance
Example Queries
"How do I set up AnySecret with AWS Secrets Manager?"
"Show me how to use anysecret in a GitHub Actions workflow"
"How do I rotate secrets across multiple cloud providers?"
"What's the difference between storing secrets vs parameters?"
"How do I configure AnySecret for a Kubernetes deployment?"
ποΈ Training Details
Training Data
Models were trained on 150+ curated examples across 7 categories:
- CLI Commands (25 examples) - Command usage and patterns
- AWS Configuration (25 examples) - Secrets Manager integration
- GCP Configuration (25 examples) - Secret Manager setup
- Azure Configuration (25 examples) - Key Vault integration
- Kubernetes (25 examples) - Secrets and ConfigMaps
- CI/CD Integration (15 examples) - Pipeline workflows
- Python Integration (10 examples) - SDK usage patterns
Training Configuration
Hyperparameters
- LoRA Rank: 16
- LoRA Alpha: 32
- Learning Rate: 2e-4
- Batch Size: 1 (with gradient accumulation)
- Epochs: 2-3
- Precision: fp16 mixed precision with 4-bit quantization
Target Modules
- Llama-3.2 (3B): q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- CodeLlama (7B/13B): q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
π§ Model Selection Guide
Choose 3B if you need:
- β Fast inference (< 1 second)
- β Low memory usage (4-6GB)
- β Edge deployment
- β Basic AnySecret queries
Choose 7B if you need:
- β Balanced performance/speed
- β Better code understanding
- β Moderate memory (8-12GB)
- β Complex configuration queries
Choose 13B if you need:
- β Highest quality responses
- β Complex multi-step guidance
- β Advanced troubleshooting
- β Production deployments
π Deployment Options
Local Development
- GGUF + Ollama: Easiest setup, good performance
- PyTorch + GPU: Best quality, requires CUDA
Production Deployment
- Docker + llama.cpp: Scalable, CPU/GPU support
- Kubernetes: Auto-scaling, load balancing
- Cloud APIs: Serverless, pay-per-use
Memory Requirements
| Model | GGUF Q4_K_M | GGUF Q8_0 | PyTorch FP16 |
|---|---|---|---|
| 3B | 2.3GB | 3.2GB | 6GB |
| 7B | 4.1GB | 7.2GB | 14GB |
| 13B | 7.8GB | 13.8GB | 26GB |
π Model Sources
- Repository: https://github.com/anysecret-io/anysecret-lib
- Documentation: https://docs.anysecret.io
- Training Code: https://github.com/anysecret-io/anysecret-llm
- Website: https://anysecret.io
π Framework Versions
- PEFT: 0.17.1+
- Transformers: 4.35.0+
- PyTorch: 2.0.0+
- llama.cpp: Latest
- Ollama: 0.1.0+
π Performance Benchmarks
| Model | Tokens/sec | Quality Score | Memory (GGUF Q4) |
|---|---|---|---|
| 3B | ~45 | 7.2/10 | 2.3GB |
| 7B | ~25 | 8.5/10 | 4.1GB |
| 13B | ~15 | 9.1/10 | 7.8GB |
Benchmarks run on RTX 3090 with GGUF Q4_K_M quantization
βοΈ License
MIT License - See individual model folders for specific license details.
For support, visit our GitHub Issues or Documentation.
- Downloads last month
- 7
4-bit
5-bit
8-bit
Model tree for anysecret-io/anysecret-assistant
Base model
codellama/CodeLlama-13b-Instruct-hf