Instructions to use JosephusCheung/Guanaco with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use JosephusCheung/Guanaco with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="JosephusCheung/Guanaco")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("JosephusCheung/Guanaco")
model = AutoModelForCausalLM.from_pretrained("JosephusCheung/Guanaco")

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use JosephusCheung/Guanaco with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "JosephusCheung/Guanaco"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JosephusCheung/Guanaco",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/JosephusCheung/Guanaco

SGLang

How to use JosephusCheung/Guanaco with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "JosephusCheung/Guanaco" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JosephusCheung/Guanaco",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "JosephusCheung/Guanaco" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "JosephusCheung/Guanaco",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use JosephusCheung/Guanaco with Docker Model Runner:
```
docker model run hf.co/JosephusCheung/Guanaco
```

JosephusCheung commited on May 29, 2023

Commit

bed6f3b

1 Parent(s): e044a62

Update README.md

Browse files

Files changed (1) hide show

README.md +17 -1

README.md CHANGED Viewed

@@ -87,4 +87,20 @@ New Assistant Answer
 It is important to remember that Guanaco is a 7B-parameter model, and **any knowledge-based content should be considered potentially inaccurate**. We strongly recommend **providing verifiable sources in System Prompt, such as Wikipedia, for knowledge-based answers**. In the absence of sources, it is crucial to inform users of this limitation to prevent the dissemination of false information and to maintain transparency.
-Due to the differences in the format between this project and [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), please refer to *Guanaco-lora: LoRA for training Multilingual Instruction-following LM based on LLaMA* (https://github.com/KohakuBlueleaf/guanaco-lora) for further training and inference our models.

 It is important to remember that Guanaco is a 7B-parameter model, and **any knowledge-based content should be considered potentially inaccurate**. We strongly recommend **providing verifiable sources in System Prompt, such as Wikipedia, for knowledge-based answers**. In the absence of sources, it is crucial to inform users of this limitation to prevent the dissemination of false information and to maintain transparency.
+Due to the differences in the format between this project and [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), please refer to *Guanaco-lora: LoRA for training Multilingual Instruction-following LM based on LLaMA* (https://github.com/KohakuBlueleaf/guanaco-lora) for further training and inference our models.
+## Recent News
+We've noticed a recent entrant in the field, the QLoRa method, which we find concerning due to its attempt to piggyback on the reputation of Guanaco. We strongly disapprove of such practices. QLoRa, as far as we can tell, lacks mathematical robustness and its performance significantly trails behind that of GPTQ and advancements such as PEFT fine-tuning, which have been successful in improving upon it.
+Guanaco has been diligent, consistently releasing multilingual datasets since March 2023, along with publishing weights that are not only an enhanced version of GPTQ but also support multimodal VQA and have been optimized for 4-bit. Despite the substantial financial investment of tens of thousands of dollars in distilling data from OpenAI's GPT models, we still consider these efforts to be incremental.
+We, however, aim to move beyond the incremental:
+1. We strive to no longer rely on distillation data from OpenAI: We've found that relying on GPT-generated data impedes significant breakthroughs. Furthermore, this approach has proven to be disastrous when dealing with the imbalances in multilingual tasks.
+2. We're focusing on the enhancement of quantization structure and partial native 4-bit fine-tuning: We are deeply appreciative of the GPTQ-Llama project for paving the way in state-of-the-art LLM quantization. Its unique qualities, especially at the 7B size, are facilitating significant progress in multilingual and multimodal tasks.
+3. We plan to utilize visual data to adjust our language models: We believe this will fundamentally address the issues of language imbalance, translation inaccuracies, and the lack of graphical logic in LLM.
+While our work is still in the early stages, we're determined to break new ground in these areas. Our critique of QLoRa's practices does not stem from animosity but rather from the fundamental belief that innovation should be rooted in originality, integrity, and substantial progress.