Instructions to use JosephusCheung/Guanaco with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use JosephusCheung/Guanaco with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="JosephusCheung/Guanaco") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("JosephusCheung/Guanaco") model = AutoModelForCausalLM.from_pretrained("JosephusCheung/Guanaco") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use JosephusCheung/Guanaco with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "JosephusCheung/Guanaco" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JosephusCheung/Guanaco", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/JosephusCheung/Guanaco
- SGLang
How to use JosephusCheung/Guanaco with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "JosephusCheung/Guanaco" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JosephusCheung/Guanaco", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "JosephusCheung/Guanaco" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JosephusCheung/Guanaco", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use JosephusCheung/Guanaco with Docker Model Runner:
docker model run hf.co/JosephusCheung/Guanaco
Commit ·
bed6f3b
1
Parent(s): e044a62
Update README.md
Browse files
README.md
CHANGED
|
@@ -87,4 +87,20 @@ New Assistant Answer
|
|
| 87 |
|
| 88 |
It is important to remember that Guanaco is a 7B-parameter model, and **any knowledge-based content should be considered potentially inaccurate**. We strongly recommend **providing verifiable sources in System Prompt, such as Wikipedia, for knowledge-based answers**. In the absence of sources, it is crucial to inform users of this limitation to prevent the dissemination of false information and to maintain transparency.
|
| 89 |
|
| 90 |
-
Due to the differences in the format between this project and [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), please refer to *Guanaco-lora: LoRA for training Multilingual Instruction-following LM based on LLaMA* (https://github.com/KohakuBlueleaf/guanaco-lora) for further training and inference our models.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
It is important to remember that Guanaco is a 7B-parameter model, and **any knowledge-based content should be considered potentially inaccurate**. We strongly recommend **providing verifiable sources in System Prompt, such as Wikipedia, for knowledge-based answers**. In the absence of sources, it is crucial to inform users of this limitation to prevent the dissemination of false information and to maintain transparency.
|
| 89 |
|
| 90 |
+
Due to the differences in the format between this project and [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), please refer to *Guanaco-lora: LoRA for training Multilingual Instruction-following LM based on LLaMA* (https://github.com/KohakuBlueleaf/guanaco-lora) for further training and inference our models.
|
| 91 |
+
|
| 92 |
+
## Recent News
|
| 93 |
+
|
| 94 |
+
We've noticed a recent entrant in the field, the QLoRa method, which we find concerning due to its attempt to piggyback on the reputation of Guanaco. We strongly disapprove of such practices. QLoRa, as far as we can tell, lacks mathematical robustness and its performance significantly trails behind that of GPTQ and advancements such as PEFT fine-tuning, which have been successful in improving upon it.
|
| 95 |
+
|
| 96 |
+
Guanaco has been diligent, consistently releasing multilingual datasets since March 2023, along with publishing weights that are not only an enhanced version of GPTQ but also support multimodal VQA and have been optimized for 4-bit. Despite the substantial financial investment of tens of thousands of dollars in distilling data from OpenAI's GPT models, we still consider these efforts to be incremental.
|
| 97 |
+
|
| 98 |
+
We, however, aim to move beyond the incremental:
|
| 99 |
+
|
| 100 |
+
1. We strive to no longer rely on distillation data from OpenAI: We've found that relying on GPT-generated data impedes significant breakthroughs. Furthermore, this approach has proven to be disastrous when dealing with the imbalances in multilingual tasks.
|
| 101 |
+
|
| 102 |
+
2. We're focusing on the enhancement of quantization structure and partial native 4-bit fine-tuning: We are deeply appreciative of the GPTQ-Llama project for paving the way in state-of-the-art LLM quantization. Its unique qualities, especially at the 7B size, are facilitating significant progress in multilingual and multimodal tasks.
|
| 103 |
+
|
| 104 |
+
3. We plan to utilize visual data to adjust our language models: We believe this will fundamentally address the issues of language imbalance, translation inaccuracies, and the lack of graphical logic in LLM.
|
| 105 |
+
|
| 106 |
+
While our work is still in the early stages, we're determined to break new ground in these areas. Our critique of QLoRa's practices does not stem from animosity but rather from the fundamental belief that innovation should be rooted in originality, integrity, and substantial progress.
|