Instructions to use HuggingFaceH4/zephyr-7b-alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceH4/zephyr-7b-alpha with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-alpha") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha") model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-alpha") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HuggingFaceH4/zephyr-7b-alpha with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceH4/zephyr-7b-alpha" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceH4/zephyr-7b-alpha", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/HuggingFaceH4/zephyr-7b-alpha
- SGLang
How to use HuggingFaceH4/zephyr-7b-alpha with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceH4/zephyr-7b-alpha" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceH4/zephyr-7b-alpha", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceH4/zephyr-7b-alpha" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceH4/zephyr-7b-alpha", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use HuggingFaceH4/zephyr-7b-alpha with Docker Model Runner:
docker model run hf.co/HuggingFaceH4/zephyr-7b-alpha
This license is NonCommercial, can we not have Apache2.0 like Mistral was?
As a SWE I know the common licenses Apache/MIT/GPL/etc, and know how I can/can't incorporate them, but I don't know why this is using a CC license, and I cannot use this for anything unfortunately. Reading through it, it's pretty harsh actually, and I would have guessed this finetune would have been released in the same spirit as Mistral, ie Apache.
Any chance for Apache?
We opted for the NC license in order to comply with the license from one of the source datasets (UltraChat https://huggingface.co/datasets/stingning/ultrachat). If the dataset owners are happy to change the license to a permissive one, then it would likely be fine to update the license of this model as well since UltraFeedback is MIT licensed.
https://huggingface.co/datasets/stingning/ultrachat/discussions/3#65278c58bc018e940257de29
@lewtun UltraChat is now MIT licensed.
Best regards,
Lino
a License Change for Commercial use would be huge! Id love to use the model!
(I am not a lawyer and this is not legal advice).
The MIT licene is great.
However, if models are copyright-worthy at all (which is an open question, and I think the answer is and should be "no"), it's likely that this model is a derivative of Mistral, while it's not necessarily the case that it's a derivative of training data (which would be a problem anyways, since that'd include Mistral's training data).
If the user is bound to the MIT license, they are likely bound to the Apache 2.0 license (from Mistral) too anyways, so I think the Apache 2.0 license would actually make more sense, for this repo, than the MIT license (even if some of the data is MIT-licensed).
Curious, since UltraChat is synthetic data generated by ChatGPT by OpenAI, wouldn't it violate 2(c)(iii) of their terms of use if this model was somehow used for production for commercial purposes?
(iii) use output from the Services to develop models that compete with OpenAI
(not a lawyer, just a naive dev asking. Love the model btw)
@alexweberk I'm not sure if it's even relevant unless you are a client of OpenAI. ChatGPT's TOS aren't a law. What right would they have to restrict this model, regardless of what they say in the TOS? Using an argument from copyright would require a copyright-maximalist take that would not benefit what OpenAI needs to do to train models.
Thanks @Aspie96 ! I would love to think that way, and you raise great points. Still curious what the implications are if you were to use this model for business use cases (I hope there are none).
I hope so too.
As I said, I am not a lawyer, and also there are multiple jurisdictions in the world. Regardless, I don't see how a mere user of this model, commercial or not, would be bound by OpenAI's TOS.
@lewtun thank you for the model! A quick question, since UltraFeedback is using commercial models (including Bard) and Llama2 series (which have their own license), I am still a bit confused on how the model (or the dataset for matter) can be issued under MIT. Thanks in advance!
Reference: https://github.com/OpenBMB/UltraFeedback#dataset-construction