Instructions to use tiiuae/falcon-180B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiiuae/falcon-180B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tiiuae/falcon-180B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-180B") model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-180B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tiiuae/falcon-180B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tiiuae/falcon-180B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-180B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tiiuae/falcon-180B
- SGLang
How to use tiiuae/falcon-180B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-180B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-180B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-180B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-180B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tiiuae/falcon-180B with Docker Model Runner:
docker model run hf.co/tiiuae/falcon-180B
I think the Falcon-180B web app is having a stroke
I just tried the open web app that was made openly available at https://huggingface.co/blog/falcon-180b#demo
Sadly, it seems to just be outputting nonsense. Here is a very simple example of what it produces based on the example prompts:
I was trying to use it to draw some information from a rather large JSON document, which worked perfectly fine with the other, smaller Falcon LLMs, but for this web app, the output seems downright bad. It does not even make sense. Maybe there is something that broke?
For an industry project, it would be of great interest to test this model, but right now I have no easy way to test it.
Have you tried this web app lately?
Just observed that the web app is hosted in a separate space here:
https://huggingface.co/spaces/tiiuae/falcon-180b-demo/
There also seems to be multiple people reporting the same (see here).
So perhaps we could move the discussion there instead.
Might be an idea to keep this Discussion open, until the problem has been resolved, to avoid others making the same thread here.
