How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sayan01/Llama-Flan-XL2base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sayan01/Llama-Flan-XL2base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker
docker model run hf.co/Sayan01/Llama-Flan-XL2base
Quick Links

This is a 230M parameter Small Llama model distilled from the Original one. The model is distilled on OpenOrca's FLAN dataset. The distillation ran over 160000 random samples of FLAN dataset. It is free to download. Also, it is a work in progress, so please use it at your own risk

Downloads last month
772
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Sayan01/Llama-Flan-XL2base