Instructions to use mlabonne/NeuralMarcoro14-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mlabonne/NeuralMarcoro14-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mlabonne/NeuralMarcoro14-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mlabonne/NeuralMarcoro14-7B") model = AutoModelForCausalLM.from_pretrained("mlabonne/NeuralMarcoro14-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use mlabonne/NeuralMarcoro14-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mlabonne/NeuralMarcoro14-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlabonne/NeuralMarcoro14-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mlabonne/NeuralMarcoro14-7B
- SGLang
How to use mlabonne/NeuralMarcoro14-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mlabonne/NeuralMarcoro14-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlabonne/NeuralMarcoro14-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mlabonne/NeuralMarcoro14-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlabonne/NeuralMarcoro14-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mlabonne/NeuralMarcoro14-7B with Docker Model Runner:
docker model run hf.co/mlabonne/NeuralMarcoro14-7B
Congrats!
Great result, congrats!
Although I can't help but feel you used my methods here... (lol, joke had to be made I'm sorry)
Thanks for sharing!
I do wonder though: it seems like yours (whilst performing good overall, let there be no doubts about that) does see the steepest increase in performance in the GSM8K benchmark.
And as somebody rightfully pointed out on my model page: The intel neural chat data includes GSM8k, which is also part of the leaderboard test.
As you know im really new to all of this so I am actually not quite sure how big of a difference this would make and how much it would influence
- Benchmarking results
(and more importantly:)
- How that would translate to actual model performance versus expected performance based on the benchmarking results.
Could you chime in on that?
Would it make a substantial difference in either results or in relationship to actual model performance in a real scenario?
Isn't it data from the training split of GSM8k? I don't think that the neural chat data is contaminated (but I might be wrong). If it's really test data, it makes the dataset absolutely useless :(
I don't completely rely on the Open LLM Leaderboard and I use another benchmark suite (with https://github.com/mlabonne/llm-autoeval) for this purpose. It doesn't include GSM8k.