Instructions to use ASU-GSL/Qwen-Audio-AHA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ASU-GSL/Qwen-Audio-AHA with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Omni-7B") model = PeftModel.from_pretrained(base_model, "ASU-GSL/Qwen-Audio-AHA") - Notebooks
- Google Colab
- Kaggle
Add pipeline tag, library name, and improve model card
Browse filesHi! I'm Niels from the Hugging Face community science team. I've opened this PR to improve the model card for Qwen-Audio-AHA:
- Added `pipeline_tag: audio-text-to-text` to ensure the model is correctly categorized on the Hub.
- Added `library_name: peft` to identify the framework used for this LoRA adapter.
- Linked the model to the associated paper and GitHub repository for better visibility.
- Updated the sample usage with a more complete snippet from your GitHub README, showing how to perform inference.
README.md
CHANGED
|
@@ -1,34 +1,43 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
base_model: Qwen/Qwen2.5-Omni-7B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
tags:
|
| 5 |
- lora
|
| 6 |
- qwen2.5-omni
|
| 7 |
- multimodal
|
| 8 |
- audio
|
| 9 |
-
datasets:
|
| 10 |
-
- ASU-GSL/AHA
|
| 11 |
---
|
| 12 |
|
| 13 |
-
#
|
|
|
|
|
|
|
| 14 |
|
| 15 |
## Model Description
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
| 19 |
|
| 20 |
## Intended Use
|
| 21 |
-
- **Primary Task:** Audio reasoning.
|
| 22 |
-
- **Languages Supported:** All languages supported by Qwen2.5-Omni-7B.
|
| 23 |
|
| 24 |
-
##
|
| 25 |
-
|
|
|
|
| 26 |
|
| 27 |
```python
|
|
|
|
|
|
|
| 28 |
from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
|
| 29 |
from peft import PeftModel
|
| 30 |
-
import torch
|
| 31 |
|
|
|
|
| 32 |
model_id = "Qwen/Qwen2.5-Omni-7B"
|
| 33 |
adapter_id = "ASU-GSL/Qwen-Audio-AHA"
|
| 34 |
|
|
@@ -40,9 +49,21 @@ model = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained(
|
|
| 40 |
|
| 41 |
# Load LoRA adapter
|
| 42 |
model = PeftModel.from_pretrained(model, adapter_id)
|
| 43 |
-
```
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
```
|
|
|
|
|
|
|
|
|
|
| 46 |
@article{chen2025aha,
|
| 47 |
title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
|
| 48 |
author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model: Qwen/Qwen2.5-Omni-7B
|
| 3 |
+
datasets:
|
| 4 |
+
- ASU-GSL/AHA
|
| 5 |
+
library_name: peft
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: audio-text-to-text
|
| 8 |
tags:
|
| 9 |
- lora
|
| 10 |
- qwen2.5-omni
|
| 11 |
- multimodal
|
| 12 |
- audio
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# Qwen-Audio-AHA (LoRA Adapter)
|
| 16 |
+
|
| 17 |
+
This repository contains the official LoRA adapter for **Qwen2.5-Omni-7B** (Thinker), fine-tuned using the **AHA (Audio Hallucination Alignment)** framework.
|
| 18 |
|
| 19 |
## Model Description
|
| 20 |
+
AHA is a framework designed to mitigate hallucinations in Large Audio-Language Models (LALMs) by focusing on fine-grained temporal reasoning and counterfactual alignment. By leveraging counterfactual hard negative mining, the pipeline constructs high-quality preference data that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications.
|
| 21 |
|
| 22 |
+
- **Paper:** [AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives](https://huggingface.co/papers/2512.24052)
|
| 23 |
+
- **GitHub Repository:** [https://github.com/LLM-VLM-GSL/AHA](https://github.com/LLM-VLM-GSL/AHA)
|
| 24 |
+
- **Base Model:** [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)
|
| 25 |
|
| 26 |
## Intended Use
|
| 27 |
+
- **Primary Task:** Audio reasoning and reducing hallucinations in audio-to-text tasks.
|
| 28 |
+
- **Languages Supported:** All languages supported by the base Qwen2.5-Omni-7B model.
|
| 29 |
|
| 30 |
+
## Sample Usage
|
| 31 |
+
|
| 32 |
+
You can load this model using the `peft` and `transformers` libraries. Note that `librosa` is required for audio loading in this example.
|
| 33 |
|
| 34 |
```python
|
| 35 |
+
import torch
|
| 36 |
+
import librosa
|
| 37 |
from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
|
| 38 |
from peft import PeftModel
|
|
|
|
| 39 |
|
| 40 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 41 |
model_id = "Qwen/Qwen2.5-Omni-7B"
|
| 42 |
adapter_id = "ASU-GSL/Qwen-Audio-AHA"
|
| 43 |
|
|
|
|
| 49 |
|
| 50 |
# Load LoRA adapter
|
| 51 |
model = PeftModel.from_pretrained(model, adapter_id)
|
|
|
|
| 52 |
|
| 53 |
+
# Load Audio
|
| 54 |
+
# Replace "example.wav" with the path to your audio file
|
| 55 |
+
audio, _ = librosa.load("example.wav", sr=processor.feature_extractor.sampling_rate)
|
| 56 |
+
prompt = "<|audio|>
|
| 57 |
+
Describe the temporal order of events in this audio."
|
| 58 |
+
inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device)
|
| 59 |
+
|
| 60 |
+
# Generate
|
| 61 |
+
generate_ids = model.generate(**inputs, max_new_tokens=256)
|
| 62 |
+
print(processor.batch_decode(generate_ids, skip_special_tokens=True)[0])
|
| 63 |
```
|
| 64 |
+
|
| 65 |
+
## Citation
|
| 66 |
+
```bibtex
|
| 67 |
@article{chen2025aha,
|
| 68 |
title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
|
| 69 |
author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},
|