nielsr HF Staff commited on
Commit
0a9ef4e
·
verified ·
1 Parent(s): 64b8c9e

Add pipeline tag, library name, and improve model card

Browse files

Hi! I'm Niels from the Hugging Face community science team. I've opened this PR to improve the model card for Qwen-Audio-AHA:
- Added `pipeline_tag: audio-text-to-text` to ensure the model is correctly categorized on the Hub.
- Added `library_name: peft` to identify the framework used for this LoRA adapter.
- Linked the model to the associated paper and GitHub repository for better visibility.
- Updated the sample usage with a more complete snippet from your GitHub README, showing how to perform inference.

Files changed (1) hide show
  1. README.md +33 -12
README.md CHANGED
@@ -1,34 +1,43 @@
1
  ---
2
- license: apache-2.0
3
  base_model: Qwen/Qwen2.5-Omni-7B
 
 
 
 
 
4
  tags:
5
  - lora
6
  - qwen2.5-omni
7
  - multimodal
8
  - audio
9
- datasets:
10
- - ASU-GSL/AHA
11
  ---
12
 
13
- # Qwen2.5-Omni LoRA Adapter
 
 
14
 
15
  ## Model Description
16
- This is a LoRA adapter for **Qwen2.5-Omni-7B** **Thinker**, fine-tuned to reduce audio hallucination.
17
 
18
- Qwen2.5-Omni is a foundational multimodal model capable of seamless audio-to-audio and audio-to-text interactions. This adapter enhances the model's audio reasoning capability by reducing model hallucination.
 
 
19
 
20
  ## Intended Use
21
- - **Primary Task:** Audio reasoning.
22
- - **Languages Supported:** All languages supported by Qwen2.5-Omni-7B.
23
 
24
- ## How to Load
25
- You can load this model using the `peft` and `transformers` libraries:
 
26
 
27
  ```python
 
 
28
  from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
29
  from peft import PeftModel
30
- import torch
31
 
 
32
  model_id = "Qwen/Qwen2.5-Omni-7B"
33
  adapter_id = "ASU-GSL/Qwen-Audio-AHA"
34
 
@@ -40,9 +49,21 @@ model = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained(
40
 
41
  # Load LoRA adapter
42
  model = PeftModel.from_pretrained(model, adapter_id)
43
- ```
44
 
 
 
 
 
 
 
 
 
 
 
45
  ```
 
 
 
46
  @article{chen2025aha,
47
  title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
48
  author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},
 
1
  ---
 
2
  base_model: Qwen/Qwen2.5-Omni-7B
3
+ datasets:
4
+ - ASU-GSL/AHA
5
+ library_name: peft
6
+ license: apache-2.0
7
+ pipeline_tag: audio-text-to-text
8
  tags:
9
  - lora
10
  - qwen2.5-omni
11
  - multimodal
12
  - audio
 
 
13
  ---
14
 
15
+ # Qwen-Audio-AHA (LoRA Adapter)
16
+
17
+ This repository contains the official LoRA adapter for **Qwen2.5-Omni-7B** (Thinker), fine-tuned using the **AHA (Audio Hallucination Alignment)** framework.
18
 
19
  ## Model Description
20
+ AHA is a framework designed to mitigate hallucinations in Large Audio-Language Models (LALMs) by focusing on fine-grained temporal reasoning and counterfactual alignment. By leveraging counterfactual hard negative mining, the pipeline constructs high-quality preference data that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications.
21
 
22
+ - **Paper:** [AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives](https://huggingface.co/papers/2512.24052)
23
+ - **GitHub Repository:** [https://github.com/LLM-VLM-GSL/AHA](https://github.com/LLM-VLM-GSL/AHA)
24
+ - **Base Model:** [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)
25
 
26
  ## Intended Use
27
+ - **Primary Task:** Audio reasoning and reducing hallucinations in audio-to-text tasks.
28
+ - **Languages Supported:** All languages supported by the base Qwen2.5-Omni-7B model.
29
 
30
+ ## Sample Usage
31
+
32
+ You can load this model using the `peft` and `transformers` libraries. Note that `librosa` is required for audio loading in this example.
33
 
34
  ```python
35
+ import torch
36
+ import librosa
37
  from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
38
  from peft import PeftModel
 
39
 
40
+ device = "cuda" if torch.cuda.is_available() else "cpu"
41
  model_id = "Qwen/Qwen2.5-Omni-7B"
42
  adapter_id = "ASU-GSL/Qwen-Audio-AHA"
43
 
 
49
 
50
  # Load LoRA adapter
51
  model = PeftModel.from_pretrained(model, adapter_id)
 
52
 
53
+ # Load Audio
54
+ # Replace "example.wav" with the path to your audio file
55
+ audio, _ = librosa.load("example.wav", sr=processor.feature_extractor.sampling_rate)
56
+ prompt = "<|audio|>
57
+ Describe the temporal order of events in this audio."
58
+ inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device)
59
+
60
+ # Generate
61
+ generate_ids = model.generate(**inputs, max_new_tokens=256)
62
+ print(processor.batch_decode(generate_ids, skip_special_tokens=True)[0])
63
  ```
64
+
65
+ ## Citation
66
+ ```bibtex
67
  @article{chen2025aha,
68
  title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
69
  author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},