Video-Text-to-Text
Transformers
Safetensors
qwen2_5_omni
text-to-audio
multimodal
video-captioning
audio-visual
ugc
Instructions to use openinterx/UGC-VideoCaptioner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openinterx/UGC-VideoCaptioner with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForTextToWaveform processor = AutoProcessor.from_pretrained("openinterx/UGC-VideoCaptioner") model = AutoModelForTextToWaveform.from_pretrained("openinterx/UGC-VideoCaptioner") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -296,10 +296,10 @@ python eval_caption.py
|
|
| 296 |
If you find this repository helpful, feel free to cite our paper:
|
| 297 |
|
| 298 |
```bibtex
|
| 299 |
-
@article{
|
| 300 |
title={UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks},
|
| 301 |
-
author={Wu,
|
| 302 |
journal={arXiv preprint arXiv:2507.11336},
|
| 303 |
-
year={
|
| 304 |
}
|
| 305 |
```
|
|
|
|
| 296 |
If you find this repository helpful, feel free to cite our paper:
|
| 297 |
|
| 298 |
```bibtex
|
| 299 |
+
@article{wu2025ugc,
|
| 300 |
title={UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks},
|
| 301 |
+
author={Wu, Peiran and Liu, Yunze and Zhu, Zhengdong and Zhou, Enmin and Shen, Shawn},
|
| 302 |
journal={arXiv preprint arXiv:2507.11336},
|
| 303 |
+
year={2025}
|
| 304 |
}
|
| 305 |
```
|