Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
17
4
MLLM
PRO
Anran-MLLM
Follow
ttkhty0's profile picture
godx7's profile picture
bitersun's profile picture
3 followers
ยท
3 following
AI & ML interests
None yet
Recent Activity
reacted
to
their
post
with ๐
2 days ago
๐ Introducing PerceptionDLM โ the first multimodal diffusion LLM for parallel region perception! Most MLLMs are autoregressive, so captioning N regions costs N sequential passes. PerceptionDLM instead describes ALL masked regions in a single denoising process. ๐งฉ โจ Highlights โข โก Up to 3.4ร faster on dense multi-region captioning, with stable per-image latency โข ๐ PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs) โข ๐ New benchmark: ParaDLC-Bench โ jointly evaluates caption quality AND inference efficiency โข ๐ Code, models & benchmark all open-sourced ๐ค Models https://huggingface.co/MSALab/PerceptionDLM-Base https://huggingface.co/MSALab/PerceptionDLM ๐ Benchmark https://huggingface.co/datasets/MSALab/ParaDLC-Bench ๐ Paper: https://huggingface.co/papers/2606.19534 ๐ป Code: https://github.com/MSALab-PKU/PerceptionDLM Diffusion LLMs aren't just for text โ they unlock efficient, parallel visual perception. ๐๏ธโจ #multimodal #diffusion #VLM #perception
reacted
to
their
post
with ๐ฅ
2 days ago
๐ Introducing PerceptionDLM โ the first multimodal diffusion LLM for parallel region perception! Most MLLMs are autoregressive, so captioning N regions costs N sequential passes. PerceptionDLM instead describes ALL masked regions in a single denoising process. ๐งฉ โจ Highlights โข โก Up to 3.4ร faster on dense multi-region captioning, with stable per-image latency โข ๐ PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs) โข ๐ New benchmark: ParaDLC-Bench โ jointly evaluates caption quality AND inference efficiency โข ๐ Code, models & benchmark all open-sourced ๐ค Models https://huggingface.co/MSALab/PerceptionDLM-Base https://huggingface.co/MSALab/PerceptionDLM ๐ Benchmark https://huggingface.co/datasets/MSALab/ParaDLC-Bench ๐ Paper: https://huggingface.co/papers/2606.19534 ๐ป Code: https://github.com/MSALab-PKU/PerceptionDLM Diffusion LLMs aren't just for text โ they unlock efficient, parallel visual perception. ๐๏ธโจ #multimodal #diffusion #VLM #perception
upvoted
a
paper
4 days ago
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models
View all activity
Organizations
None yet
Anran-MLLM
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
2 models
5 days ago
MSALab/PerceptionDLM
Image-Text-to-Text
โข
9B
โข
Updated
5 days ago
โข
18
โข
6
MSALab/PerceptionDLM-Base
Image-Text-to-Text
โข
9B
โข
Updated
5 days ago
โข
21
โข
4
liked
a model
7 months ago
black-forest-labs/FLUX.1-Kontext-dev
Image-to-Image
โข
Updated
Jan 1
โข
128k
โข
โข
2.67k
liked
a dataset
7 months ago
BleachNick/UltraEdit_Region_Based_100k
Viewer
โข
Updated
Jul 22, 2024
โข
108k
โข
271
โข
11