TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

Paper: arXiv:2603.08096 Project Page: cwru-aism.github.io/triangulang Code: github.com/bryceag11/triangulang Training Data & Caches: huggingface.co/datasets/bag100/triangulang-scannetpp-cache

Bryce Grant, Aryeh Rothenberg, Atri Banerjee, Peng Wang Case Western Reserve University

Overview

TrianguLang is a feed-forward, pose-free method for language-guided 3D localization from multi-view images. Given unposed images and a text query, it produces per-view segmentation masks and camera-relative 3D locations at ~18 FPS for 5 classes.

Checkpoints

Checkpoint	Description
`checkpoints/ma_v10_config_245s_100ep/best.pt`	Single-object (v10), ScanNet++ in-domain (62.4 mIoU)
`checkpoints/mo_v11_text_spatial_245s_8v_100ep/best.pt`	Multi-object (text + spatial)
`checkpoints/gasa_generalist/best.pt`	Generalist for zero-shot open-vocab benchmarks (uCO3D / LERF-OVS / 3D-OVS / Mip-NeRF360)
`checkpoints/gasa_E_box_camframe_230s_100ep_bs8/best.pt`	Strongest ScanNet++ (74.3 mIoU), camera-frame

Each checkpoint directory also contains last.pt (for resuming training) and config.json.

Architecture

Frozen: SAM3 (841M) + DA3-NESTED-GIANT-LARGE (1.69B) = ~2.5B params
Trainable: GASA Decoder (~13.5M params)

Results

Single-Object (text-only)

Benchmark	Setting	mIoU	mAcc / Loc. Acc.
ScanNet++	In-domain	62.4%	77.4% mAcc
uCO3D	In-domain	94.6%	98.3% mAcc
uCO3D	Cross-domain (ScanNet++ → uCO3D)	75.7%	79.6% mAcc
LERF-OVS	Zero-shot (no LERF training)	59.2%	89.1% Loc. Acc.
NVOS	Zero-shot	93.5%	—
SPIn-NeRF	Zero-shot	91.4%	—

Multi-Object (text-only, ScanNet++)

Setting	mIoU	mAcc
Text-only (multi-object)	65.2%	79.1%

LERF-OVS Per-Scene (zero-shot)

Method	Ramen	Teatime	Kitchen	Figurines	Overall mIoU	Overall Loc. Acc.
LERF	28.2	45.0	37.9	38.6	37.4	73.6
LangSplat	51.2	65.1	44.5	44.7	51.4	84.3
LangSplat-V2	51.8	72.2	59.1	56.4	59.9	84.1
TrianguLang	51.1	58.9	62.4	62.1	59.2	89.1

Note: Per-scene methods (LERF, LangSplat) require calibrated poses and 10-45 min per-scene optimization. TrianguLang runs feed-forward in ~58ms.

Citation

@article{grant2026triangulang,
  title={TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization},
  author={Grant, Bryce and Rothenberg, Aryeh and Banerjee, Atri and Wang, Peng},
  journal={arXiv preprint arXiv:2603.08096},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for bag100/triangulang

TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

Paper • 2603.08096 • Published Mar 9