Visual Intelligence, Pretrained Vision-and-Language Model, Embodied AI, Collaborative Agents, Vision Task(Object Detection, Segmentation)
Generate images from text using a lightweight Stable Diffusion XL
Answer questions about images with text prompts