Tina: Text-to-Model Generative AI (CIFAR-100, CNN)

Tina is a text-conditioned neural network diffusion model that generates personalized image classifiers from natural language prompts. Given a text description of the desired classification task (e.g., a list of class names), Tina directly outputs the full parameters of a lightweight CNN — no gradient-based training required at inference time.

This checkpoint is the Tina model trained on CIFAR-100, capable of generating 10-class personalized CNN classifiers (~5K parameters) from text prompts.

Model Description

Property Value
Architecture Diffusion Transformer (DiT), GPT-2 style backbone
Text Encoder CLIP ViT-B/32 (frozen)
Hidden Size 2048
Transformer Layers 12 encoder layers + 12 decoder layers
Attention Heads 16
Diffusion Steps 1000 (DDPM sampling)
Prediction Type Signal prediction (xâ‚€)
Generated Model 2-layer CNN, ~5K parameters
Max Classification Classes 10
Training p-Models 1000 personalized models
Training Dataset CIFAR-100 (100 classes, 32×32 images)

How It Works

Tina treats model generation as a conditional diffusion process — analogous to how text-to-image diffusion models denoise random pixels into coherent images, Tina denoises random vectors into functional neural network parameters.

  1. Training: Tina is trained on (task description, personalized model) pairs. Each personalized model is a CNN fine-tuned on a specific 10-class subset of CIFAR-100.
  2. Inference: Given a text prompt listing the desired classes (e.g., ["apple", "bear", "bicycle", "bus", "castle", "clock", "cloud", "forest", "mountain", "train"]), Tina generates a complete CNN classifier in a single forward pass through 1000 denoising steps.

Thanks to the vision-language alignment of CLIP, Tina also supports:

  • Image prompts: Zero-shot and few-shot image-prompted generation
  • Natural language descriptions: Using class descriptions instead of class names
  • Unseen classes: Generalization to classes not seen during training
  • Variable class counts: Any number of classes up to 10 via classification sequence padding

Intended Use

  • On-demand personalized classification: Quickly generate a lightweight classifier tailored to a user's specific needs without any training data or GPU-intensive fine-tuning.
  • Edge AI deployment: The generated CNN (~5K params) is extremely lightweight, suitable for resource-constrained devices.
  • Research on text-to-model generation: Exploring the paradigm of generating functional AI models from natural language.

Performance

Main Results on CIFAR-100 (10-class personalization)

Method In-Distribution Out-of-Distribution
Generic Model 28.72 29.88
Classifier Selection 64.83 64.15
TAPER 67.71 66.85
Tina (this model) 68.35 67.14

Inference Efficiency

Method Time per model (CNN)
Pretrain + fine-tune 94.35s
TAPER 18.10s
Tina 4.88s

Limitations

  • This checkpoint generates CNN classifiers only (2-layer, ~5K parameters) for CIFAR-100 class subsets.
  • Input images are expected to be 32×32 resolution.
  • A single Tina cannot generate models across different architectures or modalities simultaneously.
  • Performance on entirely out-of-domain classes (beyond CIFAR-100 semantic scope) may degrade.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ZexiLi/Tina