Thai ID Nano OCR β€” Thai OCR Reader (SimpleCRNN (MVP))

MVP model. Production upgrade: swap to ppocrv5 variant (same interface, better accuracy). See config.json β†’ architecture_variant for programmatic detection.

CTC-based text recognition model for Thai National ID card thai fields, designed for on-device inference at 30fps on mobile.

Metric Value
Architecture SimpleCRNN (MVP)
Variant crnn
ExactMatch 95.3%
CharAccuracy 99.1%
Parameters 3,059,535
Vocab size 79
Best epoch 116

Quick Start

from huggingface_hub import hf_hub_download

model_path = hf_hub_download("chayuto/thai-id-ocr-crnn-thai-reader", "model.pt")
vocab_path = hf_hub_download("chayuto/thai-id-ocr-crnn-thai-reader", "vocab.txt")
config = hf_hub_download("chayuto/thai-id-ocr-crnn-thai-reader", "config.json")

Architecture

SimpleCRNN β€” CNN (4-layer) + BiLSTM (2-layer) + CTC decoder.

Input: [B, 3, 48, 320]  (RGB, normalized to [-1, 1])
  β†’ CNN: 32β†’64β†’128β†’256 channels, BatchNorm+ReLU, MaxPool(2,2)Γ—3
  β†’ AdaptiveAvgPool2d((1, None))  β†’ T=40 time steps
  β†’ BiLSTM: hidden=256, layers=2, dropout=0.1
  β†’ Linear(512 β†’ 79)
  β†’ CTC decode (blank=0, collapse repeats)
Output: Unicode string

Field Details

  • Zone: text_thai_zone (names, addresses, dates, religion)
  • Charset: 42 consonants + 16 vowels + 7 marks + Arabic digits + punctuation (78 chars + CTC blank)
  • FP16 recommended for tone mark preservation at quantization
  • v2: Fixed 78-char vocab, 95.3% ExactMatch, 99.1% CharAcc

Input Preprocessing

import cv2
import numpy as np

def preprocess(img_path, height=48, max_width=320):
    img = cv2.imread(img_path)
    h, w = img.shape[:2]
    ratio = height / h
    new_w = min(int(w * ratio), max_width)
    img = cv2.resize(img, (new_w, height))
    # Pad to max_width with white
    if new_w < max_width:
        pad = np.full((height, max_width - new_w, 3), 255, dtype=np.uint8)
        img = np.concatenate([img, pad], axis=1)
    # Normalize to [-1, 1]
    img = img.astype(np.float32) / 255.0
    img = (img - 0.5) / 0.5
    return np.transpose(img, (2, 0, 1))  # CHW

CTC Decoding

def ctc_decode(indices, vocab_chars, blank_idx=0):
    chars, prev = [], -1
    for idx in indices:
        if idx != blank_idx and idx != prev:
            if 1 <= idx <= len(vocab_chars):
                chars.append(vocab_chars[idx - 1])
        prev = idx
    return "".join(chars)

Loading the Model

import torch
import torch.nn as nn

class SimpleCRNN(nn.Module):
    def __init__(self, num_classes, img_h=48):
        super().__init__()
        self.cnn = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1), nn.BatchNorm2d(32), nn.ReLU(), nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 64, 3, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(2, 2),
            nn.Conv2d(64, 128, 3, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(2, 2),
            nn.Conv2d(128, 256, 3, padding=1), nn.BatchNorm2d(256), nn.ReLU(),
            nn.AdaptiveAvgPool2d((1, None)),
        )
        self.rnn = nn.LSTM(256, 256, num_layers=2, bidirectional=True, batch_first=True, dropout=0.1)
        self.fc = nn.Linear(512, num_classes)

    def forward(self, x):
        features = self.cnn(x).squeeze(2).permute(0, 2, 1)
        rnn_out, _ = self.rnn(features)
        return self.fc(rnn_out).permute(1, 0, 2)  # (T, B, C) for CTC

model = SimpleCRNN(num_classes=79)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

Pipeline Context

This model is one of 3 Reader experts in the Thai ID Nano OCR pipeline:

Camera Frame β†’ YOLO26n Finder (5-class, single pass)
  β†’ num_id_zone, num_dob_zone    β†’ Numeric Reader
  β†’ text_eng_zone                β†’ English Reader
  β†’ text_thai_zone               β†’ Thai Reader
  β†’ Validator (Mod11 checksum, date logic)

Total pipeline: <15 MB, 30fps on mobile.

Files

File Description
model.pt PyTorch state_dict (~12 MB)
vocab.txt Character vocabulary, one per line (<space> = space). CTC blank is implicit at index 0.
config.json Architecture params, training metadata, charset

License

MIT

Downloads last month
103
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support