Token Classification
Transformers
ONNX
Safetensors
English
Irish
distilbert
pii
de-identification
ireland
irish
gaelic
diffusion-style
denoising
ppsn
eircode
int8
dynamic-quantization
cpu
Eval Results (legacy)
Instructions to use temsa/IrishCore-DiffMask-135M-v1-rc3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use temsa/IrishCore-DiffMask-135M-v1-rc3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="temsa/IrishCore-DiffMask-135M-v1-rc3")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("temsa/IrishCore-DiffMask-135M-v1-rc3") model = AutoModel.from_pretrained("temsa/IrishCore-DiffMask-135M-v1-rc3") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| - ga | |
| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: token-classification | |
| tags: | |
| - pii | |
| - de-identification | |
| - token-classification | |
| - ireland | |
| - irish | |
| - gaelic | |
| - diffusion-style | |
| - denoising | |
| - ppsn | |
| - eircode | |
| - onnx | |
| - int8 | |
| - dynamic-quantization | |
| - cpu | |
| base_model: | |
| - OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1 | |
| datasets: | |
| - temsa/OpenMed-Irish-CorePII-TrainMix-v1 | |
| - temsa/OpenMed-Irish-PPSN-Eircode-Spec-v1 | |
| - joelniklaus/mapa | |
| - gretelai/synthetic_pii_finance_multilingual | |
| model-index: | |
| - name: IrishCore-DiffMask-135M-v1-rc3 | |
| results: | |
| - task: | |
| type: token-classification | |
| name: Irish core PII masking | |
| dataset: | |
| type: custom | |
| name: irish_core_pii_v1 | |
| metrics: | |
| - type: f1 | |
| name: Overall F1 | |
| value: 0.9664 | |
| - task: | |
| type: token-classification | |
| name: Multilingual PPSN masking | |
| dataset: | |
| type: custom | |
| name: multilingual_ppsn_v1_all | |
| metrics: | |
| - type: f1 | |
| name: Overall F1 | |
| value: 0.9591 | |
| - task: | |
| type: token-classification | |
| name: Hardening exact suite | |
| dataset: | |
| type: custom | |
| name: irish_dllm_hardening_exact_v1 | |
| metrics: | |
| - type: f1 | |
| name: Overall F1 | |
| value: 1.0000 | |
| - task: | |
| type: token-classification | |
| name: UAT replay exact suite | |
| dataset: | |
| type: custom | |
| name: diffmask_gap_uat_exact_v1 | |
| metrics: | |
| - type: f1 | |
| name: Overall F1 | |
| value: 0.9032 | |
| # IrishCore-DiffMask-135M-v1-rc3 | |
| `IrishCore-DiffMask-135M-v1-rc3` is a raw-only Irish PII masking model derived from `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1`. | |
| It is a small, scanner-free span extractor tuned for: | |
| - `PPSN` | |
| - `ACCOUNT_NUMBER` | |
| - `BANK_ROUTING_NUMBER` | |
| - `CREDIT_DEBIT_CARD` | |
| - `PASSPORT_NUMBER` | |
| - `POSTCODE` | |
| - `PHONE_NUMBER` | |
| - `EMAIL` | |
| - `FIRST_NAME` | |
| - `LAST_NAME` | |
| - `SWIFT_BIC` | |
| The main target is English plus Irish Gaelic text in citizen-support, public-sector, and HSE-style flows. The repo ships both the full `transformers` checkpoint and a dynamic q8 ONNX artifact for CPU deployment. | |
| ## What "DiffMask" Means Here | |
| This release is not a generative diffusion language model. It is a compact discriminative token-span model trained with a diffusion-style denoising schedule. | |
| The short version: | |
| - **Base OpenMed**: plain BIO token classification | |
| - **DiffMask**: token-span extraction with token-presence and boundary heads | |
| - **DiffMask training**: repeated masked denoising over the same sentence | |
| - **DiffMask inference**: one forward pass, no iterative refinement, no text generation | |
| Concretely: | |
| - The encoder starts from the DistilBERT-family weights inside `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1`. | |
| - The model adds three task heads over the encoder hidden states: | |
| - a per-label token-presence head | |
| - a typed start-boundary head | |
| - a typed end-boundary head | |
| - During training, each input sentence is corrupted multiple times by replacing a random fraction of visible tokens with `[MASK]`. | |
| - The corruption level follows a short noise schedule from heavy masking to light masking. | |
| - The same gold spans are learned at every noise level, and the losses are averaged across the denoising passes. | |
| - At inference time there is no diffusion loop and no rewrite step: the model runs once and a score-only span decoder reconstructs spans from token scores plus typed boundaries. | |
| So the "DLLM" aspect here is the training recipe: repeated masked denoising over text, not autoregressive generation. | |
| ## What It Is Not | |
| This model is **not** a full discrete diffusion language model in the LLaDA sense. | |
| A true DLLM would usually have: | |
| - timestep or noise conditioning inside the model | |
| - iterative denoising at inference time | |
| - multi-step sequence refinement at runtime | |
| - text generation or full-sequence reconstruction as a first-class objective | |
| This release does **not** do that. | |
| Instead, it uses the diffusion idea only as a **training-time robustness trick**: | |
| - corrupt the sentence with `[MASK]` at several noise levels | |
| - train on the same target spans each time | |
| - average those losses | |
| At runtime, it behaves like a normal fast discriminative extractor. | |
| ## Architecture | |
| - Encoder: DistilBERT-size encoder from the OpenMed mLiteClinical 135M base | |
| - Heads: | |
| - token presence per released label | |
| - typed start boundary per released label | |
| - typed end boundary per released label | |
| - Decoder: | |
| - score-only span decoding from offsets, token continuity, label-specific thresholds, and typed boundaries | |
| - no regex candidate extractor | |
| - no checksum validator | |
| - no scanner layer | |
| The release behavior is fully defined by the weights plus the bundled decoder in `common.py`. | |
| ## Training And Inference Flow | |
| Training: | |
| 1. tokenize a sentence with gold BIO spans | |
| 2. convert spans into: | |
| - token-presence targets | |
| - typed start targets | |
| - typed end targets | |
| 3. create several noised copies of the same tokenized sentence by masking random visible tokens | |
| 4. run the same encoder+heads on each noised copy | |
| 5. average the losses across those denoising passes | |
| Inference: | |
| 1. tokenize the raw text once | |
| 2. run a single forward pass | |
| 3. predict: | |
| - which labels are present on each token | |
| - where each labeled span starts | |
| - where each labeled span ends | |
| 4. decode spans with label-aware thresholds and boundary rules | |
| 5. replace the detected spans with placeholders such as `[PII:PPSN]` | |
| There is no multi-step refinement loop in deployment. | |
| ## How It Differs From The Original OpenMed Model | |
| The original `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1` is a standard `DistilBertForTokenClassification` model: | |
| - one encoder | |
| - one token-classification head | |
| - BIO labels such as `B-email`, `I-email`, `B-phone_number` | |
| - generic token aggregation to recover spans | |
| DiffMask changes two things: | |
| 1. **Different supervision** | |
| - base OpenMed learns only BIO token labels | |
| - DiffMask learns token presence plus typed span boundaries | |
| 2. **Different training recipe** | |
| - base OpenMed is trained as a standard token classifier | |
| - DiffMask is trained on multiple masked-noised views of the same sentence | |
| That makes DiffMask better suited to structured Irish identifiers and mixed PII masking, while still keeping a small encoder and a fast CPU path. | |
| ## How It Differs From `rc5` And `rc8` | |
| | Model | Core idea | External scanner/validator | Runtime shape | | |
| |---|---|---|---| | |
| | `rc5` | token classifier + repair logic | yes | heavier, decoder-assisted | | |
| | `rc8` | raw-only token-span model | no | one pass + span decoder | | |
| | `DiffMask` | raw-only token-span model + denoising training | no | one pass + span decoder | | |
| So DiffMask is closest to `rc8` operationally, but it uses a stronger training recipe. | |
| ## Why This Exists | |
| The older `rc5` release still depended on a repair-oriented decoder stack. The public `rc8` release removed that external logic, but it regressed on several structured Irish identifiers. This release keeps the raw-only deployment shape while re-hardening the model on Irish numeric and mixed-PII cases. | |
| `rc3` is the next candidate after `rc2`. It keeps the stronger `focusv3` checkpoint selected during local iteration, then applies a small decoder-profile retune for the published config: | |
| - lower `EMAIL` token extend threshold to keep contiguous mailbox fragments together | |
| - lower `PASSPORT_NUMBER` q8 threshold slightly to recover a mixed-message passport miss after dynamic quantization | |
| The weights remain raw-only and scanner-free. The `rc3` change is the checkpoint plus a stricter release-time decoder profile in `config.json`. | |
| ## References | |
| Direct implementation references: | |
| - Devlin et al., *BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding* | |
| https://arxiv.org/abs/1810.04805 | |
| - Sanh et al., *DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter* | |
| https://arxiv.org/abs/1910.01108 | |
| - Fu et al., *Boundary Smoothing for Named Entity Recognition* | |
| https://aclanthology.org/2022.acl-long.490/ | |
| - Wang et al., *SPANNER: Named Entity Re-/Recognition as Span Prediction* | |
| https://aclanthology.org/2021.acl-long.558/ | |
| Conceptual diffusion-style training references: | |
| - Nie et al., *LLaDA 2.0: Scaling Up Diffusion Language Models to 100B* | |
| https://arxiv.org/abs/2512.15745 | |
| - Gong et al., *Scaling Diffusion Language Models via Adaptation from Autoregressive Models* | |
| https://arxiv.org/abs/2410.17891 | |
| These diffusion papers were used as architectural inspiration for the masked noising schedule. This release does **not** implement a generative text diffusion runtime. | |
| ## Included Artifacts | |
| - Full `transformers` checkpoint in the repo root | |
| - Dynamic q8 ONNX export in `onnx/model_quantized.onnx` | |
| - Unquantized ONNX export in `onnx/model.onnx` | |
| - `inference_mask.py` for the full checkpoint | |
| - `inference_mask_onnx.py` for the ONNX q8 path | |
| - `common.py`, `model.py`, and `multitask_model.py` implementing the release decoder | |
| - benchmark files in `eval/` | |
| Artifact sizes: | |
| - Full checkpoint: `514 MB` (`model.safetensors`) | |
| - Dynamic q8 ONNX: `393 MB` (`onnx/model_quantized.onnx`) | |
| ## How To Use It | |
| Full checkpoint: | |
| ```bash | |
| uv run python inference_mask.py \ | |
| --model temsa/IrishCore-DiffMask-135M-v1-rc3 \ | |
| --min-score 0.5 \ | |
| --text "My PPSN is 1234567TW, my Eircode is D02 X285, and my phone is 087 123 4567." \ | |
| --json | |
| ``` | |
| Dynamic q8 ONNX: | |
| ```bash | |
| uv run python inference_mask_onnx.py \ | |
| --model temsa/IrishCore-DiffMask-135M-v1-rc3 \ | |
| --min-score 0.5 \ | |
| --text "Please provide your passport NN5123456 and call me on 0851234567." \ | |
| --json | |
| ``` | |
| Both scripts emit explicit placeholders like `[PII:PPSN]` in `masked_text`. | |
| ## Q8 Comparison | |
| Deployment-relevant comparison on CPU: | |
| | Model | Core F1 | Edge F1 | Finance F1 | Finance-boundary F1 | User PPSN F1 | GA weak PPSN F1 | Multilingual PPSN F1 | Hardening F1 | | |
| |---|---:|---:|---:|---:|---:|---:|---:|---:| | |
| | `rc5` ONNX q8 | 0.9669 | 0.9744 | 0.9362 | 0.8750 | 1.0000 | 1.0000 | 0.9333 | - | | |
| | `rc8` ONNX q8 | 0.9737 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9176 | 0.7059 | | |
| | `IrishCore-DiffMask-135M-v1-rc3` ONNX q8 | 0.9664 | 1.0000 | 1.0000 | 1.0000 | 0.8571 | 1.0000 | 0.9591 | 1.0000 | | |
| UAT replay exact suite used for the recent hardening pass: | |
| | Model | UAT replay exact F1 | Precision | Recall | | |
| |---|---:|---:|---:| | |
| | `IrishCore-DiffMask-135M-v1-rc1` ONNX q8 | 0.4545 | 1.0000 | 0.2941 | | |
| | `IrishCore-DiffMask-135M-v1-rc2` ONNX q8 | 0.8276 | 1.0000 | 0.7059 | | |
| | `rc8` ONNX q8 | 0.3636 | 0.3750 | 0.3529 | | |
| | `IrishCore-DiffMask-135M-v1-rc3` ONNX q8 | 0.9032 | 1.0000 | 0.8235 | | |
| CPU throughput references: | |
| | Suite | `rc5` q8 | `rc8` q8 | `IrishCore-DiffMask-135M-v1-rc3` q8 | | |
| |---|---:|---:|---:| | |
| | Irish core short-text path | 33.6193 ex/s | 257.3756 ex/s | 29.9676 ex/s | | |
| | Multilingual PPSN short-text path | 35.5561 ex/s | 230.5181 ex/s | 54.2219 ex/s | | |
| | Runtime profile source | 23.8338 ex/s | 179.4708 ex/s | 46.1519 ex/s | | |
| Notes: | |
| - The `rc5` speed references come from its published q8 end-to-end inference stack, which includes its older repair decoder. | |
| - The `rc8` and `IrishCore-DiffMask-135M-v1-rc3` numbers use the same raw-only token-span ONNX path. | |
| - A weight-only q4 ONNX experiment was also tried during development, but it was slower than q8 on this CPU and is not shipped. | |
| - The `user_raw_regression_cases_v1` suite is a legacy PPSN-only regression set. In `rc3`, the single counted false positive is `0871234567`, which is now intentionally masked as `PHONE_NUMBER` rather than misread as `PPSN`. | |
| ## Additional Training Data Used For This RC | |
| Published training sources: | |
| - `temsa/OpenMed-Irish-CorePII-TrainMix-v1` | |
| - `temsa/OpenMed-Irish-PPSN-Eircode-Spec-v1` | |
| - `joelniklaus/mapa` | |
| - `gretelai/synthetic_pii_finance_multilingual` | |
| Additional local synthetic hardening and replay sets used during checkpoint selection: | |
| - `irish_core_diffmask_v5_mix` | |
| - `dllm_uat_replay_v1` | |
| - `dllm_gap_patch_v4` | |
| - `dllm_uat_patch_v3` | |
| - `irish_core_diffmask_focus_v3` | |
| `rc3` is based on the locally selected `focusv3` checkpoint and then retuned with a narrower decoder profile for the public config. | |
| ## Limits | |
| - This is still a compact model. The hardest remaining errors are multilingual PPSN near-miss cases rather than Irish core numeric formats. | |
| - The release path is intentionally scanner-free. If you need deterministic validation of individual identifier types, add that in your application layer. | |
| - If you rely on release behavior, use the bundled inference scripts or import `decode_token_presence_segments` from `common.py`. | |
| - Known remaining misses on the current UAT replay suite are the second phone number in the long Client Identity Services sentence (`071 967 2616`), `R93 EC57` inside the longer allocation-centre block, and `EPStamp4@enterprise.gov.ie`. | |
| ## License And Attribution | |
| - Release license: Apache-2.0 | |
| - Base model: `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1` | |
| - The derivative release remains subject to the attribution terms of the upstream datasets listed above. | |
| - See `NOTICE`, `training_sources.json`, and `eval/benchmark_summary.json` for provenance and benchmark details. | |
| <!-- portfolio-comparison:start --> | |
| ## Portfolio Comparison | |
| Updated: `2026-03-16`. | |
| Use this section for the fastest public comparison across the `temsa` PII masking portfolio. | |
| - The first core table only includes public checkpoints that ship both comparable q8 accuracy and q8 CPU throughput. | |
| - The first PPSN table only includes public artifacts that ship comparable PPSN accuracy and CPU throughput. | |
| - Missing cells in the archive tables mean the older release did not ship that metric in its public bundle. | |
| - DiffMask rows use the reconciled `clean_single_pass` harness that matches the deployed runtime. | |
| - GlobalPointer rows use the public raw-only span-matrix release bundle and its packaged q8 ONNX artifact. | |
| - The same content is shipped as `PORTFOLIO_COMPARISON.md` inside each public model repo. | |
| ### Irish Core PII: Comparable Public Checkpoints | |
| | Repo | Stack | Full Core F1 | Q8 Core F1 | Q8 Multilingual PPSN F1 | Q8 Core ex/s | | |
| |---|---|---:|---:|---:|---:| | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc4`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc4) | 4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 299.0 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc3`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc3) | 4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 317.9 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc2`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc2) | 4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 292.5 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc1`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc1) | 4-layer GlobalPointer distilled fast student | 1.0000 | 1.0000 | 0.9333 | 337.3 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc27`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc27) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 270.0 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc25`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc25) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 212.1 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc24`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc24) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 278.9 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc23`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc23) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 237.6 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc22`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc22) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 106.8 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc21`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc21) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 150.8 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc20`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc20) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 181.9 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc19`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc19) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 73.1 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc18`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc18) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 126.2 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc17`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc17) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc16`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc16) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc15`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc15) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.5 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc14`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc14) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 119.2 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc13`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc13) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 126.1 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc12`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc12) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 73.6 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc11`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc11) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 94.1 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc10`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc10) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 125.8 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc9`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc9) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 119.8 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 128.9 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 89.0 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc6`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc6) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 89.0 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5) | GlobalPointer raw-only + context labels | 1.0000 | 1.0000 | 0.9333 | 84.5 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc4`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc4) | GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9333 | 61.5 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc3`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc3) | GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9333 | 61.5 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc2`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc2) | GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9222 | 61.5 | | |
| | [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc1`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc1) | GlobalPointer raw-only + context labels | 0.9935 | 0.9935 | 0.9222 | 61.5 | | |
| | [`temsa/IrishCore-GlobalPointer-135M-v1-rc4`](https://huggingface.co/temsa/IrishCore-GlobalPointer-135M-v1-rc4) | GlobalPointer raw-only span-matrix | 1.0000 | 1.0000 | 0.9333 | 221.6 | | |
| | [`temsa/IrishCore-GlobalPointer-135M-v1-rc3`](https://huggingface.co/temsa/IrishCore-GlobalPointer-135M-v1-rc3) | GlobalPointer raw-only span-matrix | 1.0000 | 1.0000 | 0.9213 | 204.9 | | |
| | [`temsa/IrishCore-GlobalPointer-135M-v1-rc2`](https://huggingface.co/temsa/IrishCore-GlobalPointer-135M-v1-rc2) | GlobalPointer raw-only span-matrix | 0.9934 | 0.9934 | 0.9326 | 231.2 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8) | Raw-only token-span | 0.9737 | 0.9737 | 0.9176 | 46.1 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7) | Hybrid classifier + generated scanner spec | 1.0000 | 0.9934 | 1.0000 | 30.0 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6) | Hybrid classifier + repair decoders | 1.0000 | 0.9934 | 1.0000 | 29.5 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5) | Hybrid classifier + repair decoders | 0.9737 | 0.9669 | 0.9333 | 34.4 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc4`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc4) | Hybrid classifier + repair decoders | 0.9870 | 0.9740 | 0.9600 | 114.2 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc3`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc3) | Hybrid classifier + repair decoders | 0.9806 | 0.9677 | 0.9333 | 44.9 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc2`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc2) | Hybrid classifier + repair decoders | 0.9554 | 0.9615 | 0.7887 | 119.1 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1) | Hybrid classifier baseline | 0.9530 | 0.9333 | 0.9882 | 103.3 | | |
| | [`temsa/IrishCore-DiffMask-135M-v1-rc6`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc6) | DiffMask token-span, scanner-free | 0.9801 | 0.9733 | 0.9274 | 130.3 | | |
| | [`temsa/IrishCore-DiffMask-135M-v1-rc5`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc5) | DiffMask token-span, scanner-free | 0.9733 | 0.9733 | 0.9379 | 249.2 | | |
| | [`temsa/IrishCore-DiffMask-135M-v1-rc4`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc4) | DiffMask token-span, scanner-free | 0.9733 | 0.9733 | 0.9371 | 29.5 | | |
| | [`temsa/IrishCore-DiffMask-135M-v1-rc3`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc3) | DiffMask token-span, scanner-free | 0.9664 | 0.9664 | 0.9591 | 30.0 | | |
| | [`temsa/IrishCore-DiffMask-135M-v1-rc2`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc2) | DiffMask token-span, scanner-free | 0.9664 | 0.9664 | 0.9212 | 247.1 | | |
| | [`temsa/IrishCore-DiffMask-135M-v1-rc1`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc1) | DiffMask token-span, scanner-free | 0.9801 | 0.9934 | 0.9412 | 251.2 | | |
| ### Irish Core PII: Other Public Checkpoints | |
| | Repo | Stack | Full Core F1 | Q8 Core F1 | Q8 Multilingual PPSN F1 | Notes | | |
| |---|---|---:|---:|---:|---| | |
| | [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1) | Hybrid classifier prototype | 0.9487 | — | — | Predates the public q8 artifact. | | |
| Finance-boundary q8 F1 is `1.0000` for `OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6`, `OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7`, `OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8`, and all public `IrishCore-DiffMask` releases from `rc1` to `rc6`. `OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5` ships `0.8750` on that public q8 suite. | |
| ### PPSN-Only: Comparable Public Artifacts | |
| | Repo | Artifact | Irish Large F1 | Multilingual PPSN F1 | User Raw F1 | QA v8 F1 | CPU ex/s | | |
| |---|---|---:|---:|---:|---:|---:| | |
| | [`temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1) | fp32 canonical checkpoint | 0.8979 | 0.9704 | 0.8000 | 0.7385 | 57.4 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16) | fp16 CPU/GPU artifact | — | 0.9704 | 0.8000 | 0.7385 | 45.8 | | |
| | [`temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-q8`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-q8) | dynamic int8 CPU artifact | — | 0.9040 | — | — | 132.1 | | |
| ### PPSN-Only: Historical Public Checkpoints | |
| | Repo | Main Published Metrics | Notes | | |
| |---|---|---| | |
| | [`temsa/OpenMed-PPSN-mLiteClinical-v1`](https://huggingface.co/temsa/OpenMed-PPSN-mLiteClinical-v1) | same as canonical fp32 repo: multilingual 0.9704, user raw 0.8000 | Legacy alias; prefer `temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1`. | | |
| | [`temsa/OpenMed-PPSN-v6-raw-rc2`](https://huggingface.co/temsa/OpenMed-PPSN-v6-raw-rc2) | irish_reg_v5 0.8750; user_raw 0.8000; qa_v8 0.7385 | Raw PPSN-only research checkpoint; no packaged multilingual CPU benchmark row. | | |
| | [`temsa/OpenMed-PPSN-v5_1`](https://huggingface.co/temsa/OpenMed-PPSN-v5_1) | irish_large_v2 raw 0.9285; qa_v6 hybrid strict 1.0000 | Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. | | |
| | [`temsa/OpenMed-PPSN-v5`](https://huggingface.co/temsa/OpenMed-PPSN-v5) | irish_reg_v5 raw 0.8235; irish_reg_v5 hybrid strict 1.0000 | Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. | | |
| | [`temsa/OpenMed-PPSN-v4`](https://huggingface.co/temsa/OpenMed-PPSN-v4) | synthetic non-PPSN drift check only | Predates the current PPSN eval suite; no packaged apples-to-apples multilingual CPU row. | | |
| If you need the strongest current raw-only Irish core model, start with `IrishCore-GlobalPointer-135M-v1-rc4`. If you need the fastest CPU-first raw-only line, compare it against `IrishCore-DiffMask-135M-v1-rc6`. If you need a PPSN-only artifact, compare the canonical `fp32`, `fp16`, and `q8` variants of `OpenMed-mLiteClinical-IrishPPSN-135M-v1` directly in the table above. | |
| <!-- portfolio-comparison:end --> | |