AdaptationBERT-distil
A faster, distilled variant of AdaptationBERT for binary classification of climate adaptation and resilience texts in the ESG/environmental domain.
Built on top of ESGBERT/EnvironmentalBERT-base (a DistilRoBERTa backbone), AdaptationBERT-distil is fine-tuned on the ClimateLouie/AdaptationBERT-Climate dataset of 2,000 annotated samples to detect whether a given text is related to climate adaptation and resilience.
Key advantage: 82M parameters and 6 transformer layers (vs. ~125M / 12 layers in the original), delivering **2ร faster inference** with comparable classification performance.
Model Details
Model Description
AdaptationBERT-distil is a domain-specific language model designed for the automatic classification of environmental texts. It identifies whether a text passage discusses climate adaptation topics such as resilience planning, adaptive capacity, vulnerability reduction, or climate risk management.
This model is functionally equivalent to AdaptationBERT but uses a lighter backbone optimised for speed and lower resource consumption.
- Model type: DistilRoBERTa-based binary text classifier (
RobertaForSequenceClassification) - Language(s): English
- License: Apache 2.0
- Fine-tuned from: ESGBERT/EnvironmentalBERT-base
- Original full-size model: ClimateLouie/AdaptationBERT
- Training dataset: ClimateLouie/AdaptationBERT-Climate
Architecture
| Parameter | AdaptationBERT-distil | AdaptationBERT (original) |
|---|---|---|
| Backbone | DistilRoBERTa | RoBERTa |
| Hidden size | 768 | 768 |
| Layers | 6 | 12 |
| Attention heads | 12 | 12 |
| Intermediate size | 3,072 | 3,072 |
| Vocabulary size | 50,265 | 50,265 |
| Max sequence length | 512 tokens | 512 tokens |
| Parameters | ~82M | ~125M |
| Model format | SafeTensors | SafeTensors |
Labels
| Label | Description |
|---|---|
0 |
Non-adaptation-related |
1 |
Adaptation-related |
Uses
Direct Use
AdaptationBERT-distil is designed for classifying English text passages as related or unrelated to climate adaptation. It is best suited for applications where inference speed and resource efficiency matter. Typical use cases include:
- Screening corporate sustainability reports for adaptation-related disclosures
- Analyzing ESG filings and environmental policy documents
- Large-scale text mining of climate adaptation mentions across document corpora
- Real-time or near-real-time classification pipelines where latency is a constraint
- Supporting research on climate resilience discourse
Recommended Pipeline
It is highly recommended to use a two-stage classification pipeline:
- First, classify whether a text is "environmental" using the EnvironmentalBERT-environmental model.
- Then, apply AdaptationBERT-distil only to texts classified as environmental to determine if they are adaptation-related.
This two-stage approach improves precision by filtering out non-environmental texts before adaptation classification.
Out-of-Scope Use
- Texts in languages other than English
- Non-environmental domains (e.g., finance-only, legal, medical) without the upstream environmental filter
- Real-time or safety-critical decision systems where misclassification could cause harm
- As a sole basis for regulatory compliance decisions
How to Get Started with the Model
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="ClimateLouie/AdaptationBERT-distil",
tokenizer="ClimateLouie/AdaptationBERT-distil",
)
text = "The city implemented a flood resilience plan to protect coastal infrastructure from rising sea levels."
result = classifier(text)
print(result)
# [{'label': 'adaptation-related', 'score': 0.98}]
Or load the model and tokenizer directly:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("ClimateLouie/AdaptationBERT-distil")
model = AutoModelForSequenceClassification.from_pretrained("ClimateLouie/AdaptationBERT-distil")
text = "Communities are developing drought-resistant farming techniques to adapt to changing rainfall patterns."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
predicted_label = torch.argmax(predictions, dim=-1).item()
label_map = {0: "non-adaptation-related", 1: "adaptation-related"}
print(f"Prediction: {label_map[predicted_label]} (confidence: {predictions[0][predicted_label]:.4f})")
For detailed tutorials, see these guides by Tobias Schimanski on Medium:
Training Details
Training Data
The model was fine-tuned on the ClimateLouie/AdaptationBERT-Climate dataset โ a curated collection of approximately 2,000 text samples annotated for climate adaptation relevance. The dataset contains examples from ESG reports, sustainability disclosures, and environmental policy texts, with binary labels indicating whether each sample discusses climate adaptation and resilience.
Training Procedure
Base Model
Training starts from ESGBERT/EnvironmentalBERT-base, which is itself a DistilRoBERTa model further pre-trained on environmental text corpora (annual reports, sustainability reports, and corporate/general news). This provides a domain-specific foundation that is both environmentally literate and inference-efficient.
Note: The original AdaptationBERT uses ESGBERT/EnvRoBERTa-base (full RoBERTa) as its backbone. The switch to EnvironmentalBERT-base (DistilRoBERTa) halves the number of transformer layers from 12 to 6, reducing parameters from ~125M to ~82M while retaining domain-specific pre-training.
Training Hyperparameters
- Training regime: fp16 (mixed precision on GPU)
- Problem type: Single-label classification
- Learning rate: 2e-5
- Batch size: 16
- Epochs: 5 (with early stopping, patience=2)
- Weight decay: 0.01
- Warmup ratio: 0.1
- Optimizer: AdamW
- Train/val split: 80/20 (stratified)
- Framework: PyTorch + Hugging Face Transformers
Bias, Risks, and Limitations
- Training data size: The model was fine-tuned on only ~2,000 samples, which may limit its ability to generalise across all types of adaptation-related text.
- Language limitation: The model only supports English text. Climate adaptation texts in other languages will not be classified correctly.
- Domain specificity: Performance is optimised for ESG/environmental domain text. Texts from other domains discussing adaptation in non-climate contexts (e.g., biological adaptation, software adaptation) may produce false positives.
- Temporal bias: The training data reflects adaptation terminology and framing as of the time of dataset creation. Emerging adaptation concepts or evolving terminology may not be captured.
- Geographic bias: The training corpus may over-represent adaptation discourse from certain regions or regulatory frameworks, potentially underperforming on texts from underrepresented geographies.
- Distillation trade-off: As a smaller model, AdaptationBERT-distil may exhibit marginally lower accuracy on edge cases compared to the full-size AdaptationBERT. Users processing ambiguous or novel texts should consider validating against the original model.
Recommendations
- Always use the recommended two-stage pipeline (environmental filter + adaptation classification) for best results.
- Validate model outputs on your specific corpus before using in production.
- Do not use model predictions as the sole input for policy or regulatory decisions.
- Consider supplementing with human review, especially for high-stakes applications.
- If maximum accuracy is more important than speed, use the full-size AdaptationBERT instead.
Technical Specifications
Model Architecture and Objective
DistilRoBERTa (a distilled variant of RoBERTa) with a sequence classification head. The model uses 6 transformer layers with 12 attention heads each, a hidden size of 768, and GELU activation. Classification is performed via a linear layer on top of the [CLS] token representation.
Software
- Transformers: 4.40.2+
- Model format: SafeTensors
- Tokenizer: RoBERTa BPE tokenizer (50,265 tokens)
Citation
If you use this model in your research, please cite:
BibTeX:
@misc{adaptationbert_distil,
title={AdaptationBERT-distil: A Distilled Language Model for Climate Adaptation Text Classification},
author={Louie Woodall, inspired by Tobias Schimanski},
year={2025},
url={https://huggingface.co/ClimateLouie/AdaptationBERT-distil}
}
If referencing the original full-size model:
@misc{adaptationbert,
title={AdaptationBERT: A Fine-tuned Language Model for Climate Adaptation Text Classification},
author={Louie Woodall, inspired by Tobias Schimanski},
year={2024},
url={https://huggingface.co/ClimateLouie/AdaptationBERT}
}
More Information
This model is part of the ESGBERT family of models for ESG and environmental text analysis. Related models include:
- ClimateLouie/AdaptationBERT โ Full-size adaptation classifier (EnvRoBERTa backbone, ~125M params)
- ESGBERT/EnvironmentalBERT-base โ Base environmental language model (DistilRoBERTa backbone)
- ESGBERT/EnvRoBERTa-base โ Base environmental language model (full RoBERTa backbone)
- ESGBERT/EnvironmentalBERT-environmental โ Environmental text classifier (recommended upstream filter)
- extreme-weather-impacts/environmentalBERT-extremeweather โ Extreme weather event classifier (same DistilRoBERTa backbone)
- Downloads last month
- 89
Model tree for ClimateLouie/AdaptationBERT-distil
Base model
ESGBERT/EnvironmentalBERT-base