| --- |
| language: en |
| license: apache-2.0 |
| tags: |
| - vision |
| - image-classification |
| - vit |
| - fine-tuned |
| - transformers |
| datasets: |
| - your-dataset-name |
| model-index: |
| - name: ViT-Large-Patch16-224 Fine-tuned Model |
| results: |
| - task: |
| name: Image Classification |
| type: image-classification |
| metrics: |
| - name: Validation Loss |
| type: loss |
| value: 0.3268 |
| --- |
| |
| # Vision Transformer (ViT) Fine-Tuned Model |
|
|
|
|
| # Vision Transformer (ViT) Fine-Tuned Model |
|
|
| This repository contains a fine-tuned version of **[google/vit-large-patch16-224](https://huggingface.co/google/vit-large-patch16-224)**, optimized for a custom image classification task. |
|
|
| --- |
|
|
| ## π Model Overview |
|
|
| - **Base model**: `google/vit-large-patch16-224` |
| - **Architecture**: Vision Transformer (ViT) |
| - **Patch size**: 16Γ16 |
| - **Image resolution**: 224Γ224 |
| - **Frameworks**: PyTorch, Hugging Face Transformers |
|
|
| --- |
|
|
| ## π Performance |
|
|
| | Metric | Value | |
| |--------|-------| |
| | **Final Validation Loss** | **0.3268** | |
| | **Lowest Validation Loss** | **0.2548** (Epoch 18) | |
|
|
| Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs. |
|
|
| --- |
|
|
| ## π§ Training Configuration |
|
|
| | Hyperparameter | Value | |
| |----------------|-------| |
| | **Learning rate** | `2e-5` | |
| | **Train batch size** | `20` | |
| | **Eval batch size** | `8` | |
| | **Optimizer** | AdamW (`betas=(0.9, 0.999)`, `eps=1e-8`) | |
| | **LR scheduler** | Linear | |
| | **Epochs** | `40` | |
| | **Seed** | `42` | |
| | **Framework versions** | Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2 | |
|
|
| --- |
|
|
| ## π Training Results |
|
|
| | Epoch | Step | Validation Loss | |
| |-------|------|-----------------| |
| | 1 | 24 | 0.5601 | |
| | 5 | 120 | 0.3421 | |
| | 10 | 240 | 0.2901 | |
| | 14 | 336 | 0.2737 | |
| | 18 | 432 | **0.2548** | |
| | 40 | 960 | 0.3268 | |
|
|
| --- |
|
|
| ## π Intended Uses |
|
|
| - Image classification on datasets with characteristics similar to the training dataset. |
| - Fine-tuning for domain-specific classification tasks. |
|
|
| --- |
|
|
| ## β Limitations |
|
|
| - Trained on a **custom dataset** β may not generalize well to unrelated domains without additional fine-tuning. |
| - No guarantees on fairness, bias, or ethical implications without dataset analysis. |
|
|
| --- |
|
|
| ## π How to Use |
|
|
| You can use this model in two main ways: |
|
|
| ### **1οΈβ£ Using the High-Level `pipeline` API** |
| ```python |
| from transformers import pipeline |
| |
| pipe = pipeline("image-classification", model="rakib730/output-models") |
| |
| # Classify an image from a URL |
| result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png") |
| print(result) |
| |
| 2οΈβ£ Using the Processor and Model Directly** |
| from transformers import AutoImageProcessor, AutoModelForImageClassification |
| from PIL import Image |
| import requests |
| import torch |
| |
| # Load processor and model |
| processor = AutoImageProcessor.from_pretrained("rakib730/output-models") |
| model = AutoModelForImageClassification.from_pretrained("rakib730/output-models") |
| |
| # Load an image |
| url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png" |
| image = Image.open(requests.get(url, stream=True).raw).convert("RGB") |
| |
| # Preprocess |
| inputs = processor(images=image, return_tensors="pt") |
| |
| # Inference |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| logits = outputs.logits |
| predicted_class_id = logits.argmax(-1).item() |
| |
| print("Predicted class:", model.config.id2label[predicted_class_id]) |
| |