rakib730
/

output-models

Image Classification

Eval Results (legacy)

Model card Files Files and versions

output-models / README.md

rakib730's picture

Update README.md

99dbce1 verified 9 months ago

|

history blame contribute delete

3.47 kB

	---
	language: en
	license: apache-2.0
	tags:
	- vision
	- image-classification
	- vit
	- fine-tuned
	- transformers
	datasets:
	- your-dataset-name
	model-index:
	- name: ViT-Large-Patch16-224 Fine-tuned Model
	results:
	- task:
	name: Image Classification
	type: image-classification
	metrics:
	- name: Validation Loss
	type: loss
	value: 0.3268
	---

	# Vision Transformer (ViT) Fine-Tuned Model


	# Vision Transformer (ViT) Fine-Tuned Model

	This repository contains a fine-tuned version of [google/vit-large-patch16-224](https://huggingface.co/google/vit-large-patch16-224), optimized for a custom image classification task.

	---

	## 📌 Model Overview

	- Base model: `google/vit-large-patch16-224`
	- Architecture: Vision Transformer (ViT)
	- Patch size: 16×16
	- Image resolution: 224×224
	- Frameworks: PyTorch, Hugging Face Transformers

	---

	## 📊 Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Final Validation Loss \| 0.3268 \|
	\| Lowest Validation Loss \| 0.2548 (Epoch 18) \|

	Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs.

	---

	## 🔧 Training Configuration

	\| Hyperparameter \| Value \|
	\|----------------\|-------\|
	\| Learning rate \| `2e-5` \|
	\| Train batch size \| `20` \|
	\| Eval batch size \| `8` \|
	\| Optimizer \| AdamW (`betas=(0.9, 0.999)`, `eps=1e-8`) \|
	\| LR scheduler \| Linear \|
	\| Epochs \| `40` \|
	\| Seed \| `42` \|
	\| Framework versions \| Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2 \|

	---

	## 📂 Training Results

	\| Epoch \| Step \| Validation Loss \|
	\|-------\|------\|-----------------\|
	\| 1 \| 24 \| 0.5601 \|
	\| 5 \| 120 \| 0.3421 \|
	\| 10 \| 240 \| 0.2901 \|
	\| 14 \| 336 \| 0.2737 \|
	\| 18 \| 432 \| 0.2548 \|
	\| 40 \| 960 \| 0.3268 \|

	---

	## 🛠 Intended Uses

	- Image classification on datasets with characteristics similar to the training dataset.
	- Fine-tuning for domain-specific classification tasks.

	---

	## ⚠ Limitations

	- Trained on a custom dataset — may not generalize well to unrelated domains without additional fine-tuning.
	- No guarantees on fairness, bias, or ethical implications without dataset analysis.

	---

	## 🚀 How to Use

	You can use this model in two main ways:

	### 1️⃣ Using the High-Level `pipeline` API
	```python
	from transformers import pipeline

	pipe = pipeline("image-classification", model="rakib730/output-models")

	# Classify an image from a URL
	result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")
	print(result)

	2️⃣ Using the Processor and Model Directly**
	from transformers import AutoImageProcessor, AutoModelForImageClassification
	from PIL import Image
	import requests
	import torch

	# Load processor and model
	processor = AutoImageProcessor.from_pretrained("rakib730/output-models")
	model = AutoModelForImageClassification.from_pretrained("rakib730/output-models")

	# Load an image
	url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png"
	image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

	# Preprocess
	inputs = processor(images=image, return_tensors="pt")

	# Inference
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	predicted_class_id = logits.argmax(-1).item()

	print("Predicted class:", model.config.id2label[predicted_class_id])