Audio Classification
Transformers
Safetensors
English
audio-spectrogram-transformer
music
speech
ast
Instructions to use Vyvo-Research/AST-Music-Classifier-1K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Vyvo-Research/AST-Music-Classifier-1K with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("audio-classification", model="Vyvo-Research/AST-Music-Classifier-1K")# Load model directly from transformers import AutoFeatureExtractor, AutoModelForAudioClassification extractor = AutoFeatureExtractor.from_pretrained("Vyvo-Research/AST-Music-Classifier-1K") model = AutoModelForAudioClassification.from_pretrained("Vyvo-Research/AST-Music-Classifier-1K") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -26,7 +26,7 @@ Fine-tuned Audio Spectrogram Transformer (AST) for music vs speech classificatio
|
|
| 26 |
- **Base Model:** MIT/ast-finetuned-audioset-10-10-0.4593
|
| 27 |
- **Task:** Binary Audio Classification (Music vs Speech)
|
| 28 |
- **Training Dataset:** AIGenLab/speech-music-1k (1000 samples)
|
| 29 |
-
- **Overall Accuracy:**
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
@@ -36,20 +36,20 @@ Fine-tuned Audio Spectrogram Transformer (AST) for music vs speech classificatio
|
|
| 36 |
|----------|----------|---------|-------|
|
| 37 |
| Pure Music | 100.0% | 10 | 10 |
|
| 38 |
| Pure Speech | 70.0% | 7 | 10 |
|
| 39 |
-
| Speech + Music |
|
| 40 |
|
| 41 |
### Pure Music
|
| 42 |
|
| 43 |
| File | Music Score | Speech Score | Prediction | Result |
|
| 44 |
|------|-------------|--------------|------------|--------|
|
| 45 |
| music_1.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 46 |
-
| music_10.wav |
|
| 47 |
-
| music_2.wav |
|
| 48 |
-
| music_3.wav |
|
| 49 |
| music_4.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 50 |
-
| music_5.wav |
|
| 51 |
| music_6.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 52 |
-
| music_7.wav |
|
| 53 |
| music_8.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 54 |
| music_9.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 55 |
|
|
@@ -58,28 +58,28 @@ Fine-tuned Audio Spectrogram Transformer (AST) for music vs speech classificatio
|
|
| 58 |
| File | Music Score | Speech Score | Prediction | Result |
|
| 59 |
|------|-------------|--------------|------------|--------|
|
| 60 |
| speech_1.wav | 0.000 | 1.000 | SPEECH | β
|
|
| 61 |
-
| speech_10.wav | 0.
|
| 62 |
| speech_2.wav | 0.000 | 1.000 | SPEECH | β
|
|
| 63 |
-
| speech_3.wav | 0.
|
| 64 |
-
| speech_4.wav | 0.
|
| 65 |
-
| speech_5.wav |
|
| 66 |
-
| speech_6.wav | 0.
|
| 67 |
-
| speech_7.wav | 0.
|
| 68 |
-
| speech_8.wav | 0.
|
| 69 |
-
| speech_9.wav | 0.
|
| 70 |
|
| 71 |
### Speech + Music
|
| 72 |
|
| 73 |
| File | Music Score | Speech Score | Prediction | Result |
|
| 74 |
|------|-------------|--------------|------------|--------|
|
| 75 |
-
| speech_and_music_1.wav |
|
| 76 |
-
| speech_and_music_10.wav |
|
| 77 |
| speech_and_music_2.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 78 |
| speech_and_music_3wav.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 79 |
| speech_and_music_4.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 80 |
-
| speech_and_music_5.wav |
|
| 81 |
| speech_and_music_6.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 82 |
-
| speech_and_music_7.wav |
|
| 83 |
| speech_and_music_8.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 84 |
| speech_and_music_9.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 85 |
|
|
|
|
| 26 |
- **Base Model:** MIT/ast-finetuned-audioset-10-10-0.4593
|
| 27 |
- **Task:** Binary Audio Classification (Music vs Speech)
|
| 28 |
- **Training Dataset:** AIGenLab/speech-music-1k (1000 samples)
|
| 29 |
+
- **Overall Accuracy:** 90.0% (27/30)
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
|
|
| 36 |
|----------|----------|---------|-------|
|
| 37 |
| Pure Music | 100.0% | 10 | 10 |
|
| 38 |
| Pure Speech | 70.0% | 7 | 10 |
|
| 39 |
+
| Speech + Music | 100.0% | 10 | 10 |
|
| 40 |
|
| 41 |
### Pure Music
|
| 42 |
|
| 43 |
| File | Music Score | Speech Score | Prediction | Result |
|
| 44 |
|------|-------------|--------------|------------|--------|
|
| 45 |
| music_1.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 46 |
+
| music_10.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 47 |
+
| music_2.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 48 |
+
| music_3.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 49 |
| music_4.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 50 |
+
| music_5.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 51 |
| music_6.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 52 |
+
| music_7.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 53 |
| music_8.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 54 |
| music_9.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 55 |
|
|
|
|
| 58 |
| File | Music Score | Speech Score | Prediction | Result |
|
| 59 |
|------|-------------|--------------|------------|--------|
|
| 60 |
| speech_1.wav | 0.000 | 1.000 | SPEECH | β
|
|
| 61 |
+
| speech_10.wav | 0.002 | 0.998 | SPEECH | β
|
|
| 62 |
| speech_2.wav | 0.000 | 1.000 | SPEECH | β
|
|
| 63 |
+
| speech_3.wav | 0.714 | 0.286 | MUSIC | β |
|
| 64 |
+
| speech_4.wav | 0.906 | 0.094 | MUSIC | β |
|
| 65 |
+
| speech_5.wav | 0.350 | 0.650 | SPEECH | β
|
|
| 66 |
+
| speech_6.wav | 0.895 | 0.105 | MUSIC | β |
|
| 67 |
+
| speech_7.wav | 0.068 | 0.932 | SPEECH | β
|
|
| 68 |
+
| speech_8.wav | 0.097 | 0.903 | SPEECH | β
|
|
| 69 |
+
| speech_9.wav | 0.083 | 0.917 | SPEECH | β
|
|
| 70 |
|
| 71 |
### Speech + Music
|
| 72 |
|
| 73 |
| File | Music Score | Speech Score | Prediction | Result |
|
| 74 |
|------|-------------|--------------|------------|--------|
|
| 75 |
+
| speech_and_music_1.wav | 0.995 | 0.005 | MUSIC | β
|
|
| 76 |
+
| speech_and_music_10.wav | 0.987 | 0.013 | MUSIC | β
|
|
| 77 |
| speech_and_music_2.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 78 |
| speech_and_music_3wav.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 79 |
| speech_and_music_4.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 80 |
+
| speech_and_music_5.wav | 0.998 | 0.002 | MUSIC | β
|
|
| 81 |
| speech_and_music_6.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 82 |
+
| speech_and_music_7.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 83 |
| speech_and_music_8.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 84 |
| speech_and_music_9.wav | 1.000 | 0.000 | MUSIC | β
|
|
| 85 |
|