HunEmBERT8 / README.md

poltextlab

Update gated prompt

4064996 verified about 5 hours ago

preview code

raw

history blame contribute delete

3.06 kB

metadata

license: apache-2.0
language:
  - hu
metrics:
  - accuracy
model-index:
  - name: huBERTPlain
    results:
      - task:
          type: text-classification
        metrics:
          - type: f1
            value: 0.77
extra_gated_fields:
  Name: text
  Country: country
  Institution: text
  Institution Email: text
  Please specify your academic use case: text
extra_gated_prompt: >-
  Our models are intended for academic projects and academic research only. If
  you are not affiliated with an academic institution, please reach out to us at
  huggingface [at] poltextlab [dot] com for further inquiry. If we cannot
  clearly determine your academic affiliation and use case based on your form
  data, your request may be rejected. Please allow us a few business days to
  manually review subscriptions.

Model description

Cased fine-tuned BERT model for Hungarian, trained on (manuallay anniated) parliamentary pre-agenda speeches scraped from parlament.hu.

Intended uses & limitations

The model can be used as any other (cased) BERT model. It has been tested recognizing emotions at the sentence level in (parliamentary) pre-agenda speeches, where:

'Label_0': Neutral
'Label_1': Fear
'Label_2': Sadness
'Label_3': Anger
'Label_4': Disgust
'Label_5': Success
'Label_6': Joy
'Label_7': Trust

Training

Fine-tuned version of the original huBERT model (SZTAKI-HLT/hubert-base-cc), trained on HunEmPoli corpus.

Category	Count	Ratio	Sentiment	Count	Ratio
Neutral	351	1.85%	Neutral	351	1.85%
Fear	162	0.85%	Negative	11180	58.84%
Sadness	4258	22.41%
Anger	643	3.38%
Disgust	6117	32.19%
Success	6602	34.74%	Positive	7471	39.32%
Joy	441	2.32%
Trust	428	2.25%
Sum	19002

Eval results

Class	Precision	Recall	F-Score
Fear	0.625	0.625	0.625
Sadness	0.8535	0.6291	0.7243
Anger	0.7857	0.3437	0.4782
Disgust	0.7154	0.8790	0.7888
Success	0.8579	0.8683	0.8631
Joy	0.549	0.6363	0.5894
Trust	0.4705	0.5581	0.5106
Macro AVG	0.7134	0.6281	0.6497
Weighted AVG	0.791	0.7791	0.7743

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT8")
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT8")

BibTeX entry and citation info

If you use the model, please cite the following paper:

Bibtex:

@ARTICLE{10149341,
  author={{"U}veges, Istv{\'a}n and Ring, Orsolya},
  journal={IEEE Access}, 
  title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication}, 
  year={2023},
  volume={11},
  number={},
  pages={60267-60278},
  doi={10.1109/ACCESS.2023.3285536}
}