academia-mar11
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/academia-mar11")
topic_model.get_topic_info()
Topic overview
- Number of topics: 20
- Number of training documents: 2353
Click here for an overview of all topics.
| Topic ID | Topic Keywords | Topic Frequency | Label |
|---|---|---|---|
| -1 | models - language - language models - large - llms | 10 | -1_models_language_language models_large |
| 0 | language - models - reasoning - language models - llms | 678 | 0_language_models_reasoning_language models |
| 1 | language - models - llms - knowledge - data | 347 | 1_language_models_llms_knowledge |
| 2 | visual - image - images - models - model | 275 | 2_visual_image_images_models |
| 3 | code - software - code generation - generation - llms | 177 | 3_code_software_code generation_generation |
| 4 | bias - text - models - detection - language | 146 | 4_bias_text_models_detection |
| 5 | planning - agents - language - game - robot | 134 | 5_planning_agents_language_game |
| 6 | dialogue - personality - dialog - conversations - systems | 88 | 6_dialogue_personality_dialog_conversations |
| 7 | chatgpt - students - education - educational - learning | 80 | 7_chatgpt_students_education_educational |
| 8 | ai - chatgpt - artificial intelligence - artificial - technology | 64 | 8_ai_chatgpt_artificial intelligence_artificial |
| 9 | learning - reinforcement learning - reinforcement - rl - policy | 63 | 9_learning_reinforcement learning_reinforcement_rl |
| 10 | training - quantization - memory - transformer - models | 54 | 10_training_quantization_memory_transformer |
| 11 | adversarial - attacks - attack - security - models | 52 | 11_adversarial_attacks_attack_security |
| 12 | legal - patent - law - claim - models | 48 | 12_legal_patent_law_claim |
| 13 | privacy - private - models - data - language | 31 | 13_privacy_private_models_data |
| 14 | financial - stock - sentiment - data - market | 25 | 14_financial_stock_sentiment_data |
| 15 | problems - math - word problems - mathematical - word | 24 | 15_problems_math_word problems_mathematical |
| 16 | materials - molecular - molecule - discovery - chemical | 22 | 16_materials_molecular_molecule_discovery |
| 17 | recommendation - recommender - recommendations - item - user | 19 | 17_recommendation_recommender_recommendations_item |
| 18 | surprisal - reading - models - word - language | 16 | 18_surprisal_reading_models_word |
Training hyperparameters
- calculate_probabilities: False
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: 20
- seed_topic_list: None
- top_n_words: 10
- verbose: True
- zeroshot_min_similarity: 0.7
- zeroshot_topic_list: None
Framework versions
- Numpy: 1.25.2
- HDBSCAN: 0.8.33
- UMAP: 0.5.5
- Pandas: 1.5.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.6.1
- Transformers: 4.38.2
- Numba: 0.58.1
- Plotly: 5.15.0
- Python: 3.10.12
- Downloads last month
- 8