academia-mar11

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/academia-mar11")

topic_model.get_topic_info()

Topic overview

Number of topics: 20
Number of training documents: 2353

Click here for an overview of all topics.

Topic ID	Topic Keywords	Topic Frequency	Label
-1	models - language - language models - large - llms	10	-1_models_language_language models_large
0	language - models - reasoning - language models - llms	678	0_language_models_reasoning_language models
1	language - models - llms - knowledge - data	347	1_language_models_llms_knowledge
2	visual - image - images - models - model	275	2_visual_image_images_models
3	code - software - code generation - generation - llms	177	3_code_software_code generation_generation
4	bias - text - models - detection - language	146	4_bias_text_models_detection
5	planning - agents - language - game - robot	134	5_planning_agents_language_game
6	dialogue - personality - dialog - conversations - systems	88	6_dialogue_personality_dialog_conversations
7	chatgpt - students - education - educational - learning	80	7_chatgpt_students_education_educational
8	ai - chatgpt - artificial intelligence - artificial - technology	64	8_ai_chatgpt_artificial intelligence_artificial
9	learning - reinforcement learning - reinforcement - rl - policy	63	9_learning_reinforcement learning_reinforcement_rl
10	training - quantization - memory - transformer - models	54	10_training_quantization_memory_transformer
11	adversarial - attacks - attack - security - models	52	11_adversarial_attacks_attack_security
12	legal - patent - law - claim - models	48	12_legal_patent_law_claim
13	privacy - private - models - data - language	31	13_privacy_private_models_data
14	financial - stock - sentiment - data - market	25	14_financial_stock_sentiment_data
15	problems - math - word problems - mathematical - word	24	15_problems_math_word problems_mathematical
16	materials - molecular - molecule - discovery - chemical	22	16_materials_molecular_molecule_discovery
17	recommendation - recommender - recommendations - item - user	19	17_recommendation_recommender_recommendations_item
18	surprisal - reading - models - word - language	16	18_surprisal_reading_models_word

Training hyperparameters

calculate_probabilities: False
language: english
low_memory: False
min_topic_size: 10
n_gram_range: (1, 1)
nr_topics: 20
seed_topic_list: None
top_n_words: 10
verbose: True
zeroshot_min_similarity: 0.7
zeroshot_topic_list: None

Framework versions

Numpy: 1.25.2
HDBSCAN: 0.8.33
UMAP: 0.5.5
Pandas: 1.5.3
Scikit-Learn: 1.2.2
Sentence-transformers: 2.6.1
Transformers: 4.38.2
Numba: 0.58.1
Plotly: 5.15.0
Python: 3.10.12

Downloads last month: 8