academia-mar11

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/academia-mar11")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 20
  • Number of training documents: 2353
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 models - language - language models - large - llms 10 -1_models_language_language models_large
0 language - models - reasoning - language models - llms 678 0_language_models_reasoning_language models
1 language - models - llms - knowledge - data 347 1_language_models_llms_knowledge
2 visual - image - images - models - model 275 2_visual_image_images_models
3 code - software - code generation - generation - llms 177 3_code_software_code generation_generation
4 bias - text - models - detection - language 146 4_bias_text_models_detection
5 planning - agents - language - game - robot 134 5_planning_agents_language_game
6 dialogue - personality - dialog - conversations - systems 88 6_dialogue_personality_dialog_conversations
7 chatgpt - students - education - educational - learning 80 7_chatgpt_students_education_educational
8 ai - chatgpt - artificial intelligence - artificial - technology 64 8_ai_chatgpt_artificial intelligence_artificial
9 learning - reinforcement learning - reinforcement - rl - policy 63 9_learning_reinforcement learning_reinforcement_rl
10 training - quantization - memory - transformer - models 54 10_training_quantization_memory_transformer
11 adversarial - attacks - attack - security - models 52 11_adversarial_attacks_attack_security
12 legal - patent - law - claim - models 48 12_legal_patent_law_claim
13 privacy - private - models - data - language 31 13_privacy_private_models_data
14 financial - stock - sentiment - data - market 25 14_financial_stock_sentiment_data
15 problems - math - word problems - mathematical - word 24 15_problems_math_word problems_mathematical
16 materials - molecular - molecule - discovery - chemical 22 16_materials_molecular_molecule_discovery
17 recommendation - recommender - recommendations - item - user 19 17_recommendation_recommender_recommendations_item
18 surprisal - reading - models - word - language 16 18_surprisal_reading_models_word

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 20
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.25.2
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.5
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.6.1
  • Transformers: 4.38.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support