Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.03432

A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings

Paper • 2504.15610 • Published Apr 22, 2025 • 1
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Paper • 2502.13533 • Published Feb 19, 2025 • 13
LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

Paper • 2403.08822 • Published Feb 28, 2024
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Paper • 2407.18242 • Published Jul 25, 2024

Papers - Training with Lora

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6, 2024 • 1
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 18
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data

Paper • 2304.08247 • Published Apr 14, 2023 • 2

Papers - MoE - Training

Robust Mixture-of-Expert Training for Convolutional Neural Networks

Paper • 2308.10110 • Published Aug 19, 2023 • 2
Experts Weights Averaging: A New General Training Scheme for Vision Transformers

Paper • 2308.06093 • Published Aug 11, 2023 • 2
ConstitutionalExperts: Training a Mixture of Principle-based Prompts

Paper • 2403.04894 • Published Mar 7, 2024 • 2
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6, 2024 • 1

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 189
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 56
ResLoRA: Identity Residual Mapping in Low-Rank Adaption

Paper • 2402.18039 • Published Feb 28, 2024 • 11

Papers on Mixture of Experts (MoE)

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12, 2024 • 44
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29, 2024 • 28
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 53
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6, 2024 • 1

Models - MoE - Training using Lora

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6, 2024 • 1

Papers - MoE - Research

Adaptive sequential Monte Carlo by means of mixture of experts

Paper • 1108.2836 • Published Aug 14, 2011 • 2
Convergence Rates for Mixture-of-Experts

Paper • 1110.2058 • Published Oct 10, 2011 • 2
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs

Paper • 2310.12008 • Published Oct 18, 2023 • 2
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

Paper • 2308.11793 • Published Aug 22, 2023 • 2

A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings

Paper • 2504.15610 • Published Apr 22, 2025 • 1
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Paper • 2502.13533 • Published Feb 19, 2025 • 13
LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient Fine-Tuning of Large Language Models

Paper • 2403.08822 • Published Feb 28, 2024
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Paper • 2407.18242 • Published Jul 25, 2024

Papers on Mixture of Experts (MoE)

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12, 2024 • 44
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29, 2024 • 28
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 53
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6, 2024 • 1

Papers - Training with Lora

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6, 2024 • 1
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 18
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data

Paper • 2304.08247 • Published Apr 14, 2023 • 2

Models - MoE - Training using Lora

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6, 2024 • 1

Papers - MoE - Training

Robust Mixture-of-Expert Training for Convolutional Neural Networks

Paper • 2308.10110 • Published Aug 19, 2023 • 2
Experts Weights Averaging: A New General Training Scheme for Vision Transformers

Paper • 2308.06093 • Published Aug 11, 2023 • 2
ConstitutionalExperts: Training a Mixture of Principle-based Prompts

Paper • 2403.04894 • Published Mar 7, 2024 • 2
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6, 2024 • 1

Papers - MoE - Research

Adaptive sequential Monte Carlo by means of mixture of experts

Paper • 1108.2836 • Published Aug 14, 2011 • 2
Convergence Rates for Mixture-of-Experts

Paper • 1110.2058 • Published Oct 10, 2011 • 2
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs

Paper • 2310.12008 • Published Oct 18, 2023 • 2
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

Paper • 2308.11793 • Published Aug 22, 2023 • 2

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 189
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 56
ResLoRA: Identity Residual Mapping in Low-Rank Adaption

Paper • 2402.18039 • Published Feb 28, 2024 • 11

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs