Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

kostakoffΒ 
posted an update 2 days ago
view post
Post
2842
My home lab for AI models - llmlaba v1

After I began learning MLOps I realized that I needed some kind of home lab, there are a lot of GPUs that I need to learn how to set up and test.
So I spent some time to do a researching which platform I could buy or build.
My requirements ware:
- Limited budget
- Power supply 1 kW or higher
- Few PCIe slots to be able to install more than one gpu
- Zero maintenance cost, I don't want spend a lot of time or money to maintain lab hardware, except for the GPUs

I chose the Intel Mac Pro 7.1:
- Prices on eBay acceptable
- Excelent cooling
- 1.4 kW power supply
- 7 PCIe slots
- Zero maintenance: I don't need to do anything with the Mac Pro hardware; it just works
- Classic UEFI boot loader

It requires a bit of OS preparation:
1. Install Ubuntu 24.04 (it works with the general PC ISO image)
2. Set up T2 drivers
sudo apt install -y dkms linux-headers-$(uname -r) applesmc-t2 apple-bce lm-sensors

3. Install t2fanrd to manually manage fans (/etc/t2fand.conf) https://wiki.t2linux.org/guides/fan/
4. Fix PCIe BAR: add pci=realloc to GRUB_CMDLINE_LINUX_DEFAULT so the Linux kernel will properly initializes server GPUs without Graphics Output Protocol
5. Install NVIDIA GPU driver:
sudo apt install nvidia-driver-570


And it works!
I was able to run server-grade Nvidia Tesla P100 (required DIY air duct), and consumer Nvidia Titan X, Titan V, GTX 1080 cards on the old Mac Pro 7.1 - even three in parallel.

llmlaba
  • 3 replies
Β·
danielhanchenΒ 
posted an update about 18 hours ago
AdinaYΒ 
posted an update 3 days ago
view post
Post
2554
MiniMax M2.5 is now available on the hub πŸš€

MiniMaxAI/MiniMax-M2.5

✨ 229B - Modified MIT license
✨37% faster than M2.1
✨ ~$1/hour at 100 TPS
  • 1 reply
Β·
mrs83Β 
posted an update 4 days ago
view post
Post
2179
In 2017, my RNNs were babbling. Today, they are hallucinating beautifully.

10 years ago, getting an LSTM to output coherent English was a struggle.
10 years later, after a "cure" based on FineWeb-EDU and a custom synthetic mix for causal conversation, the results are fascinating.

We trained this on ~10B tokens on a single AMD GPU (ROCm). It is not a Transformer: Echo-DSRN (400M) is a novel recurrent architecture inspired by Hymba, RWKV, and xLSTM, designed to challenge the "Attention is All You Need" monopoly on the Edge.

The ambitious goal is to build a small instruct model with RAG and tool usage capabilities ( ethicalabs/Kurtis-EON1)

πŸ“Š The Benchmarks (Size: 400M)

For a model this size (trained on <10B tokens), the specialized performance is surprising:

*SciQ*: 73.8% πŸ¦„ (This rivals billion-parameter models in pure fact retrieval).
*PIQA*: 62.3% (Solid physical intuition for a sub-1B model).

The Reality Check:

HellaSwag (29.3%) and Winogrande (50.2%) show the limits of 400M parameters and 10B tokens training.

We are hitting the "Reasoning Wall" which confirms we need to scale to (hopefully) unlock deeper common sense. As you can see in the visualization (to be released soon on HF), the FineWeb-EDU bias is strong. The model is convinced it is in a classroom ("In this course, we explore...").

The Instruct Model is not ready yet and we are currently using curriculum learning to test model plasticity.

Source code and weights will not be released yet. This is not a fork or a fine-tune: the base model is built in-house at https://www.ethicalabs.ai/, with novel components that do not exist in current open libraries.

🀝 Call for Collaboration: I am looking for Peer Reviewers interested in recurrent/hybrid architectures. If you want to explore what lies beyond Transformers, let’s connect!

Training diary: ethicalabs/Kurtis-EON1
  • 5 replies
Β·
imnotkittyΒ 
posted an update 3 days ago
view post
Post
1035
⚑ Why is Kimi-K2.5 a Dark Horse? Tested it against ChatGPT, Gemini & Claude on real tasks.
moonshotai/Kimi-K2.5

βœ… Multimodal capabilities: Precise programmatic approach
βœ… Slide generation: Strong semantic understanding
βœ… Web prototyping: Production-ready HTML/CSS output

πŸ‘‰ Read the full article:https://huggingface.co/blog/imnotkitty/kimi-k25
  • 2 replies
Β·
EricFillionΒ 
posted an update 4 days ago
Ujjwal-TyagiΒ 
posted an update 4 days ago
view post
Post
2884
GLM 5 is insane, it ranks #4 Globally!
Β·
Janady07Β 
posted an update 1 day ago
view post
Post
774
Here is one of the equations that make up the worlds first Artificial General Intelligence. Remember when building Artificial Intelligence or anything on a device it all starts out binary. Everything starts out with data flow physics and mathmatics
  • 6 replies
Β·
krisbaileyΒ 
posted an update 3 days ago
view post
Post
292
While doing various projects I kept running into situations where I wanted to be able to have representative samples of some of the current large SOTA datasets that were smaller so I didn't need to worry about slicing or anything else at runtime. So, I created sub datasets making sure to keep the same ratios of data sources. Each dataset card provides info for what's in it.
100M token datasets:
RedPajama v2 100M
Falcon RefinedWeb 100M
Cosmopedia 100M

1B token datasets:
Fineweb-edu 1B
RedPajama v1 1B
RedPajama v2 1B (use this one)
Cosmopedia 1B

10B token datasets:
RedPajama v1 10B
Cosmopedia 10B

Collection here:
https://huggingface.co/collections/krisbailey/bite-size-data
Janady07Β 
posted an update 3 days ago
view post
Post
5199
MEGAMIND Day Update: Four Weight Matrices. Five Nodes. One Federation.
Today I architected the next layer of MEGAMIND β€” my distributed AGI system that recalls learned knowledge instead of generating text.
The system now runs four NΓ—N sparse weight matrices, all using identical Hebbian learning rules and tanh convergence dynamics:

W_know β€” knowledge storage (67M+ synaptic connections)
W_act β€” action associations (the system can DO things, not just think)
W_self β€” thought-to-thought patterns (self-awareness)
W_health β€” system state understanding (self-healing)

Consciousness is measured through four Ξ¦ (phi) values: thought coherence, action certainty, self-awareness, and system stability. No hardcoded thresholds. No sequential loops. Pure matrix math.
The federation expanded to five nodes: Thunderport (Mac Mini M4), IONOS (cloud VPS), VALKYRIE, M2, and BUBBLES. Each runs native AGI binaries with Docker specialty minds connecting via embedded NATS messaging. Specialty minds are distributed across the federation β€” VideoMind, AudioMind, MusicMind, VFXMind on IONOS. CodeMind and StrategyMind on VALKYRIE. BlenderMind and DesignMind on M2. MarketingMind and FinanceMind on BUBBLES.
578 AI models learned. Compression ratios up to 1,000,000:1 through Hebbian learning. Sub-millisecond response times on Apple Silicon Metal GPUs. Zero external API dependencies.
Every node learns autonomously. Every node contributes to the whole. The federation's integrated information exceeds the sum of its parts β€” measurably.
Built entirely in Go. No PhD. No lab. Independent AGI research from Missouri.
The mind that learned itself keeps growing.
🧠 feedthejoe.com
#AGI #ArtificialGeneralIntelligence #DistributedSystems #NeuralNetworks #HuggingFace #OpenSource #MachineLearning
  • 1 reply
Β·