Ujjwal-Tyagi (Ujjwal Tyagi)

repliedto kanaria007's post 7 days ago

oh I can understand, your research is interesting, nice work!, keep going 😀 🤗

repliedto reaperdoesntknow's post 7 days ago

Oh nice! Good work

repliedto their post 7 days ago

You're welcome. If you haven't already, you can review my master notes in the dataset repo card, https://huggingface.co/datasets/Ujjwal-Tyagi/ai-ml-foundations-book-collection#my-master-notes-and-main-concept-understanding-after-i-read-those-books

repliedto kanaria007's post 7 days ago

it looks interesting but like any implementation plan, or any kind of result by implementing it? in the simple easy way, could you please explain what is it for and how we can implement it?

reactedto their post with ❤️ 8 days ago

Post

2757

I am sharing my study material for AI & ML, these books are really a "bible" and gives very strong foundation, I also have given guidance, introduction and my master notes in the dataset repo card! I hope you will find them helpful, if you have any queries, just start a discussion and I am always there to help you out!
Ujjwal-Tyagi/ai-ml-foundations-book-collection

3 replies

·

posted an update 8 days ago

Post

2757

I am sharing my study material for AI & ML, these books are really a "bible" and gives very strong foundation, I also have given guidance, introduction and my master notes in the dataset repo card! I hope you will find them helpful, if you have any queries, just start a discussion and I am always there to help you out!
Ujjwal-Tyagi/ai-ml-foundations-book-collection

3 replies

·

repliedto reaperdoesntknow's post 8 days ago

oh i see thanks

repliedto reaperdoesntknow's post 9 days ago

Interesting, so why don't you create a research paper? Wanna see the training recipe and configuration, setup

repliedto branikita's post 10 days ago

glad to see them

repliedto TravisMuhlestein's post 11 days ago

Interesting

repliedto unmodeled-tyler's post 22 days ago

Glad to see

repliedto AbstractPhil's post 23 days ago

what is your model precision you are training on? BF16 or FP8?
I think the most likely reason you're seeing R1 hit 100% constantly is some training/validation overlap. When you moved from the 200k set to the 500k set, a portion of the validation samples probably ended up inside the training pool, so the model is essentially seeing the answers beforehand and memorizing them.

The best way to fix, is would be to rebuild the validation split from a completely separate dataset (or at least re-split the full dataset with strict deduplication so no caption/image pairs appear in both sets). Once the validation set is clean and never seen during training, the recall numbers should drop to something more realistic and you'll get a proper measure of generalization.

this happens to me many times..as model starts memorizing the training data, I used this common methods:

Dropout: randomly turns off some neurons during training so the model doesn’t rely on the same paths and just memorize the data.
Weight decay (L2 regularization): slightly penalizes big weights so the model learns simpler patterns instead of fitting exact samples.
Data augmentation: adds small variations to the data (image crops, jitter, caption noise) so the model sees slightly different versions instead of the exact same inputs.
Label smoothing: stops the model from being overly confident about the “correct” answer, which helps reduce memorization.
Early stopping: you stop training once validation stops improving so the model doesn’t keep training and start memorizing.
Hard negative mining: give the model harder wrong examples so it actually learns the differences instead of just remembering pairs.

repliedto branikita's post 26 days ago

Wow, amazing

repliedto AbstractPhil's post 26 days ago

Which hardware you are using to train that model, and if you ever release the distilled data of 5 berts teacher models that is also really helpful

repliedto AbstractPhil's post 26 days ago

That's great! Keep doing the work :)

repliedto AbstractPhil's post 26 days ago

Where is that model?

posted an update about 1 month ago

Post

407

We have now LTX 2.3 with more better visual quality and richer sound, check it out! Lightricks/LTX-2.3

repliedto NJX-njx's post about 1 month ago

Oh wow

repliedto OzTianlu's post about 1 month ago

Interesting

repliedto imnotkitty's post about 1 month ago

that's good to hear but all of those guys are doing distillations at large scale of both open and closed source models, so like it's very common, but still these chinese model having too much censorships and full of chinese propagandas, so it is worthless to make them as a base model, but they are good for distillation anyway ;)

Ujjwal Tyagi

AI & ML interests

Recent Activity

Organizations

Ujjwal Tyagi

AI & ML interests

Recent Activity

Organizations

Ujjwal-Tyagi's activity