I put together my own dataset, and after 30 episodes in dataset and about 6,000 fine-tuning steps, the model finally started doing something reasonable 🎉 https://www.youtube.com/shorts/C3_fH8jhzo8
The only funny part is that no matter what text prompt I give it, it still sticks to the same action from the dataset (prompt from training dataset: “pick up the cube and place it in the bin.”)
Any tips on figuring out the right number of max-steps for fine-tuning? And what are your go-to best practices? I heard in NVIDIA’s livestreams that there’s a chance of overtraining, which might make the model ignore text prompts, so I’m curious how you avoid that.