This took me waaay too long to work out today and i was thinking it could make a nice little interview coding type question (which i’d probably fail).
Suppose you have 10,000 rows of data and need to continually train and retrain a model training on at most 1,000 rows at a time and retraining the model every 500 rows, can you tell me how many “batches” of data this will create and the start and end index of each batch?
After doing some crazy loops in python for a while I decided to go back to basics and do it Jeremy Howard style in excel (well gsheets – i’m not a savage) – gsheet.
And here is my Python solution:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…I’m pretty sure someone will come along with a super pythonic one liner that shows maybe i am an idiot after all.
Ok now back to work.
Update: Actually i think what i want is more something like the below where you can define a minimum and maximum size of your training data and then roll that over your data.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters