A little brainteaser (or i’m an idiot)

This took me waaay too long to work out today and i was thinking it could make a nice little interview coding type question (which i’d probably fail).

Suppose you have 10,000 rows of data and need to continually train and retrain a model training on at most 1,000 rows at a time and retraining the model every 500 rows, can you tell me how many “batches” of data this will create and the start and end index of each batch?

So thats:

n = 10000
train_max = 1000
train_every = 500

And we want a dictionary like this:

  1: {"start": 1, "end": 1000},
  2: {"start": 500, "end": 1000}, 
  ?: {"start": ?, "end": ?},

After doing some crazy loops in python for a while I decided to go back to basics and do it Jeremy Howard style in excel (well gsheets – i’m not a savage) – gsheet.

And here is my Python solution:

…I’m pretty sure someone will come along with a super pythonic one liner that shows maybe i am an idiot after all.

Ok now back to work.

Update: Actually i think what i want is more something like the below where you can define a minimum and maximum size of your training data and then roll that over your data.