The Setup This is a little one I was surprised to see. Recently I had a need to do some pretty basic feature engineering to a pandas dataframe prior to training some models. Basically I needed to take differences of each column, apply some smoothing, and then add a number of lagged columns for each … Continue reading Numpy Feature Engineering – 2x Speed Up Over Pandas!
Parallelize a wide df in Pandas
I was going to make a pretty picture. Sometimes you end up with a very wide pandas dataframe and you are interested in doing the same types of operations (data processing, building a model etc.) but focused on subsets of the columns. For example if we had a wide df with different time series kpi's … Continue reading Parallelize a wide df in Pandas
( 0 – 0 ) / 0 != 0
Arrrgghh - I just wasted the best part of an afternoon chasing this one down. If i can knock out a quick post on it then at least i'll feel i've gotten something out of it. Here's the story - somewhere in an admittedly crazy ETL type pipeline i was using pandas pct_change() as a … Continue reading ( 0 – 0 ) / 0 != 0