Numpy Feature Engineering – 2x Speed Up Over Pandas!

The Setup This is a little one I was surprised to see. Recently I had a need to do some pretty basic feature engineering to a pandas dataframe prior to training some models. Basically I needed to take differences of each column, apply some smoothing, and then add a number of lagged columns for each … Continue reading Numpy Feature Engineering – 2x Speed Up Over Pandas!

Time series clustering with tslearn

I've recently been playing around with some time series clustering tasks and came across the tslearn library. I was interested in seeing how easy it would be to get up and running some of the clustering functionality that is already built into tslearn, turns out it was quite easy and straight forward, perfect blog post … Continue reading Time series clustering with tslearn

KubeFlow Custom Jupyter Image (+ github for notebook source control)

I've been playing around a bit with KubeFlow a bit lately and found that a lot of the tutorials and examples of Jupyter notebooks on KubeFlow do a lot of the pip install and other sort of setup and config stuff in the notebook itself which feels icky. But, in reality, if you were working … Continue reading KubeFlow Custom Jupyter Image (+ github for notebook source control)

Multi-Variate, Multi-Step, LSTM for Anomaly Detection

This post will walk through a synthetic example illustrating one way to use a multi-variate, multi-step LSTM for anomaly detection. Imagine you have a matrix of k time series data coming at you at regular intervals and you look at the last n observations for each metric. A matrix of 5 metrics from period t to t-n One approach … Continue reading Multi-Variate, Multi-Step, LSTM for Anomaly Detection