The Setup This is a little one I was surprised to see. Recently I had a need to do some pretty basic feature engineering to a pandas dataframe prior to training some models. Basically I needed to take differences of each column, apply some smoothing, and then add a number of lagged columns for each … Continue reading Numpy Feature Engineering – 2x Speed Up Over Pandas!
Category: machine-learning
Market basket analysis in Python
An actual market basket I found in my Google photos. tl; dr; if you find yourself doing some association rule mining using mlxtend but finding it a bit slow then checkout PyFIM - here is a colab I made to get you started. I have recently been looking to do some market basket analysis ("Association … Continue reading Market basket analysis in Python
I helped build a thing!
don't mind if i do Here is a thing i helped build in work that i'm fairly happy with: https://www.linkedin.com/posts/andrewm4894_netdata-introducing-our-first-netdata-cloud-activity-6712008465574887424-SlIr Now, onto the next thing!
Time series clustering with tslearn
I've recently been playing around with some time series clustering tasks and came across the tslearn library. I was interested in seeing how easy it would be to get up and running some of the clustering functionality that is already built into tslearn, turns out it was quite easy and straight forward, perfect blog post … Continue reading Time series clustering with tslearn
Terraform is Magic + r/MachineLearning Links
Terraform is magic, i may be a little late to the game on this one and i'm sure it has it's fair share of haters (i've seen some have a love hate relationship with it, maybe i'm still in my honeymoon period). But from my point of view as a Data Scientist/ML Engineer playing around … Continue reading Terraform is Magic + r/MachineLearning Links
A little brainteaser (or i’m an idiot)
This took me waaay too long to work out today and i was thinking it could make a nice little interview coding type question (which i'd probably fail). Suppose you have 10,000 rows of data and need to continually train and retrain a model training on at most 1,000 rows at a time and retraining … Continue reading A little brainteaser (or i’m an idiot)
Papers i’m reading #1
I've recently set myself the goal of reading one academic paper a week relating to the ML/AI things i'm working on i'm my current role. To try help keep me honest and diligent in this regard, I've decided to get into the habit of jotting down some quick notes on each paper and every now … Continue reading Papers i’m reading #1
KubeFlow Custom Jupyter Image (+ github for notebook source control)
I've been playing around a bit with KubeFlow a bit lately and found that a lot of the tutorials and examples of Jupyter notebooks on KubeFlow do a lot of the pip install and other sort of setup and config stuff in the notebook itself which feels icky. But, in reality, if you were working … Continue reading KubeFlow Custom Jupyter Image (+ github for notebook source control)
Multi-Variate, Multi-Step, LSTM for Anomaly Detection
This post will walk through a synthetic example illustrating one way to use a multi-variate, multi-step LSTM for anomaly detection. Imagine you have a matrix of k time series data coming at you at regular intervals and you look at the last n observations for each metric. A matrix of 5 metrics from period t to t-n One approach … Continue reading Multi-Variate, Multi-Step, LSTM for Anomaly Detection