always use a meme to kick off a tutorial Here is an anomaly detection tutorial that i created for my boss and the open source community where i work. It's part of some work i have been doing around adding some anomaly detection functionality into our open source monitoring project. Like most ML projects the … Continue reading Anomaly Detection Tutorial
Category: python
Numpy Feature Engineering – 2x Speed Up Over Pandas!
The Setup This is a little one I was surprised to see. Recently I had a need to do some pretty basic feature engineering to a pandas dataframe prior to training some models. Basically I needed to take differences of each column, apply some smoothing, and then add a number of lagged columns for each … Continue reading Numpy Feature Engineering – 2x Speed Up Over Pandas!
Market basket analysis in Python
An actual market basket I found in my Google photos. tl; dr; if you find yourself doing some association rule mining using mlxtend but finding it a bit slow then checkout PyFIM - here is a colab I made to get you started. I have recently been looking to do some market basket analysis ("Association … Continue reading Market basket analysis in Python
Time series clustering with tslearn
I've recently been playing around with some time series clustering tasks and came across the tslearn library. I was interested in seeing how easy it would be to get up and running some of the clustering functionality that is already built into tslearn, turns out it was quite easy and straight forward, perfect blog post … Continue reading Time series clustering with tslearn
Premature Optimization
I've been doing some work that necessitated using the same statistical test from spicy lots of times on a fairly wide pandas dataframe with lots of columns. I spent a bit too much time googling around for the most efficient ways to do this, and even more time re-writing things various way before realizing i … Continue reading Premature Optimization
Ireland Covid19 Data
I was looking around a bit and could not really find any datasets behind the daily updates from the Irish government that get posted here. In particular i was thinking the break out tables of numbers by different dimensions might be of use for anyone looking to analyse the data. So here is a python … Continue reading Ireland Covid19 Data
A little brainteaser (or i’m an idiot)
This took me waaay too long to work out today and i was thinking it could make a nice little interview coding type question (which i'd probably fail). Suppose you have 10,000 rows of data and need to continually train and retrain a model training on at most 1,000 rows at a time and retraining … Continue reading A little brainteaser (or i’m an idiot)
Github Webhook -> Cloud Function -> BigQuery
I have recently needed to watch and track various activities on specific github repos i'm working on, however the rest api from Gtihub can sometimes be a bit limited (for example, best i could see, if you want to get the most recent list of people who began watching your repo you need to make … Continue reading Github Webhook -> Cloud Function -> BigQuery
My First PyPI Package
I've been threatening to myself to do this for a long time and recently got around to it, so as usual i'm going to try milk it for a blog post (Note: i'm not talking about getting into a box like the below picture, its something much less impressive). Confession - I don't know matplotlib … Continue reading My First PyPI Package
KubeFlow Custom Jupyter Image (+ github for notebook source control)
I've been playing around a bit with KubeFlow a bit lately and found that a lot of the tutorials and examples of Jupyter notebooks on KubeFlow do a lot of the pip install and other sort of setup and config stuff in the notebook itself which feels icky. But, in reality, if you were working … Continue reading KubeFlow Custom Jupyter Image (+ github for notebook source control)