I've been doing some work that necessitated using the same statistical test from spicy lots of times on a fairly wide pandas dataframe with lots of columns. I spent a bit too much time googling around for the most efficient ways to do this, and even more time re-writing things various way before realizing i … Continue reading Premature Optimization
Terraform is Magic + r/MachineLearning Links
Terraform is magic, i may be a little late to the game on this one and i'm sure it has it's fair share of haters (i've seen some have a love hate relationship with it, maybe i'm still in my honeymoon period). But from my point of view as a Data Scientist/ML Engineer playing around … Continue reading Terraform is Magic + r/MachineLearning Links
Ireland Covid19 Data
I was looking around a bit and could not really find any datasets behind the daily updates from the Irish government that get posted here. In particular i was thinking the break out tables of numbers by different dimensions might be of use for anyone looking to analyse the data. So here is a python … Continue reading Ireland Covid19 Data
A little brainteaser (or i’m an idiot)
This took me waaay too long to work out today and i was thinking it could make a nice little interview coding type question (which i'd probably fail). Suppose you have 10,000 rows of data and need to continually train and retrain a model training on at most 1,000 rows at a time and retraining … Continue reading A little brainteaser (or i’m an idiot)
Papers i’m reading #2
Continuation from this post. An unsupervised spatiotemporal graphical modeling approach to anomaly detection in distributed CPS (Cyber Physical Systems). Link My Summary: Really interesting paper - PGM’s, HMM’s and all that good stuff. Quite complicated though and no clear route to implementation. Also I would wonder how well it scales beyond 10’s of time series. … Continue reading Papers i’m reading #2
Github Webhook -> Cloud Function -> BigQuery
I have recently needed to watch and track various activities on specific github repos i'm working on, however the rest api from Gtihub can sometimes be a bit limited (for example, best i could see, if you want to get the most recent list of people who began watching your repo you need to make … Continue reading Github Webhook -> Cloud Function -> BigQuery
Papers i’m reading #1
I've recently set myself the goal of reading one academic paper a week relating to the ML/AI things i'm working on i'm my current role. To try help keep me honest and diligent in this regard, I've decided to get into the habit of jotting down some quick notes on each paper and every now … Continue reading Papers i’m reading #1
My First PyPI Package
I've been threatening to myself to do this for a long time and recently got around to it, so as usual i'm going to try milk it for a blog post (Note: i'm not talking about getting into a box like the below picture, its something much less impressive). Confession - I don't know matplotlib … Continue reading My First PyPI Package
KubeFlow Custom Jupyter Image (+ github for notebook source control)
I've been playing around a bit with KubeFlow a bit lately and found that a lot of the tutorials and examples of Jupyter notebooks on KubeFlow do a lot of the pip install and other sort of setup and config stuff in the notebook itself which feels icky. But, in reality, if you were working … Continue reading KubeFlow Custom Jupyter Image (+ github for notebook source control)
Multi-Variate, Multi-Step, LSTM for Anomaly Detection
This post will walk through a synthetic example illustrating one way to use a multi-variate, multi-step LSTM for anomaly detection. Imagine you have a matrix of k time series data coming at you at regular intervals and you look at the last n observations for each metric. A matrix of 5 metrics from period t to t-n One approach … Continue reading Multi-Variate, Multi-Step, LSTM for Anomaly Detection