All Posts

  • A little brainteaser (or i'm an idiot)

    February 27, 2020 by

    This took me waaay too long to work out today and i was thinking it could make a nice little interview coding type question (which i’d probably fail). Suppose you have 10,000 rows of data and need to continually train and retrain a model training on at most 1,000 rows at a time and retraining… Read more

  • Papers i'm reading #2

    February 20, 2020 by

    Continuation from this post. An unsupervised spatiotemporal graphical modeling approach to anomaly detection in distributed CPS (Cyber Physical Systems). Link My Summary: Really interesting paper – PGM’s, HMM’s and all that good stuff. Quite complicated though and no clear route to implementation. Also I would wonder how well it scales beyond 10’s of time series.… Read more

  • Github Webhook -> Cloud Function -> BigQuery

    January 28, 2020 by

    I have recently needed to watch and track various activities on specific github repos i’m working on, however the rest api from Gtihub can sometimes be a bit limited (for example, best i could see, if you want to get the most recent list of people who began watching your repo you need to make… Read more

  • Papers i'm reading #1

    December 30, 2019 by

    I’ve recently set myself the goal of reading one academic paper a week relating to the ML/AI things i’m working on i’m my current role. To try help keep me honest and diligent in this regard, I’ve decided to get into the habit of jotting down some quick notes on each paper and every now… Read more

  • My First PyPI Package

    November 26, 2019 by

    I’ve been threatening to myself to do this for a long time and recently got around to it, so as usual i’m going to try milk it for a blog post (Note: i’m not talking about getting into a box like the below picture, its something much less impressive). Confession – I don’t know matplotlib… Read more

  • Multi-Variate, Multi-Step, LSTM for Anomaly Detection

    September 9, 2019 by

    This post will walk through a synthetic example illustrating one way to use a multi-variate, multi-step LSTM for anomaly detection. Imagine you have a matrix of k time series data coming at you at regular intervals and you look at the last n observations for each metric. One approach to doing anomaly detection in such a setting is to… Read more

  • Custom Python Packages in AWS Lambda

    September 4, 2019 by

    I’m pretty sure i’ll be looking this up again at some stage so that passed one of my main thresholds for a blog post. I’ve recently been porting some data and model development pipelines over to AWS Lambda and was mildly horrified to see how clunky the whole process for adding custom python packages to… Read more

  • Clustering Cell Tower Usage Data

    July 20, 2019 by

    Below is a little mini project i did a while back looking at cell tower usage data. Main takeaway really was a nice example of how subjective clustering can be, especially the more features and variables you feed into your distance metric.

  • Parallelize a wide df in Pandas

    July 4, 2019 by

    Sometimes you end up with a very wide pandas dataframe and you are interested in doing the same types of operations (data processing, building a model etc.) but focused on subsets of the columns. For example if we had a wide df with different time series kpi’s represented as columns then we might want to… Read more

  • ( 0 – 0 ) / 0 != 0

    June 11, 2019 by

    Arrrgghh – I just wasted the best part of an afternoon chasing this one down. If i can knock out a quick post on it then at least i’ll feel i’ve gotten something out of it. Here’s the story – somewhere in an admittedly crazy ETL type pipeline i was using pandas pct_change() as a… Read more

  • Java Hello World cli using args4j

    May 1, 2019 by

    A simple example of how to use args4j to add command line args to a simple “Hello World” type java application. For example below command line execution would print “Hello arg4j!” instead of the default “Hello World!” if you don’t pass any args. $ java -jar helloWorldParamaterized –msg=’Hello arg4j!’

  • Parallel Jupyter Notebooks

    April 27, 2019 by

    I have become master of the notebooks, they bend at my will and exist to serve my data science needs! Ok i might be getting a bit carried away, but i recently discovered papermill and have been finding it very useful in conjunction with Python multiprocessing to speed up a lot of data science experimental… Read more

  • Java for Machine Learning

    April 25, 2019 by

    I’ve recently been learning java for machine learning related work (a long story to do with mainframes, and you know, why not). Decided to stick down in here some resources i found very useful as i found i still needed to do a bit of Googling to find ML related Java tutorials and moocs i… Read more

  • Previous blog posts

    April 25, 2019 by

    List of blog posts i’ve done before this site. Multi-Variate, Multi-Step, LSTM for Anomaly Detection A Docker Data Science Recipe Celebrity Word Vectors Playing around with Apache Airflow & BigQuery Content Lifecycle Clustering Interview with BigData-MadeSimple.com One potential pitfall with referral source tracking and AMP in Google Analytics

  • My Site…

    April 25, 2019 by

    Was doing some blogs on medium but don’t trust them to not all end up behind some sort of paywall so decided to shell out a couple quid a month for my own domain.

View all posts