I get asked this a lot by students so decided to make a little list in here that I can add to and point people towards. https://github.com/awesomedata/awesome-public-datasetshttps://www.kaggle.com/datasetshttps://datasetsearch.research.google.com/https://cloud.google.com/bigquery/public-datahttps://cloud.google.com/public-datasetshttps://registry.opendata.aws/https://data.world/data
Continuation from this post. An unsupervised spatiotemporal graphical modeling approach to anomaly detection in distributed CPS (Cyber Physical Systems). Link My Summary: Really interesting paper - PGM’s, HMM’s and all that good stuff. Quite complicated though and no clear route to implementation. Also I would wonder how well it scales beyond 10’s of time series. … Continue reading Papers i’m reading #2
Below is a little mini project i did a while back looking at cell tower usage data. Main takeaway really was a nice example of how subjective clustering can be, especially the more features and variables you feed into your distance metric.
Arrrgghh - I just wasted the best part of an afternoon chasing this one down. If i can knock out a quick post on it then at least i'll feel i've gotten something out of it. Here's the story - somewhere in an admittedly crazy ETL type pipeline i was using pandas pct_change() as a … Continue reading ( 0 – 0 ) / 0 != 0
A simple example of how to use args4j to add command line args to a simple "Hello World" type java application. For example below command line execution would print "Hello arg4j!" instead of the default "Hello World!" if you don't pass any args. $ java -jar helloWorldParamaterized --msg='Hello arg4j!' https://gist.github.com/andrewm4894/0e9d741126d7c8add261724868d4a844
I've recently been learning java for machine learning related work (a long story to do with mainframes, and you know, why not). Decided to stick down in here some resources i found very useful as i found i still needed to do a bit of Googling to find ML related Java tutorials and moocs i … Continue reading Java for Machine Learning
List of blog posts i've done before this site. Multi-Variate, Multi-Step, LSTM for Anomaly DetectionA Docker Data Science RecipeCelebrity Word VectorsPlaying around with Apache Airflow & BigQueryContent Lifecycle ClusteringInterview with BigData-MadeSimple.comOne potential pitfall with referral source tracking and AMP in Google Analytics
Was doing some blogs on medium but don't trust them to not all end up behind some sort of paywall so decided to shell out a couple quid a month for my own domain.