I have been working a bit lately with some text classification stuff using Hugging Face - its great n all but their docs can actually be a bit overwhelming. So here is a minimal text classification example, using huggingface and either pytorch or tensorflow (you decide). Will try to update and maintain the colab here: … Continue reading Hugging Face Text Classification Quickstart
Tag: python
Airflow “Trigger Dags” Python Script
You have some dag that runs multiple times a day but you need to do a manual backfill of last 30 days. It's 2022 and this is still surprisingly painful with Airflow. The "new" REST API helps and mean's all the building blocks are there but, as I found out today, there can often still … Continue reading Airflow “Trigger Dags” Python Script
Time series anomaly detection using PCA
Here is a little recipe for using good old PCA to do some fast and efficient time series anomaly detection.
streamlit multi-page app minimal example
too obvious? maybe. probably. Recently i had a need to assess streamlit for some internal DS/ML/Data apps i wanted to build in my job. By "i had a need" i mean i heard it was the new cool thing so i wanted to play with it and feel better about myself. Anyway, as part of … Continue reading streamlit multi-page app minimal example
Some asyncio fun/pain
Taken from this great Talk Python Training course - get the lifetime bundle if you can! You have a list of api endpoints you want to pull data from and collect results into some results list or dataframe for further processing. You could just loop over that list and make a load of requests.get() calls … Continue reading Some asyncio fun/pain
Anomaly Detection using the Matrix Profile
I like an excuse to play with fancy things, so when i first learned about the Matrix Profile for time series analysis, particularly around anomaly detection, i was intrigued. When i learned there was a nice python package (STUMPY) i could just pip install i was outright excited, as one thing i like more than … Continue reading Anomaly Detection using the Matrix Profile
Anomaly Detection Tutorial
always use a meme to kick off a tutorial Here is an anomaly detection tutorial that i created for my boss and the open source community where i work. It's part of some work i have been doing around adding some anomaly detection functionality into our open source monitoring project. Like most ML projects the … Continue reading Anomaly Detection Tutorial
Numpy Feature Engineering – 2x Speed Up Over Pandas!
The Setup This is a little one I was surprised to see. Recently I had a need to do some pretty basic feature engineering to a pandas dataframe prior to training some models. Basically I needed to take differences of each column, apply some smoothing, and then add a number of lagged columns for each … Continue reading Numpy Feature Engineering – 2x Speed Up Over Pandas!
Market basket analysis in Python
An actual market basket I found in my Google photos. tl; dr; if you find yourself doing some association rule mining using mlxtend but finding it a bit slow then checkout PyFIM - here is a colab I made to get you started. I have recently been looking to do some market basket analysis ("Association … Continue reading Market basket analysis in Python
Time series clustering with tslearn
I've recently been playing around with some time series clustering tasks and came across the tslearn library. I was interested in seeing how easy it would be to get up and running some of the clustering functionality that is already built into tslearn, turns out it was quite easy and straight forward, perfect blog post … Continue reading Time series clustering with tslearn