15 Jan 2018

A primer for online learning

I have encountered a number of scenarios in my role as a data scientist which I think would be suitible for online machine learning algorithms. This post contains some basic examples to give some intuition where an online learning algorithm may be suitible.

31 Oct 2017

A random forest of trees

I implemented a decision tree classifier in a previous post. Here, I extend the model to create a random forest model as an ensemble of trees.

30 Sep 2017

a home grown tree

I had a project recently at work which, although it wasnt a typical classification problem, I found a nice solution involving a recursive partition tree. This got me thinking that I have never taken the time to implement a classification tree model from scratch.

31 Jul 2017

Machine learning pipelines with Scikit-Learn

I've been aware of Scikit-Learn's Pipeline class for a long time, and for some reason have never really got around to having a play with it. Turns out you can do some pretty powerful things, and I will certainly be using it a lot more in future.

29 Jun 2017

Bundling Python Packages for PySpark Applications

I've been using Spark for a fair few months now via the Python API. As with any rapidly developing technology with few experts around to learn from, I've found getting up and running to require a fair amount of effort. After a little bit of searching around, I found how to bundle python packages up to distribute around the cluster when submitting an application, and though some clear instructions could be of use...

24 Mar 2017

pycaret- a python framework for classification and regression training

As a constructive way to improve my Python skills and understanding of supervised machine learning, I wrote a Python framework for classification and regression training, inspired by Max Kuhn's R caret package.

15 Aug 2016

My Predictive Modeling Workflow

I've seen quite a few blogs about people describing their workflow for predictive modelling, from data preperation through to model evaluation. While I am adamant that there is no one size fits all approach, I thought I would share my template that I find serves as a good starting point.

12 Jul 2016

Class imbalance in classification models

Class imbalance can have a massively negative impact on classification models. I investigated several means to remedy the problem, using the Adult data set as an example.

06 June 2016

Quick, easy, secure file sharing

I wanted a way to share reasonably large files that is secure, free, and no hassle. I've found a solution using transfer.sh and gpg that is good up to 10Gb!

20 May 2016

Model Ensembles: Regression

After gaining a lot of interest in the subject, I have started developing an R package to create model ensembles. Here, I focus on ensembles for regression modelling.

