15 Jan 2018

A primer for online learning

I have encountered a number of scenarios in my role as a data scientist which I think would be suitible for online machine learning algorithms. This post contains some basic examples to give some intuition where an online learning algorithm may be suitible.

Read more

31 Oct 2017

A random forest of trees

I implemented a decision tree classifier in a previous post. Here, I extend the model to create a random forest model as an ensemble of trees.

Read more

30 Sep 2017

a home grown tree

I had a project recently at work which, although it wasnt a typical classification problem, I found a nice solution involving a recursive partition tree. This got me thinking that I have never taken the time to implement a classification tree model from scratch.

Read more

31 Jul 2017

Machine learning pipelines with Scikit-Learn

I've been aware of Scikit-Learn's Pipeline class for a long time, and for some reason have never really got around to having a play with it. Turns out you can do some pretty powerful things, and I will certainly be using it a lot more in future.

Read more

29 Jun 2017

Bundling Python Packages for PySpark Applications

I've been using Spark for a fair few months now via the Python API. As with any rapidly developing technology with few experts around to learn from, I've found getting up and running to require a fair amount of effort. After a little bit of searching around, I found how to bundle python packages up to distribute around the cluster when submitting an application, and though some clear instructions could be of use...

Read more

24 Mar 2017

pycaret- a python framework for classification and regression training

As a constructive way to improve my Python skills and understanding of supervised machine learning, I wrote a Python framework for classification and regression training, inspired by Max Kuhn's R caret package.

Read more

15 Aug 2016

My Predictive Modeling Workflow

I've seen quite a few blogs about people describing their workflow for predictive modelling, from data preperation through to model evaluation. While I am adamant that there is no one size fits all approach, I thought I would share my template that I find serves as a good starting point.

Read more

12 Jul 2016

Class imbalance in classification models

Class imbalance can have a massively negative impact on classification models. I investigated several means to remedy the problem, using the Adult data set as an example.

Read more

06 June 2016

Quick, easy, secure file sharing

I wanted a way to share reasonably large files that is secure, free, and no hassle. I've found a solution using transfer.sh and gpg that is good up to 10Gb!

Read more

20 May 2016

Model Ensembles: Regression

After gaining a lot of interest in the subject, I have started developing an R package to create model ensembles. Here, I focus on ensembles for regression modelling.

Read more