13 Jan 2016

Packages, Linear Models, and S4

I have been taking Andrew Ng's Machine Learning course on Coursera, and am interested to implement my machine learning algorithms in R, as well as getting some practice with the S4 system. So here we have it, a very simple implementation of linear regression using an S4 class.

08 Dec 2015

k-NN Classification

I've been meaning for a while to start writing my own statistical learning algorithms. My motiviation isn't to reinvent the wheel, but to gain a proper understanding of how various techniques work. Here is the easiest of them all, k-nearest-neighbor classifier, where the input is numerical.

7 Dec 2015

Simple clustering

On the train the other day, I set myself a little challenge to write a compact little k-nearest neighbors clustering algorithm. I wrote it in R, so it definitely wouldn't stand to a C++ equivalent (as there is lots of looping involved). However, it is a nice little demonstration of how use of functional programming with the [*]apply family can make things neat and concise.

1 Dec 2015

Coursera Data Manipulation at Scale: Systems and Algorithms

What a fantastic course! I really enjoyed this one, and for anyone interested in a well rounded introduction to 'Big' data science, I wholeheartedly reccommend it.

22 Nov 2015

I've dug up some experimenting I did a while ago, in the hopes that it may be an enlightening illustration for others. I'd been using the R caret package for a while, and taking for granted the simple and easy resampling available when training models. This is a nice, simple example of how resampling using the bootstrap can be peformed.

1 Nov 2015

Coursera Mathematical Biostatistics 2

Tough, but interesting. Really interesting.

10 OCT 2015

The Magic of Rcpp

A quick and easy R package with some initial mucking around integrating compiled C++ with R

01 Oct 2015

Churning with Caret: Linear Models

Predictive modeling on the churn data set with linear models. This is the first in a series of blogs where I investigate the data set- I will also be writing about non-linear and tree based models, and then I plan get carried away considering topics such as model ensembling and feature selection.

30 Sep 2015