1 March 2015
Coursera Machine Learning
This course had a lot to live up to: it was Coursera's flagship course, and essentially paved the way for what it is today. After having completed it, I can fully appeciate how Coursera did so well. Andrew Ng's course is, to put bluntly, fantastic. I learned an insane amount, both from the perspective of getting practice prototyping learning algorithms in Octave, and also understanding better the context in which algorithms should be applied.
The programmming assignments are all in GNU Octave or Matlab, and probably are the one thing that may put people off if they haven't had previous experience. My advice is: see it through! I had no experience at all with Octave or Matlab, and I found that the first few programming assignments had me frustrated as I took hours doing in Octave what would have taken me about 20 minutes in R. By week four or five I was proficient enough that I could enjoy the exercises rather than fight against them. If you see this course through the adjustment to Octave phase, I promise you won't regret it!
As per most courses, week one delivers some appetite whetting, but the main focus is to introduce linear regression. I found throughout the course that Andrew hits the level of detail required spot on: he doesnt bombard you with linear algebra, calculus, and pages of derivations, but he delivers the required concepts in an accesible manner that I think anyone with a reasonable mathematical background will be able to digest (hint- if you don't have experience working with vectors, you may benefit from getting a little practice before getting stuck in with this course.)
To me, linear regression was mainly revision, however there were a few occasions where moments of realisation hit me, and concepts I had taken for granted previously sunk in just a little bit more. What I mean to say is, despite having seen similar material before, I still really enjoyed this week!
Well, time for Octave part 1! Andrew introduces multivariate linear regression, and sets you to work implementing it in an Octave exercise. His assignments are really well devised: he gives you template excercises, and lets you fill in key sections to test your understanding.
All the concepts were familiar to me for the assignment, as were implementation techniques (if you use R, like me, you will be familiar with the advantages of writing vectorised code). Octave, however was not! I think this week the 3 hour assignment took me a little more than that... probably 4 or so. However, much of that was getting to grips with Octave (and making sure my column and row vectors were the right way around!). I enjoyed the challenge though, and felt like I had well earned my victory by the time I completed the assignment.
Andrew also introduces the concept of regularisation (the technique he teaches is commonly called Ridge Regression). I typically apply the elastic net method when I need regularisation for linear models, which is a combination of ridge regression and the lasso method (that can apply feature selection!). The great discussion of regularisation allowed many concepts of the elastic net to sink in that I had previously taken for granted. For that alone, this course had already proved its worth!
Time to move on from linear regression to classification. Once again, I had already implemented a logistic regression classifier in R, but felt a little like I had been following a step by step guide rather than gaining a decent understanding when I coded my implementation up. This week changed that: even though Andrew doesn't go to town with a complex derivation, he delivers a coherant explanation which gives you enough to understand and implement the technique without bogging you down in excess detail.
I found the Octave exercise a little challenging, but once again this was due to my lack of experience with Octave. And once again, I felt a sense of genuine satisfaction and accomplishment once I finished it. Further, this course taught me how multi class (one-vs-all) classification worked; I admit I only really understood binary classification until I finished this week!
Weeks 4 and 5
These were my favorite two weeks of the course. Neural network time baby! As you can probably guess from the fact there are two weeks dedicated to this, there is a fair amount to go through. One thing I really liked throughout was Andrew's honesty. Optimising the network parameters requires a technique called backwards propogation, which is not a simple or intuitive concept. Andrew walks you through it at a good pace, and wraps up the discussion by discussing that it was a concept that took him a while to grasp. What I love about his teaching style is that he is never trying to show off; he just wants to share his knowledge in the most effective way possible.
The Octave assignments this week are (in my opinion) as hard as they get. If you completed these, you will be fine for the rest of the course. As the concepts of neural networks require a bit of thought, it is great training both for algorithm writing skills and Octave practice. I can't really describe how cool it is having the oppurtunity to hand-write your own neural network classifier. I actually extended these ideas and wrote my own R package, where the user can define the number of layers and units per layer. I got a classification accuracy of 96.8% on the kaggle digits competition with it! Everything I know about neural networks I learnt here. Loved this topic. I suppose my next challenge is to extend these ideas to regression...
This week was great. Andrew focusses on advice for applying learning algorithms, touching on bias/variance tradeoff, data preprocessing, and the suitability of the learning algorithms touched upon to date for different applications. Once again, I found that bias/variance trade off was something I 'knew' but didn't fully understand, so great to get a proper insight to that.
As I do most of my 'applied' machine learning with the (excellent) R caret package I already have a framework in place for training, testing and cross validation; but none the less the concept of learning curves was new to me, and I still think I benefited a lot from Andrew's advice and insight this week.
I was really looking forward to this week, and didn't quite get what I expected. Not necessarily a bad thing though! I was hoping to get a chance to code and SVM from scratch, but the lesson at the end of the day is that they are hard. If you want to use one, chances are you are better off relying on a package that someone far smarter and better at coding and maths than you has already written.
Andrew focusses on the concept of the support vector machine for classification, and then moves on to discuss the radial (or gaussian) kernel at length. The Octave assignment gives you a chance to apply an implementation of an SVM from a library. I felt I got a lot from the discussion of how the SVM works, but at the same time I felt a little disappointed that I didn't get to implement my own. However, I guess that this exercise is held back for good reason; it's probably technically challenging, and without dedicating some serious effort and time, chances are any any implementation I could write wouldn't even start to compare to the various off-the-shelf libraries.
The focus of this week moves on from supervised learning. Andrew introduces k-means clustering for unsupervised classification, and principle component analysis for dimension reduction.
I found, despite having written my own k-means clustering algorithm months ago, the discussion of k-means clustering to be useful, filling in some gaps in my knowledge and discussions of applications.
The discussion of PCA was really useful too: once again I had encountered the topic several times before, yet Andrew's discussion hit a few points home that I had not considered before. I generally apply PCA for visualisations rather than pre-processing, mainly as I rarely deal with large enough data sets to warrant compression. The kaggle digits competition is a case where compression could be useful, but I actually used a different technique for eliminating redundant predictive inputs when I tackled the challenge.
This week was really interesting, introducing two topics I had never implemented before. My only criticsm is that I would have liked an entire week for both topics!
Firstly, Andrew introduces anomoly detection. My understanding is that the approach is similar to Naive Bayes, modelling probabilities as the product of multiple Guassian distributions. The discussion evolves to multinomial Guassian distributions; that is, allowing the description to accomodate correlations between predictive inputs. Overall, it was great introduction with a really useful Octave assignment to hit the points home.
The section on recommender systems was fascinating. Andrew introduces collaborative filtering in an accessible manner, and the assignmnet focuses on a recommenser system for movie ratings. I enjoyed this so much that I think I will be tackling a MOOC focussing entirely on recommender systems next. Like I said: my only criticism of this week was that I would have loved to have delved deeper into both topics!
I got the feeling this week that the course was starting to wrap up. There are no Octave assignments (by this time I really enjoyed them, so this was a little sad), but for good reason, as Andrew delivers an introduction of concepts relating to 'big data' machine learning, including stochastic gradient descent and MapReduce. These topics require specialist architechture to apply (such as JPC clusters or cloud computing), so it is somewhat understandable that there is no programming assignment! I think there is only so much you can learn from the general concepts without actually having a go at implementing them, but none the less Andrew gives enough of an introductin so that you are both aware of the concepts, and can intelligently talk about them, or recognise where they may be appropriate to use.
The 11th and final week. Andrew steps through the pipeline of the design of a photo OCR application. As well as a brief description of the process, the concept of ceiling analysis is introduced. This was hugely educational for me, and anyone who works on design applications or products that rely on machine learning techniques would benefit hugely from even a brief introduction to this concept. Andrews warning story of researchers dedicating 18 months to background subtraction which only made a marginal effect on the performance of an image recognition system is a lesson that everyone should take something from.
The final summary and thank you video had me feeling almost a little emotional. It was here that it really sunk in just how much I had learnt, and how good the course was. Even though I was remotely learning through a series of recorded lectures, I realised just how engaging Andrew had made the course, and how good a learning experience it was. In these 11 weeks Andrew had shared a huge amount of knowledge, and really sets up anyone who followed the course through with a solid base level of understanding and experience to build upon.
I learnt a huge amount from this course, and my main regrets are that it took me almost a year to get around to doing it, and also that it is now over! I really feel like I have enhanced my appetite for investigating, understanding, and applying learning algorithms. There are a number of techniques that weren't covered, like convolution neural networks (and other 'deep learning' techniques), trees, rules, boosting, and model ensembles which I feel motivated to go and explore by myself. I think it would be a great exercise to try and implement my own CART tree, for instance.
I am also really keen to investigate some of the other topics in more detail. My expertise lies mainly in supervised learning; however I found recommender systems to be fascinating, and an area that I would love to revisit in more detail. Well, it seems there are a couple of MOOCs for that... so I guess that is my next stop!
TL;DR- Really, really fantastic course. If you are interesting in machine learning beyond pulling libraries off a shelf, I cannot recommend this course highly enough.