Coursera Mathematical Biostatistics 1

1 Sep 2015

I signed up to this course after having completed the Statistical Inference module of the Johns Hopkins Data Science Specialisation on Coursera. I had really enjoyed that course, and found that it really demonstrated some gaps in my knowledge. Therefore, I decided to embark on Brian Caffo's Mathematical Biostatistics Boot Camps.

A quick comment about the instructor. He is like Marmite, you will either love him or hate him (pardon the analogy for any non-British readers). The forums are full of people either singing his praises, or declaring the course to be rubbish. I think you need to appreciate what this course is- Brian is teaching statistics, NOT just how to plug numbers into R to get statistical results. Now correct me if I am wrong, but statistics is a branch of mathematics, so if you actually want to understand statistics, you cant avoid the mathematics. I feel that the critics of this course fell into the camp of people who think statistics is easy, because its just plugging numbers into statistical software. If this is what you are after, you will not enjoy this course.

If you fall into the other camp, you will love this course. Brian Caffo has a unique teaching style, a very dry sense of humour, and he radiates a geeky love and enthusiasm for his subject that makes the course very compelling to follow along. I wont lie, I found this course challenging- but then again Brian does mention that by the end of Biostats Bootcamp 2 you will be working at the level of introductory Masters of Biostatistics courses.

Overview

The course breaks down into four modules, each spanning a couple of weeks, with practice questions (and worked answers) and graded homework questions. Brian uses the R statistical programming language throughout for plotting, and calculations where neccessary, so it will help if you have a little prior experience. My strategy was to always do the practice questions before attempting the quizzes- the worked solutions proved invaluable. With the graded assignments, I typically found I could get around 70% on my first attempt, 80-90% on the second attempt after pondering those I got wrong, and 90-100% on the third attempt by making educated guesses. If that is any measure of difficulty at all.

Module 1

The first module essentially serves as a warm up, and to get everybody onto the same page. For me, this was essentially brushing up on a lot of very rusty basics. I actually found the homeworks hardest from this module- in fact, somebody's theory on the course forums was that this may be deliberate, to weed out people unlikely to be able to see the rest of the course through. Not sure if I agree that the instructors would do this deliberately, but I found the course definitely began to gather momentum after the first module.

Module 2

This module looks at conditional probability, Bayes, likelihood, distributions and asymptotics. I had done a little on asymptotics, Bayes and distributions many years ago in my undergraduate years. Likelihood was a topic I was really glad to revisit and spend some time on- it is such a key concept (for example, it crops up all the time if you are interested in the theory behind machine learning algorithms).

Module 3

Module 3 is a bit R-specific in places- namely for plotting. If you don't like using R, you will probably not like this module... but it's good to learn new things! The main focus of the week is confidence intervals- yet again, such a key concept in statistics that I was happy to revisit it. The introduction to bootstrapping was really nice as well- Brian keeps the discussion alive with examples, rather than throwing you into the deep end with the maths. I think I'm quite happy to be in the shallow end for the time being, having flicked through some more advanced statistics books... None the less I feel that I understand the principle well enough for applications in resampling for machine learning applications.

Module 4

This was the only module that I felt was a little rushed- perhaps its because only one week was allocated for it, and perhaps its because I was coming to the end of a long course. I enjoyed following through the lectures, but I didn't really feel that the lessons hit home as well as they could have. I suppose the thing I found most interesting was the Baysesian analysis, but I think a little more time could have been spent on the topic.

Other thoughts

I really liked this course. It has done me a world of good to brush up on statistics, and I think the instructor was fantastic. I wouldnt claim to be an expert on statistics- but having worked through this course I feel I have a much broader knowledge, and a much more solid base level of understanding (even if I would have to refer back to my notes to apply something in practice!). I am really looking forward to Mathematical Biostatistics Bootcamp 2, and seeing what that course has to offer.

TL;DR- if you have a reasonable mathematical background and would like a course to brush up and expand your knowledge of statistics, I really recommend this. If you just like plugging numbers into software to churn out statistics, and dont really care how it does it, you should probably look elsewhere.