1 Nov 2015

Coursera Mathematical Biostatistics 2

I decided to take this course after having completed Brian Caffo's Mathematical Biostatistics Bootcamp 1. I guess, assuming that most people complete series in order, that anyone else thinking of taking this course will have completed Bootcamp 1. Therefore, I can assume that anyone thinking of taking this course enjoyed the first course and learnt a lot from it (otherwise why would you be thinking of taking this course), so I wont dwell too much on my opinion of the instructor (fantastic, in a word).

### Overview

This course focusses mainly upon categorical data analysis, other than the first module which covers hypothesis testing. Virtually all the material was new to me, other than the first module, so I really learnt a lot from this course. I think many of of the concepts have proven really valuable, and crop up time and time again in machine learning. This is one of those courses where the concepts taught pop up time and time again, and you will not regret investing the time required to complete it!

As per Biostatistics Bootcamp 1, this is a seven week course covering four modules. Also, as per Biostatistics Bootcamp 1, you need to have a solid mathematical background or you will struggle. The course is assessed by a quiz per module, which is always preceeded by practice questions with worked solutions.

### Module 1

The course jumps straight in with hypothesis testing, confidence intervals, p-values and power, which was the only bit of the course to me that was revision. Still, it provided a good warmup (and demonstrated just how much I had forgotten on the subject!) and set the pace for the remainder of the course.

### Module 2

The course moves on in module two to binomial tests, relative risks, and odds ratios. This discussion sets up nicely for the fact that the remainder of the course focusses heavily on categorical data analysis. There is a brief discussion on the delta method- perhaps too brief, as its practical usage wasnt too clear to me after the lecture.

### Module 3

Heres where the really good stuff kicks in. Fishers exact tests, Chi-squared tests, looking at testing independence, equality of several proportions and goodness of fit testing. This stuff was really well discussed, with really neat examples- some fairly amusing as well! Details for practical implementation in R are also given.

### Module 4

Simpsons paradox is covered, and then the course moves on to case-control data, and matched outcome tables. Finally, the module covers nonparametric tests. This last lecture is one where I think more worked examples would have been really beneficial. I finished the final lecture of the course feeling like I had received an information overload, and don't think I managed to digest all the information presented. Perhaps Brian should have rounded up to eight weeks, and taken his time a bit more for the last module.

### Other Thoughts

Overall, another fantastic course from Brian Caffo. I really enjoyed his teaching style, and think I took a lot from this course. I wouldnt claim by any stretch of the imagination to be an expert in statistics, but I think its really crucial to get exposed to this stuff, and know what tools and techniques are available. Brian mentions that this stuff is entry level Masters of Biostatistics. If I ever had the time, I would love to complete a Masters in some form of statistics- I think that is the best compliment I can give. This stuff is interesting, challenging, and rewarding to learn and understand.

*TL;DR- a great intermediate statistics course, which will familiarise you with many topics key to categorical data analysis and predictive statistical modelling, as well as other areas. If you want to gain a decent base level of understanding, I recommend this course. But be prepared for a challenge!*