24 August 2015

Hadley Wickham's Advanced R

I suppose this post is a discussion of my first pass through the Advanced R- I will certainly be revisiting several sections in greater detail. I worked through this book on my lunch break over the course of a couple of months, and thoroughly enjoyed it. I found it to follow on at an appropriate level from the Johns Hopkins Data Science Specialization on Coursera, and stuff you will learn will definitely start to move you through the ranks of competancy as an R programmer.

Overview

Hadley nicely breaks the book down into several sections: foundations, functional programming, metaprogramming and peformant code. I found at my level, the foundations chapter served really well to fill in the gaps of my knowledge- for example stuff I was used to doing, but perhaps had picked it up without fully appreciating what was going on, and just using programming idioms by force of habit rather than a real understanding.

One key part that I have definitely shared with others who are fairly competent at R programming is the styling section- programmers used only to R are notorious for their abuse of '.' notation, which grinds the gears of anyone who is used to object oriented programming. For example, anyone who names a variable input.data will probably annoy and confuse people. Hadley's styling guide is good, and when you work with others on projects it is key to agree on a style of coding before you end up in arms with one another.

The O-O field guide was perhaps disappointingly short, and I'm going to have to rely on other resources to get to grips with the S4 system. However, the author does make this point clear that it is a large topic in itself, and only a quick summary will be provided.

Moving on to functional programming, I think this is the most important section of the book. To get competent at R programming, you need to be competent at functional programming. Hadley's introduction to the [*]apply family and functionals is outstanding. He moves on to function operators after a few chapters. This was a new concept to me- I found it fascinating, and definitely a topic I will revisit so that I can comfortably incorporate the techniques in my day to day coding.

The section on metaprogramming definitely moves up a level of difficulty. If you have used R interactively, chances are you have taken advantage of metaprogramming, perhaps without even knowing it. I enjoyed working through the chapters and the excercises, however coming from a background programming in Fortran, the concept of metaprogramming was a bit alien. This was perhaps the only section where I felt not entirely convinced where and how I should be using these techniques, despite the excellent description of what is going on. This will change soon, as I plan to revisit this section and get more familiar with the concepts so that I can take full advantage of them in my code.

The section on peformant code is fantastic, with an outstanding introduction to Rcpp. Dirk Eddelbuettel's package allows seamless integration of C++ into R, allowing insane speedups without having to resort to (potentially ridiculous) cloud based parralelisation running R. I learnt a bit of C++ a few years ago, and this has certainly motivated me into brushing up to the standard that I can begin to incorprate my own high peformance functions into my R code. I'll probably blog about that soon.

Hadley also goes through some other neat tricks and practices, such as stripping down functions to speed them up, and also discusses the process you should go through when trying to eliminate bottlenecks and optimise code. This stuff is I suppose fairly standard in all programming languages, but the author is an excellent communicator and makes his points clear, concise, and interesting to read.

Throughout the book, Hadley provides many short assignments to test your knowledge, and I found it really beneficial to work through them. Most of the questions were great, however on a couple of occasions I didnt really follow what Hadley was getting at and asking me to do. I think some worked solutions would have been really helpful... although I suppose that would have discouraged people from attempting them by themselves.

Other Thoughts

I think overall this book is fantastic. Hadley Wickham has become somewhat of a celebrity in the R community, and this book is proof that it is for good reason. He really knows his stuff, and is an excellent author and teacher. I also quite liked that Hadley focuses more on teaching R programming than showing off his own (many) packages.

TL;DR- For anyone serious at getting competent at R programming, read this book, and keep a copy on your desk.