30 Sep 2015

Hadley Wickham's R Packages

I tracked down this book after attending the 2015 London Earl conference- I was aware of the basic ideas behind making my own R packages, but had never found a situation that warranted me working through the enntire process. However, at the conference the advantages of working with packages was made abundantly clear; they are easy to share, documented and version controlled (should you be working sensibly!). Even if you don't fancy making a package to upload to CRAN, I have heard arguments that you should even make a package to accompany a data analysis- it keeps functions neat, the work flow logical, and more importantly makes collaborating with others, or future you, much easier. Further, they play an essential role with reproduceable coding.

I completed the Johns Hopkins Data Science course on Data Products before reading this book- and I might suggest that you do too. Hadley is a great teacher and communicator, but having a worked example to put everything in context will be very benefical before you get stuck in!

Also, don't worry about buying the book if you are saving your pennies. It's available online here (provided by the author, not some dodgy Russian site!)

Overview

Hadley structures the book very logically and nicely. In part 1, we get introduced to the why and what for R packages- its pretty interesting and compelling- the author definitely keeps the text accessable to anyone who is interested, rather than keeping it unnecessarily cryptic so that you might need a PhD in computer science to get anything out of it.

Part 2 is essentially the bread an butter of this book, looking at package components. I wont list the contents back to you (as you will find the book far more interesting than me). I think the only part I glossed over was the section on vignettes- I'm quite happy with using Roxygen to comment up my functions until the day comes where I have a fantastic package to release on CRAN- I'll get stuck in with vignettes then.

The book is pure gold for it's discussion of the S4 system. Up until I read this book I would always get annoying side effects with S4 classes- they do not like being sat in the Global Environment, and I found that I would always get error messages popping up (which didnt do anything other than annoying me). Your S4 class definitions, generics and methods need to sit in packages! I will later blog about a couple of examples I have been working on... just to prove how easy it is!

Another fantastic section is the one on compiled code. C++ code is soooooo easy to incorprate into R packages. Have a read of Hadley's section on Rcpp in his other book, become amazed that it is so easy to include compiled code C++ into your package, and then praise Dirk Eddelbuettel.

The section on testing is one that is VITAL to read if you are in a glossy mood. A developer friend of mine always seemed somewhat amazed that I never used to use tests in my academic days... in fact I didn't use many sensible things such as git either. I put it down to having too much maths to do! But I am repentant, and greatful for this book demonstrating just how easy it is to write tests for your R package, and to protect yourself from breaking something and not realising until much later down the line.

The final section on Best Practices is a really helpful one. I'm going admit that I'm a bit snooty, and far prefer to do my version controlling via git from the command line rather than using RStudio's interface, but that is my personal preference.

I glossed a bit over the section on automated checking and package release, primariliy because I don't have anything at present I deem worthy to submit to CRAN. Hopefully this will change soon, and should be easy to do, thanks to having read this book!

Other Thoughts

Overall, this book is really worth reading. The level Hadley-worship in the community struck me as a little weird when I first started learning R... but he proves time and time again that he deserves the level of respect. I certainly find anything with his name on worth investigating, and chances are you will learn a lot from it.

Since reading this book and making a few example packages, I feel pretty confident in the process and have several on my github. Should the day come that I want to formally prepare one for submission to CRAN, I know that I will be able to turn to this book for any advice and best practice that I need.

TL;DR- you should be able to make R packages if you use R. This book will tell you how, although I suggest looking at a worked tutorial first to put everything into context.