26 Feb 2015

Coursera Managing Big Data with MySQL

I have dabbled with SQL on a number of occasions- for example Bill Howe's Big Data Systems and Algorithms course, and Charles Severence's Python Series on Coursera. However, these courses mainly focused on the principles of relational algebra or practical usage of databases, rather than providing practice with 'real world' databases.

This is exactly what this course provides, and I can really appreciate the amount of effort that went into it. The concepts are taught with a MySQL database, and the assignments focus on extracting infomation from a Teradata system. I'm really of the opinion that anyone can learn SQL, regardless of their programming background: all you need is practice. And you certainly get a lot of that with this course!

Week 1

Week 1 consists of a very gentle introduction- useful if you have never see databases before (indeed, the specialisation emphasises the transition from Excel to MySQL). I found it to be a useful refresher. Having had some previous experience with SQL I worked through it fairly quickly without any issues.

Week 2

Week 2 is where things start to get interesting. The course works around two datasets; one MySQL database filled with data from a company peddling a product that measures your pet dog's intelligence (Dognition), and one Teradata database containing one years trasaction data, as well as supplementary information, from a large US department store chain (Dillards). The instructors provide Jupyter (iPython) notebooks for the MySQL excercises, which are accompanied with detailed step-by-step instructions. The Teradata exercises offer less support, essentially testing your understanding (and lead on to the graded quizzes).

The excercises in week 2 are trivial if you have ever used SQL before- simple selection and aggregation operations- but neccessary to build up to the more complex material in the coming weeks.

Week 3

The major topic of week 3 is joins. This is such a crucial concept when working with relational databases. The excercises started to get a little more challenging this week, and I finally felt like I was having to work. I can't emphasise enough how useful it was to finally get a significant amount of practice writing queries that required joins- I definitely felt like the course was starting to help resolve my lack of experience working with relational databases.

Week 4

Week 4 continues from joins to the next percieved level of difficulty; case statements and subqueries. The concepts themselves I found to be rather straightforward, but I will reiterate my opinion that even with simple concepts, you need practice to become competent writing queries. I did start to have a few frustrations at times where I didn't feel that the MySQL practice questions were as clear as they could be, but then again I guess that is a common problem when people ask you to extract data for them. Overall, I did find the MySQL exercises really useful, and it was helpful to have example solutions to check my work against.

Week 5

Well, week 5 was crunch time. The final set of Teradata excercises were pretty challenging, requiring fairly complex queries to be written, drawing from the material taught over the past few weeks. It certainly took a few hours to complete, which is why I guess the entire week was allocated for the final excercise.

I did find it a little annoying that the Teradata practice questions didn't have any solutions- sometimes it was difficult to ensure that you were on the right track. I don't think example queries would be needed- but just the answer (i.e. 'your query should return X rows', or 'the highest average sales should be Y').

None the less, it felt fairly rewarding by the time I had finally worked through the excercises, and I definitely feel like my competency writing SQL queries has increased significantly.

Other Thoughts

One unfortunate source of inconvenience was the frequent Teradata outages. To date, this is the only course I have been unable to finish by the offical 'finish' date, purely because there was an outage on a number of times during the time I set aside for learning. On the other hand, I cannot even begin to comprehend how much work went into setting up this course, especially due to the fact over 10,000 learners were accessing the database. I think the outages were forgivable, but would hope that in future the course organisers manage to increase the reliablity of the system.

TL;DR- This MOOC will give you am insane amount of practice writing SQL queries. It may not have been the most fun I have ever had, but the experience I gained through working through the exercises was is invaluable.