Wednesday, April 9, 2014

Pursuing Data Science in the Cloud (is it really just about R?)

I'm currently enrolled in two MOOCs for learning data science. They are:
Let me reveal my biases. First, I vastly prefer The Analytics Edge, the MITx offering, but I don't dislike the Johns Hopkins Data Science Specialization. The Analytics Edge is very intellectually engaging while the Johns Hopkins specialization focuses on building your professional portfolio so that you can advance your career. In particular, the Johns Hopkins specialization injects an element of social networking through git and git hub that is just totally lacking in The Analytics Edge.

You need both, so I'm taking both.

Second, both offerings essentially use the statistical programming language R as their centerpiece. I have used this language extensively in the past. In the few years since I last took it up, the associated tools have taken a nice leap. In particular, there is now RStudio, a much friendlier editing and interactive debugging environment as well as the two courses I mentioned. You can now see a real practice-based community developing around the language, smoothing the rough edges it was born with as an academic brain child.

Now for the kicker, data science cannot just be about programming or understanding analytical paradigms for solving problems. I'm taking these two courses to determine the current zeitgeist. But, I think the real value in data science comes from understanding the problem to be addressed in its natural context and going from there. In other words, the critical element is in initially figuring out the problem. That critical step is absent from both courses. I think there is an opportunity there.