The difficulties of moving from Python to R

This post is in response to: Python, Machine Learning, and Language Wars, by Sebastian Raschka

As someone who’s switched from Ruby to Python (because the latter is far easier to teach, IMO) and who has also put significant time into learning R just to use ggplot2, I was really surprised at the lack of relevant Google results for “switching from python to r” – or similarly phrased queries. In fact, that particular query will bring up more results for R to Python , e.g. “Python Displacing R as The Programming Language For Data”. The use of R is so ubiquitous in academia (and in the wild, ggplot2 tends to wow nearly on the same level as D3) that I had just assumed there were a fair number of Python/Ruby developers who have tried jumping into R. But there aren’t…minimaxir’s guides are the most and only comprehensive how-to-do-R-as-written-by-an-outsider guides I’ve seen on the web.

By and far, the most common shift seems to be that of Raschka’s – going from R to Python:

Well, I guess it’s no big secret that I was an R person once. I even wrote a book about it… So, how can I summarize my feelings about R? I am not exactly sure where this quote is comes from – I picked it up from someone somewhere some time ago – but it is great for explaining the difference between R and Python: “R is a programming language developed by statisticians for statisticians; Python was developed by a computer scientist, and it can be used by programmers to apply statistical techniques.” Part of the message is that both R and Python are similarly capable for “data science” tasks, however, the Python syntax simply feels more natural to me – it’s a personal taste.

That said, one of the things I’ve appreciated about R is how it “just works”…I usually install R through Homebrew, but installing RStudio via point and click is also straightforward. I can see why that’s a huge appeal for both beginners and people who want to do computation but not necessarily become developers. Hell, I’ve been struggling for what feels like months to do just the most rudimentary GIS work in Python 3. But in just a couple weeks of learning R – and leveraging however it manages to package GDAL and all its other geospatial dependencies with rgdal – been able to create some decent geospatial visualizations (and queries):

I’m actually enjoying plotting with Matplotlib and seaborn, but it’s hard to beat the elegance of ggplot2 – it’s worth learning R just to be able to read and better understand Wickham’s ggplot2 book and its explanation of the “Grammar of Graphics”. And there’s nothing else quite like ggmap in other languages.

Also, I used to hate how <- was used for assignment. Now, that’s one of the things I miss most about using R. I’ve grown up with single-equals-sign assignment in every other language I’ve learned, but after having to teach some programming…the difference between == and = is a common and often hugely stumping error for beginners. Not only that, they have trouble remembering how assignment even works, even for basic variable assignment…I’ve come to realize that I’ve programmed so long that I immediately recognize the pattern, but that can’t possibly be the case for novices, who if they’ve taken general math classes, have never seen the equals sign that way. The <- operator makes a lot more sense…though I would have never thought that if hadn’t read Hadley Wickham’s style guide.

Speaking of Wickham’s style guide, one thing I wish I had done at the very early stages of learning R is to have read Wickham’s Advanced R book – which is free online (and contains the style guide). Not only is it just a great read for any programmer, like everything Wickham writes, it is not at all an “advanced” book if you are coming from another language. It goes over the fundamentals of how the language is designed. For example, one major pain point for me was not realizing that R does not have scalars – things that appear to be scalars happen to be vectors of length one. This is something Wickham’s book mentions in its Data structures chapter.

Another vital and easy-to-read chapter: Wickham’s explanation of R’s non-standard evaluation has totally illuminated to me why a programmer of Wickham’s caliber enjoys building in R, but why I would find it infuriating to teach R versus Python to beginners.

(Here’s another negative take on non-standard evaluation, by an R-using statistician)

FWIW, Wickham has posted a repo attempting to chart and analyze various trends and metrics about R and Python usage. I won’t be that methodical; on Reddit, r/Python seems to be by far the biggest programming subreddit. At the time of writing, it has 122,690 readers. By comparison, r/ruby and r/javascript have 31,200 and 82,825 subscribers, respectively. The R-focused subreddit, r/rstats, currently has 8,500 subscribers.

The Python community is so active on Reddit that it has its own learners subreddit – r/learnpython – with 54,300 subscribers.

From anecdotal observations, I don’t think Python shows much sign of diminishing popularity on Hacker News, either. Not just because Python-language specific posts keep making the front page, but because of the general increased interest in artificial intelligence, coinciding with Google’s recent release of TensorFlow, which they’ve even quickly ported to Python 3.x.

(This is a partial repost of my comment in a HN discussion.)

Dan Nguyen's Blog | Thoughts, Data and Computational Journalism