How to get started with open-source

In response to a reddit/learnprogramming thread on working on other people’s projects:

The best way to get started in open source software is to add a feature to a project that you actually need. Besides the karma boost from contributing to open-source, you get two very personal benefits:

By taking the time to integrate your change to the code base, you end up learning a lot about a “real” project is done. Oftentimes I’ll learn smarter ways to organize my code and/or the existence of a helpful library. More often than not, I’ll see that a programmer much better than me wrote some ugly code that just works…and I’ll realize that I need to stop nitpicking my own code to death.
Even if your contribution is rejected or ignored, you still get to use it to do whatever you needed to do in the first place.

Some examples from my Github history:

I recently contributed a change to the united-states/congress-legislators project, a data project that scrapes and organizes information about every U.S. congressmember. The change I added was a script to contact Twitter’s API to fetch a user’s unique ID from their screen name (as screen names can be changed): https://github.com/unitedstates/congress-legislators/pull/303

I’ve written lots of Twitter-fetching scripts before so the API calls weren’t hard (especially with Tweepy)…however, having to integrate it into their framework of data scrapers forced me to look through some of their implementation details, such as how to arrange the scripts, what interfaces to expose…and I also got to notice some dev environment stuff that I never use, such as Travis CI. And I learned about the rtyaml library, which is one of @unitedstates’s many helpful data projects. I use the Congress data for lots of projects, and I needed to have the Twitter IDs to more accurately analyze social media data…even if they had rejected my pull request I still benefited from writing out the code.

Another example I have is phasion, this Ruby library for doing dupe detection of images. I needed a class to make one of its attributes publicly accessible…the algorithm and its details are far above my head but I do know how to modify a class definition.

The best way to get started is to just start with even the most trivial things, such as fixing typos in documentation; my first Github contributions were minor grammar changes. Then, move up to adding documentation, which can greatly impact people who use the project. And then finally, add new code features. If you’re like me and don’t work on a team and use version control on a day-to-day basis, even doing simple, non-technical pull requests helps you get confident about the open-source process.

Dan Nguyen's Blog | Thoughts, Data and Computational Journalism