Coding & Collaboration on GitHub

Previously on Tech Connect we wrote about the Git version control system, walking you through “cloning” a project onto to your computer, making some small changes, and committing them to the project’s history. But that post concluded on a sad note: all we could do was work by ourselves, fiddling with Git on our own computer and gaining nothing from the software’s ability to manage multiple contributors. Well, here we will return to Git to specifically cover GitHub, one of the most popular code-sharing websites around.

Git vs. GitHub

Git is open source version control software. You don’t need to rely on any third-party service to use it and you can benefit from many of its features even if you’re working on your own.

GitHub, on the other hand, is a company that hosts Git repositories on their website. If you allow your code to be publicly viewable, then you can host your repository for free. If you want to have a private repository, then you have to pay for a subscription.

GitHub layers some unique features on top of Git. There’s an Issues queue where bug reports and feature requests can be tracked and assigned to contributors. Every project has a Graphs section where interesting information, such as number of lines added and deleted over time, is charted (see the graphs for jQuery, for instance). You can create gists which are mini-repositories, great for sharing or storing snippets of useful code. There’s even a Wiki feature where a project can publish editable documentation and examples. All of these nice features build upon, but ultimately have little to do with, Git.

Collaboration

GitHub is so successful because of how well it facilitates collaboration. Hosted version control repositories are nothing new; SourceForge has been doing this since 1999, almost a decade prior to GitHub’s founding in 2008. But something about GitHub has struck a chord and it’s taken off like wildfire. Depending on how you count, it’s the most popular collection of open source code, over SourceForge and Google Code.[1] The New York Times profiled co-founder Tom Preston-Werner. It’s inspired spin-offs, like Pixelapse which has been called “GitHub for Photoshop” and Docracy which TechCrunch called “GitHub for legal documents.” In fact, just like the phrase “It’s Facebook for {{insert obscure user group}}” became a common descriptor for up-and-coming social networks, “It’s GitHub for {{insert non-code document}}” has become commonplace. There are many inventive projects which use GitHub as more than just a collection of code (more on this later).

Perhaps GitHub’s popularity is due to Git’s own popularity, though similar sites host Git repositories too.[2] Perhaps the GitHub website simply implements better features than its competitors. Whatever the reason, it’s certain that GitHub does a marvelous job of allowing multiple people to manage and work on a project.

Fork It, Bop It, Pull It

Let’s focus two nice features of GitHub—Forking and the Pull Request [3]—to see exactly why GitHub is so great for collaboration.

If you recall our prior post on Git, we cloned a public repository from GitHub and made some minor changes. Then, when reviewing the results of git log, we could see that our changes were present in the project’s history. That’s great, but how would we go about getting our changes back into the original project?

For the actual step-by-step process, see the LibCodeYear GitHub Project’s instructions. There are basically only two changes from our previous process, one at the very beginning and one at the end.

GItHub's Fork Button

First, start by forking the repository you want to work on. To do so, set up a GitHub account, sign in, visit the repository, and click the Fork button in the upper right. After a pretty sweet animation of a book being scanned, a new project (identical to the original in both name and files) will appear on your GitHub account. You can then clone this forked repository onto your local computer by running git clone on the command line and supplying the URL listed on GitHub.

Now you can do your editing. This part is the same as using Git without GitHub. As you change files and commit changes to the repository, the history of your cloned version and the one on your GitHub account diverge. By running git push you “push” your local changes up to GitHub’s remote server. Git will prompt you for your GitHub password, which can get annoying after a while so you may want to set up an SSH key on GitHub so that you don’t need to type it in each time. Once you’ve pushed, if you visit the repository on GitHub and click the “commits” tab right above the file browser, you can see that your local changes have been published to GitHub. However, they’re still not in the original repository, which is underneath someone else’s account. How do you add your changes to the original account?

GitHub's Pull Request Button

In your forked repository on GitHub, something is different: there’s a Pull Request button in the same upper right area where the Fork one is. Click that button to initiate a pull request. After you click it, you can choose which branches on your GitHub repository to push to the original GitHub repository, as well as write a note explaining your changes. When you submit the request, a message is sent to the project’s owners. Part of the beauty of GitHub is in how pull requests are implemented. When you send one, an issue is automatically opened in the receiving project’s Issues queue. Any GitHub account can comment on public pull requests, connecting them to open issues (e.g. “this fixes bug #43”) or calling upon other contributors to review the request. Then, when the request is approved, its changes are merged into the original repository.

diagram of forking & pulling on GitHub

“Pull Request” might seem like a strange term. “Push” is the name of the command that takes commits from your local computer and adds them to some remote server, such as your GitHub account. So shouldn’t it be called a “push request” since you’re essentially pushing from your GitHub account to another one? Think of it this way: you are requesting that your changes be pulled (e.g. the git pull command) into the original project. Honestly, “push request” might be just as descriptive, but for whatever reason GitHub went with “pull request.”

GitHub Applications

While hopefully we’ve convinced you that the command line is a fine way to do things, GitHub also offers Mac and Windows applications. These apps are well-designed and turn the entire process of creating and publishing a Git repository into a point-and-click affair. For instance, here is the fork-edit-pull request workflow from earlier except done entirely through a GitHub app:

  • Visit the original repository’s page, click Fork
  • On your repository’s page, select “Clone in Mac” or “Clone in Windows” depending on which OS you’re using. The repository will be cloned onto your computer
  • Make your changes and then, when you’re ready to commit, open up the GitHub app, selecting the repository from the list of your local ones
  • Type in a commit message and press Commit
    writing a commit message in GitHub for Windows
  • To sync changes with GitHub, click Sync
  • Return to the repository on GitHub, where you can click the Pull Request button and continue from there

GitHub without the command line, amazing! You can even work with local Git repositories, using the app to do commits and view previous changes, without ever pushing to GitHub. This is particularly useful on Windows, where installing Git can have a few more hurdles. Since the GitHub for Windows app comes bundled with Git, a simple installation and login can get you up-and-running. The apps also make the process of pushing a local repository to GitHub incredibly easy, whereas there are a few steps otherwise. The apps’ visual display of “diffs” (differences in a file between versions, with added and deleted lines highlighted) and handy shortcuts to revert to particular commits can appeal even to those of us that love the command line.

viewing a diff in GitHub for Windows

More than Code

In my previous post on Git, I noted that version control has applications far beyond coding. GitHub hosts a number of inventive projects that demonstrate this.

  • The Code4Lib community hosts an Antiharassment Policy on GitHub. Those in support can simply fork the repository and add their name to a text file, while the policy’s entire revision history is present online as well
  • The city of Philadelphia experimented with using GitHub for procurements with successful results
  • ProfHacker just wrapped up a series on GitHub, ending by discussing what it would mean to “fork the academy” and combine scholarly publishing with forking and pull requests
  • The Jekyll static-site generator makes it possible to generate a blog on GitHub
  • The Homebrew package manager for Mac makes extensive use of Git to manage the various formulae for its software packages. For instance, if you want to roll back to a previous version of an installed package, you run brew versions $PACKAGE where $PACKAGE is the name of the package. That command prints a list of Git commits associated with older versions of the package, so you can enter the Homebrew repository and run a Git command like git checkout 0476235 /usr/local/Library/Formula/gettext.rb to get the installation formula for version 0.17 of the gettext package.

These wonderful examples aside, GitHub is not a magic panacea for coding, collaboration, or any of the problems facing libraries. GitHub can be an impediment to those who are intimidated or simply not sold on the value of learning what’s traditionally been a software development tool. On the Code4Lib listserv, it was noted that the small number of signatories on the Antiharassment Policy might actually be due to its being hosted on GitHub. I struggle to sell people on my campus of the value of Google Docs with its collaborative editing features. So, as much as I’d like the Strategic Plan the college is producing to be on GitHub where everyone could submit pull requests and comment on commits, it’s not necessarily the best platform. It is important, however, not to think of it as limited purely to versioning code written by professional developers. GitHub has uses for amateurs and non-coders alike.

Footnotes

[1]^ GitHub Has Passed SourceForge, (June 2, 2011), ReadWrite.

[2]^ Previously-mentioned SourceForge also supports Git, as does Bitbucket.

[3]^ I think this would make an excellent band name, by the way.