Open Source Data Assignment
October 8, 2018 ยท View on GitHub
due Tuesday, September 25
Pick one of the below (just one! not all!)
- Contribute to Corpora (see instructions below).
- Create your own open source dataset in a new repository on GitHub.
- Make a contribution to another Open Source repository focused on data.
Link to your pull request (or repo) below:
- emma rae -- Add list of fictional colors
- jinzhong yu -- Add list of country calling codes
- alice -- adding and updating fictional religions
- Wenhe Li -- add description && add Persona file into game
- Luna -- Puerto Rican Debt
- Jiwon -- Skeleton & Point Cloud Data for Kinect V2
- Camille -- add list of medical Latin combining terms
- Vince Shao -- AGI Designers List
- Lin Zhang -- Feelings about coding
- Amitabh Shrivastava -- Added to carbon allotropes
- Guillermo Montecinos -- Add FIFA World Cup history
- James -- The Most Useless Dataset
Instructions for Corpora Contribution
Note that this assignment can be completed any number of ways. The instructions below are for submitting a pull request using git and the command line. You are also welcome to the GitHub web interface or any other tool that you want to experiment with.
In addition to the instructions below, you might also find this egghead course useful background: How to Contribute to an Open Source Project on GitHub
Install Git
- Download and install git.
- Open your shell (see shell workflow video). Configure your git username using the following commands
git config --global user.name "Your Name"
git config --global user.email "email@example.com"
You can find more details at GitHub help as well as this video walkthrough.
Fork Repo
You have two options here. You can fork Corpora repo directly or the "Open Source Studio" fork of Corpora. The latter option allows you to experiment with making a pull request that flows through our class before heading "upstream" to Corpora itself. Instructions for how to fork are on the GitHub guide.

Clone the Repo
- Now you'll want to clone your fork of the repo. Open to your shell, navigate to a directory where you'd like to store the files locally, and type:
git clone https://github.com/yourgithubname/corpora.git
This video has more details about cloning a repo.
cd(change directory) into the repo.
cd corpora
Write your contribution
Now you can open the "corpora" project in any text editor, add a new file, edit files, etc.
Commit your changes
Now it's time to "add" and "commit" the work you've done. This video tutorial has some more details about git add and git commit.
git add to stage your changes.
You can choose the files you want to add or just use . to add all of your changes.
For a specific file:
git add path/to/files/file.json
For a specific directory:
git add path/to/files/
For all files:
git add .
git commit to commit your changes.
For small / trivial fixes, you can use the -m argument to add a message.
git commit -m "message about this commit"
It's good practice, however, to use git commit only and launch a text editor for writing a more detailed message.
git commit
Here are details for how to associate text editors with git. For Visual Studio Code the command is:
git config --global core.editor "code --wait"
Push your code to GitHub.
Now that you've finished your work, you can push the code to your fork of Corpora. One way to do this is:
git push origin master
While the above is adequate I sometimes prefer to push it to a new branch on GitHub with a name related to my changes.
git push origin master:name-of-my-branch
This takes the local master branch and pushes it to name-of-my-branch on GitHub.
Make a pull request
- Open your fork on GitHub in your browser.
- You should see a new button "Compare and Pull Request" referencing your branch name.
- Select "Compare and Pull Request"
- Write comments about your changes.
- Select "Create Pull Request"