Git Primer

From Cheaha
Revision as of 23:25, 17 May 2012 by Jpr@uab.edu (talk | contribs) (Expand conclusion of intro to set the tone for Git as a tool for documenting process)
Jump to navigation Jump to search


Attention: Research Computing Documentation has Moved
https://docs.rc.uab.edu/


Please use the new documentation url https://docs.rc.uab.edu/ for all Research Computing documentation needs.


As a result of this move, we have deprecated use of this wiki for documentation. We are providing read-only access to the content to facilitate migration of bookmarks and to serve as an historical record. All content updates should be made at the new documentation site. The original wiki will not receive further updates.

Thank you,

The Research Computing Team

Git is a tool to help you keep track of your content. Your data. The information that drives your world.

Git is very popular with people who curate large collections of instructions for machines and humans. In fact, Git is used to keep track of all of the instructions that make this machine work like a website. follows to make this making this website machine work.

Because this is the dominant community of users for Git, you'll see a lot of the documentation on the web focused on that context.

People will assume you too are using it for managing instructions for computers. This isn't so bad, however, if you don't know why they do what they do, you're going to have a hard time trying to understanding the steps you are carrying out.

A tool is only magical when you don't understand what it's doing for you. Imagine using a lawn mower to dust the dirt off the floors in your house. That's not what it's for and if you do that don't be surprised if you break stuff or cut off your toes.

Instructions for using Git aren't magic. They are explicit steps with predictable outcomes. They are completely deterministic. Like most powerful tools, there is more than one way to use Git. Understanding Git's flexibility is the key to mastering your craft.

This is a getting started guide for Git to help clarify what it does and how to use it to manage your content.

You're likely here because you are documenting process. If you keep you're mind open to that task you'll find Git is your friend in no time.

Starting Notes

Git can be confusing. It is a flexible tool. It is designed to record a history of changes in a directory. It is also designed to enable you to share this history of changes with collaborators through a process known as merging. Combined, these functions in git implement a data structure known as a directed acyclic graph. Scary, but it doesn't have to be.

Data structures are devices we have contrived to help us organize and manage collections of data, ie. information. You're likely already familiar with the most common data structures like lists (ordered collections of data) or arrays (ordered collections data with a numeric index). You may also be familiar with two dimensional arrays (collections of data with a row and column index), eg. spreadsheets and matrices. You're likely also familiar with queues (collections of data with a front and a rear, where data is added to the rear and removed from the front), eg. think grocery store checkout lines.

There is a whole branch of mathematical theory around graphs and computer science theory implementing those graphs as data structures. While it's fun to know this stuff, we only need to know a little of it to become effective users of git.

A directed acyclic graph, or DAG for short, describes a pattern that is common to collaboration, as Wikipedia puts it DAGs may be used to model processes in which information flows in a consistent direction through a network of processors. While not very personable, this is the basis of collaboration. Collaborators are the "network of processors" and the data that we are sharing is the "information flow".

So git is really a tool that helps you manage a process central to collaboration: tracking the actions of many independent actors working on (ie. modifying) a shared collection of data.

Git is most commonly used by developers to work on developing code in parallel and merging the results of those independent efforts. And we'll take this perspective in this primer because that's the use were putting it to.

Git has two sides: the data set you are tracking and the history of modifications to that data set. Git doesn't have too much to say about the the data you are tracking. It does assume that all the data you are managing in your collection exists in a single directory tree of your file system. It assumes that all you responsible for defining the relationships between your file and directory objects. That is, it doesn't dictate any structure for your data. It will happily merge two wholly unrelated data collections if that's what you tell it to do.

The second side of Git is the history of modifications. This is the part of git that builds, maintains, and lets you review the history of changes that have occurred to the directory tree. Every recorded file change, every merge with work from another copy of the data set will build up the history for this copy of the data set. It is this history that forms the directed acyclic graph.