Git Primer

From Cheaha
Revision as of 13:41, 6 June 2011 by Jpr@uab.edu (talk | contribs) (→‎Starting Notes: fix typo in link to wikipedia math graph page)
Jump to navigation Jump to search


Attention: Research Computing Documentation has Moved
https://docs.rc.uab.edu/


Please use the new documentation url https://docs.rc.uab.edu/ for all Research Computing documentation needs.


As a result of this move, we have deprecated use of this wiki for documentation. We are providing read-only access to the content to facilitate migration of bookmarks and to serve as an historical record. All content updates should be made at the new documentation site. The original wiki will not receive further updates.

Thank you,

The Research Computing Team

This is a getting started guide for git to help clarify what it does and how to use it to develop code for our platform.

Starting Notes

Git can be confusing. It is a flexible tool. It is designed to record a history of changes in a directory. It is also designed to enable you to share this history of changes with collaborators through a process known as merging. Combined, these functions in git implement a data structure known as a directed acyclic graph. Scary, but it doesn't have to be.

Data structures are devices we have contrived to help us organize and manage collections of data, ie. information. You're likely already familiar with the most common data structures like lists (ordered collections of data) or arrays (ordered collections data with a numeric index). You may also be familiar with two dimensional arrays (collections of data with a row and column index), eg. spreadsheets and matrices. You're likely also familiar with queues (collections of data with a front and a rear, where data is added to the rear and removed from the front), eg. think grocery store checkout lines.

There is a whole branch of mathematical theory around graphs and computer science theory implementing those graphs as data structures. While it's fun to know this stuff, we only need to know a little of it to become effective users of git.

A directed acyclic graph, or DAG for short, describes a pattern that is common to collaboration, as Wikipedia puts it DAGs may be used to model processes in which information flows in a consistent direction through a network of processors. While not very personable, this is the basis of collaboration. Collaborators are the "network of processors" and the data that we are sharing is the "information flow".

So git is really a tool that helps you manage a process central to collaboration: tracking the actions of many independent actors working on (ie. modifying) a shared collection of data.

Git is most commonly used by developers to work on developing code in parallel and merging the results of those independent efforts. And we'll take this perspective in this primer because that's the use were putting it to.

Git has two sides: the data set you are tracking and the history of modifications to that data set. Git doesn't have too much to say about the the data you are tracking. It does assume that all the data you are managing in your collection exists in a single directory tree of your file system. It assumes that all you responsible for defining the relationships between your file and directory objects. That is, it doesn't dictate any structure for your data. It will happily merge two wholly unrelated data collections if that's what you tell it to do.

The second side of Git is the history of modifications. This is the part of git that builds, maintains, and lets you review the history of changes that have occurred to the directory tree. Every recorded file change, every merge with work from another copy of the data set will build up the history for this copy of the data set. It is this history that forms the directed acyclic graph.