Repository distribution
Git is a distributed versioning system which means there is strictly no notion of a single
central repository, but many distributed ones. For GeoServer these are:
- The canonical repository located on GitHub that serves as the official authoritative
copy of the source code for project
- Developers’ forked repositories on GitHub. These repositories
generally contain everything in the canonical repository, as well any feature or
topic branches a developer is working on and wishes to back up or share.
- Developers’ local repositories on their own systems. This is where development work is actually done.
Even though there are numerous copies of the repository they can all interoperate because
they share a common history. This is the magic of git!
In order to interoperate with other repositories hosted on GitHub,
a local repository must contain remote references to them.
A local repository typically contains the following remote references:
- A remote called origin that points to the developers’ forked GitHub repository.
- A remote called upstream that points to the canonical GitHub repository.
- Optionally, some remotes that point to other developers’ forked repositories on GitHub.
To set up a local repository in this manner:
Clone your fork of the canonical repository (where “bob” is replaced with your GitHub account name):
% git clone git@github.com:bob/geoserver.git geoserver
% cd geoserver
Create the upstream
remote pointing to the canonical repository:
% git remote add upstream git@github.com:geoserver/geoserver.git
Or if your account does not have push access to the canonical repository use the read-only url:
% git remote add upstream git://github.com/geoserver/geoserver.git
Optionally, create remotes pointing to other developer’s forks. These remotes are typically
read-only:
% git remote add aaime git://github.com/aaime/geoserver.git
% git remote add jdeolive git://github.com/jdeolive/geoserver.git
Repository structure
A git repository contains a number of branches. These branches fall into three categories:
- Primary branches that correspond to major versions of the software
- Release branches that are used to manage releases of the primary branches
- Feature or topic branches that developers do development on
Primary branches
Primary branches are present in all repositories and correspond to the main release streams of the
project. These branches consist of:
- The master branch that is the current unstable development version of the project
- The current stable branch that is the current stable development version of the project
- The branches for previous stable versions
For example at present these branches are:
- master - The 2.3.x release stream, where unstable development such as major new features take place
- 2.2.x - The 2.2.x release stream, where stable development such as bug fixing and stable features take place
- 2.1.x - The 2.1.x release stream, which is at end-of-life and has no active development
Release branches
Release branches are used to manage releases of stable branches. For each stable primary branch there is a
corresponding release branch. At present this includes:
- rel_2.2.x - The stable release branch
- rel_2.1.x - The previous stable release branch
Release branches are only used during a versioned release of the software. At any given time a release branch
corresponds to the exact state of the last release from that branch. During release these branches are tagged.
Release branches are also present in all repositories.
Feature branches
Feature branches are what developers use for day-to-day development. This can include small-scale bug fixes or
major new features. Feature branches serve as a staging area for work that allows a developer to freely commit to
them without affecting the primary branches. For this reason feature branches generally only live
in a developer’s local repository, and possibly their remote forked repository. Feature branches are never pushed
up into the canonical repository.
When a developer feels a particular feature is complete enough the feature branch is merged into a primary branch,
usually master
. If the work is suitable for the current stable branch the changeset can be ported back to the
stable branch as well. This is explained in greater detail in the Development workflow section.
Git client configuration
When a repository is shared across different platforms it is necessary to have a
strategy in place for dealing with file line endings. In general git is pretty good about
dealing this without explicit configuration but to be safe developers should set the
core.autocrlf
setting to “input”:
% git config --global core.autocrlf input
The value “input” essentially tells git to respect whatever line ending form is present
in the git repository.
Note
It is also a good idea, especially for Windows users, to set the core.safecrlf
option to “true”:
% git config --global core.safecrlf true
This will basically prevent commits that may potentially modify file line endings.
Some useful reading on this subject:
Development workflow
This section contains examples of workflows a developer will typically use on a daily basis.
To follow these examples it is crucial to understand the phases that a changeset goes though in the git
workflow. The lifecycle of a single changeset is:
- The change is made in a developer’s local repository.
- The change is staged for commit.
- The staged change is committed.
- The committed changed is pushed up to a remote repository
There are many variations on this general workflow.
For instance, it is common to make many local commits and then push them all up in batch to a remote repository.
Also, for brevity multiple local commits may be squashed into a single final commit.
Updating from canonical
Generally developers always work on a recent version of the official source code. The following example
shows how to pull down the latest changes for the master branch from the canonical repository:
% git checkout master
% git pull upstream master
Similarly for the stable branch:
% git checkout 2.2.x
% git pull upstream 2.2.x
Making local changes
As mentioned above, git has a two-phase workflow in which changes are first staged and then committed
locally. For example, to change, stage and commit a single file:
% git checkout master
# do some work on file x
% git add x
% git commit -m "commit message" x
Again there are many
variations but generally the staging process involves using git add
to stage files that have been added
or modified, and git rm
to stage files that have been deleted. git mv
is used to move files and
stage the changes in one step.
At any time you can run git status
to check what files have been changed in the working area
and what has been staged for commit. It also shows the current branch, which is useful when
switching frequently between branches.
Pushing changes to canonical
Once a developer has made some local commits they generally will want to push them up to a remote repository.
For the primary branches these commits should always be pushed up to the canonical repository. If they are for
some reason not suitable to be pushed to the canonical repository then the work should not be done on a primary
branch, but on a feature branch.
For example, to push a local bug fix up to the canonical master
branch:
% git checkout master
# make a change
% git add/rm/mv ...
% git commit -m "making change x"
% git pull upstream master
% git push upstream master
The example shows the practice of first pulling from canonical before pushing to it. Developers should always do
this. In fact, if there are commits in canonical that have not been pulled down, by default git will not allow
you to push the change until you have pulled those commits.
Note
A merge commit may occur when one branch is merged with another.
A merge commit occurs when two branches are merged and the merge is not a “fast-forward” merge.
This happens when the target branch has changed since the commits were created.
Fast-forward merges are worth reading about.
An easy way to avoid merge commits is to do a “rebase” when pulling down changes:
% git pull --rebase upstream master
The rebase makes local changes appear in git history after the changes that are pulled down.
This allows the following merge to be fast-forward. This is not a required practice since merge commits are fairly harmless,
but they should be avoided where possible since they clutter up the commit history and make the git log harder to read.
Working with feature branches
As mentioned before, it is always a good idea to work on a feature branch rather than directly on a primary branch.
A classic problem every developer who has used a version control system has run into is when they have
worked on a feature locally and made a ton of changes, but then need to switch context to work on some other feature or
bug fix. The developer tries to make the fix in the midst of the other changes
and ends up committing a file that should not have been changed.
Feature branches are the remedy for this problem.
To create a new feature branch off the master branch:
% git checkout -b my_feature master
% # make some changes
% git add/rm, etc...
% git commit -m "first part of my_feature"
Rinse, wash, repeat. The nice about thing about using a feature branch is that it is easy to switch context
to work on something else. Just git checkout
whatever other branch you need to work on,
and then return to the feature branch when ready.
Note
When a branch is checked out, all the files in the working area are modified to reflect
the current state of the branch. When using development tools which cache the state of the
project (such as Eclipse) it may be necessary to refresh their state to match the file system.
If the branch is very different it may even be necessary to perform a rebuild so that
build artifacts match the modified source code.
Merging feature branches
Once a developer is done with a feature branch it must be merged into one of the primary branches and pushed up
to the canonical repository. The way to do this is with the git merge
command:
% git checkout master
% git merge my_feature
It’s as easy as that. After the feature branch has been merged into the primary branch push it up as described before:
% git pull --rebase upstream master
% git push upstream master
Porting changes between primary branches
Often a single change (such as a bug fix) has to be committed to multiple branches. Unfortunately primary
branches cannot be merged with the git merge
command. Instead we use git cherry-pick
.
As an example consider making a change to master:
% git checkout master
% # make the change
% git add/rm/etc...
% git commit -m "fixing bug GEOS-XYZ"
% git pull --rebase upstream master
% git push upstream master
We want to backport the bug fix to the stable branch as well. To do so we have to note the commit
id of the change we just made on master. The git log
command will provide this. Let’s assume the commit
id is “123”. Backporting to the stable branch then becomes:
% git checkout 2.2.x
% git cherry-pick 123
% git pull --rebase upstream 2.2.x
% git push upstream 2.2.x
Cleaning up feature branches
Consider the following situation. A developer has been working on a feature branch and has gone back
and forth to and from it making commits here and there. The result is that the feature branch has accumulated
a number of commits on it. But all the commits are related, and what we want is really just one commit.
This is easy with git and you have two options:
- Do an interactive rebase on the feature branch
- Do a merge with squash
Interactive rebase
Rebasing allows us to rewrite the commits on a branch, deleting commits we don’t want, or merging commits that should
really be done. You can read more about interactive rebasing here.
Warning
Much care should be taken with rebasing. You should never rebase commits that are public (that is, commits that have
been copied outside your local repository). Rebasing public commits changes branch history and results in the inability to merge
with other repositories.
The following example shows an interactive rebase on a feature branch:
% git checkout my_feature
% git log
The git log shows the current commit on the branch is commit “123”.
We make some changes and commit the result:
% git commit "fixing bug x" # results in commit 456
We realize we forgot to stage a change before committing, so we add the file and commit:
% git commit -m "oops, forgot to commit that file" # results in commit 678
Then we notice a small mistake, so we fix and commit again:
% git commit -m "darn, made a typo" # results in commit #910
At this point we have three commits when what we really want is one. So we rebase,
specifying the revision immediately prior to the first commit:
This invokes an editor that allows indicating which commits should be combined.
Git then squashes the commits into an equivalent single commit.
After this we can merge the cleaned-up feature branch into master as usual:
% git checkout master
% git merge my_feature
Again, be sure to read up on this feature before attempting to use it. And again, never rebase a public commit.
Merge with squash
The git merge
command takes an option --squash
that performs the merge
against the working area but does not commit the result to the target branch.
This squashes all the commits from the feature branch into a single changeset that
is staged and ready to be committed:
% git checkout master
% git merge --squash my_feature
% git commit -m "implemented feature x"