Preparing a private git repository for public release

A git project that is open-sourced could require some manipulation before it is ready for release such as changing the directory structure and cleaning out irrelevant files. When the repository contains a single sub-directory with content to release (and only such content), there are methods to split a git repository. This post covers techniques for finer-grained manipulation.

Removing history of deleted files

A git project frequently arrives to a point where it is beneficial to split it into two repositories. You may have developed a library and an application and want to release them separately, or perhaps some of the documentation grows to become a book, to be maintained by a different set of individuals. How does one create a repository with a subset of files from a big repository, optionally changing the directory structure, while keeping history for the split part and discarding history for files not carried over?

In one approach, the whole repository is cloned, and then history is pruned. There is a script to automate this. A downside is that files are “git mv”‘d within the repository, requiring use of git log –follow to get accurate history.

I’ve had good experience with extracting a patch from the original git and applying it to a new repository when separating libwireless from a bigger repository with a lot of experimental code. The rest of the post surveys the command-line parameters from the StackExchange post, and lists parameters for modifying the directory structure of files (for example, moving files from directoryA in the original repo to directoryB in the new repo).

Generating the patch

These are parameter meanings to “git log”, from the git-log man page:

  • --pretty=email: format as an email message, to be read by “git am”. Using “git log” with the email format rather than “git format patch” allows easy creation of a single patch file with multiple commits.
  • --patch-with-stat: generate a patch and include statistics on changed files. Not clear why stats are needed for this purpose, but they don’t appear to hurt.
  • --reverse: latest commits first.
  • --full-index: show full object names, not only first few characters
  • --binary: binary diff, used to support binary files

To change the directory structure:

  • --relative=<path>: patch filenames should be relative to the given directory. Will also exclude changes outside the directory.

The combination –full-history and –simplify-merges would tell git to keep more history than the default, but the default setting seems to keep a good amount of history structure.

For completeness (copied from SE, with added –relative):

git log --pretty=email --patch-with-stat --reverse --full-index --binary --relative=directoryA -- directoryA/files_or_folders_to_export > patch

Applying the patch

From the git-am and git-apply man pages:

  • --committer-date-is-author-date: usually, the time of the “git am” run is used as the commit date. This allows preservation of the original date.
  • --directory=<path>: prepends the given directory to the files in the patch, allows for putting the files under that subdirectory.
  • -p<n>: removes n leading slashes from the filenames. This can be used as an alternative to –relative when contructing the path. The files always have one leading slash, so to remove one level of directory nesting, use n=2.

So, in the destination repo:

git am --committer-date-is-author-date --directory=directoryB < patch

Optional: rename files/directories in the git diff patch

To rename files manually, replace occurrences of the “— a/dirX/filenameY” and “+++ b/dirX/filenameY” with your desired filename (keeping the +- and ab):

sed 's/^--- a\/old_directory/--- a\/new_directory/g' -i.orig patch  sed 's/^+++ b\/old_directory/+++ b\/new_directory/g' -i patch

Happy splitting!

Posted in Tech Blog.

Leave a Reply

Your email address will not be published. Required fields are marked *