Posted Wednesday night, April 22nd, 2009
License: GPL
Posted Wednesday night, April 22nd, 2009
License: GPL
This is a continuation from before. I am digressing a little in this post. One of the things I want to get out of this exercise is to learn more about Ontologies and Ontology editors, and on the principle that you can never learn something unless you build something with it (aka bone knowledge), so this is gathering my thoughts to get started on creating an Ontology for package building. Perhaps this has been done before, and better, but I’ll probably learn more trying to create my own.
Also, I am playing around with code, an odd melange of my
package building porcelain, and gitpkg, and other
ideas bruited on IRC, and I don’t want to blog about
something that would be embarrassing in the long run if some of the
concepts I have milling around turn out to not meet the challenge
of first contact with reality.
I want to create a ontology related to packaging software. It should be general enough to cater to the needs any packaging effort in a distribution agnostic and version control agnostic manner. It should enable us to talk about packaging schemes and mechanisms, compare different methods, and perhaps to work towards a common interchange mechanism good enough for people to share the efforts spent in packaging software.
The ontology should be able to describe common practices in packaging, concepts of upstream sources, versioning, commits, package versions, and other meta-data related to packages.
I am doing this ontology primarily for myself, but I hope this might be useful for other folks involved in packaging software.
So, here follow a set of concepts related to packaging software, people who like pretty pictures can click on the thumbnail on the right:
Posted Saturday night, April 18th, 2009
License: GPL
This is a continuation from before.
Before I go plunging into writing code for a generic
vcs-pkg implementation, I wanted to take a close look
at my current, working, non-generic implementation: making sure
that the generic implementation can support at least this one
concrete work-flow will keep me grounded.
One of the features of my home grown porcelain for building
package has been that I use a fixed layout for all the packages I
maintain. There is a top level directory for all working trees.
Each package gets a sub-directory under this working area. And in
each package sub-directory, are the upstream versions, the checked
out VCS working directory, and anything else package related. With
this layout, knowing the package name is enough to locate the
working directory. This enable me to, for example, hack away at a
package in Emacs, and when done, go to any open terminal window,
and say stage_release kernel-package or
tag_releases ucf without needing to know what the
current directory is (usually, the packages working directory is
several levels deep —
/usr/local/git/debian/make-dfsg/make-dfsg-3.91, for
instance.
However, this is less palatable for a generic tool – imposing a
directory structure layout is pretty heavy. And I guess I can
always create a function called cdwd, or something, to
take away the tedium of typing out long cd
commands.
Anyway, looking at my code, there is the information that the scripts seem to need in order to do their work.
rpm based sources, look for the
spec filedebian/rulesspec or
debian/rules in the current directory, and parse
either the spec file or
debian/changelog.tla tree-rootbzr infogit rev-parse --show cduphg rootdebian directory, and
changelog and rules files existThen, look for the spec file or
debian/rules in the base directory
spec or changelog files.pristine-tar is in use, given two trees
(branches, commits. etc), namely:
The tree can be generated
tar archive.So, if I do away with the whole working area layout convention, this can be reduced to just requiring the user to:
dpkg-buildpackage imposes this too).pristine-tar or have the upstream
tar archive in the parent directory of the working
directoryHmm. One user specified directory, where the results are dumped.
I can live with that. However, gitpkg has a different
concept: it works purely on the git objects, you feed it upto three
tree objects, the first being the tree with sources to build, and
the second and third trees being looked at only if the upstream tar
archive can not be located, and passes the trees to pristine tar to
re-construct the upstram tar. The package name and version are
constructed after
the source-tar archive is extracted to the staging area. I like the
minimality of this.
This is continued here.
Posted Thursday afternoon, April 16th, 2009
License: GPL
I have been involved in vcs-pkg.org since around
the time it started, a couple of years ago. The discussion has been
interesting, and I learned a lot about the benefits and
disadvantages of serializing patches (and collecting integration
deltas in the feature branches and the specific ordering of
the feature branches) and maintaining integration branches (where
the integration deltas are collected purely in the integration
branch, but might tend to get lost in the history, and a fresh
integration branch having to re-invent the integration deltas
afresh).
However, one of the things we have been lax about is getting
down to brass tacks and getting around to being able to create
generic packaging tools (though for the folks on the serializing
patches side of the debate we have the excellent quilt
and the topgit packages).
I have recently mostly automated my git based work-flow, and
have built fancy porcelain around my git repository setup. During
IRC discussion, the gitpkg script came up. This seems
almost usable, apart from not having any built-in
pristine-tar support, and also not supporting
git submodules, which make is less useful an
alternative than my current porcelain.
But it seems to me that we are pretty close to being able to create a distribution, layout, and patch handler agnostic script that builds distribution packages directly from version control, as long as we take care not to bind people into distributions or tool specific straitjackets. To these ends, I wanted to see what are the tasks that we want a package building script to perform. Here is what I came up with.
The first and third steps above are pretty straight forward, and fairly uncontroversial.
The upstream sources may be handled by one of these three alternatives:
The command to run may be supplied by the user in a
configuration file or option, and may default based on the native
distribution, to dpkg-buildpackage or
rpm. There are a number of already mature mechanisms
to take a source directory and upstream tar archive and produce
packages from that point, and the wheel need not be
re-invented.
So the hardest part of the task is to present, in the staging area, for further processing, a directory tree of the source package, ready for the distribution specific build commands. This part of the solution is likely to be VCS specific.
This post is getting long, so I’ll defer presenting my evolving
implementation of a generic vcs-pkg tool,
git flavour, to the next blog post.
This is continued here.
Posted Wednesday night, April 15th, 2009
License: GPL
There are a lot of little git scripts and tools being written by a lot of people. Including a lot of tools written by people I have a lot of respect for. And yet, they are mostly useless for me. Take git-pkg. Can’t use it. Does not work with git submodules. Then there is our nice, new, shiny, incredibly bodacious “3.0 (git)” source format. Again, useless: does not cater to submodules.
I like submodules. They are nice. They allow for projects to take upstream sources, add Debian packaging instructions, and put them into git. They allow you to stitch together disparate projects, with different authors, and different release schedules and goals, into a coherent, integrated, software project.
Yes, I use git submodules for my Debian packaging. I think it is
conceptually and practically the correct solution. Why submodules?
Well, one of the first things I discovered was that most of the
packaging for my packages was very similar – but not identical.
Unfortunately, the previous incarnation of my packages with a
monolithic rules file in each ./debian/ directory, it
was easy for the rules files in packages to get out of sync – and
there was no easy way to merge changes in the common portions an
any sane automated fashion. The ./debian/ directories
for all my packages package that they are instrumental in
packaging. So, since I make the ./debian/ directories
branches of the same project, it is far easier to package a new
package, or to roll out a new feature when policy changes – the
same commit can be applied across all the branches, and thus all my
source packages, easily. With a separate debian-dir
project, I can separate the management of the packaging rules from
the package code itself.
Also, I have abstracted out the really common bits across all my
packages into a ./debian.common directory, which is
yet another project, and included in as a submodule in all the
packages – so there is a central place to change the common bits,
without having to duplicate my efforts 30-odd times.
Now people are complaining since they have no idea how to clone
my package repositories, since apparently no one actually pays
attention to a file called .gitmodules, and even when
they do, they, and the tools they use, have no clue what to do with
it. I am tired of sending emails with one off-cluebats, and I am
building my own porcelain around something I hope to present as a
generic vcs-pkg implementation soon. The firs step is
a wrapper around git-clone, that understands git
submodules.
So, here is
the browsable code (there is a link in there to the downloadable
sources too). Complete with a built in man page. Takes the same
arguments as git-clone, but with fewer options. Have
fun.
I have been meaning to write this up for a long time now, since
I
vaguely made a promise to do so last Debconf. I
have also been wondering about the inefficiencies in my work-flow,
but I kept postponing my analysis since there were still large gaps
in my packaging automation since I moved off Arch as my SCM of
choice. However, recently I have taken a sabbatical from Debian, so
I’ve had time to complete bits and pieces of my package building
framework, enough so that I could no longer justify putting off the
analysis. I tried writing it up, but the result confused even me;
so I instead recorded every shell command during a recent series of
packaging tasks, and converted that into a nice, detailed, activity
diagram that you see over here. This is as efficient a work-flow as
I have been able to come up with.
Along with a git commit hook script, that parses the commit log and adds pending tags to bugs closed in the commit, the figure above represents my complete work-flow – down to the details of every cd command I executed. I think there are too many steps still.
Feedback and commentary would be appreciated, as well as any suggestions to improve efficiency.
“Are you rebasing or merging?” seems to be the 64 thousand dollar question over in vcs-pkg discussions. Various people have offered their preferences, and indeed, several case studies of work flows have been presented, what is lacking is an analysis of the work-flow; an exploration of which methodology has advantages, and whether there are scenarios in which the other work flow would have been better.
Oh, what are all these work flows about, you ask? Most of the issues with packaging software for distributions have a few things in common: there is a mainline or upstream source of development. There are zero or more independent lines of development or ongoing bug fixes that are to be managed. And then there is the tree from which the distribution package is to be built. All this talk about packaging software work flows is how to best manage asynchronous development upstream and in the independent lines of development, and how to create a coherent, debuggable, integrated tree from which to build the distributions package.
The rebasing question goes to the heart of how to handle the independent lines of development using git; since these lines of development are based off the main line of development, and must be periodically synchronized. Here is a first look at a couple of important factors that will have bearing on that question, and packaging software for a distribution using Git in general. This is heavily geared towards git (nothing else does rebases so easily, I think), but some of the concepts should be generic. I am not considering the stacked set of quilt patches source controlled with Git in this article (I don’t understand that model well enough to do an analysis)
As a teaser, there is a third answer: neither. You can just add an independent line of development, and just let it sit: don’t rebase, and don’t merge; and in some circumstances that is a winning strategy.
I have been using Arch to package my Debian packages since 2003; which means that Arch has had a good long run as my SCM of choice. I have been using CVS for a few years before I moved to arch, and the migration took me about six months, since it involved a while new philosophy of packaging; I am hoping that migrating to git would not involve such a major paradigm shift, and thus be less disruptive and time consuming. What follows is a narrative of my efforts to get educated about Git.
This article is meant to be an annotated, selective, organized set of links to information about Git. How does it differ from the myriad of other link collections about Git proliferating on the web? Well, the value add is in the annotations and the organization: while not quite a narrative of my exploration, this is an idealized version of what I think my discovery process should have been, to be most effective. Staging the information is important; google finds one lots of information that is incomprehensible to someone just coming to Git. This selection of links is actually selective; I have included only pointers to resources that fed me information at the level that I could handle at that stage, and I have eliminated links to information that was not new at that point. I have tried to select the best (in terms of information and clarity) of breed for each kind of information source I have come across so far.
There is a caveat: while still a beginner, though I am able to better judge now what is confusing to a beginner than I shall be when I have become more familiar with the system, I am still enough of a novice not to trust my judgement on what really is best practice. I can fix the latter as I gain experience, but then I’ll need to be careful not to overload on complexity too early in the learning curve.
On the down side, this selection is subjective, and probably shall be even in the long term: I include what appealed to me, and will probably miss loads of pointers to information that I have not yet come across. However, I hope this will make it easier for other people to reach the same goal: use git for their version control needs.
Have fun.