Posted Sunday night, November 14th, 2010
Dear Lazyweb: How do you refresh or recreate your kvm virt periodically?
Dear Lazyweb, how do all y'all using virts recreate the build machine
setup periodically? I have tried and failed to get
qemu-make-debian-root script work for me. Going through and redoing
it from netinst ISO is an option – but then I need debconf preseeding
files, and I was wondering if there are some out there. And then there
is the whole "Oh, by the way, upgrade from Squeeze to Sid, please"
step. The less sexy alternative is going to the master copy and
running a cron job to safe-upgrade each week, and re-creating any
copy-on-write children. Would probably work, but I am betting there
are less hackish solutions out there.
First, some background. It has been an year since I interviewed for the job I currently hold. And nearly 10 months since I have been really active in Debian (apart from Debconf 10). Partly it was trying to perform well at the new job, partly it was getting permissions to work on Debian from my employer. Now that I think I have an handle on the job, and the process for getting permissions is coming to a positive end, I am looking towards getting my Debian processes and infrastructure back up to snuff.
Before the interregnum, I used to have a UML machine setup to do builds. It was generated from scratch weekly using cron, and ran SELinux strict mode, and I used to have an automated ssh based script to build packages, and dump them on my box to test them. I had local git porcelain to do all this and tag releases, in a nice, effortless work flow.
Now, the glory days of UML are long gone, and all the cool kids are
using KVM. I have set up a kvm box, using a netinst ISO (like the
majority of the HOWTO's say). I used madduck's old
/etc/networking/interfaces set up to do networking using a public
bridge (mostly because how cool his solution was,
virsh can talk
natively to a bridge for us now) and I have NFS, SELinux, ssh,
and my remote build infrastructure all done, so I am ready to hop back
into the fray once the lawyers actually ink the agreements. All I
have to do is decide on how to refresh my build machines periodically.
And I guess I should set up
virsh, instead of having a shell alias
kvm. Just haven't gotten around to that.
Posted late Wednesday morning, August 25th, 2010
Refreshing GNUPG keys sensibly
It has come up on the Planet recently, as well as the gnupg users mailing list: users need to refresh keys that they use to get updated information on revocations and key expiration. And there are plenty of examples of simple additions to ones crontab to set up a key refresh.
Of course, with me, things are rarely that simple. Firstly, I have my GNUPGHOME set to a non standard location; and, secondly, I like having my Gnus tell me about signatures on mails to the Debian mailing lists, so I periodically sync debian-keyring.gpg into my GNUPGHOME. I add this as an additional keyring in my gpg.conf file, so that in normal operations gnus has ready access to the keys; but I do not care to refresh all the keys in debian-keyring. I also prefer to trust and update keys in my keyring, so the commands grow a little complex.
Also, I want to get keys for any signatures folks have kindly added to my key and uploaded to the key server (not everyone uses caff), so just –refresh-keys does not serve. Linebreaks added for readability.
# refresh my keys # Note how I have to dance around keyring specification 45 4 * * 4 (/usr/bin/gpg2 --homedir ~/.sec --refresh-keys $(/usr/bin/gpg2 --options /dev/null --homedir ~/.sec --no-default-keyring --keyring pubring.gpg --with-colons --fixed-list-mode --list-keys | egrep '^pub' | cut -f5 -d: | sort -u) >/dev/null 2>&1) # Get keys for new sigs on my keys (get my key by default, in case # there are no unknown user IDs [do not want to re-get all keys]) 44 4 * * 5 (/usr/bin/gpg2 --homedir ~/.sec --recv-keys 0xC5779A1C $(/usr/bin/gpg2 --options /dev/null --homedir ~/.sec --no-default-keyring --keyring pubring.gpg --with-colons --fixed-list-mode --list-sigs 0xC5779A1C | egrep '^sig:' | grep 'User ID not found' | cut -f5 -d: | sort -u) >/dev/null 2>&1)
Posted early Sunday morning, March 28th, 2010
Customer obsession: Early days at a new Job
I have been at Amazon.com for a very short while (I have only gotten one paycheck from them so far), but long enough for first impressions to have settled. Dress is casual, Parking is limited. Cafeteria food is merely OK, and is not free.
There is a very flat structure at Amazon. The front line work is done by one-or-two pizza teams – size measure by the number of large pizzas that can feed the team. Individual experiences with the company largely depend on what team you happen to end up with. I think I lucked out here. I get to work on interesting and challenging problems, at scales I had not experienced before.
There is an ownership culture. Every one – including developers – get to own what they produce. You are responsible for our product – down to carrying pagers in rotation with others on your team, so that there is someone on call in case your product has a bug. RC (or customer impacting) bugs result in a conference call being invoked within 10-15 minutes, and all kinds of people and departments being folded in until the issue is resolved.
Unlike others, I find the operations burden refreshing (I come from working as a federal government contractor). On call pages are often opportunities to learn thing, and I like the investigation of the current burning issue du jour. I also like the fact that I get to be my own support staff for the most part, though I have not yet installed Debian anywhere here.
While it seems corny, customer obsession is a concept that pervades the company. I find ti refreshing. The mantra that "it's all about the customer experience" is actually true and enforced. Whenever a tie needs to be broken on how something should work the answer to this question is usually sufficient to break it. Most other places the management was responsible for, and worried about budgets for the department – this does not seem to be the case for lower to middle management here. We don't get infinite resources, but work is planned based on user experience, customer needs, and technical requirements, not following the drum beat of bean counters. The focus is on the job to be done, not the hours punched in.
I can choose to work from home if I wish, modulo meetings (which one could dial in to, at a pinch). But then, I have a 5 mile, 12 minute commute. I have, to my surprise, started coming in to work at 7:30 in the morning (I used to rarely get out of bed before 9:30 before), and I plan on getting a bike and seeing if I can ride my bike to work this summer.
All in all, I like it here.
Posted late Monday night, May 5th, 2009
Debian list spam reporting the Gnus way
So, recently our email overlords graciously provided means for us minions to help them in their toils and help clean up the spammish clutter in the mailing lists by helping report the spam. And the provided us with a dead simple means of reporting such spam to them. Now, us folks who knoweth that there is but one editor, the true editor, and its, err, proponent is RMS, use Gnus to follow the emacs mailing lists, either directly, or through gmane. There are plenty of examples out there showing how to automate reporting spam to gmane, so I won't bore y'all with the details. Here I only show how one serves our list overlords, and smite the spam at the same time.
Some background, from the Gnus info page. I'll try to keep it brief. There is far more functionality present if you read the documentation, but you can see that for yourself.
The Spam package provides Gnus with a centralized mechanism for detecting and filtering spam. It filters new mail, and processes messages according to whether they are spam or ham. There are two "contact points" between the Spam package and the rest of Gnus: checking new mail for spam, and leaving a group.
Checking new mail for spam is done in one of two ways: while splitting incoming mail, or when you enter a group. Identifying spam messages is only half of the Spam package's job. The second half comes into play whenever you exit a group buffer. At this point, the Spam package does several things: it can add the contents of the ham or spam message to the dictionary of the filtering software, and it can report mail to various places using different protocols.
All this is very plugin and modular. The advantage is, that you can use various plugin front ends to identify spam and ham, or mark messages as you go through a group, and when you exit the group, spam is reported, ham and spam messages are copied to special destinations for future training of your filter. Since you inspect the marks put into the group buffer as you read the messages, there is a human involved in the processing, but as much as possible can be automated away. Do read the info page on the Spam package in Gnus, it is edifying.
Anyway, here is a snippet from my
which can help automate the tedium of reporting spam. This is perhaps
more like how Gnus does things than having to press a special key for
every spam, and which does nothing to help train your filter.
Posted Sunday night, May 3rd, 2009
Reflections on streetcars
Recently, I have made fairly major changes tokernel-package,
and there were some reports that I had managed to mess up cross compilation. And, not having a cross-compilation tool chain handy, I had to depend on the kindness of strangers to address that issue. And, given that I am much less personable than Ms Vivien Leigh, this is not something I particularly look forward to repeating.
At the onset, building a cross compiling tool chain seems a daunting task. This is not an activity one does frequently, and so one may be pardoned for being non-plussed by this. However, I have done this before, the most recent effort being creating one to compile rockbox binaries, so I had some idea where to start. Of course, since it is usually years between attempts to create cross-compiling tool chains, I generally forget how it is all done, and have to go hunting for details. Thank god for google.
Well, I am not the only one in the same pickle, apparently, for there are gobs of articles and HOWTOs out there, including some pretty comprehensive (and intimidating) general tool sets to designed to create cross compilers in the most generic fashion possible. Using them was not really an option, since I would forget how to drive them in a few months, and have a miniature version of the current problem again. Also, you know, I don't feel comfortable using scripts that are too complex for me to understand – I mean, without understanding, how can there be trust?
Also, this time around, I could not decide whether to cross compile
arm-elf, as I did the last time, or for the newfangled
target. A need for quickly changing the target for the cross compiler
build mechanism would be nice. Manually building the tool chain makes
a wrong decision here expensive, and I hate that. I am also
getting fed up with having to root around on the internet every time I
wanted to build a cross compiler. I came across a script by Uwe
Hermann, which started me down the path of creating a script, with a
help option, to store the instructions, without trying to be too
general and thus getting overly complex. However, Uwe's script hard
coded too many things like version numbers and upstream source
locations, and I know I would rapidly find updating the script
irritating. Using Debian source packages would fix both of these
I also wanted to use Debian sources as far as I could, to ensure that my cross compiler was as compatible as I could make it, though I did want to use newlib (I don't know why, except that I can, and the docs sound cool). And of course the script should have a help option and do proper command line parsing, so that editing the script would be unnecessary.
Anyway, all this effort culminated in the following script:build cross toolchain,
surprisingly compact. So I am now all set to try and cross compile a kernel the next time a kernel-package bug comes around. I thought that I would share this with the lazy web, while I was at it.
The next thing, of course, is to get my script to create aqemu
base image every week so I can move from user mode Linux to the much more niftykvm,
which is what all the cool kids use. And then I can even create an arm virtual machine to test my kernels with, something that user mode linux can't easily do.
Posted late Wednesday evening, April 22nd, 2009
Ontologies: Towards a generic, distribution agnostic tool for building packages from a VCS
This is a continuation frombefore.
I am digressing a little in this post. One of the things I want to get out of this exercise is to learn more about Ontologies and Ontology editors, and on the principle that you can never learn something unless you build something with it (aka bone knowledge), so this is gathering my thoughts to get started on creating an Ontology for package building. Perhaps this has been done before, and better, but I'll probably learn more trying to create my own.
Also, I am playing around with code, an odd melange of my package
building porcelain, and
gitpkg, and other ideas bruited on
and I don't want to blog about something that would be embarrassing in
the long run if some of the concepts I have milling around turn out to
not meet the challenge of first contact with reality.
I want to create a ontology related to packaging software. It should be general enough to cater to the needs any packaging effort in a distribution agnostic and version control agnostic manner. It should enable us to talk about packaging schemes and mechanisms, compare different methods, and perhaps to work towards a common interchange mechanism good enough for people to share the efforts spent in packaging software.
The ontology should be able to describe common practices in packaging, concepts of upstream sources, versioning, commits, package versions, and other meta-data related to packages.
I am doing this ontology primarily for myself, but I hope this might be useful for other folks involved in packaging software.
So, here follow a set of concepts related to packaging software, people who like pretty pictures can click on the thumbnail on the right:
- software is a general term used to describe a collection of computer programs, procedures and documentation that perform some tasks on a computer system.
- software is what we are trying to package
- software has names
- software may exist as
- source code
- executable code
- packaged code
- source code is any collection of statements or declarations written in some human-readable computer programming language.
- source code is usually held in one or more text files (blobs).
- A large collection of source code files may be organized into a directory tree, in which case it may also be known as a source tree.
- The source code may be converted into an executable format by a compiler, or executed on the fly from the human readable form with the aid of an interpreter.
- executable format is the form software must be in in order to be run. Running means to cause a computer "to perform indicated tasks according to encoded instructions."
- software source code has one or more lines of development. Some
Common specific lines of development for the software to be
- upstream line of development
- feature branch is a line of development related to a new feature under development. Often the goal is to merge the feature branches into the upstream line of development
- usually, all feature branches are merged into the integration branch, and the package is created from the integration branch.
- integration branch is the line of development of software that is to be packaged
- some software lines of development have releases
- releases have release dates
- some releases have release versions
- source code may be stored in a version control repository, and maintain history.
- Trees are a collection of blobs and other trees (directories and sub-directories). A tree object describes the state of a directory hierarchy at a particular given time.
- Blobs are simply chunks of binary data - they are the contents of files.
- a tree can be converted into an archive and back
- In git, directories are represented by tree object. They refer to blobs that have the contents of files (file name, access mode, etc is all stored in the tree), and to other trees for sub-directories.
- Commits (or "changesets") mark points in the history of a line of development, and references to parent commits.
- A commit refers to a tree that represents the state of the files at the time of the commit.
- HEAD is the most recent commit in a line of development or branch.
- A working directory is a directory that corresponds, but might not be identical, to a commit in the version control repository
- Commits from the version control system can be checked out into the working directory
- uncommitted changes are changes in the working directory that make it different from the corresponding commit. Some call the working directory to be in a "dirty" state.
- uncommited changes be checked in into the version control system, creating a new commit
- The working directory may contain a ignore file
- ignore file contains the names of files in the working directory that should be "ignored" by the version control system.
- In git, a commit may also contains references to parent commits.
- If there is more than one parent commit, then the commit is a merge
- If there are no parent commits, it is an initial commit
- references, or heads, or branches, are movable references to a commit. On a fresh commit, the head or branch reference is moved to the new commit.
- lines of development are usually stored as a branch in the version control repository.
- A new branch may be created by branching from an existing branch
- a patch is a file that contains difference listings between two trees.
- A patch file can be used to transform (patch_) one tree into another (tree).
- A quilt series is a method of representing an integration branch as a collection of a series of patches. These patches can be applied in sequence to the upstream branch to produce the integration branch.
- A tag is a named reference to a specific commit, and is not normally moved to point to a different commit.
- A package is an archive format of software created to be installed by a package management system or a self-sufficient installer, derived by transforming a tree associated with an integration branch.
- packages have package names
- package names are related to upstream software names
- packages have package versions
- package versions may have
- an upstream version component
- a distribution or packaging specific component
- package versions are related to upstream software versions
- helper packages provide libraries and other support facilities to help compile an integration branch ultimately yielding a package
Posted late Saturday evening, April 18th, 2009
Looking at porcelain: Towards a generic, distribution agnostic tool for building packages from a VCS
This is a continuation frombefore.
Before I go plunging into writing code for a generic
implementation, I wanted to take a close look at my current, working,
non-generic implementation: making sure that the generic
implementation can support at least this one concrete work-flow will
keep me grounded.
One of the features of my home grown porcelain for building package
has been that I use a fixed layout for all the packages I
maintain. There is a top level directory for all working trees. Each
package gets a sub-directory under this working area. And in each
package sub-directory, are the upstream versions, the checked out VCS
working directory, and anything else package related. With this
layout, knowing the package name is enough to locate the working
directory. This enable me to, for example, hack away at a package in
Emacs, and when done, go to any open terminal window, and say
stage_release kernel-package or
tag_releases ucf without needing
to know what the current directory is (usually, the packages working
directory is several levels deep –
/usr/local/git/debian/make-dfsg/make-dfsg-3.91, for instance.
However, this is less palatable for a generic tool – imposing a
directory structure layout is pretty heavy. And I guess I can always
create a function called
cdwd, or something, to take away the tedium
of typing out long
Anyway, looking at my code, there is the information that the scripts seem to need in order to do their work.
- Staging area. This is where software to be built is exported (and
this area is visible from my build virtual machine).
- User specified (configuration)
- Working Area. This is the location where all my packaging work
happens. Each package I work on has a sub-directory in here, and the
working directories for each package live in the package
sub-directory. Note: Should not be needed.
- User specified.
- Working directory. This is the checked out tree from the VCS, and
this is the place where we get the source tree from which the
package can be built.
- Since we know the location of the working are, if the package name
is known, we can just look in the package's sub-directory in the
rpmbased sources, look for the
- For Debian sources, locate
- If package name is not known, look for
debian/rulesin the current directory, and parse either the
- If in a VCS directory, look for the base of the tree
git rev-parse --show cdup
- You have to climb the tree for subversion
- If you are in a
Then, look for the
debian/rulesin the base directory
- Since we know the location of the working are, if the package name is known, we can just look in the package's sub-directory in the working area.
- package name
- User specified, on the command line
- If in the working directory of the package, can be parsed from the
- upstream tar archive
- Usually located in the parent directory of the working directory (the package specific sub-directory of the working area)
pristine-taris in use, given two trees (branches, commits. etc), namely:
- a tree for upstream (default: the branch
- a tree for the delta (default: the branch
The tree can be generated
- a tree for upstream (default: the branch
- Given an upstream tree (default: the branch
upstream), a tar archive can be generated, but is likely to be not bit-for-bit identical to the original
So, if I do away with the whole working area layout convention, this can be reduced to just requiring the user to:
- Specify Staging area
- Call the script in the working directory (
dpkg-buildpackageimposes this too).
- Either use
pristine-taror have the upstream
tararchive in the parent directory of the working directory
Hmm. One user specified directory, where the results are dumped. I can
live with that. However,
gitpkg has a different concept: it works
purely on the git objects, you feed it upto three tree objects, the
first being the tree with sources to build, and the second and third
trees being looked at only if the upstream tar archive can not be
located, and passes the trees to pristine tar to re-construct the
upstram tar. The package name and version are constructed after the
source-tar archive is extracted to the staging area. I like the
minimality of this.
This is continuedhere.
Posted Thursday afternoon, April 16th, 2009
Towards a generic, distribution agnostic tool for building packages from a VCS
I have been involved in
vcs-pkg.org since around the time it
started, a couple of years ago. The discussion has been interesting,
and I learned a lot about the benefits and disadvantages of
serializing patches (and collecting integration deltas in the feature
branches and the specific ordering of the feature branches) and
maintaining integration branches (where the integration deltas are
collected purely in the integration branch, but might tend to get lost
in the history, and a fresh integration branch having to re-invent the
integration deltas afresh).
However, one of the things we have been lax about is getting down to
brass tacks and getting around to being able to create generic
packaging tools (though for the folks on the serializing patches side
of the debate we have the excellent
quilt and the
I have recently mostly automated my git based work-flow, and have
built fancy porcelain around my git repository setup. During IRC
gitpkg script came up. This seems almost usable,
apart from not having any built-in
pristine-tar support, and also not
git submodules, which make is less useful an alternative
than my current porcelain.
But it seems to me that we are pretty close to being able to create a distribution, layout, and patch handler agnostic script that builds distribution packages directly from version control, as long as we take care not to bind people into distributions or tool specific straitjackets. To these ends, I wanted to see what are the tasks that we want a package building script to perform. Here is what I came up with.
- Provide a copy of one or more upstream source tar-balls in the staging area where the package will be built. This staging area may or may not be the working directory checked out from the underlying VCS; my experience has been that most tools of the ilk have a temporary staging directory of some kind.
- Provide a directory tree of the sources from which the package is to be built in the staging area
- Run one or more commands or shell scripts in the staging area to create the package. These series of commands might be very complex, creating and running virtual machines, chroot jails, satisfying build dependencies, using copy-on-write mechanisms, running unit tests and lintian/puiparts checks on the results. But the building a package script may just punt on these scripts to a user specified hook.
The first and third steps above are pretty straight forward, and fairly uncontroversial.
The upstream sources may be handled by one of these three alternatives:
- compressed tar archives of the upstream sources are available, and may be copied.
- There is a pristine-tar VCS branch, which in conjunction with the upstream branch, may be used to reproduce the upstream tr archive
- Export and create an archive from the upstream branch, which may not have the same checksum as the original branch
The command to run may be supplied by the user in a configuration file
or option, and may default based on the native distribution, to
rpm. There are a number of already mature
mechanisms to take a source directory and upstream tar archive and
produce packages from that point, and the wheel need not be
So the hardest part of the task is to present, in the staging area, for further processing, a directory tree of the source package, ready for the distribution specific build commands. This part of the solution is likely to be VCS specific.
This post is getting long, so I'll defer presenting my evolving
implementation of a generic
git flavour, to the
next blog post.
This is continuedhere.
Posted late Wednesday evening, April 15th, 2009
The glaring hole in most git tools, or the submodule Cinderella story
There are a lot of little git scripts and tools being written by a lot of people. Including a lot of tools written by people I have a lot of respect for. And yet, they are mostly useless for me. Take git-pkg. Can't use it. Does not work with git submodules. Then there is our nice, new, shiny, incredibly bodacious "3.0 (git)" source format. Again, useless: does not cater to submodules.
I like submodules. They are nice. They allow for projects to take upstream sources, add Debian packaging instructions, and put them into git. They allow you to stitch together disparate projects, with different authors, and different release schedules and goals, into a coherent, integrated, software project.
Yes, I use git submodules for my Debian packaging. I think it is
conceptually and practically the correct solution. Why submodules?
Well, one of the first things I discovered was that most of the
packaging for my packages was very similar – but not identical.
Unfortunately, the previous incarnation of my packages with a
monolithic rules file in each
./debian/ directory, it was easy for
the rules files in packages to get out of sync – and there was no
easy way to merge changes in the common portions an any sane automated
./debian/ directories for all my packages package that
they are instrumental in packaging. So, since I make the
directories branches of the same project, it is far easier to package
a new package, or to roll out a new feature when policy changes – the
same commit can be applied across all the branches, and thus all my
source packages, easily. With a separate
debian-dir project, I can
separate the management of the packaging rules from the package code
Also, I have abstracted out the really common bits across all my
packages into a
./debian.common directory, which is yet another
project, and included in as a submodule in all the packages – so
there is a central place to change the common bits, without having to
duplicate my efforts 30-odd times.
Now people are complaining since they have no idea how to clone my
package repositories, since apparently no one actually pays attention
to a file called
.gitmodules, and even when they do, they, and the
tools they use, have no clue what to do with it. I am tired of
sending emails with one off-cluebats, and I am building my own
porcelain around something I hope to present as a generic
implementation soon. The firs step is a wrapper around
that understands git submodules.
is the browsable code (there is a link in there to the downloadable
sources too). Complete with a built in man page. Takes the same
git-clone, but with fewer options. Have fun.
Posted Monday night, April 13th, 2009
Yet another kernel hook script
With tonight's upload of
kernel-package, the recent flurry of
activity on this package (8 uploads in 6 days) is drawing to a
close. I think most of the functionality I started to put into place
is now in place, and all reported regressions and bugs in the new
12.XX version have been fixed. The only known deficiency is in the
support of Xen dom0 images, and for that I am waiting for kernel
2.6.30, where Linus has reportedly incorporated Xen
patches. In the meanwhile,
kernel-package seems to be working well,
and I am turning my attention to other things.
But, before I go, here is another example kernel postinst hook script (which, BTW, looks way better with syntax highlighting CSS on my blog than it does in a rss feed or an aggregator site).[[!syntax language=Bash linenumbers=1 bars=1 text=""" #! /bin/sh set -e if [ -n "$INITRD" ] && [ "$INITRD" = 'No' ]; then exit 0 fi version="$1" vmlinuz_location="$2" if [ -n "$DEB_MAINT_PARAMS" ]; then eval set -- "$DEB_MAINT_PARAMS" if [ -z "$1" ] || [ "$1" != "configure" ]; then exit 0; fi fi # passing the kernel version is required [ -z "$version" ] && exit 1 if [ -n "$vmlinuz_location" ]; then # Where is the image located? We'll place the initrd there. boot=$(dirname "$vmlinuz_location") bootarg="-b $boot" fi # Update the initramfs update-initramfs -c -t -k "$version" $bootarg exit 0 """]]