Tales from the Gryphon/ categories/ software/

Tales from the Gryphon

Blog postings about Packaging software

Manoj's hackergotchi
RSS Atom Add a new post titled:
Thursday 23 April
2009
Link: Ontologies: Towards a generic, distribution agnostic tool for building packages from a VCS

Posted in the wee hours of Wednesday night, April 23rd, 2009

License: GPL

Ontologies: Towards a generic, distribution agnostic tool for building packages from a VCS

#+TITLE: Ontologies: Towards a generic, distribution agnostic tool for building packages from a VCS #+AUTHOR: Manoj Srivastava #+EMAIL: srivasta@debian.org #+DATE: #+LANGUAGE: en #+OPTIONS: H:0 num:nil toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:t TeX:t LaTeX:t skip:nil d:nil tags:not-in-toc #+INFOJS_OPT: view:showall toc:nil ltoc:nil mouse:underline buttons:nil path:http://orgmode.org/org-info.js #+LINK_UP: http://www.golden-gryphon.com/blog/manoj/ #+LINK_HOME: http://www.golden-gryphon.com/ This is a continuation from #+BEGIN_HTML before. #+END_HTML I am digressing a little in this post. One of the things I want to get out of this exercise is to learn more about Ontologies and Ontology editors, and on the principle that you can never learn something unless you build something with it (aka bone knowledge), so this is gathering my thoughts to get started on creating an Ontology for package building. Perhaps this has been done before, and better, but I'll probably learn more trying to create my own. Also, I am playing around with code, an odd melange of my package building porcelain, and ~gitpkg~, and other ideas bruited on =IRC=, and I don't want to blog about something that would be embarrassing in the long run if some of the concepts I have milling around turn out to not meet the challenge of first contact with reality. I want to create a ontology related to packaging software. It should be general enough to cater to the needs any packaging effort in a distribution agnostic and version control agnostic manner. It should enable us to talk about packaging schemes and mechanisms, compare different methods, and perhaps to work towards a common interchange mechanism good enough for people to share the efforts spent in packaging software. The ontology should be able to describe common practices in packaging, concepts of upstream sources, versioning, commits, package versions, and other meta-data related to packages. #+BEGIN_HTML vcs-pkg concept diagram #+END_HTML I am doing this ontology primarily for myself, but I hope this might be useful for other folks involved in packaging software. So, here follow a set of concepts related to packaging software, people who like pretty pictures can click on the thumbnail on the right: - *software* is a general term used to describe a collection of computer programs, procedures and documentation that perform some tasks on a computer system. - *software* is what we are trying to _package_ - *software* has /names/ - *software* may exist as + /source code/ + /executable code/ + /packaged code/ - *source code* is any collection of statements or declarations written in some human-readable computer /programming language/. - *source code* is usually held in one or more text files (/blobs/). - A large collection of *source code* /files/ may be organized into a directory tree, in which case it may also be known as a source /tree/. - The *source code* may be _converted_ into an /executable format/ by a compiler, or executed on the fly from the human readable form with the aid of an interpreter. - *executable format* is the form /software/ must be in in order to be run. Running means to cause a computer "to perform indicated tasks according to encoded instructions." - software *source code* has one or more /lines of development/. Some Common specific /lines of development/ for the software to be packaged are: + *upstream* /line of development/ + *feature branch* is a /line of development/ related to a new feature under development. Often the goal is to _merge_ the feature branches into the /upstream/ /line of development/ + usually, all *feature branches* are _merged_ into the /integration branch/, and the /package/ is created from the /integration branch/. + *integration branch* is the /line of development/ of software that is to be packaged - some software *lines of development* have /releases/ - *releases* have /release dates/ - some *releases* have /release versions/ - *source code* may be _stored_ in a version control /repository/, and maintain history. - *Trees* are a collection of /blobs/ and other /trees/ (directories and sub-directories). A tree object describes the state of a directory hierarchy at a particular given time. - *Blobs* are simply chunks of binary data - they are the contents of files. - a *tree* can be _converted_ into an /archive/ and back - In git, *directories* are _represented_ by /tree/ object. They refer to blobs that have the contents of files (file name, access mode, etc is all stored in the tree), and to other trees for sub-directories. - *Commits* (or "changesets") mark points in the history of a /line of development/, and references to /parent commits/. - A *commit* refers to a tree that represents the state of the files at the time of the commit. - *HEAD* is the most recent commit in a /line of development/ or /branch/. - A *working directory* is a directory that corresponds, but might not be identical, to a /commit/ in the version control /repository/ - *Commits* from the version control system can be _checked out_ into the /working directory/ - *uncommitted changes* are changes in the working directory that make it different from the corresponding /commit/. Some call the working directory to be in a "dirty" state. - *uncommited changes* be _checked in_ into the version control system, creating a new /commit/ - The *working directory* may contain a /ignore file/ - *ignore file* contains the names of files in the /working directory/ that should be "ignored" by the version control system. - In git, a *commit* may also contains references to /parent commits/. + If there is more than one /parent commit/, then the /commit/ is a /merge/ + If there are no /parent commits/, it is an /initial commit/ - references, or heads, or *branches*, are movable references to a /commit/. On a fresh /commit/, the head or /branch/ reference is moved to the new /commit/. - *lines of development* are usually _stored_ as a /branch/ in the version control /repository/. - A new *branch* may be created by _branching_ from an existing /branch/ - a *patch* is a file that contains difference listings between two /trees/. - A *patch* file can be used to transform (_patch_) one /tree/ into another (/tree/). - A *quilt series* is a method of representing an /integration branch/ as a collection of a series of /patches/. These patches can be applied in sequence to the /upstream/ branch to produce the /integration branch/. - A *tag* is a named reference to a specific /commit/, and is not normally moved to point to a different /commit/. - A *package* is an /archive/ format of /software/ created to be installed by a package management system or a self-sufficient installer, derived by _transforming_ a /tree/ associated with an /integration branch/. - *packages* have /package names/ - *package names* are related to /upstream/ /software names/ - *packages* have /package versions/ - *package versions* may have + an /upstream version/ component + a distribution or packaging specific component - *package versions* are related to upstream /software versions/ - *helper packages* provide libraries and other support facilities to help _compile_ an /integration branch/ ultimately yielding a /package/

Manoj

Sunday 19 April
2009
Link: Looking at porcelain: Towards a generic, distribution agnostic tool for building packages from a VCS

Posted in the wee hours of Saturday night, April 19th, 2009

License: GPL

Looking at porcelain: Towards a generic, distribution agnostic tool for building packages from a VCS

#+TITLE: Looking at porcelain: Towards a generic, distribution agnostic tool for building packages from a VCS #+AUTHOR: Manoj Srivastava #+EMAIL: srivasta@debian.org #+DATE: #+LANGUAGE: en #+OPTIONS: H:0 num:nil toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:t TeX:t LaTeX:t skip:nil d:nil tags:not-in-toc #+INFOJS_OPT: view:showall toc:nil ltoc:nil mouse:underline buttons:nil path:http://orgmode.org/org-info.js #+LINK_UP: http://www.golden-gryphon.com/blog/manoj/ #+LINK_HOME: http://www.golden-gryphon.com/ This is a continuation from #+BEGIN_HTML before. #+END_HTML Before I go plunging into writing code for a generic =vcs-pkg= implementation, I wanted to take a close look at my current, working, non-generic implementation: making sure that the generic implementation can support at least this one concrete work-flow will keep me grounded. One of the features of my home grown porcelain for building package has been that I use a fixed layout for all the packages I maintain. There is a top level directory for all working trees. Each package gets a sub-directory under this working area. And in each package sub-directory, are the upstream versions, the checked out VCS working directory, and anything else package related. With this layout, knowing the package name is enough to locate the working directory. This enable me to, for example, hack away at a package in Emacs, and when done, go to any open terminal window, and say =stage_release kernel-package= or =tag_releases ucf= without needing to know what the current directory is (usually, the packages working directory is several levels deep -- =/usr/local/git/debian/make-dfsg/make-dfsg-3.91=, for instance. However, this is less palatable for a generic tool -- imposing a directory structure layout is pretty heavy. And I guess I can always create a function called ~cdwd~, or something, to take away the tedium of typing out long ~cd~ commands. Anyway, looking at my code, there is the information that the scripts seem to need in order to do their work. - *Staging area*. This is where software to be built is exported (and this area is visible from my build virtual machine). + User specified (configuration) - *Working Area*. This is the location where all my packaging work happens. Each package I work on has a sub-directory in here, and the working directories for each package live in the package sub-directory. /Note: Should not be needed/. + User specified. - *Working directory*. This is the checked out tree from the VCS, and this is the place where we get the source tree from which the package can be built. + Since we know the location of the working are, if the package name is known, we can just look in the package's sub-directory in the working area. * For =rpm= based sources, look for the ~spec~ file * For Debian sources, locate ~debian/rules~ + If package name is not known, look for ~spec~ or ~debian/rules~ in the current directory, and parse either the ~spec~ file or ~debian/changelog~. + If in a VCS directory, look for the base of the tree - ~tla tree-root~ - ~bzr info~ - ~git rev-parse --show cdup~ - ~hg root~ - You have to climb the tree for subversion + If you are in a ~debian~ directory, and ~changelog~ and ~rules~ files exist Then, look for the ~spec~ file or ~debian/rules~ in the base directory - *package name* + User specified, on the command line + If in the working directory of the package, can be parsed from the ~spec~ or ~changelog~ files. - *upstream tar archive* + Usually located in the parent directory of the working directory (the package specific sub-directory of the working area) + If ~pristine-tar~ is in use, given two trees (branches, commits. etc), namely: * a tree for upstream (/default: the branch ~upstream~/) * a tree for the delta (/default: the branch ~pristine-tar~/) The tree can be generated * Given an upstream tree (/default: the branch ~upstream~/), a tar archive can be generated, but is likely to be not bit-for-bit identical to the original ~tar~ archive. So, if I do away with the whole working area layout convention, this can be reduced to just requiring the user to: - Specify *Staging area* - Call the script in the working directory (=dpkg-buildpackage= imposes this too). - Either use ~pristine-tar~ or have the upstream ~tar~ archive in the parent directory of the working directory Hmm. One user specified directory, where the results are dumped. I can live with that. However, ~gitpkg~ has a different concept: it works purely on the git objects, you feed it upto three tree objects, the first being the tree with sources to build, and the second and third trees being looked at only if the upstream tar archive can not be located, and passes the trees to pristine tar to re-construct the upstram tar. The package name and version are constructed _after_ the source-tar archive is extracted to the staging area. I like the minimality of this. This is continued #+BEGIN_HTML here. #+END_HTML

Manoj

Thursday 16 April
2009
Link: Towards a generic, distribution agnostic tool for building packages from a VCS

Posted late Thursday evening, April 16th, 2009

License: GPL

Towards a generic, distribution agnostic tool for building packages from a VCS

#+TITLE: Towards a generic, distribution agnostic tool for building packages from a VCS #+AUTHOR: Manoj Srivastava #+EMAIL: srivasta@debian.org #+DATE: #+LANGUAGE: en #+OPTIONS: H:0 num:nil toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:t TeX:t LaTeX:t skip:nil d:nil tags:not-in-toc #+INFOJS_OPT: view:showall toc:nil ltoc:nil mouse:underline buttons:nil path:http://orgmode.org/org-info.js #+LINK_UP: http://www.golden-gryphon.com/blog/manoj/ #+LINK_HOME: http://www.golden-gryphon.com/ I have been involved in =vcs-pkg.org= since around the time it started, a couple of years ago. The discussion has been interesting, and I learned a lot about the benefits and disadvantages of serializing patches (and collecting integration deltas in the feature branches *and* the specific ordering of the feature branches) and maintaining integration branches (where the integration deltas are collected purely in the integration branch, but might tend to get lost in the history, and a fresh integration branch having to re-invent the integration deltas afresh). However, one of the things we have been lax about is getting down to brass tacks and getting around to being able to create generic packaging tools (though for the folks on the serializing patches side of the debate we have the excellent =quilt= and the =topgit= packages). I have recently mostly automated my git based work-flow, and have built fancy porcelain around my git repository setup. During IRC discussion, the =gitpkg= script came up. This seems almost usable, apart from not having any built-in =pristine-tar= support, and also not supporting =git submodules=, which make is less useful an alternative than my current porcelain. But it seems to me that we are pretty close to being able to create a distribution, layout, and patch handler agnostic script that builds distribution packages directly from version control, as long as we take care not to bind people into distributions or tool specific straitjackets. To these ends, I wanted to see what are the tasks that we want a package building script to perform. Here is what I came up with. 1. Provide a copy of one or more upstream source tar-balls in the staging area where the package will be built. This staging area may or may not be the working directory checked out from the underlying VCS; my experience has been that most tools of the ilk have a temporary staging directory of some kind. 2. Provide a directory tree of the sources from which the package is to be built in the staging area 3. Run one or more commands or shell scripts in the staging area to create the package. These series of commands might be very complex, creating and running virtual machines, chroot jails, satisfying build dependencies, using copy-on-write mechanisms, running unit tests and lintian/puiparts checks on the results. But the building a package script may just punt on these scripts to a user specified hook. The first and third steps above are pretty straight forward, and fairly uncontroversial. The upstream sources may be handled by one of these three alternatives: 1. compressed tar archives of the upstream sources are available, and may be copied. 2. There is a pristine-tar VCS branch, which in conjunction with the upstream branch, may be used to reproduce the upstream tr archive 3. Export and create an archive from the upstream branch, which may not have the same checksum as the original branch The command to run may be supplied by the user in a configuration file or option, and may default based on the native distribution, to =dpkg-buildpackage= or =rpm=. There are a number of already mature mechanisms to take a source directory and upstream tar archive and produce packages from that point, and the wheel need not be re-invented. So the hardest part of the task is to present, in the staging area, for further processing, a directory tree of the source package, ready for the distribution specific build commands. This part of the solution is likely to be VCS specific. This post is getting long, so I'll defer presenting my evolving implementation of a generic =vcs-pkg= tool, ~git~ flavour, to the next blog post. This is continued #+BEGIN_HTML here. #+END_HTML

Manoj

Thursday 16 April
2009
Link: The glaring hole in most git tools, or the submodule Cinderella story

Posted in the wee hours of Wednesday night, April 16th, 2009

License: GPL

The glaring hole in most git tools, or the submodule Cinderella story

#+TITLE: The glaring hole in most git tools, or the submodule Cinderella story #+AUTHOR: Manoj Srivastava #+EMAIL: srivasta@debian.org #+DATE: #+LANGUAGE: en #+OPTIONS: H:0 num:nil toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:t TeX:t LaTeX:t skip:nil d:nil tags:not-in-toc #+INFOJS_OPT: view:showall toc:nil ltoc:nil mouse:underline buttons:nil path:http://orgmode.org/org-info.js #+LINK_UP: http://www.golden-gryphon.com/blog/manoj/ #+LINK_HOME: http://www.golden-gryphon.com/ There are a lot of little git scripts and tools being written by a lot of people. Including a lot of tools written by people I have a lot of respect for. And yet, they are mostly useless for me. Take git-pkg. Can't use it. Does not work with git submodules. Then there is our nice, new, shiny, incredibly bodacious "3.0 (git)" source format. Again, useless: does not cater to submodules. I like submodules. They are nice. They allow for projects to take upstream sources, add Debian packaging instructions, and put them into git. They allow you to stitch together disparate projects, with different authors, and different release schedules and goals, into a coherent, integrated, software project. Yes, I use git submodules for my Debian packaging. I think it is conceptually and practically the correct solution. Why submodules? Well, one of the first things I discovered was that most of the packaging for my packages was very similar -- but not identical. Unfortunately, the previous incarnation of my packages with a monolithic rules file in each ~./debian/~ directory, it was easy for the rules files in packages to get out of sync -- and there was no easy way to merge changes in the common portions an any sane automated fashion. The ~./debian/~ directories for all my packages package that they are instrumental in packaging. So, since I make the ~./debian/~ directories branches of the same project, it is far easier to package a new package, or to roll out a new feature when policy changes -- the same commit can be applied across all the branches, and thus all my source packages, easily. With a separate =debian-dir= project, I can separate the management of the packaging rules from the package code itself. Also, I have abstracted out the really common bits across all my packages into a ~./debian.common~ directory, which is yet another project, and included in as a submodule in all the packages -- so there is a central place to change the common bits, without having to duplicate my efforts 30-odd times. Now people are complaining since they have no idea how to clone my package repositories, since apparently no one actually pays attention to a file called ~.gitmodules~, and even when they do, they, and the tools they use, have no clue what to do with it. I am tired of sending emails with one off-cluebats, and I am building my own porcelain around something I hope to present as a generic =vcs-pkg= implementation soon. The firs step is a wrapper around =git-clone=, that understands git submodules. So, #+BEGIN_HTML here #+END_HTML is the browsable code (there is a link in there to the downloadable sources too). Complete with a built in man page. Takes the same arguments as =git-clone=, but with fewer options. Have fun.

Manoj

Wednesday 25 February
2009
Link: A day in the life of a Debian hacker

Posted terribly early Wednesday morning, February 25th, 2009

License: GPL

A day in the life of a Debian hacker

#+TITLE: A day in the life of a Debian hacker #+AUTHOR: Manoj Srivastava #+EMAIL: srivasta@debian.org #+DATE: #+LANGUAGE: en #+OPTIONS: H:0 num:nil toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:t TeX:t LaTeX:t skip:nil d:nil tags:not-in-toc #+INFOJS_OPT: view:showall toc:nil ltoc:nil mouse:underline buttons:nil path:http://orgmode.org/org-info.js #+LINK_UP: http://www.golden-gryphon.com/blog/manoj/ #+LINK_HOME: http://www.golden-gryphon.com/ I have been meaning to write this up for a long time now, since I #+BEGIN_HTML Packaging activity diagram #+END_HTML vaguely made a promise to do so last Debconf. I have also been wondering about the inefficiencies in my work-flow, but I kept postponing my analysis since there were still large gaps in my packaging automation since I moved off Arch as my SCM of choice. However, recently I have taken a sabbatical from Debian, so I've had time to complete bits and pieces of my package building framework, enough so that I could no longer justify putting off the analysis. I tried writing it up, but the result confused even me; so I instead recorded every shell command during a recent series of packaging tasks, and converted that into a nice, detailed, activity diagram that you see over here. This is as efficient a work-flow as I have been able to come up with. #+BEGIN_HTML details here #+END_HTML Along with a git commit hook script, that parses the commit log and adds pending tags to bugs closed in the commit, the figure above represents my complete work-flow -- down to the details of every /cd/ command I executed. I think there are too many steps still. Feedback and commentary would be appreciated, as well as any suggestions to improve efficiency.

Manoj

Friday 04 April
2008
Link: Schemes for packaging using Git: An analysis

Posted Friday evening, April 4th, 2008

License: GPL

Schemes for packaging using Git: An analysis

"Are you rebasing or merging?" seems to be the 64 thousand dollar question over in vcs-pkg discussions. Various people have offered their preferences, and indeed, several case studies of work flows have been presented, what is lacking is an analysis of the work-flow; an exploration of which methodology has advantages, and whether there are scenarios in which the other work flow would have been better.

Oh, what are all these work flows about, you ask? Most of the issues with packaging software for distributions have a few things in common: there is a mainline or upstream source of development. There are zero or more independent lines of development or ongoing bug fixes that are to be managed. And then there is the tree from which the distribution package is to be built. All this talk about packaging software work flows is how to best manage asynchronous development upstream and in the independent lines of development, and how to create a coherent, debuggable, integrated tree from which to build the distributions package.

The rebasing question goes to the heart of how to handle the independent lines of development using git; since these lines of development are based off the main line of development, and must be periodically synchronized. Here is a first look at a couple of important factors that will have bearing on that question, and packaging software for a distribution using Git in general. This is heavily geared towards git (nothing else does rebases so easily, I think), but some of the concepts should be generic. I am not considering the stacked set of quilt patches source controlled with Git in this article (I don't understand that model well enough to do an analysis)

As a teaser, there is a third answer: neither. You can just add an independent line of development, and just let it sit: don't rebase, and don't merge; and in some circumstances that is a winning strategy.

Manoj

Wednesday 02 April
2008
Link: Migrating to Git

Posted in the wee hours of Tuesday night, April 2nd, 2008

License: GPL

Migrating to Git

I have been using Arch to package my Debian packages since 2003; which means that Arch has had a good long run as my SCM of choice. I have been using CVS for a few years before I moved to arch, and the migration took me about six months, since it involved a while new philosophy of packaging; I am hoping that migrating to git would not involve such a major paradigm shift, and thus be less disruptive and time consuming. What follows is a narrative of my efforts to get educated about Git.

This article is meant to be an annotated, selective, organized set of links to information about Git. How does it differ from the myriad of other link collections about Git proliferating on the web? Well, the value add is in the annotations and the organization: while not quite a narrative of my exploration, this is an idealized version of what I think my discovery process should have been, to be most effective. Staging the information is important; google finds one lots of information that is incomprehensible to someone just coming to Git. This selection of links is actually selective; I have included only pointers to resources that fed me information at the level that I could handle at that stage, and I have eliminated links to information that was not new at that point. I have tried to select the best (in terms of information and clarity) of breed for each kind of information source I have come across so far.

There is a caveat: while still a beginner, though I am able to better judge now what is confusing to a beginner than I shall be when I have become more familiar with the system, I am still enough of a novice not to trust my judgement on what really is best practice. I can fix the latter as I gain experience, but then I'll need to be careful not to overload on complexity too early in the learning curve.

On the down side, this selection is subjective, and probably shall be even in the long term: I include what appealed to me, and will probably miss loads of pointers to information that I have not yet come across. However, I hope this will make it easier for other people to reach the same goal: use git for their version control needs.

Have fun.

Manoj


Webmaster <webmaster@golden-gryphon.com>
Last commit: in the wee hours of Friday night, May 31st, 2014
Last edited in the wee hours of Friday night, May 31st, 2014