Peter Eisentraut's Blog: May 2009

Saturday, May 23, 2009

PGCon 2009: Fourth Day - The File System Strikes Back

Leading topics among the talks this year at PGCon were evidently database version control, testing, and deployment methodology questions and answers, as I mentioned yesterday, as well as "queueing". Although you need to realize that there are actually two separate applications of the word queueing floating around: one being the message broker related usage, the other related to queueing theory for predicting database performance. Anyway, if you are queueing, you are doing something right this year. In general, I am amazed every year how many participants appear to have the same set of issues, and then a completely different same set of new issues the next year. So at least we are either all completely off track or mostly on the same right track.

In the lightning talks I announced the availability of postgresqlfs, a small project of mine that I theorized about many years ago and which I finally managed to implement within basically two days. If you liked PL/sh, this is the new deal for you. In fact, postgresqlfs allows you to execute your PL/sh functions directly from the shell, which is what you probably should have done in the first place. ;-)

Friday, May 22, 2009

PGCon 2009: Third Day - A New Hope

So this is going to be the new world order. Check your database into a version control system. No, not your .sql files, your database! Check it out, do changes, check it in. Wait, before you check it in, run your test suite. No, not your application test suite, your database test suite! Kind of how you develop your other code, right? Right? Together, Post Facto and pgTAP, and the spirit they represent, might be the most sensible things since the invention of the file system. And there appear to be one or two or three more talks on the program about comparing and consolidating and de-messing your databases, so that appears to be a theme this year. It's about time we organize this. Thanks guys.

Thursday, May 21, 2009

PGCon 2009: Second Day

The developer meeting turned out to be very useful, I thought. We decided to divide the PostgreSQL community into two groups for the next release cycle: one group works on hot standby, one group on synchronous replication. Everyone, please pick a camp and help out. These features are arguably the top adoption issues for PostgreSQL now, and we don't have enough people working on them.

My body clock is still out of whack. I wake up at 5 in the morning. Seems to be a common problem among Europeans, I gather. A good time to hack. Slony-I versions 1.2.16 and 2.0.2 are now uploaded to Debian. There you go.

Wednesday, May 20, 2009

PGCon 2009: First Day

So we're back in Ottawa once more. The trip has been getting smoother over the years. Back in the residence tower overlooking the city. Met some friends and colleagues at the Royal Oak. Noted that they sell Coors under Canadian beers. Baseball on TV. Mmh ... baseball ...

Before everyone asks: I'm not running the Ottawa Marathon or anything else this year. I just ran a half-marathon in Helsinki last week, and that is enough for this month.

The developer meeting is next. I am ready for six and a half hours of Git and anti-Git bashing. ;-)

Monday, May 18, 2009

Regression test code coverage reports

I have been collecting monthly PostgreSQL regression test code coverage reports at <http://developer.postgresql.org/~petere/coverage/>. So if you are wondering what this thing is but haven't had the courage to try it out yourself, there is your chance. (Hmm, buildfarm integration could be nice, someday.)

We have had a line coverage rate of about 66% steadily for a few months now (well, it's feature freeze). The lcov tool labels that as "green" (=good). The new version of lcov, which I have in use as of the April report, also reports function coverage, where we have about 73%, which lcov labels as "red" (=bad).

For the next release cycle, I have two goals in this area: First, expand the test coverage reporting to the entire source tree, not only the backend. And second, improve the test coverage of various neglected areas. There is reduced coverage, for example, in the areas of non-btree indexes, vacuuming, recovery, GEQO; and once we analyze other parts of the source tree, we will probably find gaping holes there.

Wednesday, May 13, 2009

The Big Shots

As the occasional thinker about open-source development practices, communities, and issues, I have been wondering for a while: What are the largest open-source projects? What projects have the most code, the most users, and the most issues to deal with, and how do they cope?

The Debian archive should provide some insights into the first one or two questions, as it contains a very large portion of all available and relevant open-source software and exposes them in a fairly standard form. In the old days one might have gotten out grep-dctrl to create some puzzling statistics, but nowadays this information is actually available in an SQL database: the Ultimate Debian Database (UDD). (And it's in PostgreSQL. And it comes with a postgresql_autodoc-generated schema documentation. Excellent.)

So here is a first question. Well, the zeroth question would have been, which source packages have the largest unpacked orig tarball, but that information doesn't seem to be available, either via UDD or via apt. So the first question anyway is, which source packages produce the largest installation size across all their binary packages:

udd=> SELECT source, sum(installed_size)/1024 AS mib FROM packages WHERE distribution = 'debian' AND release = 'sid' AND component = 'main' AND architecture IN ('all', 'i386') AND section <> 'debug' GROUP BY source, version ORDER BY mib DESC LIMIT 30;                                                                                   
     source       | mib                                                                                                                                   
------------------+------                                                                                                                                 
openoffice.org    | 1797                                                                                                                                  
kde-l10n          |  648                                                                                                                                  
gcj-4.4           |  544                                                                                                                                  
vtk               |  465                                                                                                                                  
linux-2.6         |  404                                                                                                                                  
openclipart       |  353                                                                                                                                  
vegastrike-data   |  311                                                                                                                                  
ghc6              |  308                                                                                                                                  
gclcvs            |  303                                                                                                                                  
wesnoth           |  300                                                                                                                                  
fpc               |  269                                                                                                                                  
axiom             |  256                                                                                                                                  
webkit            |  255                                                                                                                                  
gcc-snapshot      |  255                                                                                                                                  
lazarus           |  241                                                                                                                                  
kdebase-workspace |  226                                                                                                                                  
plt-scheme        |  221                                                                                                                                  
torcs-data-tracks |  219                                                                                                                                  
scilab            |  213
openscenegraph    |  211
eclipse           |  210
sagemath          |  201
insighttoolkit    |  198
acl2              |  195
kdebindings       |  181
atlas             |  165
gcl               |  163
trilinos          |  153
paraview          |  153
asterisk          |  144
(30 rows)

This produces a few well-known packages, but also a number of obscure ones. If you look closer, many of them appear to be themed around scientific, numerical, visualization, Scheme, Lisp, that sort of thing. Hmm.

Here is another idea. Take a package's installation footprint and multiply it by its popularity contest installation count. So you get some kind of maintenance effort score, either because the package is large or because you have a lot of users or both.

SELECT rank() OVER (ORDER BY score DESC), source, sum(installed_size::numeric * insts) AS score FROM packages JOIN popcon USING (package) WHERE distribution = 'debian' AND release = 'sid' AND component = 'main' AND architecture IN ('all', 'i386') GROUP BY source, version ORDER BY score DESC LIMIT 30;
rank |           source            |    score
-----+-----------------------------+-------------
   1 | openoffice.org              | 12638492332
   2 | mysql-dfsg-5.0              |  3411344560
   3 | eglibc                      |  3371485240
   4 | perl                        |  3019183024
   5 | evolution                   |  2669948000
   6 | samba                       |  2308923872
   7 | mesa                        |  1853902860
   8 | texlive-base                |  1684245516
   9 | gcj-4.3                     |  1610495484
  10 | foomatic-db-engine          |  1608178104
  11 | foomatic-db                 |  1423947704
  12 | inkscape                    |  1413910080
  13 | qt4-x11                     |  1258220636
  14 | gcc-4.3                     |  1248741312
  15 | kdelibs                     |  1021058256
  16 | gnome-applets               |   998434136
  17 | xulrunner                   |   958232688
  18 | coreutils                   |   954766896
  19 | openssl                     |   877067672
  20 | ncurses                     |   827679424
  21 | python2.5                   |   815826384
  22 | aptitude                    |   808161380
  23 | gimp                        |   786015124
  24 | gnome-utils                 |   781756328
  25 | nautilus                    |   774319690
  26 | openoffice.org-dictionaries |   761075576
  27 | eclipse                     |   756072380
  28 | dpkg                        |   736626200
  29 | openclipart                 |   731244240
  30 | wine                        |   707967500
(30 rows)

(Yeah, they run this thing on PostgreSQL 8.4 beta 1.)

I noticed linux-2.6 is suspiciously absent because of a low popcon score (?!?).

I don't want to dump the entire database into this blog post, but if you try this yourself you can look at about the first 200 to 300 places to find reasonably large and complex projects before it gets a bit more obscure. A few highlights:

  51 | gnupg                       |   455660464
  59 | php5                        |   386417572
  60 | mutt                        |   381148176
  83 | icu                         |   258602756
  84 | xorg-server                 |   255186332
 101 | exim4                       |   224857700
 107 | openssh                     |   215792828
 113 | tar                         |   201520400
 114 | postgresql-8.3              |   196844584
 115 | libx11                      |   195856564
 116 | ruby1.8                     |   194681656
 272 | emacs22                     |    62047476

This is obviously still biased in a lot of ways, but it does show the major projects.

The UDD is also an interesting use case that shows how you can deploy a PostgreSQL database as a semi-public service with direct access. A great tool, and a great tool to build other great tools on top of.

Europawahl

Ich habe letzte Woche die Briefwahlunterlagen zur Europawahl bekommen. Als Briefwähler hat man ja etwas Zeit, sich den Stimmzettel etwas genauer durchzulesen. So trifft man gelegentlich alte Bekannte wieder. Zum Beispiel sehe ich auf Platz 6 der Liste der CDU

Sabine Verheyen, Hausfrau, Aachen

Das ist die selbe Frau Verheyen, vermute ich ganz stark, die vor ein paar Jahren, also ich noch in Aachen wohnte, die Oberbürgermeisterwahl gegen

Jürgen Linden, Oberbürgermeister, Aachen

verloren hatte. Tipp vom einfachen Wahlvolk: Bei allem Respekt für Hausfrauen, schreiben Sie doch was anderes auf den Wahlzettel. Das sieht besser aus. Im Notfall so wie der Platz 2 auf der Liste DIE LINKE: "Angestellte". Geht doch.

Auf dem Stimmzettel befinden sich hier bei mir insgesamt 31 Wahlvorschläge. Vereinigungsfreiheit ist ja eine tolle Sache, aber man kann sich ja auch mal absprechen. Vielleicht könnten ja

10. Volksabstimmung, und
26. Für Volksentscheide

das nächste Mal gemeinsame Sache machen?

Und vielleicht finden sich auch unter

18. 50plus
22. Die Grauen
30. Rentnerinnen und Rentner Partei
31. Rentner-Partei-Deutschland

das nächste Mal Schnittpunkte?

Der eigentliche Knaller sind ja aber die Berufe der Kandidaten auf der Liste der Piratenpartei Deutschlands: Dipl.-Wirtschaftsmathematiker, Dipl.-Physiker, selbst. IT-Unternehmer, Informatiker, Geschäftsführer, Student, Student, Programmierer, Web-Entwickler, Consultant. Alles klar. ;-)

Dieses Mal gibt es übrigens wieder den Wahl-O-Mat. Den hatten Sie ja letztes Mal wohl gerichtlich verboten, weil er nicht alle Parteien berücksichtigt hatte. Dieses mal sind fast alle 31 dabei. Toll ...

Peter Eisentraut's Blog

Saturday, May 23, 2009

PGCon 2009: Fourth Day - The File System Strikes Back

Friday, May 22, 2009

PGCon 2009: Third Day - A New Hope

Thursday, May 21, 2009

PGCon 2009: Second Day

Wednesday, May 20, 2009

PGCon 2009: First Day

Monday, May 18, 2009

Regression test code coverage reports

Wednesday, May 13, 2009

The Big Shots

Europawahl

About Me

Recent Posts

Blog Archive

Writing

Hacking

Search This Blog

Some Rights Reserved

Labels

Saturday, May 23, 2009

Friday, May 22, 2009

Thursday, May 21, 2009

Wednesday, May 20, 2009

Monday, May 18, 2009

Wednesday, May 13, 2009

About Me

Recent Posts

Subscribe To

Blog Archive

Writing

Hacking

Search This Blog

Some Rights Reserved

Labels