Monday, July 21, 2008
New Job at Sun
I would like to take this opportunity as well to thank my former colleagues at credativ GmbH for their support of the PostgreSQL project and my own career. I wish them much success in their continued development.
So, I expect that I will have more time to contribute to PostgreSQL development from now on, and both Sun and I have a sizeable backlog of projects and ideas that we would like to realize. Time to get started!
Thursday, June 19, 2008
Schema design and ID fields
CREATE TABLE a (id int PRIMARY KEY, value text);
CREATE TABLE b (id int PRIMARY KEY, data text, a_id int REFERENCES a (id));
CREATE TABLE c (id int PRIMARY KEY, a_id int REFERENCES a (id), moredata text);
They all have id columns, but they are all different sets of IDs. Moreover, the same set of IDs is sometimes knows as "id", sometimes as "a_id". Some time ago I come to the conclusion that this naming scheme is bad. It makes sense locally within the table, because a.a_id would be redundant. But I think it has advantages to be redundant here anyway.
The first advantage is that you can use this join syntax:
a JOIN b USING (a_id) JOIN c USING (a_id)
If you build denormalized intermediate tables (materialized views) of this type
CREATE TABLE a_b AS SELECT * FROM a, b, c WHERE a.id = b.a_id AND a.id = c.a_idit will fail because the join will have multiple columns named a_id. Unless using the USING join syntax, it won't know they are actually the same. So you would have to write out the column list instead of the asterisk in the select list.
Also, once your SQL queries become more complicated, say using table aliases and subselects, it is easy to lose track of what foo.id means in a particular context.
For all of these reasons, the general rule arises that a column name should be globally unique within a database schema. This should actually be checkable automatically if you allow duplicates only for columns connected by foreign key constraints.
There is another idea you can play with while developing your database schema. You start by creating (conceptually) one big table containing all the columns that are needed to store your data, which obviously requires globally unique column names. Then you use the decomposition algorithm to normalize this table down to the individual tables that you actually implement. Some might find this preferrable to converting ER diagrams.
Friday, June 13, 2008
Debian Release Goals
First of all, I think the concept of release goals is a fantastic idea. It may be one of the most important conceptual moves in Debian of late. It allows developers to implement distribution-wide changes without having to negotiate with every package maintainer individually and without having to go through the vicious no-change-without-policy/no-policy-without-established-practice maze. It moves the focus back on operating system development instead of package hoarding.
In the future, I think we should come up with some "sexier" release goal descriptions. If you look over the current list, many of these tasks are implementation-level, without any obvious benefit to users: double-build support? no unmet recommends? Or they have incomprehensible descriptions. The pretty much only release goal that has a real user benefit, namely the piuparts cleanness, ensuring clean distribution upgrades, is both the most cryptic and actually the most neglected one in terms of development attention. Don't get me wrong, all these goals are very valid. For better or worse, the press has started to pick up these release goals to put in their various Debian release candidate timeline pre-announcement articles, and next time around we should have something cool in there. If armel support or KDE 4 integration had been release goals, the marketing effect would be better.
The next thing is tracking bugs. If you have some time on your hands, how do you find a release goal bug that you can help with? I had worked up this wiki page to help myself, but it is still too weird. The fact that bts.turmzimmer.net exists shows that the BTS is lacking features or is too hard to use or both. I'm thinking, we should regularly copy the whole BTS into an SQL database and make queries from there. Think aggregation, grouping, full-text search! I'm not familiar with the BTS internals or the LDAP gateway, but I know my SQL. So if anyone knows the other end of this deal, I'd love to chat about setting something like this up.
So then, what happens to the release goals after the release? Has this been thought through?
Obviously, some release goals become technically self-enforcing. Python 2.5 and GCC 4.3 are now the default, and anything that doesn't work with these will have permanent trouble in the future.
Release goals that implement a distribution-wide feature, such as the LSB-based init system enhancements, will really need to become part of policy after the release. Otherwise they would revert to being normal bugs without any NMU powers and the other special attention given to release goals. So over time, the support for the former release goals across packages would again decline. To keep it up, you'd either have to repeat the release goal indefinitely, which would be silly, especially when there are no problems right now, or you'd have to make it a part of policy to enforce it in the future. Essentially, you'd have to consider most release goals trial policy changes, which become permanent if successful.
Near-examples of the above are the eternal release goals of IPv6 support and LFS support. We really have these under control quite well now -- although I still wouldn't dare to call Debian or any general-purpose Linux distribution fully IPv6-ready -- but nowadays these problems are almost all upstream code problems that Debian can't really fix very well on the packaging side. So under the current approach these release goals will exist forever, or worse, will be dropped from release goal status.
And then of course, some release goals fail to be reached. These should reapply for the next release, if interest persists. The dash goal comes to mind.
By the way, we had 13 release goals. 3 are completely done, 5 are pretty much done (less than 10 bugs), 1 more looks like it could get pretty much done before the release, and 4 will most likely fail to be accomplished.
Saturday, May 31, 2008
VirtualBox on lenny
In the meantime, I notice that the Debian lenny installer beta 1 does not boot in VirtualBox OSE as included in Debian. It freezes right after the "OK, booting the kernel ..." message. I had to run "install noapic nolapic acpi=off" from the boot prompt. (Perhaps you don't need all of these.) Does anyone know the reason for this? It'd be great if this could be fixed; otherwise using VirtualBox will become very annoying in the future, in my estimate.
Friday, May 30, 2008
KDE 4.1 experience
Here is a random assortment of problems I encountered:
- You can't configure size and appearance of the panel. This is a long-standing problem, but I thought it was fixed.
- In general, there don't appear to be all that many configuration options available to replicate various behaviors that I have become accustomed to.
- Rendering problems in the system tray; icons appear on top of each other.
- Various kicker widgets don't appear to be available for KDE 4 (yet?).
- Most applications don't copy their old settings, but it appears that some do?
- KMail doesn't interact well with encrypted IMAP accounts. I didn't try much beyond that, because I have no accounts that are not affected by that.
- Clicking on an RSS link in Konqueror doesn't link to Akregator anymore. Probably, various MIME type associations are missing.
- I couldn't get the environment to remember to open terminal windows maximized. Very annoying.
- I couldn't add any application icons to the panel. I'm not sure where on is supposed to store one's favorites now.
The good news is that the packaging appears to be really robust (I found one packaging bug with a file conflict) and the upgrade and downgrade works well (you have to use aptitude, as described in Ana's post; apt craps out completely).
So when KDE 4 is actually usable, Debian will be ready. But not for lenny; that would be suicide in my opinion.
Monday, May 26, 2008
Mission Accomplished
Saturday, May 24, 2008
PGCon Day Four
PGCon 2008 is over, and I couldn't help feeling somewhat sad.
The second day of talks also featured many good presentations. My own also went quite well. We concluded the day with a meeting of the core team members who were present in Ottawa, and there will probably be a few actions coming out of that. Don't expect anything too dramatic, though.
Thanks to everyone for this great conference, and I'll see all of you next year if not sooner.
Friday, May 23, 2008
PGCon Day Three
I welcome back my faithful readership to their regularly scheduled program. Thursday was the first day of talks at PGCon, as evidenced by the rise of blog posts on planetpostgresql under a "day 1" label. Welcome newbies! :-)
The talks were really good. I won't go into the details here; the slides should be available on the conference web site. For those not here I will summarize the sentiments of the day as Everyone is hiring. If to you, "vacuum" means cleaning the disk rather than cleaning the house, there appear to be literally dozens of jobs just waiting to be grabbed.
In the evening we had the EnterpriseDB dinner. I think I caught a cold in the draft. Must save my voice for my presentation.
Thursday, May 22, 2008
PGCon Day Two
Wednesday morning, I went geocaching again, attempting to complete the second UO cache, but ended up at a pile of rubble again. Hmm. Maybe this game works differently here in Canada. I hope I get to drop my travel bug before I leave though.
The main event of the day, at least for the major developers, was the developer meeting, which I incidentally managed to locate with my GPS device. I was skeptical about this meeting beforehand, but I have to say, it was extremely useful and enjoyable. The wiki page linked to above contains the meeting minutes. So I guess I'm on the hook now to kill off some PostgreSQL mailing lists, and perhaps we'll have a prototype cmake-based build system for PostgreSQL sometime.
Wednesday, May 21, 2008
PGCon Day One
I went geocaching this morning, because I have a Travel Bug to drop in Canada. But I only found a pile of rubble. Ran into Bruce just getting out of his cab from the airport on the way back.
I presented my tutorial on porting Oracle applications to PostgreSQL in the afternoon. This went quite well, and I received encouraging feedback afterwards. The slides of the presentation should appear on the PGCon web site at the linked URL. For those who were confused by the somewhat cynical tone of the presentation: I have been traumatized a bit by the issue. By and large many applications are quite easy and straightforward to port. I certainly do encourage these efforts.
One thing that came up after the presentation that I have not considered in great depth is the issue of performance of the ported result. In the discussion, a few possibilities were mentioned:
- The Oracle application is so carefully tuned with optimizer hints, it will never perform on PostgreSQL. We probably can't/won't port it.
- Half the time of a porting project will be required to tune the PostgreSQL port, because Oracle optimizes bad queries much better.
- Some things perform better in PostgreSQL, some worse. It probably averages out.
- With the money you save with PostgreSQL, you can afford better hardware.
More insight on this issue would be welcome.
I think I'll go on a second geocaching attempt now and hit the Royal Oak pub later with the rest of the group.
Note for those coming from across the pond: The Champions League final is televised on TSN and RDS beginning at 14:00 tomorrow. You get both of these channels in the university residence. But there's the developer meeting ...
Tuesday, May 20, 2008
PGCon Day Zero
Tuesday, April 29, 2008
Lenny release goals wiki page
Thursday, April 24, 2008
The wiki is great!
Well, I asked for a wiki almost three years ago, but it is finally here in official capacity. Thanks! In the few weeks of its existence, we have already seen great progress in documenting many development issues that were previously undocumented or hidden in strange places. I have started to keep my personal notes and todo items, which were previously spread around various files and pieces of papers, under User:Petere.
I have been observing the wikification in the Debian project for a few years now. I expect that in the near future, all of the contents relevant to developers will have migrated from the main web site to the wiki.
A Sporty PGCon!
Monday, April 14, 2008
News on the PostgreSQL RPM packages for SUSE
Wednesday, March 19, 2008
PostgreSQL RPM packages for SUSE
Friday, March 7, 2008
Notification about available package upgrades
0 */8 * * * apt-get -qq update && apt-get -dqq dist-upgrade && apt-get -qq --simulate dist-upgrade | grep ^Inst
If your root mail goes to some place you read (probably a good idea), you will get a list of packages it wants to upgrade. When running stable, this will also effectively send you alerts about security updates.
Tuesday, March 4, 2008
Readding implicit casts in PostgreSQL 8.3
I have used a shell script to regenerate the removed casts including the required cast functions. The result is pg83-implicit-casts.sql. I have tested this against some of the recent "bug reports" I found on the mailing lists, so it should work for some people at least. If someone wants to use this, I suggest you only add the casts you really need, not the whole file.
(Note: This blog entry was recovered after a server crash and does not include any of the original comments. Those comments contained additional important insights about the restoration of the casts, which have unfortunately been lost now. The gist was, only restore the casts you need, not all of them.)
Wednesday, February 27, 2008
Announcing git.postgresql.org
Friday, February 22, 2008
On patch review
We currently have a well-established and arguably successful process for maintaining existing features. We have a public source code repository, test infrastructure, and many people willing and able to contribute. Committers can fix small issues immediately and larger issues with small overhead. Others can work on small issues with the small overhead of sending a patch that will soon get committed, and they can work on large issues, well, that is the debate. If more than one person wants to contribute, they obviously can. We have many hands and eyes working on everything, and that works well.
We don't have any of that for new feature patches, the staging area of development. The source code repository is a set of possibly related multiple 10k patches spread around the mailing list archives. The only way for others to get involved in small or large ways is by chance learning about the feature proposal, fetching the patch, which will usually no longer apply cleanly, alter it and send an equally huge patch back. If more than one person does that, it's impossible, unless you want to get involved in interdiff madness. Meanwhile the original developer has to sit still waiting for review, or more likely, continues development, which invalidates all the review work.
Now we also know that reviewing the core logic of patches is in fact hard and needs skilled people of which we would always like to have more. But much of our CVS-based development works with many little hands, and so should feature patch review.
Here is how I think patch review could work better. Someone writing a feature patch should use a distributed version control system that ties into our currently-CVS trunk. He publishes that repository either via web or via email. Now the little hands can get involved immediately. They can pull local copies and review the commit history. They can send small bug fixes, formatting, wording improvements. They can update the patch for new developments in the mainline or investigate collisions with other features in development. Someone else can start writing documentation. The original developer can merge those changes back and continue based on them. Meanwhile, a useful revision history is always available for more people to get involved. Additionally, by a simple link published in a wiki, say, you have obsoleted all the needs for patch trackers or the like. Other people, perhaps more skilled reviewers can offer early comments and easily review development progress.
That is how I would like to work. I have started to use the Git repository for my development work now because I believe that could make it happen. But you can replace your own technology if you think it can achieve similar effects. (In fact, Git is in my mind so ridiculously better than our current anonymous CVS setup and replaces CVSup as well, there is no reason -- other than the need to learn some new tools -- for anyone who has no CVS write access to even bother with CVS anymore.)
Incidentally, quite a few people have shown interest during the mailing list discussion in trying out source code repositories mirrored off CVS to something of their liking. I hope that way we can find more optimal development methods in the near future.
Thursday, February 21, 2008
Debian PostgreSQL Packaging Project
Obviously, Debian already contains a large number of packages related to PostgreSQL. The idea behind this project is to get the involved maintainers together, concentrate resources, exchange ideas, and allow more people to get involved in small ways. See our web site about participating.
Friday, February 8, 2008
Thursday, January 31, 2008
CDBS usage statistics
Sources-sid-main-2005-04 9.76 10.02
Sources-sid-main-2005-05 10.20 10.62
Sources-sid-main-2005-06 10.36 10.76
Sources-sid-main-2005-07 10.76 11.16
Sources-sid-main-2005-08 11.05 11.36
Sources-sid-main-2005-09 11.48 12.32
Sources-sid-main-2005-10 12.08 13.12
Sources-sid-main-2005-11 12.66 13.68
Sources-sid-main-2005-12 13.13 14.13
Sources-sid-main-2006-01 13.29 14.37
Sources-sid-main-2006-02 13.59 15.00
Sources-sid-main-2006-03 13.97 15.34
Sources-sid-main-2006-04 14.38 15.73
Sources-sid-main-2006-05 14.53 16.17
Sources-sid-main-2006-06 14.71 16.30
Sources-sid-main-2006-07 15.17 16.54
Sources-sid-main-2006-08 15.84 17.04
Sources-sid-main-2006-09 16.23 17.07
Sources-sid-main-2006-10 16.60 17.24
Sources-sid-main-2006-11 16.97 17.56
Sources-sid-main-2006-12 17.46 17.90
Sources-sid-main-2007-01 17.66 17.97
Sources-sid-main-2007-02 17.87 18.12
Sources-sid-main-2007-03 18.06 18.38
Sources-sid-main-2007-04 18.27 18.68
Sources-sid-main-2007-05 18.86 19.39
Sources-sid-main-2007-06 19.64 20.66
Sources-sid-main-2007-07 19.79 20.93
Sources-sid-main-2007-08 20.13 21.15
Sources-sid-main-2007-09 20.32 21.33
Sources-sid-main-2007-10 20.81 21.95
Sources-sid-main-2007-11 21.16 22.02
Sources-sid-main-2007-12 21.48 22.17
Sources-sid-main-2008-01 22.12 22.67
World domination in about 2020. :)
Tuesday, January 8, 2008
A Sporty PgCon?
Monday, January 7, 2008
Problems with newer kernels
Saturday, January 5, 2008
Configuration files with Git
Now I have converted this to Git, which appears to address these problems. The repository information is entirely contained in the .git directory in the top level of the working tree. I haven't transferred the tree to a different host yet, so that will be next week's project.
Now I'll go ahead and commit the .gitconfig file into the GIT repository. :)