Peter Eisentraut's Blog: July 2009

Wednesday, July 29, 2009

How to find all tables without primary key

Getting a list of all tables that don't have a primary key is occasionally useful when you need to update a legacy schema or repair someone else's database "design". There are a few recipes out there that peek into the PostgreSQL system catalogs, but here is a way that uses the information schema:

SELECT table_catalog, table_schema, table_name
    FROM information_schema.tables
    WHERE (table_catalog, table_schema, table_name) NOT IN
          (SELECT table_catalog, table_schema, table_name
               FROM information_schema.table_constraints
               WHERE constraint_type = 'PRIMARY KEY')
      AND table_schema NOT IN ('information_schema', 'pg_catalog');

The last line is useful because many predefined tables don't have primary keys.

Since you're using the standardized information schema, this query should be portable to other SQL database systems. The identical query works on MySQL, for example.

Monday, July 27, 2009

The UserVoice Votes Have Materialized

A few weeks ago, I started a PostgreSQL feedback forum at UserVoice. The votes have stabilized now, so it's time to take a first look.These are the five most requested features:

Hot standby
Materialized views
Synchronous replication
Parallel queries
Better administration and monitoring tools

1, 3, and 5 should not be surprises; we already saw them among Josh's development priorities. Parallel queries are also mentioned occasionally. But it's interesting to see materialized views so high.

I remember many years ago, materialized views were a somewhat hot topic, but things have cooled down significantly now. In a broader sense, materialized views are queries caches, and I did hear something about that at PGCon. In the meantime, Jonathan Gardner's article Materialized Views in PostgreSQL appears to remain the most frequently cited treatment on the subject. (The original link is down, unfortunately.) The PGCon 2008 tutorial Materialized Views that Really Work by Dan Chak is also worth looking at.

So, who's up for implementing built-in materialized views/query caching?

Also, if you haven't done so, submit your suggestions and votes at the UserVoice forum.

Saturday, July 18, 2009

How can I get the compiler to check my SQL queries?

Over at Coding Horror, I found this piece of code:

var posts = 
  (from p in DB.Posts
  where 
  p.PostTypeId == PostTypeId.Question &&
  p.DeletionDate == null &&
  p.Score >= minscore
  orderby p.LastActivityDate descending
  select p).
  Take(maxposts);

This is C# code with a piece of infrastructure called LINQ to SQL. And I learn:

The big hook here is that this is code the compiler actually understands. You get code completion, compiler errors if you rename a database field or mistype the syntax, and so forth. Perhaps best of all, you get an honest to goodness post object as output!

Is there any open-source stack, possibly involving PostgreSQL, that can do that? And I don't mean ECPG, which can check the SQL syntax, but does not verify the existence and types of tables, columns, etc. at compile time.

Any ideas?

Thursday, July 16, 2009

Adding Color to the Console: Pygments vs. Source-highlight

The other day I write about code syntax highlighting with less and source-highlight, when someone pointed out the Pygments package, that can do the same thing. This called for

the great source-highlight vs. pygments face-off

Installation

Source-highlight is in C++ and needs the Boost regex library. About 3.5 MB together. Pygments is in Python and has no dependencies beyond that. About 2 MB installed. No problem on either side.

In Debian, the packages are source-highlight and python-pygments. Note that the Pygments command-line tool is called pygmentize.

Source-highlight is licensed under the GPL, Pygments under a BSD license.

Getting Started

pygmentize file.c

writes a colored version of file.c to the standard output. Nice.

source-highlight file.c

writes a colored version of file.c to file.c.html. As I had written before, the correct invocation for this purpose is

source-highlight -fesc -oSTDOUT file.c

That makes pygmentize slightly easier to use, I suppose.

Supported Languages

Source-highlight supports 30 languages, Pygments supports 136.

Source-highlight can produce output for DocBook, ANSI console, (X)HTML, Javadoc, LaTeX, Texinfo. Pygments can produce output for HTML, LaTeX, RTF, SVG, and several image formats.

Source-highlight supports styles, but only ships a few. Pygments supports styles and filters, and ships over a dozen styles.

So more options with Pygments here.

Also note that Pygments is a Python library that can be used, say, in web applications for syntax highlighting. This is used in Review Board, for example. Source-highlight is just a command-line tool, but it could of course also be invoked by other packages. Horde uses this, for instance.

Speed

To process all the C files in the PostgreSQL source tree (709271 lines), writing the ANSI console colored version of each file.c to file.c.txt:

source-highlight: 25 seconds
pygmentize: 5 minutes

So for interactive use with less, source-highlight is probably still the better option.

Robustness

pygmentize gave me a few errors of this sort during the processing of the PostgreSQL source tree:

*** Error while highlighting:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 83: ordinal not in range(128)
  (file "/usr/lib/python2.5/site-packages/Pygments-1.0-py2.5.egg/pygments/formatters/terminal.py", line 93, in format)

*** Error while highlighting:
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 196-198: invalid data
  (file "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode)

That probably shouldn't happen.

source-highlight gave no spurious errors in my limited testing.

Miscellaneous

Source-highlight can highlight its own configuration files, which are in a custom language, and Pygments' configuration files, which are in Python. Conversely, Pygments can of course highlight its own configuration files, but doesn't know what to do with those of Source-highlight.

Summary

I will stick with Source-highlight for interactive use, but Pygments is a great alternative when you need more formatting options or want to integrate the package as a library.

Wednesday, July 15, 2009

Solid-State Drive Benchmarks With Latency Measurements

After my previous round of SSD benchmarks, I was asked to repeat the tests with a newer bonnie version to get some latency measurements. Here are the results:

Version      1.96   ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
hdd             64G           13939   2 10973   2           56604   5  42.0   1
Latency                       20356ms    3310ms             60419us     593ms
hdd             64G           13957   2 10933   2           56628   5  42.7   1
Latency                        4956ms    2243ms             66937us     523ms
hdd             64G           13938   2 10958   2           56613   5  42.7   1
Latency                        5259ms    2577ms             94416us     609ms
hdd             64G           13906   2 10976   2           56602   5  42.5   2
Latency                       20689ms    2510ms             56957us     593ms
hdd             64G           13914   2 10990   2           56635   5  42.2   2
Latency                        5266ms    2357ms             94285us     518ms
ssd             48G           106763  19 63722  12           254350  21  1825  72
Latency                       15396ms     421ms             53517us    7830us
ssd             48G           110696  19 63812  10           254238  21  1971  69
Latency                       12050ms     456ms             10615us    1334us
ssd             48G           107193  19 64260  11           254179  21  1823  79
Latency                       14950ms     789ms             10542us    8046us
ssd             48G           108723  19 63380  11           254226  21  1801  74
Latency                       14118ms    1720ms             40106us    1477us
ssd             48G           108848  19 63363   9           254035  21  1960  75
Latency                       15320ms    1717ms             10895us    1387us
hdraid          64G           280086  58 115494  21           450230  40 552.0  49
Latency                        1685ms    1325ms               619ms   33518us
hdraid          64G           283360  55 115566  20           467900  45 589.1  42
Latency                        1738ms    1109ms               298ms   33042us
hdraid          64G           285003  56 115642  20           460394  43 571.2  37
Latency                        1890ms    1272ms               759ms   46262us
hdraid          64G           286838  56 115059  20           453657  43 570.1  43
Latency                        1808ms    1065ms               513ms   65779us
hdraid          64G           291125  57 116871  21           445977  43 589.4  40
Latency                        1081ms    1057ms               577ms   97685us

I have posted an HTML version of this table at http://wiki.postgresql.org/wiki/User:Petere/SSD_benchmarks, for easier viewing.

Legend: hdd = single hard disk, ssd = Intel X25-E, hdraid = 8 hard disk RAID 10 array

The random seek numbers aren't quite the same as in the last test. I have at times noticed deviations of up to 50% over many measurements.

So the SSD beats everything else in latency, but not by much in the sequential tests. It's difficult to say, however, what exactly these numbers might mean in practice.

A side observation is the 75% or so CPU utilization during the random seek test on the SSD. With multiple disks, you might actually end up exhausting the CPUs before the disks!?!

Saturday, July 11, 2009

Adding Color to the Console: Code Syntax Highlighting with less and source-highlight

Syntax highlighting is a standard feature of every nontrivial editor, but if you just want to look at a file or the output of a command, then you will often only stumble upon niche solutions such as colordiff, or you end up using vim as your file viewer. But one very flexible program that apparently doesn't get much attention is source-highlight, available from the GNU project or as a Debian package. Besides being able to syntax-highlight all the usual programming and markup languages, it can also produce various output formats such as HTML, LaTeX, and of course plain text.

The standalone usage is a bit bizarre, because it tries to produce HTML by default and write the output to a file named infile.html. To produce colored plain text on standard output, use something like

source-highlight -fesc -oSTDOUT FILENAME

To get the most out of this, integrate it with the less pager program, so all the files you look at are automatically colored. Create a file .lessfilter in your home directory with the following contents:

#!/bin/sh

source-highlight -fesc -oSTDOUT "$1" 2>/dev/null

and make this file executable. Be sure to redirect the error output away as shown, so that you don't get annoying messages when source-highlight complains in case it doesn't know the format of the file. Then put the following in your shell startup file (.bashrc or .zshrc):

export LESS="-R"
eval "$(lesspipe)"

The first line makes sure that less will show colors, and the second line sets things up so that the .lessfilter file is invoked.

That's it. It recognizes the file type by the file name extension, so try anything like

less somefile.c
less somefile.py
less somefile.patch

to see the colors. It won't work when less is at the end of a pipe; I have some ideas on how to work around that, which I will post here later.

Wednesday, July 8, 2009

I Just Have One Question

Will the Google Chrome OS include Mono in the default install?

Schema Search Paths Considered Pain in the Butt

OK, I've had it. Who ever thought that a search path was a good idea anyway? I'm talking about the schema search path in PostgreSQL in particular, but any other ones are just as bad. If you review the last 48 hours in the PostgreSQL commit log, you can see what kind of nonsense this creates, and what amount of resources it drains (with apologies to my fellow developers who had to clean up my mistakes).

The Schema Search Path

What is the schema search path? One way to describe it is that it allows you to specify the objects in which schemas can be addressed without schema qualification, and in which order those objects are resolved. So if you write

CREATE TABLE schema1.foo ( ... );
CREATE TABLE schema2.bar ( ... );
SET search_path TO schema1, schema2;

then you can address these tables as just

SELECT ... FROM foo, bar ...

instead of

SELECT ... FROM schema1.foo, schema2.bar ...

Another way to describe it is that it allows you to set hidden traps that mysteriously make your perfectly good SQL behave in completely unpredictable ways. So a perfectly harmless query like

SELECT * FROM pg_class WHERE relname ~ 'stuff$';

can be changed to do pretty much anything, if the user creates his own pg_class and puts it first into the search path, before the system schema pg_catalog.

The way to deal with this, to the extent that it is dealt with at all, is to either set the schema search path everywhere before accessing any object (for some value of everywhere, such as every script, every session, every function) or to explicitly schema-qualify every object reference. The problem with the former is that it negates the point of allowing the user to set the search path in the first place. What you'd need to do instead is to prepend your known search path to what was set already, which is a pain to program in SQL, which is why probably hardly anyone does it that way. The problem with the latter is that it makes schemas pointless, because if you have to qualify everything, you might as well make the schema part of the name ("pg_catalog_pg_class") and forget about the extra indirection.

Qualify Everything!

The way we usually deal with this in PostgreSQL core code, for example in the psql describe code, is to add pg_catalog qualifications everywhere; see this last commit for instance. In my small example above, this would become

SELECT * FROM pg_catalog.pg_class WHERE relname ~ 'stuff$';

But this, and the above commit and all of psql is still wrong, because the actually correct way to write this is

SELECT * FROM pg_catalog.pg_class WHERE relname OPERATOR(pg_catalog.~) 'stuff$';

because PostgreSQL of course allows user-defined operators, and those are subject to search path rules like anything else. Proof:

CREATE OPERATOR = ( PROCEDURE = charne, LEFTARG = "char", RIGHTARG = "char" );
SET search_path = public, pg_catalog;
\diS

This command is supposed to show all indexes, but now it will show everything that is not an index. Try this yourself; it's a great way to mess with your colleagues' heads. :-) Anyway, surely no one wants to schema-qualify all operators, which is why attempting to qualify everything is probably a hopeless proposition.

Unix Shells

Let's look for some analogies to this dreaded technology. Unix shells have had search paths for decades. I don't have a whole lot of insight into their history, or what they where originally intended for, but the way I see it now, from the point of view of an operating system development contributor myself, they are just as much a pain in the butt in their full generality. Because users can set hidden traps that can make any old harmless command do anything, various programming and security pains persist. The usual answers are again: set the path explicitly in every shell script, or add an explicit path to every command. The latter is obviously rarely done, and would by the way possibly fall into the same operator trap that I mentioned above. Who knew that [ (as in if [ "$foo" = bar ]) might actually be a separate executable:

$ ls -l '/usr/bin/['
-rwxr-xr-x 1 root root 38608 2009-06-05 03:17 /usr/bin/[

Many authors of shell scripts probably know that setting an explicit path at the top of the script is probably a good idea. Or is it? On this Debian system, I have 242 shell scripts in my /usr/bin/, 181 of which don't appear to set a path. 14 out of 28 scripts in /etc/init.d/ don't set a path. And out of the 14 that do, there are 12 different variants of the path that they actually choose to set. Which makes any actually sensible use of the path impossible. One such use might be to put a locally modified version of a common program into /usr/local/bin/. Now some scripts will use it, some will not.

Linux distributions have been trying to move away from relying on this path business anyway. The Filesystem Hierarchy Standard only allows for a few directories for executable programs. Everything else is supposed to be organized by the package manager, and in practice perhaps by the occasional symlink. Solaris has a much greater mess of bin directories, such as /usr/gnu/bin, /usr/xpg4/bin, but even there I sense that they don't want to keep going into that direction.

Whither PostgreSQL?

In PostgreSQL, you might get the worst of all worlds:

Explicitly qualifying everthing is completely unwieldy, and perhaps impossible if you can't modify the SQL directly.

Programmatically modifying the search path in a correct way is pretty complicated.

There is no standard for what goes into what schema.

There is talk of adding more "standard schemas", potentially breaking all those applications who managed to get one of the above points right.

Using the search path to override a system object by your own implementation doesn't work, at least in half of the applications that choose to do the supposedly right thing and alter the search path to their liking.

Security definer functions (analogous to setuid scripts) don't have a safe default for the search path. You need to make sure you set a safe path in every one of them yourself (see also CVE-2007-2138). One wonders how many of those out there are broken in this regard. At least setuid shell scripts are usually not possible.

What might be a better solution? More modern programming languages such as Java or Python with their "import" mechanisms, where you explicitly choose what you want in what namespace, appear to have less problems of the sort I described. Of course there is still a CLASSPATH or a PYTHONPATH in the background, but that's actually just a workaround because the underlying operating system relies on search paths and doesn't have an import mechanism itself.

I think the straightforward solution at the moment is to ignore the whole schema issue as much as you can and don't let random people issue random SQL statements. Probably not completely safe, but at least it doesn't tangle you up in a giant mess.

If someone wants to defend the current system or has proposals for fixing it, please leave comments.

(picture by david.nikonvscanon CC-BY)

Tuesday, July 7, 2009

StackOverflow Overflows My Stack

StackOverflow is a pretty neat idea for a web site. It's one of those forums where you ask a question and others answer, and this one is particularly for questions about programming. They really put a lot of thought into it to make this a useful resource for programmers. I recommend watching the presentation that creator Joel Spolsky gave at Google about this, just to learn about the sort of thoughts that went into creating a modern, useful web site.

Except the whole thing is pretty much completely useless to me. Here is why. In order to contribute to StackOverflow, you answer questions that others have posted, and you can vote on answers. There is a slightly sophisticated system behind this that will ultimately establish one or two authoritative answers on a particular question, and then everyone is happy. So the first approach to contributing may be finding unanswered questions. That's easy, it's linked right from the home page: http://stackoverflow.com/unanswered. Now in order to find questions that matter to you, or that you might have some knowledge about, all the questions are tagged. Tags might be javascript, sql, vb.net, and so on. So someone like me might go to http://stackoverflow.com/questions/tagged/postgresql. But this will include all questions ever asked about PostgreSQL. To actually hang out and contribute on occasion, I would like to have all unanswered questions for a tag or set of tags, right? No problem, go to http://stackoverflow.com/unanswered/tagged/postgresql; this link has the tool tip "show unanswered questions tagged 'postgresql'". Except this doesn't work; it shows all unanswered questions, ignoring the tag.

I was confused about this for a while, and couldn't really work with the site because of it. But the design of the site is so clear that it was almost impossible that I misunderstood the intents of the various pages, and so I decided that I was at the right URL and the results were wrong. Must be a bug.

At the time, the feedback link on the StackOverflow site led to http://stackoverflow.uservoice.com/. And sure enough, if you search for "unanswered tagged", you get about 11 essentially duplicate results, one of which is claimed to be "completed", and nine of which are "declined" with comments such as "make sure you are on the right tab", "you are on the wrong tab, look again" and "I am on the unanswered questions tab. It is still broken."

Sometime within the last two weeks they must have switched their feedback forum to themselves, at http://meta.stackoverflow.com/. And sure enough, the same complaints have reappeared: http://meta.stackoverflow.com/search?q=unanswered+tagged.

What's so puzzling about this to me is that on a site that had put so much thought into usability, exactly the only one feature that I wanted to use doesn't work, in spite of obviously having been designed in. And I suppose if you can show unanswered questions and tagged questions, you should be able to show unanswered tagged questions without too much trouble. Which just shows that many people must have many different workflows, and there will be many solutions to satisfy them all. See you later, StackOverflow.

(picture by nicolasnova CC-BY)

Monday, July 6, 2009

Solid-State Drive Benchmarks and the Write Cache

Like many people, we are interested in deploying solid-state drives (SSDs) for our database systems. Jignesh posted some performance test results a while ago, but as I had commented there, this test ran with the write cache on, which concerned me.

The Write Cache

Interlude: The disk write cache is the feature that causes you to lose your data when the server machine crashes or loses power. Just like the kernel may lie to the user-land application about writing stuff to disk unless the user-land application calls fsync(), the disk may lie to the kernel about writing stuff to the metal unless the write cache is turned off. (There is, as far as I know, no easy way to explicitly flush the cache. So write cache off is kind of like open_sync, if you know what that means.) As PostgreSQL pundits know, PostgreSQL does fsyncs at the right places unless you explicitly turn this off, and ignore all the warning signs on the way there. By contrast, however, the write cache is on by default on consumer grade ATA disks, including SATA disks and, as it turns out, also including "enterprise" SSD SATA devices.

To query the state of the write cache on a Linux system, use something like hdparm -W /dev/sda. To turn it off, use hdparm -W0 /dev/sda, to turn it back on, hdparm -W1 /dev/sda. If this command fails, you probably have a higher-grade RAID controller that does its own cache management (and doesn't tell you about it), or you might not have a disk at all. ;-) Note to self: None of this appears to be described in the PostgreSQL documentation.

It has been mentioned to me, however, that SSDs require the write cache for write wear leveling, and turning it off may significantly reduce the life time of the device. I haven't seen anything authoritative on this, but it sounds unattractive. Anyone know?

The Tests

Anyway, we have now gotten our hands on an SSD ourselves and gave this a try. It's an Intel X25-E from the local electronics shop, because the standard, big-name vendor can't deliver it. The X25-E appears to be the most common "enterprise" SSD today.

I started with the sequential read and write tests that Greg Smith has described. (Presumably, an SSD is much better at being better at random access than at sequential access, so this is a good worst-case baseline.) And then some bonnie++ numbers for random seeks, which is where the SSDs should excel. So to the numbers ...

Desktop machine with a single hard disk with LVM and LUKS over it:

Write 16 GB file, write caching on: 46.3 MB/s
Write 16 GB file, write caching off: 27.5 MB/s
Read 16 GB file: 59.8 MB/s (same with write cache on and off)

Hard disk that they put into the server that we put the SSD in:

Write 16 GB file, write caching on: 49.3 MB/s
Write 16 GB file, write caching off: 14.8 MB/s
Read 16 GB file: 54.8 MB/s (same with write cache on and off)
Random seeks: 210.2/s

This is pretty standard stuff. (Yes, the file size is at least twice the RAM size.)

SSD Intel X25-E:

Write 16 GB file, write caching on: 220 MB/s
Write 16 GB file, write caching off: 114 MB/s
Read 16 GB file: 260 MB/s (same with write cache on and off)
Random seeks: 441.4/s

So I take it that sequential speed isn't a problem for SSDs. I also repeated this test with the disk half full to see if the performance would then suffer because of the write wear leveling, but I didn't see any difference in these numbers.

A 10-disk RAID 10 of the kind that we currently use:

Write 64 GB: 274 MB/s
Read 64 GB: 498 MB/s
Random seeks: 765.1/s

(This device didn't expose the write cache configuration, as explained above.)

So a good disk array still beats a single SSD. In a few weeks, we are expecting an SSD RAID setup (yes, RAID from big-name vendor, SSDs from shop down the street), and I plan revisit this test then.

Check the approximate prices of these configurations:

plain-old hard disk: < 100 €
X25-E 64 GB: 816.90 € retail, 2-5 weeks delivery
RAID 10: 5-10k €

For production database use, you probably want at least four X25-E's in a RAID 10, to have some space and reliability. At that point you are approaching the price of the big disk array, but probably pass it in performance (to be tested later, see above). Depending on whether you more deperately need space or speed, SSDs can be cost-reasonable.

There are of course other factors to consider when comparing storage solutions, including space and energy consumption, ease of management, availability of the hardware, and reliability of the devices. It looks like it's still a tie there overall.

Next up are some pgbench tests. Thanks Greg for all the performance testing instructions.

(picture by XaYaNa CC-BY)

Friday, July 3, 2009

Do you have a good idea for PostgreSQL?

Do you have a feature request or a good idea for how to improve PostgreSQL? Subscribing to a mailing list is too cumbersome? Or you did subscribe to a mailing list and presented your idea there, and now it's rotting in the archives or on the todo list? Let's try something different.

I have set up a feedback forum over at UserVoice. There you can enter your ideas, comment and vote on other suggestions, and see what the top requests are, without the overhead of tracking a mailing list every day. Let's consider this an experiment. It is not going to replace the existing project communication channels, and you shouldn't send bug reports or engage in coding discussions there. But the more people raise their "user voice" there and provide useful suggestions and comments, the more useful it might become.

To try it, go to: http://postgresql.uservoice.com/

The floodgates are open for development on PostgreSQL 8.5, so now is the time to make yourself heard.

(Disclaimer: UserVoice is a commercial company. I am not associated with them. This is just an attempt to find better interfaces for user feedback.)

(picture by NoNo^Q8 CC-BY)

Thursday, July 2, 2009

Where have all the translations gone?

If you have downloaded PostgreSQL 8.4.0 and are wondering where so many of the translations have gone: The translation team has decided not to ship translations anymore that are not translated at least about 80%. (See the commit message for the list of victims.) This is so that incidental users of stale translations are not presented with a confusing and distracting mix of translated and untranslated messages all the time. So right now we are only shipping a full or almost full set of translations into German, Spanish, French, Japanese, Portuguese, and Turkish.

To get the translations into other languages back into the release, go to http://babel.postgresql.org/ and start submitting updates. Updates may be included as early as release 8.4.1 in a few months.

I hope in particular that we might get the Chinese, Italian, and Russian translations back into shape.

By the way, if you want to start (or continue) translating, I suggest that you approximately follow this priority order: libpq, psql, pgscripts, pg_dump, initdb, postgres. This or a similar order will make the translations useful to the most users with the least amount of work.

Wednesday, July 1, 2009

10+5 Things (Not Yet) Rendered Obsolete By My Mobile Phone

I don't have an iPhone, but I figured when reading about the 15 Apps Rendered Obsolete By The New iPhone 3GS that while my mobile phones have over time obsoleted several gadgets, tools, and other non-phone items, there are still a lot of things they don't do for me, but conceivably should. At the moment, I have a Nokia E71, which is a pretty good phone. But here are a few things it could do better:

Calendar. I still carry a separate small pocket calendar for my personal and business appointments. Mostly because I haven't bothered to figure out how to synchronize the phone calendar with the calendar at the office. And then, I don't think it's easily possible to sync my work appointments with the calendar system at work but back up my personal appointments to MyNokia Backup (or is it Ovi now), without sending the somewhat confidential work appointments to Nokia and the somewhat confidential private appointments to the office. I think you can exclude certain events from being synced at all, but not sync certain events only to certain servers. And as a minor point, the phone calendar doesn't know the local holidays. The calendar issue is obviously a core competency of the mobile phone, so it might just be matter of time before it's solved. Best guess: End of the year, when I need to decide whether to buy another calendar book.
Notes. The phone has a Notes application (and an Active Notes application, whatever). But there's the synchronization issue again; how do I get the notes from the phone into my action and project lists (which I don't keep on the phone). I do use the phone to take notes on the train and during meetings, say, but then I transfer them manually to a more trusted container.
GPS receiver. I keep a separate GPS receiver from Garmin for geocaching. The E71 has a GPS receiver and a maps application, and I do use it for navigation in the car, and I have recently discovered the SportsTracker application. But the simple interface of entering coordinates and having an arrow point there that the Garmin provides does not appear to be available on the phone. I do know that some people go geocaching with only their phones, so this must be possible, but I haven't had time to figure it out yet. But this gadget might become the next one to go.
Flashlight. This one ties in with a recent geocaching experience. Many people have tried to use their mobile phone as a light source in a bind and noticed that it doesn't work so well. Perhaps they could put in a few LEDs and a battery that is two millimeters thicker. That would help a lot.
Laser pointer. If they can do flashlight, they can surely do laser pointer. Useful for presentations, when you don't want to carry around an extra gadget.
Bluetooth mouse. Also for presentations, how about just using the phone as a next-slide clicky thing. This is likely already possible with some custom software on both the phone and the laptop, but it could be easier.
Camera. I'm not an expert photographer by any means, and I haven't done any great analysis of the picture quality of the built-in camera, but it just feels better to take a separate digital camera when you actual go out with the purpose of taking pictures.
USB drive. Yeah, you can do this with the cable and then taking care to copy things into the right directory on the phone's memory card or something. Could be easier and clearer, though. And separate from the phone's internal file system. Putting a full-size USB plug into the phone is probably going to be difficult, but for larger models like the Nokia E90 it could be possible.
Debian. Nothing like being able to hack up your own operating system and applications as you choose. It's sad that the Openmoko/FreeRunner initiative has had setbacks. And while Android is an open platform, there is something to be said for having the same operating system on all devices.
Desktop computer. Yeah right. Most of the "productivity" applications on the phone still suck and are at best stopgaps until you get home to the big screen. More on that in a later post.

OK, those weren't 15 things, as with the iPhone. For completeness, how about 5 things that the phone does appear to have replaced permanently:

Address book. I used to have one, but all the current addresses of relevance are in the phone, backed up to somewhere on the Internet. In fact, I could probably find most people I know either via public mailing list archives or something like Facebook anyway.
Land line phone. The stationary phones are gone for good. There is now only this one phone, which is the home phone, the work phone (also no more desk phones at the office), and the VoIP phone. I found this review of Google Voice an interesting contrast in this context. I only have one phone number anyway. (The VoIP line has a different number, but there you want the separate number to make sure you are using the cost-saving path to call.)
Car satnav system. For the few times I have needed it, the satellite navigation system in the phone has done fine. It's not quite the same as having one built into the car, in terms of ease of use, GPS reception, and integration with the other audio equipment, but it works. And you can use the same system to also navigate on walks, as opposed to drives, which is a surprisingly useful feature.
MP3 player. I have never really listened much to MP3s anyway, and now on the phone I have podcasts, video podcasts, internet and broadcast radio, as well as MP3s, which is much better altogether.
iPhone. :-) Don't have one, don't want one. I recommend the Hacking the iPhone session from 25C3 (video page) for further enlightenment.