Wednesday, June 30, 2010

An Update On Views

A long time ago, in a text book far away, someone described the following (relational) database design approach: Put all your data in tables, normalized and so on, as you'd expect.  An top of each table, create a view.  The database clients only access the views.  As the application evolves, tables might change, more views will be added, but the interface that the existing views present to the outside stays the same.  Sounds nice; everything stays backward compatible.

In order for this to work, views need to be updatable.  Updatable views have been standardized in SQL as far back as 1992, and have been implemented (partially, possibly) in some of the popular/successful SQL implementations.  Curiously, however, this feature has evaded PostgreSQL altogether so far.  Conventional wisdom has held it that views can be made updatable by hand using the rewrite rules system, and a proper implementation of automatically updatable views would only need to create the rules automatically.  Debate and coding around this idea has been taking place for many years, mostly inspired by the heroic efforts of Bernd Helmle.  But I think the conclusion of those in the know by now is that the rewrite rules system is inappropriate for this task (and in fact others have argued that the rewrite rules system is inappropriate for any task and should be ripped out; perhaps harsh but not far from the truth in my opinion).  So the updatable views effort is at square one (or zero) at the moment.

In the meantime, updatable views have been conspicuously absent from any of the recent top-needed features list for PostgreSQL.  Which basically means no one really cares (anymore).  Why is that?

My observation is that the typical use of views has changed quite a bit of the years (decades), and has decreased altogether.  The idea of using views as transparent interfaces on top of tables never really took off to begin with.  Using views to encapsulate complex queries is still reasonable, but not terribly common.  Part of the reason is that many queries are generated, either by ORMs or by custom code.  The other reason is that server-side functions provide a much more powerful abstraction mechanism.  They can't do everything, in particular inlining of queries doesn't alway work, but they can do a lot more with parameters and general procedural programming.  Views are also used for access control, either for columns or for rows.  Column security can now be had with explicit column privileges in PostgreSQL, and one could argue that using views for this was a workaround all along.  And using views for row access control doesn't actually really work, as it turns out.  Proper row security might be an upcoming feature, but it will need to work differently.  And anyway, superfinely granular access control is rather rare in SQL databases overall, I guess.  And in all of these cases, the views don't need to be updatable.

Views are occasionally used as a thin wrapper around a table-returning function, so the function can be access like a table.  But this only works when reading.  Another use, somewhat related actually, is defining views for monitoring the database system or the database application.  Because those views are often used interactively, it is
important that they are convenient to access, and so hugely complex queries combined with custom functions or possibly query generation are inappropriate.

On the other hand, materialized views are all the rage now, appearing at the top of the requested PostgreSQL feature list, and having been fixtures in the feature lists of other SQL implementations for some time.  While traditional views are a tool for organizing code and interfaces, materialized views are purely a performance optimization tool (which, by the way, is the reason materialized views are not
specified in the SQL standard).  Depending on how you implement materialized views, they might even be usable implicitly, if the query matches the materialized query.  Then, materialized views wouldn't really be views at all anymore, they're just query caches.

So, what do you think?  Are traditional views an obsolescent, marginal feature?  Is the matter of updatable views obsolete?

21 comments:

  1. Hi Peter,

    As far as I'm concerned, I would say it would be great to be able to update views. If it ever appears in Postgresql, do you think there would be limitations on the structure of the view so it is "updatable"? (Maximum number of joins? Unions?)

    ReplyDelete
  2. Hi Peter,

    I think one of the main reasons abundant usage of views is not common is the fact that there are a lot of DBMSs that don't support them, or support them partial with only a very small "common base".
    As a software developer I know I would use them occasionally especially for backwards compatibility during schema redesigns or as a sort of "manual" interface to a complex table structure. The fact that they are not update-able forces me to take another approach.

    That said, I don't think view-heavy designs are never going to be an option for a large number of projects that want to support different SQL servers. For in those cases "basic" SQL designs are the easiest to support.

    ReplyDelete
  3. So what you are saying is that we dont need updateable views because we dont have them in PG. In any case even if we did have updtable views we really dont understand how to use/apply them in applications or in security or their static read only sisters anyway.

    Materialized views apparrently do not have to be updateable ....... duh!

    Materialized views depend in both implementations on the existance of updateable view mechanisms.

    This article is so deeply flawed you should pull it.

    ReplyDelete
  4. Have to at least partially agree with the last Anonymous, if people had an easy way to create updateable views, they'd be more used. Also some may disparage the rule system and want to rip it out completely, that would probably be an unpopular idea until there is a better way to create updateable views.

    Yes, you can replace view functionality with client code, you can do the same with stored procedures, data integrity checks, security, table joins and filtering.

    ReplyDelete
  5. I think you're close on the whole ORM thing... I think many people are using these and completely ignoring good db design. I have an opinion of using the right tool for the job. some ORM's don't actually seem to be a bad tool, because they are flexible. These ORM's are increasingly being used to ignore good database design period. I find these ORM's don't take concurrent transactions into mind as much as they should (event the better ones).

    However, I would use update-able views if they existed. I suspect they just happen to be a feature easy (enough) to workaround not having, and thus status quo is 'good enough'.

    As you've said many existing implementations do either not have them or they are severely limited (to 1 table I believe) I personally don't think I would find them all that... useful unless I could update more than 1 table with them.

    So we're in a case of what what we have is 'good enough' and we don't know what we're missing.

    ReplyDelete
  6. Yeah, I've been wondering for years why updatable view does not belong to the top of the TODO list. That's been one of my biggest annoyances with Postgres (which otherwise would be more or less perfect), and the only reason I haven't dumped MS Access.

    Glad to see the self delusion about rule is being cleared though.

    ReplyDelete
  7. Peter,

    Not sure what rock you are living under or what rock I'm living under, but they are obviously not the same rock.

    For enterprise class applications -- you know financial applications and things I care about, I really can't imagine living without views. You've got all those people wanting to query their own data - for simplifying of reporting. Functions just don't do it -- because they require inputs. Not very friendly when you've got to feed it to people used to using query designers.

    For web apps, like twitter, yah I can see how you can do with views. In my opinion though for one of the main audiences PostgreSQL is trying to target - business applications, front end web reporting applications, you really can't do that easily without views.

    ReplyDelete
  8. I maintain one application where i could have used updatable views, at least a few times.

    The last bunch of changes to the database layout - getting rid of a messed up table which was never well designed to begin with, and was made worse with every update - was made just a few days ago, and while theoretically a view might have been possible (though complicated, the stuff was a real mess), i didn't even seriously think about it.
    Because cleaning up the database and leaving the code in its less than clean state simply is not an option in the long run. The goal of the cleanup is cleaning up, not having one good interface and one bad interface together.

    And that is why i do not care much about views, though they are handy sometimes, especially if one needs direct data access of some kind.

    But updatable views? Using them to give write access to someone would bypass too much application level features - especially the caching module.

    ReplyDelete
  9. -- but they can do a lot more with parameters and general procedural programming.

    The reason is pitifully simple: 99.44% of database "developers" are either explicitly coders who appropriate DB "design" to themselves, or database "developers" who've never overcome their procedural coding background. I'd wager that those 99.44% have never read Date or Gray & Reuter, either.

    Such folks, as a coder colleague expressed it, "prefer to handle transactions and concurrency in the application code". All in all, as Pascal and Date have opined for years, unless and until we database folk pressure the RDBMS vendors to do it right (and SQL is not Codd's language, rather Chamberlin's), we'll get half-baked crap.

    As to the use of views, they are the subject of Codd's Rule 6, and using them as the interface, rather than base tables, would lead to a much simpler, and less coding in the, application. Any attempt to have less coding is anathema to coders, so they resist. They succeed. That does not make it right.

    The ORM's can target views; they exist in the catalog just as do base tables. Exactly how PG's catalog works, I'm not certain.

    With the advent of multi-core/processor/SSD machines, there really is no defensible reason to do otherwise. FWIW, Best Buy now *stocks in stores* the Intel X-25 parts. Designing a database to SSD is not foolish.

    ReplyDelete
  10. Isn't it the case that updateable views require the SELECT to only be against one table? If that is the case then it is no better than most ORMS which consider a table == object.

    The whole point about complex views and set returning functions is that they provide a spatial view of the data that is not simply evident from any particular table or composite of data from several tables.

    From my no doubt naive point of view materialized views are simply optimised versions of set returning functions.

    ReplyDelete
  11. agree with Anonymous @ http://petereisentraut.blogspot.com/2010/07/update-on-views.html?showComment=1277987064848#c6042600397682211501, james@vegas, Steven, Regina and Robert Young.

    rules may not be perfect but they're *very useful*. twice so as the triggers in PostgreSQL suck (row-level triggers are inherent suckers, descending from the declarative set-level comfort to the row-level mess, and statement-level triggers have no access to the update in progress [see the `inserted` and `deleted` pseudo-tables in SQL Server, *that* should also a top bullet in the TODO, otherwise statement-level triggers could just as well go: i've never had any use for that "feature" in PostgreSQL]).

    for example, rules were quite vital in porting a web-based helpdesk application from SQL Server to PostgreSQL, where we didn't just copy the db schema: we looked at the data and the operations performed on it by the app, and developed a new, relational, and very different schema for PostgreSQL, along with a set of views that mirrored the old schema, each with an insert view, and migrated the data through those.

    selecting from functions is ok (as in, it's just a different syntax for the same thing, right?), but symmetry is *very* important. how does one `INSERT` into (generally update) the result of a function call? in Oracle, you can update the result of a `SELECT`, if it's still a relation and trivially maps to base tables. but it's the same business as automatically updatable views: the requirements are so stringent they're only infrequently met, within that circumstance seldomly needed, and thus rarely used.

    ORM popularity is just that. MySQL is more popular than PostgreSQL. NoSQL databases seem to be more popular than either one.

    NB: it's outside of SQL, but would be enormously useful: the two relational DIVIDE operator described by Date. i wonder how frequently is that feature requested. ;) scratch the DIVIDE, i'd actually prefer the general feature of user-defined relational operators!

    ReplyDelete
  12. @RoryCL:
    Isn't it the case that updateable views require the SELECT to only be against one table?

    No:
    http://www.exforsys.com/tutorials/oracle-10g/oracle-10g-updating-views.html

    While not fully implementing Codd's requirement, it's further along than most. The issue boils down to: how does the engine keep track of base rows in the expanded row under update? The answer is rowid (by whatever name the vendor uses), but the vendors haven't generally wanted to make the effort. If you want to only update columns in one of the joined tables during the update, as would likely be the case in the canonical order/orderline example, that should always be supported.

    ReplyDelete
  13. I think updatable views would be great -- perhaps it's just not the kind of feature likely to make a top-10 list.

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Hi Peter,
    I must say that updatable views are a feature I would greatly appreciate.

    I use Java with ORM, and also have entities that map to views.

    In order go get a good object model to work with an OO language like Java if you use 1:1 table-entity mapping you imply a bad DB design.

    I use views to build a layer on top of a generically designed tables in the DB. This way I get a nice OO model mapping to views that the developers easily understand.

    The only draw back is that when you write to an entity mapped to a view you must provide a logic to update the generic (read complex) tables with the ORM. If we had updateable views we wouldn't worry about this and may even remove entity-table mappings all together and have only entity-view mappings.

    Cheers

    ReplyDelete
  16. I use views for reporting and it has paid off time and again when the database design was enhanced.

    ReplyDelete
  17. Would help me a lot if there where updatable views.

    I got some projects with crap db that i would really like to cleanup. But whats putting it off is that there is a lot of client code that would need to be updated.

    Br Björn Blomqvist

    ReplyDelete
  18. "Using views for row access control doesn't actually really work"... Do you have other arguments for this? The link you provided looks like a bug in Postgres, which is an implementation issue, while using views for row access control is something logical. I don't see how a physical issue can invalidate a logical solution.
    Moreover, this problem could be very easily fixed by caching RAISE and NOTICE output until entire WHERE clause has been evaluated, and discard it when evaluates to false. This actually seems absolutely logic in any way you look at, not only regarding security. For example you would not wand an exception to be raise for a row that is not part of the result. So just a stupid bug, with a simple solution.

    ReplyDelete
    Replies
    1. Yes, it was perhaps not clear that my post was focused on PostgreSQL.

      As for your simple solution, please submit a patch. :)

      Delete
    2. It is pretty clear that the article is about Postgres, but it wasn't clear if you consider views as row access control a bad solution, or it simply doesn't work in a particular version of Postgres.
      "Don't use this feature (and stay with basic features) because there is a bug/performance issue/whatever in version X" is the kind of attitude I'm used to see in MySQL world. As I'm investigating Postgres as an alternative between Oracle and MySQL, it would be disappointing to see it again (nothing personal here).
      The point in using a RDBMS is to make applications more robust and to speed up development time with simpler and more reliable solutions. Views are a very powerful abstractions and can drastically reduce development time and/or volume of code if used properly.
      Regarding fixing, it shouldn't be a big issue for someone familiar with the source code. Getting familiar with source code however, could take order of magnitude much more time than fixing a bug. I would rather afford a paid license than spending a lot of time.

      Delete
    3. It doesn't work, but it's being fixed.

      Delete