Database assisted denormalization – Oracle edition

time to read 7 min | 1297 words

I decided to take a chance (installing Oracle is a big leap :-) ) and see how things match in Oracle.

I decided to run the following query:

SELECT deptno, 
       dname, 
       loc, 
       (SELECT COUNT(*) 
        FROM   emp 
        WHERE  emp.deptno = dept.deptno) AS empcount 
FROM   dept 
WHERE  deptno = 20 

Please note that I run in on a database that had (total) maybe a 100 records, so the results may be skewed.

Like in the SQL Server case, we need to create an index on the FK column. I did so, after which I got:

Then I dropped that index and create a simple view:

CREATE VIEW depswithempcount 
AS 
  SELECT deptno, 
         dname, 
         loc, 
         (SELECT COUNT(*) 
          FROM   emp 
          WHERE  emp.deptno = dept.deptno) AS empcount 
  FROM   dept 

Querying on top of that gives me the same query plan as before. Trying to create a materialized view out of this fails, because of the subquery expression, I’ll have to express the view in terms of joins, instead. Like this:

SELECT dept.deptno, 
       dname, 
       loc, 
       COUNT(*) empcount 
FROM   dept 
       LEFT JOIN emp 
         ON dept.deptno = emp.deptno 
WHERE  dept.deptno = 20 
GROUP  BY dept.deptno, 
          dname, 
          loc 

Interestingly enough, this is a different query plan than the subquery, with SQL Server, those two query exhibit identical query plans.

Now, to turn that into an materialized view.

CREATE materialized VIEW deptwithempcount 
AS SELECT dept.deptno, 
          dname, 
          loc, 
          COUNT(*) empcount 
   FROM   dept 
          left join emp 
            ON dept.deptno = emp.deptno 
   GROUP  BY dept.deptno, 
             dname, 
             loc 

And querying on this gives us very interesting results:

select * from deptwithempcount 
where deptno = 20

Unlike SQL Server, we can see that Oracle is reading everything from the view. But let us try one more thing, before we conclude this with a victory.

update emp 
set deptno = 10
where deptno = 20;

select * from deptwithempcount 
where deptno = 20

But now, when we re-run the materialized view query, we see the results as they were at the creation of the view.

There appears to be a set of options to control that, but the one that I want (RERESH FAST), which update the view as soon as data changes will not work with this query, since it consider it too complex. I didn’t investigate too deeply, but it seems that this is another dead end.

Tweet Share Share 11 comments

Tags:

Databases

Comments

27 Aug 2010
15:12 PM

Justin

Probably one of the biggest deficiencies in the various RDBMS offerings out there is their inability to materialize complex views.

This is a hard problem to solve but could be improved and fits very well into the relational model.

If you went with a eventually consistent model with a background view generation like some NOSQL products, the the complex view update could be deferred, but you could get stale results.

You should try a trigger based solution with MSSQL or Oracle at least as a comparison.

Not sure what the aversion to triggers is, they are ideal for creating complex user defined "indexes" such as what you are doing with the Post Count.

Materialize views are essentially just declarative system managed triggers that update another table.

27 Aug 2010
17:58 PM

Bruno Martinez

Ayende,

Put both tables in the same table cluster, so that finding the employees is not another disk seek. This gets close to document data bases.

27 Aug 2010
18:06 PM

tobi

"This is a hard problem to solve" It is not. It can be solved by using stacked indexed views (search on google to find a pdf from MS; alas not implemented).

I was appalled as I learned about oracles deferred materialized views. I believe there is no synchronous option which makes them worthless for many applications. What an unnecessary crippling.

27 Aug 2010
18:13 PM

tobi

On the other hand If you want a deferred update system then the automatic view matching in the query optimizer must come in very handy.

28 Aug 2010
07:47 AM

Sam

I'm with Justin - relational databases have an existing, proven, built-in mechanism to implement what you're trying to accomplish; they're called triggers.

On one occasion I remember fiddling around with indexed views for a week before giving up and implementing a trigger solution. Quickest, easiest, most sensible and most performant change I ever made to that system.

28 Aug 2010
09:28 AM

Frans Bouma

@Tobi: materialized/indexed views are by definition used in functionality which consumes large buckets of data and processes them for a lengthy period of time for readonly purposes. This alone makes it already not necessary to have the views be 'up to date' at all time, as that's only possible with a live query.

many people think data has to be 'non-stale' at any given moment, but they forget that only some data has to be 'non-stale'. All the rest can be consumed from for example materialized/indexed views. The post count is a good example for data which is totally acceptable to be 'stale': who's going to check whether user X has indeed 103 posts? no-one.

It therefore comes down to:

decide what data elements can be stale
make sure the non-stale data is fast and easy to retrieve and manipulate.

And with 'stale' I don't mean data which is a year old, but for example materialized views (or your own tables with duplicated denormalized data, same thing, if your db doesn't support materialized views) which are refreshed every 10 minutes. For reports about monthly sales etc., that's perfectly fine.

This also leads to the point where one could decide to use two distinct databases instead of 1: one for the non-stale data and one for the stale data, which is for example updated every night.

It's true that this means one has to think about what the data in a system really means at time T and time T+t. But IMHO that's a good thing, it makes one realize that a database is more than just a bucket to store bits in, and actually a machine which can make things really easy.

28 Aug 2010
13:08 PM

gunteman

"many people think data has to be 'non-stale' at any given moment, but they forget that only some data has to be 'non-stale'. All the rest can be consumed from for example materialized/indexed views. The post count is a good example for data which is totally acceptable to be 'stale': who's going to check whether user X has indeed 103 posts? no-one."

Word!

28 Aug 2010
14:49 PM

tobi

@Frans Bouma: Materialized views are not only used for aggregation. You can also save tons of join with them. Example: A forum thread. At least the following tables contribute one row for each post: ForumPosts, Users, UserAvatarImages. Then the resultset has to be sorted by datetime. All of this can be materialized so that the forum page needs exactly one range seek. Very cheap, can do that 2k times per second per core.

28 Aug 2010
21:07 PM

Imran

Have you tried using count(1) instead of count(*) and seeing if it makes a difference to your execution time? Also if it is ok for the count to be a little stale you could store it as a normal numerical field that gets updated by a batch process (database job in sql server) thats runs every so often?

29 Aug 2010
18:35 PM

Simon Mac

@ Justin

+1 for materialized / schema bound views, being "sytem shortcuts" for triggers.

Ultimately the DB needs to infer if ANY object insert / update / delete would alter the output of ANY of the materialized views (Which I guess are little more than tables anyway).

I suspect this is the reason they are fairly limited in complexity, otherwise they could cripple modification performance, but in a way that is completely non-obvious.

Whilst triggers too maybe less than obvious, at least they are tied to the table you find the bottleneck on.

Would be interesting to see the query plan for the insert / updates on tables that form part of the materialized views.

30 Aug 2010
13:00 PM

tobi

The query plan does multiple updates in sequence. For aggregation views it actually aggregates the changed rows for each different view.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB