Most Helpful Customer Reviews
|
|
51 of 53 people found the following review helpful:
5.0 out of 5 stars
If you have ever wanted to understand what Oracle is doing..., November 21, 2005
This is the book for you.
This book is, well, in a word amazing. If you have ever been baffled or bemused by why the heck did the optimizer do that, or as Jonathan wrote on page 299:
"I am reluctant to call something a bug unless I can work out what Oracle is doing and can prove that its doing something irrational. Too many people say, Its a bug when they really mean I dont know why this happened."
You will absolutely love this book. In it you will discover the hows and whys of the optimizer. Why statistics matter, how they matter. Whats up with histograms when and where do we need them, what affect do they have.
Sprinkled throughout the book are random insights like this one:
"There are many ways to implement Oracle systems badly, and as a general rule, anything that hides useful information from the optimizer is a bad idea. One of the simple, and highly popular, strategies for doing this is to stick all of your reference data into a single table with a type column. The results can be catastrophic as far as the optimizer is concerned."
And then is goes on to say why. That is what I really really like it goes on to say why. I hate it when statements are made and no reasoning is made why. You will find none of that in this book.
Jonathan did one thing in this book that Ill definitely be stealing myself. One neat thing is every chapter ends with a list of script names and descriptions. In the text, he references these script names as well. That way, when you download the code you have a straight reference to the sample you should be running. Ive used the (extremely poor) naming convention of demo001.sql, demo002.sql and so on. Next book theyll all have names and Ill be referencing exactly like he did. Very nice.
The attention to detail, the simplicity of presentation (I dont care what level of Oracle user you are you will be able to read this book and get it). If you are advanced (ok, Ill put myself into that category), youll learn things you did not know before. If you are beginner, youll know lots more than some advanced people after reading it. The surprising thing? It isnt that hard. Well, it wasnt to me anyway maybe the math background I have helped. You do not need 10 years of experience with Oracle to get this stuff, and if you have 10 years of experience with Oracle you will get new knowledge you never had.
Im on my second scan of it re-reading things that I didnt fully absorb. What Ill be doing lots in the future is referring to it. I got the gist of everything, I know where to go when I need to explain why. Or maybe Ill just post the link to the book.
And remember, this is I of III, two more to come
|
|
|
12 of 13 people found the following review helpful:
5.0 out of 5 stars
The Real Cost of Oracle, December 28, 2005
The beauty of reading a book by a publisher not sanctioned by Oracle and by an author who doesn't work for Oracle is that they can openly mention bugs. And there are oh-so-many! This book is a superb introduction to the Cost Based Optimizer, and is not afraid to discuss it's many shortcomings. In so doing it also explains how to patch up those shortcomings by giving the CBO more information, either by creating a histogram here and there, or by using the DBMS_STATS package to insert your own statistics in those specific cases where you need to.
Another interesting thing is how this book illustrates, though
accidentally, the challenges of proprietary software systems. Much of this book and the authors time is spent reverse engineering the CBO, Oracle's bread and butter optimizing engine. Source code, and details about its inner workings are not published or available. And of course that's intentional. But what's clear page after page in this book is that for the DBA and system tuner, going about their day to day tasks, they really need inside information about what the optimizer is doing, and so this book goes on a long journal to illuminate much of what the CBO is doing, or in some cases provide very educated guesses and some speculation. In contrast, as we know and hear about often, the Open Source alternative provides free access to source code, though not necessarily to the goods themselves. What this means in a very real way is that a book like this would not need to be written for an alternative open source application, because the internal code would be a proverbial open book. That said it remains difficult to imagine how a company like Oracle might persue a more open strategy given that their bread and butter really is the secrets hidden inside their Cost Based Optimizing engine. At any rate, let's get back to Jonathan's book.
Reading this book was like reading a scientists notebook. I found it:
o of inestimable value, but sometimes difficult to sift through
o very anecdotal in nature, debugging, and constantly demonstrating that the CBO is much more faulty and prone to errors than you might imagine
o may not be easy to say I have a query of type X, and it is behaving funny, how do I lookup information on this?
o his discussion of the evolution of the product is so good I'll quote it:
"A common evolutionary path in the optimizer code seems to be the following: hidden by undocumented parameter and disabled in first release; silently enabled but not costed in second release; enabled and costed in third release."
o has excellent chapter summaries which were particularly good for sifting, and boiling down the previous pages into a few conclusions.
o it will probably be of particular value to Oracle's own CBO development teams
Some chapter highlights
-------------------
CH2 - Tablescans
explains how to gather system stats, how to use dbms_stats to set ind. stats manually, bind variables can make the CBO blind, bind variable peeking may not help, partition exchange may break global stats for table, use CPU costing when possible
CH3 - Selectivity
big problem with IN lists in 8i, fixed in 9i/10g, but still prob. with NOT IN, uses very good example of astrological signs overlapping birth months, and associated CBO cardinality problems, reminds us that the optimizer isn't actually intelligent per se, but merely a piece of software
CH4 BTree Access
cost based on depth, #leaf blocks, and clustering factor, try to use CPU costing (system statistics)
CH5 - Clustering Factor
mainly a measure of the degree of random distribution of your data, very important for costing indx scans, use dbms_stats to correct when necessary, just giving CBO better information, freelists (procID problem) + freelist groups discussion with RAC
CH6 - Selectivity Issues
there is a big problem with string selectivity, Oracle uses only first seven characters, will be even more trouble for urls all starting with "http://", and multibyte charactersets, trouble when you have db ind. apps which use string for date, use histrograms when you have problems, can use the tuning advisor for "offline optimization", Oracle uses transitive closure to transform queries to more easily opt versions, moves predicates around, sometimes runs astray
CH7 - Histograms
height balanced > 255 buckets (outside Oracle called equi-depth),
otherwise frequency histograms, don't use cursor sharing as it forces bind variables, blinds CBO, bind var peeking is only first call, Oracle doesn't use histograms much, expensive to create, use sparingly, dist queries don't pull hist from remote site, don't work well with joins, no impact if you're using bind vars, if using dbms_stats to hack certain stats be careful of rare codepaths
CH8 - Bitmap Indexes
don't stop at just one, avoid updates like the plague as can cause deadlocking, opt assumes 80% data tightly packed, 20% widely scattered
CH9 - Query Transformation
partly rule based, peeling the onion w views to understand complex queries, natural language queries often not the most efficient, therefore this transformation process has huge potential upside for Oracle in overall optimization of app code behind the scenes by db engine, always remember Oracle may rewrite your query, sometimes want to block with hints, tell CBO about uniqueness, not NULL if you know this
CH10 - Join Cardinality
makes sensible guess at best first table, continues from there,
don't hide useful information from the CBO, histograms may help with some difficult queries
CH11 - Nested Loops
fairly straightforward costing based on cardinality of each returned set multiplied together
CH12 - Hash Joins
Oracle executes as optimal (all in memory), onepass (doesn't quite fit so dumped to disk for one pass) and multipass (least attractive sort to disk), avoid scripts writing scripts in prod, best option is to use workarea_size_policy=AUTO, set pga_aggregate_target & use CPU costing
CH 13 - Sorting + Merge Joins
also uses optimal, onepass, & multipass algorithms, need more than 4x dataset size for in memory sort, 8x on 64bit system, increasing sort_area_size will incr. CPU util so on CPU bottlenecked machines sorting to disk (onepass) may improve performance, must always use ORDER BY to guarentee sorted output, Oracle may not need to sort behind the scenes, Oracle very good at avoiding sorts, again try to use workarea_size_policy=AUTO
CH 14 - 10053 Trace
reviews various ways to enable, detailed rundown of trace with comments inline, and highlights; even mentions a VOL 2 + 3 of the book is coming!
Appendix A
be careful when switching from analyze to dbms_stats, in 10g some new hist will appear w/default dbms_stats options, 10g creates job to gather stats
Conclusion
----------
I found this book to be full of gems of information that you won't find anywhere else. If you're at the more technical end of the spectrum, this is a one of a kind Oracle book and a
must-have for your collection. Keep in mind something Jonathan mentions in appendix A: "New features that improve 99% of all known queries may cripple your database because you fall into the remaining 1% of special cases". If these cases are your concern, then this book will surely prove to be one-of-a-kind for you!
|
|
|
8 of 8 people found the following review helpful:
5.0 out of 5 stars
No Competition, December 2, 2005
There is little point to write how good this book is, since there is no other book devoted to SQL optimization exclusively. Dan Tow's book comes close, but he is focused more on a method of join graph analysis that he developed, than on details how optimizer did arrive to a certain access path. The lack of competition on the market is really surprising giving that SQL optimization is the only part of RDBMS that is justifiably complex, and would remain complex in foreseable future.
Compared to SQL optimizations all the other issues that DBA deals today look ridiculous. There is no reason why, for example export and import should be more complex than copying image file from your camera. Likewise, managing extents and segments is totally automated these days. All the manageability trend just proves a simple idea that RDBMS is nothing more than query execution engine.
Now, unlike any other RDBMS implementation area, the flow of poorly executed SQL never seems to cease. SQL Optimization is well known to be a difficult problem. Statistics information is incomplete, robust cost metrics is elusive, and the search space is explosive. The optimization goals are often conflicting. The very first idea that every SQL performance analyst discovers: "The optimization is only as good as its cost estimates". Those issues are fundamental rather than SQL DBMS vendor specific, of course. Given the scope and complexity of the problem, one citation comes to mind: "There is no emperor's way to SQL optimization".
|
|
|
Most Recent Customer Reviews
|