- Paperback: 528 pages
- Publisher: Wiley; 1 edition (October 1, 2004)
- Language: English
- ISBN-10: 8126505540
- ISBN-13: 978-0764567575
- ASIN: 0764567578
- Product Dimensions: 7.3 x 1 x 9.2 inches
- Shipping Weight: 1.7 pounds (View shipping rates and policies)
- Average Customer Review: 37 customer reviews
- Amazon Best Sellers Rank: #486,113 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
The Data WarehouseÂ ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Fulfillment by Amazon (FBA) is a service we offer sellers that lets them store their products in Amazon's fulfillment centers, and we directly pack, ship, and provide customer service for these products. Something we hope you'll especially enjoy: FBA items qualify for FREE Shipping and Amazon Prime.
If you're a seller, Fulfillment by Amazon can help you increase your sales. We invite you to learn more about Fulfillment by Amazon .
The Amazon Book Review
Author interviews, book reviews, editors picks, and more. Read it now
From the Back Cover
The single most authoritative guide on the most difficult phase of building a data warehouse
The extract, transform, and load (ETL) phase of the data warehouse development life cycle is far and away the most difficult, time-consuming, and labor-intensive phase of building a data warehouse. Done right, companies can maximize their use of data storage; if not, they can end up wasting millions of dollars storing obsolete and rarely used data. Bestselling author Ralph Kimball, along with Joe Caserta, shows you how a properly designed ETL system extracts the data from the source systems, enforces data quality and consistency standards, conforms the data so that separate sources can be used together, and finally delivers the data in a presentation-ready format.
Serving as a road map for planning, designing, building, and running the back-room of a data warehouse, this book provides complete coverage of proven, timesaving ETL techniques. Beginning with a quick overview of ETL fundamentals, it then looks at ETL data structures, both relational and dimensional. The authors show how to build useful dimensional structures, providing practical examples of techniques. Along the way you'll learn how to:
- Plan and design your ETL system
- Choose the appropriate architecture from the many possible options
- Build the development/test/production suite of ETL processes
- Build a comprehensive data cleaning subsystem
- Tune the overall ETL process for optimum performance
About the Author
RALPH KIMBALL, PhD, founder of the Kimball Group, has been a leading visionary in the data warehousing industry since 1982 and is one of today's best-known speakers and educators. He is the author of several bestselling titles published on data warehousing, including The Data Warehouse Toolkit (Wiley).
JOE CASERTA is the founder of Caserta Concepts, LLC, a data warehousing consulting firm. He writes frequently for print and online magazines, and is an active contributor to DWList, the major online community for data warehousing professionals.
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
Given that this book was written more than 10 years ago, many things have changed and an update is desparately needed.
The book does not present or attempt to develop the "Kimball Method" of SCD. As it turns out, that's far less important than all the other things required simply to do a day's ETL cycle for which much may go wrong. Numerous tips are worked into this text. It seems that the small things add up greatly throughout a review of the book. For example, it's a mistake to allow NULLs ever to be the prime indicator for an active Type 2 record specifically because a simple SQL call needing a date range controlled by BETWEEN will fail if the active indicator is NULL. It's the fine points like this that rate this book very highly.
This is not a trivial one-time-only to read book. It seems that this sort of preparation should be more in evidence in many of the shops. Let's face it. IT but especially ETL is a high end engineering discipline, more so with EDW.
As far as criticisms are concerned, the SQL examples are good but are PL/SQL and require time to decode for those developing in MSSQL environments. The book was written in 2002 when Oracle was the main player and there were not so many Teradata shops. That may make the book too back level for some readers. However, the book covers most of the subject matter devoid of specific implementation specifics, enough so that the key concepts would work regardless of the technology chosen.
It's not very likely that a course like this one would even be taught at a college level and for that reason the book is a key investment. Further, this book could be used for two full semesters, based on this reviewer's experience in formal course work. It's not likely that on-the-job training would supplant the need for the knowledge contained in this book or one like it.
The Authors (Kimball and Caserta) do a good job of pointing out other source books for items that the user will probably want to look at in depth.
There is also a pretty good section explaining how to manage your ETL project, the different roles of people who should be involved and a pretty good project plan / checklist to use as you are getting started.
My only complaint is that I did not read this prior to starting my own project and am instead having to correct items as I try to implement these best practices.