- Paperback: 528 pages
- Publisher: Wiley; 1 edition (October 1, 2004)
- Language: English
- ISBN-10: 0764567578
- ISBN-13: 978-0764567575
- Product Dimensions: 7.3 x 1 x 9.2 inches
- Shipping Weight: 1.7 pounds (View shipping rates and policies)
- Average Customer Review: 37 customer reviews
- Amazon Best Sellers Rank: #162,433 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
The Data WarehouseÂ ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Fulfillment by Amazon (FBA) is a service we offer sellers that lets them store their products in Amazon's fulfillment centers, and we directly pack, ship, and provide customer service for these products. Something we hope you'll especially enjoy: FBA items qualify for FREE Shipping and Amazon Prime.
If you're a seller, Fulfillment by Amazon can help you increase your sales. We invite you to learn more about Fulfillment by Amazon .
Frequently bought together
Customers who bought this item also bought
Customers who viewed this item also viewed
From the Back Cover
The single most authoritative guide on the most difficult phase of building a data warehouse
The extract, transform, and load (ETL) phase of the data warehouse development life cycle is far and away the most difficult, time-consuming, and labor-intensive phase of building a data warehouse. Done right, companies can maximize their use of data storage; if not, they can end up wasting millions of dollars storing obsolete and rarely used data. Bestselling author Ralph Kimball, along with Joe Caserta, shows you how a properly designed ETL system extracts the data from the source systems, enforces data quality and consistency standards, conforms the data so that separate sources can be used together, and finally delivers the data in a presentation-ready format.
Serving as a road map for planning, designing, building, and running the back-room of a data warehouse, this book provides complete coverage of proven, timesaving ETL techniques. Beginning with a quick overview of ETL fundamentals, it then looks at ETL data structures, both relational and dimensional. The authors show how to build useful dimensional structures, providing practical examples of techniques. Along the way you'll learn how to:
- Plan and design your ETL system
- Choose the appropriate architecture from the many possible options
- Build the development/test/production suite of ETL processes
- Build a comprehensive data cleaning subsystem
- Tune the overall ETL process for optimum performance
About the Author
RALPH KIMBALL, PhD, founder of the Kimball Group, has been a leading visionary in the data warehousing industry since 1982 and is one of today's best-known speakers and educators. He is the author of several bestselling titles published on data warehousing, including The Data Warehouse Toolkit (Wiley).
JOE CASERTA is the founder of Caserta Concepts, LLC, a data warehousing consulting firm. He writes frequently for print and online magazines, and is an active contributor to DWList, the major online community for data warehousing professionals.
Author interviews, book reviews, editors picks, and more. Read it now
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
Given that this book was written more than 10 years ago, many things have changed and an update is desparately needed.
The book does not present or attempt to develop the "Kimball Method" of SCD. As it turns out, that's far less important than all the other things required simply to do a day's ETL cycle for which much may go wrong. Numerous tips are worked into this text. It seems that the small things add up greatly throughout a review of the book. For example, it's a mistake to allow NULLs ever to be the prime indicator for an active Type 2 record specifically because a simple SQL call needing a date range controlled by BETWEEN will fail if the active indicator is NULL. It's the fine points like this that rate this book very highly.
This is not a trivial one-time-only to read book. It seems that this sort of preparation should be more in evidence in many of the shops. Let's face it. IT but especially ETL is a high end engineering discipline, more so with EDW.
As far as criticisms are concerned, the SQL examples are good but are PL/SQL and require time to decode for those developing in MSSQL environments. The book was written in 2002 when Oracle was the main player and there were not so many Teradata shops. That may make the book too back level for some readers. However, the book covers most of the subject matter devoid of specific implementation specifics, enough so that the key concepts would work regardless of the technology chosen.
It's not very likely that a course like this one would even be taught at a college level and for that reason the book is a key investment. Further, this book could be used for two full semesters, based on this reviewer's experience in formal course work. It's not likely that on-the-job training would supplant the need for the knowledge contained in this book or one like it.
Without a doubt ETL has been the biggest stumbling block to deploying and maintaining well architected data warehouses that stand the test of time. Ralph draws on his years of experience and engagement with thousands of projects and crystallizes the `Best Practices' into an effective application architecture for all ETL systems regardless of what tools projects use for implementation.
In this thorough examination of the Extract, Transform and Load (ETL) process, Ralph identifies 38 critical functions that all ETL systems need to implement for success in the long haul. He thoughtfully lays out simple and practical approaches for how each of these functions can be implemented by projects with any size of budget.
For many, the paradoxical nature of ETL (seeming trivial yet replete with endlessly complex details that constantly change) has been the proverbial straw that has broken the bank for many DW projects. Continual customer pressure to grow, improve performance, and quickly deal with changing business conditions have left developers and architects grasping for more powerful and flexible approaches to ETL that meet project timelines, yet evolve and improve with age. Armed with this enlightening roadmap, many DW professionals will be far better equipped to design and build systems that meet the challenges today and tomorrow.