Tapping into Unstructured Data and over one million other books are available for Amazon Kindle. Learn more
Flip to back Flip to front
Listen Playing... Paused   You're listening to a sample of the Audible audio edition.
Learn more
See this image
Have one to sell? Sell on Amazon

Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence Paperback – December 21, 2007

ISBN-13: 978-0132360296 ISBN-10: 0132360292 Edition: 1st
Buy used
Buy new
Used & new from other sellers Delivery options vary per offer
45 used & new from $3.11
Amazon Price New from Used from
"Please retry"
Paperback, December 21, 2007
"Please retry"
$18.00 $3.11
Free Two-Day Shipping for College Students with Amazon Student Free%20Two-Day%20Shipping%20for%20College%20Students%20with%20Amazon%20Student

Hero Quick Promo
Save up to 90% on Textbooks
Rent textbooks, buy textbooks, or get up to 80% back when you sell us your books. Shop Now
$49.99 FREE Shipping. Usually ships within 2 to 4 weeks. Ships from and sold by Amazon.com. Gift-wrap available.

Frequently Bought Together

Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence + Building the Unstructured Data Warehouse
Price for both: $90.45

One of these items ships sooner than the other.

Buy the selected items together

Editorial Reviews

About the Author

Bill Inmon--the "father of data warehousing"--has written 50 books and published in nine languages on subjects such as data warehousing, database design, and architecture.

For current events, seminars, conference speaking schedules, and a lot of other information related to data warehousing, unstructured data, and textual ETL, take a look at Bill Inmon’s Web site at www.inmoncif.com.

Anthony aka “Tony” Nesavich received his master's degree in computer information technology from Regis University in Denver, Colorado. He worked with Bill Inmon at Inmon Data Systems (IDS) where he was instrumental in the development of the IDS Foundation software. Much of Tony’s contributions to IDS are discussed in this book. Tony lives in Denver, Colorado, with his wife Melissa and his faithful dog, Lola.


Excerpt. © Reprinted by permission. All rights reserved.



There have been two environments that have grown up side by side—the structured environment and the unstructured environment. The structured environment is typified by transactions, databases, records, keys, and attributes. The unstructured environment is typified by email, spreadsheets, medical records, documents, and reports.

It is amazing that at the same time that these worlds have grown up side by side, they have grown separately. It is as if these worlds exist in alternate universes.

The world of analytics and business intelligence has grown up around structured information. With business intelligence, we have displays of information, summaries, pivots, and an entire world of analytical processing. With business intelligence, we can make sense of the numbers, facts, and figures that hide out in the systems that run our corporations.

For analyses of text—unstructured information—there is nowhere near the amount of sophistication that exists in the structured environment. In the unstructured world, a few search engines can find documents and that is about it.

Does that mean that there is no important or useful information in the unstructured environment? The answer is—of course not. There is a wealth of important and useful information in the unstructured environment, but it is not as easily recoverable as information in the structured environment. The information in the unstructured environment is much more difficult to get a handle on.

There are many reasons why textual data is more difficult to handle than structured, transaction-oriented data. The primary reason is the lack of repeatability of textual data and the lack of predictability about the contents of the data. Textual data is hard to handle because it is hard to find, and it is hard to find because it does not entail repetition to any great degree.

This book is about doing textual analytics and the technologies that can be used to do textual analytics.

Two major architectural and technological approaches to doing textual analytics are used. One approach is to look at and gather the textual data in the unstructured environment. When there, the textual data is analyzed and manipulated in the unstructured environment. The unstructured environment seems like a natural place to do textual analytics because, after all, the text resides in the unstructured environment.

The other architectural approach is to look at and gather the textual data in the unstructured environment and then bring the textual data to the structured environment to do the textual analytics there.

It might seem strange or even unnatural to take the approach of accessing and gathering textual data in the unstructured environment and then bringing the textual data to the structured environment for analytical processing; however, there are good reasons for doing exactly that.

Some of those reasons follow:

  • The analytical environment has already been created in the structured environment. If we bring unstructured data to that environment, we can leverage existing investments. We already have trained end users, trained support staff, and licenses in place. So, why not bring the unstructured text to the structured environment where analytical tools are already in place?
  • Proprietary software. When we bring in technology to do analytical processing in the unstructured environment, that technology is proprietary. Do we actually want more proprietary software in our world? Isn’t it a much more rational approach to use open software that has thousands of users and uses around the world, rather than bring in proprietary software that might or might not meet the long-term goals of the organization?
  • By bringing unstructured text to the structured environment, we can create links between the unstructured data and our structured data, making possible analysis that otherwise would not have been possible. In doing so, we can build an integrated data warehouse that takes into account both structured and unstructured data.
  • If we don’t bring unstructured data to the structured environment, we are going to have to re-create the analytical infrastructure in the unstructured environment. Is that something that is advisable to do? We already have an analytical infrastructure. Why not use it?

For these reasons, this book is about what is required to go to the unstructured environment, find and integrate the textual data there, and then bring the unstructured textual data to the structured environment and organize it in a meaningful manner. After the textual data is in the structured analytical environment, a new world of analyses opens up.

One of the recurring themes of this book is the need for integration of text before it is useful. In most environments and in most circumstances, text is nonhomogeneous. People might talk in English, but for all practical purposes, they speak in dialects. Before analytical processing can be done effectively, there must be a common tongue established. Analyses can be done effectively only when a common tongue is established. Stated differently, if all you do is gather text and throw it into a database, you end up with the Tower of Babel. The Tower of Babel led nowhere, certainly not up to God.

One of the requirements of textual analytical processing is accessing and analyzing text in a colloquial vocabulary and a common vocabulary. The textual analyst needs both abilities.

The classical approach to text and text processing is to use semantics and natural language processing. This book describes a different approach. Without fail, the approach taken in this book is that text—made of up of words—is just another form of data. The approach that looks at words as just another unit of data frees the analyst from the trap of context. It is true that words taken out of context can have twisted meanings in some occasions. It is also true that freeing words from context opens up the door to entirely new and novel kinds of processing that simply are not possible when having to stop and consider the context of text at every turn.

There is a tradeoff. Paying attention to context when dealing with text entails a certain set of opportunities and precision. However, freeing text from context opens up entirely new and exciting vistas.

This book assumes that words are treated as just another unit of data and does not take context into consideration in 99.99 percent of the cases.

This book is for a wide audience. It is for students of computer science, general managers, database designers, data modelers, database administrators, researchers, and end users—in short, it is for anyone facing the challenge of taking a body of text and trying to make sense of it. In addition, this book answers the questions, “How do we bridge the gap between structured and unstructured systems?” and “How do we create an integrated data warehouse that incorporates both structured and unstructured data?”

The discipline of textual analytics is in its infancy; it is entirely predictable that more discussion and more advances will be made in the future about this subject. This book represents merely the first step in what is likely to be a massive field of endeavor in years to come.

We hope that you find the book full of useful information. We hope the book at least sets you down the right path to enjoying the fruits of textual analytics.

Bill Inmon Jan 11, 2007

Tony Nesavich, Jan 11, 2007


Shop the new tech.book(store)
New! Introducing the tech.book(store), a hub for Software Developers and Architects, Networking Administrators, TPMs, and other technology professionals to find highly-rated and highly-relevant career resources. Shop books on programming and big data, or read this week's blog posts by authors and thought-leaders in the tech industry. > Shop now

Product Details

  • Paperback: 264 pages
  • Publisher: Prentice Hall; 1 edition (December 21, 2007)
  • Language: English
  • ISBN-10: 0132360292
  • ISBN-13: 978-0132360296
  • Product Dimensions: 7 x 0.6 x 9.1 inches
  • Shipping Weight: 1.2 pounds (View shipping rates and policies)
  • Average Customer Review: 2.5 out of 5 stars  See all reviews (4 customer reviews)
  • Amazon Best Sellers Rank: #1,284,195 in Books (See Top 100 in Books)

More About the Author

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

2.5 out of 5 stars
Share your thoughts with other customers

Most Helpful Customer Reviews

12 of 13 people found the following review helpful By John M. Ford on December 1, 2010
Format: Paperback Verified Purchase
William Inman and Anthony Nesavich introduce the concepts of text analysis. They build on readers' familiarity with analysis of structured data from spreadsheets and databases. They describe how to transform text data into numbers and categories that can be analyzed with these traditional methods.

The book's chapters are of two types. The first fourteen chapters review the nature of business intelligence, discuss the challenges of analyzing structured and unstructured data, and lay out a general process for organizing and categorizing text data. Some sections are particularly good. Chapter 1, for example, suggests where useful unstructured data can be found in a typical organization. This is helpful guidance for an new analyst. Chapter 12 develops the framework of "A Methodology For Textual Analytics" that covers some of the key issues. This chapter contains most of the book's new information about text analysis.

The last five chapters present business intelligence case studies which used text analysis. The settings include conducting medical research, monitoring toxic chemicals, managing contract documents, creating a common corporate vocabulary, and imposing consistency on insurance claims.

The book has two weaknesses. First, it comes too slowly to its core material about text analysis. Early chapters review well-established business and data management practices to excess. There is far too much agonizing over whether to integrate unstructured data with structured data or analyze it separately. The second and more serious weakness is the abstract and almost cursory description of text analysis techniques. This is certainly not a technical tutorial. It is also not an adequate high-level description of the challenges and variations in such projects.
Read more ›
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
3 of 3 people found the following review helpful By aussiejim on September 30, 2012
Format: Kindle Edition Verified Purchase
Like many of his earlier books Bill tends to generalise a little and although it all seems to make sense some of the thoughts are becoming outdated.
In this book Bill suggest to bring unstructured data INTO the warehouse but the current trend is to dump unstructured data into nosql databases with little or no modelling and apply statistical analysis to this BIG data. Then someone might wish to then test what they are infering by taking structured data out of the warehouse to compare or what if etc

Thus it seems to be going the other way, traditional data warehouses are taking the back seat to emerging big data technologies like hadoop, mapreduce, base, no sql, yada, yada, yada etc
People wouldnt dream of dumping all that data into a traditional warehouse IMHO
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
Format: Paperback Verified Purchase
Seems like the author is much more focused on traditional databases than really understanding and explaining how unstructured data can be harnessed. The book repeats same things in each chapter, the case studies are very shallow and there are no practical exercises whatsoever. All in all - skip it. You will not learn much from this book.
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
1 of 16 people found the following review helpful By J. Wood on January 26, 2009
Format: Paperback Verified Purchase
I need this book for my Information Quality course. It seems to be a good book so far. Easy to read.
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again