38 of 39 people found the following review helpful
I was lucky enough to read Aiden & Michel's original study, "Quantitative Analysis of Culture Using Millions of Digitized Books," when it appeared in Science on 14 January 2011. It was an astonishing piece of scholarship, one of the rare papers that divides an entire branch of human learning into "before" and "after." I felt the hair on the back of my neck rise as I read it. In essence, they mined through the Google Books database to answer concrete questions about linguistics, culture, politics, even topics such as the nature of fame and the pace of propagation of new technologies. It was a tour de force.
The title of this book, "Big Data as a Lens on Human Culture," suggests that it will be a general text on Big Data, but it is not. It covers only this body of work by these two researchers and their assistants.
The book repeats the contents of that 2011 article, explaining the results for the general public, adding some discussion of the origins of the work and the researchers' thoughts about the future. In the process, they expand the original piece, which was about six pages long excluding notes, to about 220 pages. Some of the new material is fun; I got a kick out the story about a romance novel that had been alphabetized and the information that could still be gleaned from it. Others seem like padding; who cares about this history of lexical concordances?
It's a shame that Aiden & Michel wrote this book themselves; the same material coming from a third party would not have seemed so self-congratualtory and, sometimes, smug. Stylistically, the book has some flaws, including an odd 'cutesy' tone and repeated reliance on lousy puns for humor (see for example the discussion of the plague sent by God to punish King Samuel in the old testament, and the authors' rather forced questioning if our decisions will similarly "come back to plague us").
Readers who make it to the end of the book may find the last couple of chapters a little disturbing. Aiden & Michel seem to lament that anything goes unrecorded, that anything is forgotten. But forgetting is healthy and can be vital to society. Though they include a few paragraphs on possible abuses of big data, this is clearly an afterthought and I suspect these guys read about NSA databases of e-mail and text messages and thought "if only we could read those too!"
The book has ample footnotes for those who want more detail, and many excellent graphs. I wish they had provided a footnote for the software package they used to generate these, the design is quite nice while remaining unusually clear. Edward Tufte would approve.
In summary, read the original paper if you have access to it. If not, give the book a try. The original can be found in Science 14 January 2011: Vol. 331 no. 6014 pp. 176-182.
12 of 13 people found the following review helpful
Uncharted: Big Data as a Lens on Human Culture, is a fun look at a pretty amazing research project. Starting as graduate students, authors Erez Aiden and Jean-Baptiste Michel wanted to use big data to answer interesting questions. What started out as a simple research question ended up jump starting the authors' careers and an entirely new way to look at big data.
They came up with an idea to make a tool that could query Google's digitized library in order to determine word frequencies. Using the tool they invented, called the Google Ngram Viewer, they have been able to answer interesting questions that relate to word frequencies, explore how language changes over time, assess the adoption of new technology, assess fame, and conjecture as to how the answers to the questions they pose reflect on the prevailing culture.
Although the idea is simple in concept, it wasn't so simple in execution. They had to wiggle their way into the Googleverse to get permission to use the database, write a lot of code, and iron out certain legal/copyright problems. But once all this was done, the magic began.
I won't go into detail about their findings, but suffice it to say, they not only created the Ngram Viewer but used it intelligently to come to some very interesting (and often humorous) conclusions. Their analogy of Ngram Viewer as a modern equivalent of Galileo's telescope is an apt one. Without the telescope, Galileo couldn't have made some of his most important astronomic observations. Without the Ngram Viewer, it would be much impossible to look at; things like the transformation of irregular verbs over time or get a good idea when writers really started to refer to The United States in the singular (the results are surprising).
However, like Galileo's telescope, the Ngram Viewer is still a somewhat primitive tool. First, it is limited by the number of books that are available in Google's digital database. (Google is trying to digitize every book in existence, but it still has a ways to go.) Second, the authors limited themselves to books only--they did not look at other printed media; digitized periodicals or newspapers, for example. Third, the database is limited to the printed word and does not include usage in other media. (A fourth limitation that the authors did not mention is that Google is only trying to digitize one edition of each book. Therefore the database doesn't account for a book's popularity or circulation. Obscure scholarly books would therefore be weighted as equal with popular novels which introduces a certain bias as well. Moreover, there are books that are quite popular that people buy and read voraciously but have little social impact beyond a short period of time (e.g. Fifty Shades of Grey), extraordinarily popular books that lots of people may have but never read (Godel, Escher, Bach), books that come out in modern editions but are written in an archaic (e.g. The Bible).
So there are limitations. Still, if archeologists can garner incredible insights about the past by looking at the contents of ancient waste dumps, even with limitations there is a wealth of information that the Ngram Viewer can tap into that is there for the taking.
Aiden and Michel write with a great amount of scholarship, humility, and humor. The book was a easy and quick to read but insightful as well. They do spend a fair amount of time on the trials and tribulations of how they developed the Ngram viewer. This history of the Ngram viewer takes up a fair amount of copy and interesting to read. However, the insights that the authors are able to obtain actually using the Ngram viewer are far more interesting and I would have been more than happy to read about more of them. If there is one major downside to the book, it is that the authors got me turned on to the Ngram Viewer, which is majorly addictive and can consume a lot of your time. (Once you start, it's hard to stop. Trust me, this is highly addictive and more of a time sink than Twitter or Facebook.)
All in all, an insightful, engaging, interesting, and entertaining read. Highly recommended.
9 of 10 people found the following review helpful
This book began as a scientific paper, and I think perhaps it should have stayed in that shorter form. The authors try to spin their ngram research to book length and it stretches somewhat thin. The beginning of the book is promising - the depiction of irregular verbs becoming regularized over time is interesting to anyone who looks forward to the word of the year, and I confess to being fascinated at what can be revealed by alphabetizing the words in a novel. Past that point, however, vignettes and anecdotes became more disjointed, and I'm not a fan of the authors' style of humor. Frankly, I enjoyed pondering the ngram graphs in the back of the book (the occurrences of "turnip" vs "tomato" over time, "slavery is" vs "slavery was", "werewolf" vs "zombie") more than the majority of the text.
12 of 15 people found the following review helpful
on January 30, 2014
Two young research scientists from Harvard University, Erez Aiden and Jean-Baptiste Michel teamed up with Google in 2010 to create the Ngram Viewer. It sifts through millions of digitized books and charts the frequency with which words have been used. On the day that the Ngram Viewer debuted, more than one million queries were run through it. Some consider it to be at the center of a major revolution.
In an interview with Studio 360`s Kurt Andersen, Aiden and Michele said how pleased they are that the new technology can open up academic research to the "independently curious."
"It's good that a tool that's at the leading edge of science can generate so much enthusiasm in the general public." Michele cautions however, "it's inevitable that a tool like that will generate a large number of discussions that are actually irrelevant or that are flat-out wrong . . . it's still important that bona fide experts are the ones interpreting the research." 
In their new book Uncharted: Big Data as a Lens on Human Culture, however, they are nowhere near so humble about the so-called "big data revolution," nor are they convinced about the value of "bona fide experts."
"At its core, this big data revolution is about how humans create and preserve a historical record of their activities. Its consequences will transform how we look at ourselves. It will enable the creation of new scopes that make it possible for our society to more effectively probe its own nature. Big data is going to change the humanities, transform the social sciences, and renegotiate the relationship between the world of commerce and the ivory tower." 
Well, if for whatever reason this is going to be a contest between capital and academia, or academics versus the "independently curious," then let's hear first from the so-called "ivory tower." The following passage is from Simon Schama's introduction to his The Embarrassment of Riches: An Interpretation of Dutch Culture in the Golden Age:
". . . there is nothing especially daring about a working definition of culture drawn from social anthropology. I follow the kind of characterization offered by Mary Douglas of cultural bias as "an array of beliefs locked together into relational patterns." In the same essay, however, she cautions that for those beliefs to be considered the matrix of a culture, they should be treated as part of the [social] action and not separated from it." I have tried to follow this rather Durkheimian command in what is, essentially, a descriptive enterprise that emphasizes social process rather than social structure, habits rather than intuitions. Acting upon one another, beliefs and customs together form what Emile Durkeim called "a determinate system that has it's own life: . . . the collective or common conscience . . . it is by definition diffuse in every reach of society, Nevertheless it has specific conditions that make it a distinct reality." 
Now, let's hear from the big data revolutionaries:
"Consider the following question: Which would help you more if your quest was to learn about contemporary human society--unfettered access to a leading university's department of sociology, packed with experts on how societies function, or unfettered access to Facebook, a company whose goal is to help mediate human social relationships online?"
"On the one hand, the members of the sociology faculty benefit from brilliant insights culled from many lifetimes dedicated to learning and study.
"On the other hand, Facebook is part of the day-to-day social lives of a billion people. It knows where they live and work, where they play and with whom, what they like, when they get sick, and what they talk about with their friends. So the answer to our question may very well be Facebook. And if it isn't--yet--then what about a world twenty years down the line, when Facebook or some other site like it stores ten thousand times as much information, about every single person on the planet?" 
Aside from the vague and uninformed illogicality that pervades Uncharted, I am particularly struck by the air of self-congratulatory triumph that permeates the entire book, suggesting that big data has already won--hands down.
Why are so many enthralled by this stuff? All I can say is, "In the land of the blind, the one-eyed man is king."
 from Studio 360, Public Radio International, broadcast August 9, 2013.
 Aiden, Erez; Michel, Jean-Baptiste (2013-12-26). Uncharted: Big Data as a Lens on Human Culture (Kindle Locations 133-137). Penguin Group US. Kindle Edition.
 Simon Schama. The Embarrassment of Riches: An Interpretation of Dutch Culture in the Golden Age. New York: Random House, 1987, p. 9.
 Aiden, Erez; Michel, Jean-Baptiste (2013-12-26). Uncharted: Big Data as a Lens on Human Culture (Kindle Locations 185-189). Penguin Group US. Kindle Edition.
5 of 6 people found the following review helpful
This is essentially an interactive book, because as you read, you can go to the Ngram site and try out your own queries (or try the link in my comment below). This book is a lot of fun to read but is also quite interesting, though it could be better (below).
The Google Books Ngram Viewer has access to the words and phrases used in a significant chunk of Google's digitized book archive, By using it, you can ask questions about things such as the changing use of language, the rise and fall in popularity of ideas, or celebrities, and so forth. For example, I found that early uses of the phrase "rock and roll" were in accounts of traveling in rickety wagons, and in sailor songs. It is a great way of viewing social and cultural history through the lens of big data.
This book consists of some of the interesting ideas the researchers uncovered, alternating with the story of how the Ngram Viewer came about, and the issues they dealt with in doing so, such as privacy, and copyrights of the authors, access to the archive of information, and so forth. Along the way they utilize a number of interesting episodes of history, e.g. Helen Keller's open letter to the German people in the 1930s. Or, they analyze the corpus to show that there are over a million words used in the English language, but the Oxford English Dictionary has only about half a million of them. It instructive to see how they wring information out of this data.
Also, they discuss some of the foibles of the data, for example, one of the most mentioned people is an academic that no one has heard up -- this is because published books are skewed towards academic content. It closes with a brief discussion of how access to big data changes the questions we can ask and what is knowable. In the appendix they show charts of additional comparisons, but at that point, it is really more interesting to go to the site and input your own queries.
For an example of the kind of results it produces, and how easy it is to use, look at the link I've put in the comment section below, where mentions of different book review publications over time are charted, along with mentions of Amazon.
A glaring omission is lack of any information about how to do advanced searches, such as constrain words by part of speech, using wildcard, what does "smoothing" mean, etc. However, googling for "ngram advanced" will lead you to the online documentation. Additionally, I wouldn't have minded some more technical information about how things were implemented. Also, at points, the book gets a bit diary-like, and could use some tightening up.
However, setting tools like this loose for anyone to use is a game-changer, and thus for those interested, it is a five star book. Try it yourself!
2 of 2 people found the following review helpful
When we think of big data -- and in my field we often do -- we think of Retail Link or Google Analytics or Nielsen's research. All of these examples have the size and speed of data input that makes big data big, but these, and most other real world examples, are about current behavior of people, often in the context of marketing and purchasing behavior. This book deals with a very different set of data and a very different set of questions.
Google has created searchable versions of millions of books from the past, and the authors got access to them and developed a tool that allowed them to use this enormous dataset to gain insights into history. What's especially wonderful about this is that you can use this tool, too: [...]
The book tells the story of the scientists' journey to creating this tool, which is quite a good story. It also gives examples of the kinds of conclusions one can draw from the information the tool harnesses:
* Exactly when have changes in language taken place?
* What effects do political suppression of art and science have on the availability of information?
* How quickly do cultures learn about new technology, and is that changing?
These are fascinating questions, and the evidence that they can now be answered more accurately than ever before is impressive. However, the most compelling thing about the book is that these are merely examples of what this new big data source could do for our understanding of history and culture. The book ends with a collection of examples of graphs generated by the Ngram viewer, and they should certainly inspire social scientists, linguists, teachers, and others concerned with history, culture, and language to imagine many more applications.
The book is written in a playful style which might make it more accessible to readers, but which also might obscure the importance of the work for some readers. In the interests of full disclosure, I should admit that my degrees are in Linguistics, I teach literature, and I work in technology, so I may be the perfect audience for this book. However, I think many readers will find the story engaging, the examples eye-opening, and the prospect of using the Ngram viewer exciting.
4 of 5 people found the following review helpful
on October 28, 2013
Aiden & Michel focus primarily on the pre-database 'science', then the story and algorithm development surrounding the Google Book project database. Google Books is the largest dataset library of `words' on the planet. What might this dataset reveal? The author's had a role in Google Books and the Ngram algorithm toolset. The authors reflect entertaining observations of big data's potential. Uncharted includes an introductory history to the early 20th century science of word and language. Their history includes interesting but significant obscurities like Zipf's Law and the Haney index among others.
Big data is described as a new kind of science. This notion is not easily understood to be different until you look very closely at it. First, classical science observes nature, asks questions and designs experiments to reveal repetition. Second, the scientific method begins with the hypothesis to establish theory then proof .... unfortunaely societal trends defy predictive methods. The notion of 'word', or expression or concept and the application over time doesn't conform to scientific, quantitative, normative notions. So, the 'science' of big data behavior, independent of fully understanding underlying causalities and correlations, will be hotly debated. The Google Books library can tease notional theory from the current and exponentially growing data that humankind digitizes 5 zetabytes of data/yr at this moment.
The authors use their experience from Ngram development to provide simple 2D graphical images. The basic picture is of the frequency of a notions occurrence per billion words, or sets of words, over time.
So, as an example ... and it's just about Halloween, I can test for myself what I learned in "Uncharted" ... I can ask, "Is the world talking more about `vampires' or `zombies'"? I just did an Ngram and can empirically state that vampires have been and continue to be the most discussed of the two societal threats. Further, vampires have increased to about a 4:1 lead over zombies in the database as of 2008. I can also see that "zombies" first appeared in the 1940's; `vampires' seem to have been around for ages or at least since before 1800. "Vampire" occurrences are rising exponentially; zombies arithmetically. We will have to wait to see if "World War Z" will affect the published world's threat assessment of zombies.
The authors expand the notion to real world considerations. Several specific `trend sets' were very interesting to this reviewer and they've consumed some post-read thinking time. There is very much something here in the Uncharted big data world. It can be seen that after more than 200 years that we seem to have converged on the "chicken" vs "egg" and that's pretty big in itself.
The authors are interesting young gents who inject astute, off-the-wall commentary and deliver some very clever notions but ... the book reads as if novice writers are behind the word processor. If the author's purpose is the novice style, they succeeded. Some might find the lilting tech style off putting. The writing style aside, it's rare that I can read a book out of curiosity that delivers practical, ongoing self-entertainment skills. The fertilizer-for-the-brain effect of the appendix charts is worth the price of the book. You will need to decide for yourself to appreciate the unusual writing style.
6 of 8 people found the following review helpful
Google's Ngram Viewer is a nifty tool to track the frequencies of words and phrases over time from the enormous cache of data gathered from Google's digitalization of the world's books. For those interested in this massive effort, Jeff John Roberts' " The Battle for the Books" describes the legal battle behind that effort.
Aiden and Michel in "Uncharted" turn their commendable scientific talents to chart trends in human history extracted from their analysis of these digitalized books. With two hundred (200) pages of text, twenty five (25) pages of charts and about fifty (50) pages of notes to wade into, the casual reader soon realizes that the obvious absence of an aggressive editor "trimming the fat" from this book makes for a frustrating read.
This sloppy, chatty book leaves the reader adrift, or, as the case may be, " uncharted," to determine what the authors are really writing about, or, if their research and conclusions are important and to be trusted. The notes and charts are more instructive and helpful than the text.
On the issues of large data collection and its implications , I found "Big Data" by Kenneth Cukier and Viktor Mayer-Schonberger more informative and, of course, easier to understand as a lay person.
4 of 5 people found the following review helpful
Enjoyable, interesting, fun, educational read. Great new tech story (Google Books) being applied to one of man's most enduring achievements (written language). But I hesitate to give it the four stars, because the sub-title seems to imply so much more that just the story of the Google Books Ngram Viewer (as interesting as that is).
If you are interested in searching Google Books corpus, this is a must have background and history book. But if you're looking for a more general understanding of big data, they are broader better books available. Hope this helps.
1 of 1 people found the following review helpful
Everyone has heard of Big Data; huge amounts of information, usually involving computers or the Internet. Is there a cultural or historical equivalent of Big Data?
Yes, and it comes from Google's intention to digitize all the world's books (or, at least, a significant portion of them). The authors created an algorithm that would search all those books for certain words. On a chart, it will show, for instance, how many times, per million words, the name "Abraham Lincoln" was used, or "World War II." It can also be used to compare the historical use of pairs of words, like Satan/Santa, evolution/DNA, men/women, war/peace, tea/coffee or old school/new school. It can be found at books.google.com/ngrams ("Possibly the greatest time-waster in the history of the Internet." - Mother Jones magazine). Google needed convincing that this was a good idea, that it would not open them up to millions of copyright infringement lawsuits.
Using this algorithm, it is possible to look at things like historical attempts at censorship. It can range from Nazi attempts to remove Jewish artists like Marc Chagall from the German cultural landscape, to the 1950's Hollywood Blacklist. A person can also look at how long a certain word or phrase stays in the cultural memory. For instance, "Korean war" has a huge jump in usage in the 1960's, then an equally huge drop in usage soon after, down to its present level of almost nothing.
The book also looks at the evolution of the English language. If we have pairs of words like drive/drove, what happened to thrive/throve? Also, what happened to words like burnt, learnt and dwelt? It all has to do with irregular verbs, which change over time.
This is a fascinating book, but it will take some effort on the part of the reader. It's very well done, and it gives the reader the chance to do their own historical research.