- Paperback: 896 pages
- Publisher: Addison-Wesley Professional; 1 edition (September 26, 2002)
- Language: English
- ISBN-10: 0201700522
- ISBN-13: 978-0201700527
- Product Dimensions: 7.4 x 1.9 x 9 inches
- Shipping Weight: 3 pounds
- Average Customer Review: 8 customer reviews
- Amazon Best Sellers Rank: #1,356,792 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Fulfillment by Amazon (FBA) is a service we offer sellers that lets them store their products in Amazon's fulfillment centers, and we directly pack, ship, and provide customer service for these products. Something we hope you'll especially enjoy: FBA items qualify for FREE Shipping and Amazon Prime.
If you're a seller, Fulfillment by Amazon can help you increase your sales. We invite you to learn more about Fulfillment by Amazon .
The Amazon Book Review
Author interviews, book reviews, editors picks, and more. Read it now
Customers who viewed this item also viewed
Customers who bought this item also bought
From the Back Cover
"Rich has a clear, colloquial style that allows him to make even complex Unicode matters understandable. People dealing with Unicode will find this book a valuable resource."
--Dr. Mark Davis, President, The Unicode Consortium
As the software marketplace becomes more global in scope, programmers are recognizing the importance of the Unicode standard for engineering robust software that works across multiple regions, countries, languages, alphabets, and scripts. Unicode Demystified offers an in-depth introduction to the encoding standard and provides the tools and techniques necessary to create today's globally interoperable software systems.
An ideal complement to specifics found in The Unicode Standard, Version 3.0 (Addison-Wesley, 2000), this practical guidebook brings the "big picture" of Unicode into practical focus for the day-to-day programmer and the internationalization specialist alike. Beginning with a structural overview of the standard and a discussion of its heritage and motivations, the book then shifts focus to the various writing systems represented by Unicode--along with the challenges associated with each. From there, the book looks at Unicode in action and presents strategies for implementing various aspects of the standard.
Topics covered include:
- The basics of Unicode--what it is and what it isn't
- The history and development of character encoding
- The architecture and salient features of Unicode, including character properties, normalization forms, and storage and serialization formats
- The character repertoire: scripts of Europe, the Middle East, Africa, Asia, and more, plus numbers, punctuation, symbols, and special characters
- Implementation techniques: conversions, searching and sorting, rendering, and editing
- Using Unicode with the Internet, programming languages, and operating systems
With this book as a guide, programmers now have the tools necessary to understand, create, and deploy dynamic software systems across today's increasingly global marketplace.
About the Author
Richard Gillam is a senior development engineer at Trilogy, a leading developer of large-enterprise e-commerce solutions. He is a former member of IBM's Globalization Center of Competency, where he was one of the original designers of the open-source International Components for Unicode and was responsible for several of the international frameworks in the Java Class Libraries. Rich is a former columnist for C++ Report, a regular presenter at the International Unicode Conferences, and a Specialist Member of the Unicode Consortium.
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
Part I of this book starts with the history of character encoding standards, from Morse code to today. It then presents a thorough review of the Unicode architecture and associated standards. The information presented was mostly excellent, although I found the section describing SCSU a little bit too sketchy (and the actual code in part III not entirely satisfactory to fill in the gaps).
Part II gives an overview of the various writing systems and character ranges represented in Unicode. Even for a nontechnical audience, this part would be fascinating with all the typographical and historical trivia it presents.
Part III discusses various algorithms applicable to text processing in a Unicode context. I must admit that I found this part a bit of a letdown. Many of the algoritms are only sketched out because discussing them in detail would be beyond the scope of the book. Quite possibly, the pages dedicated to these algorithms would have been better spent presenting examples of code using the various existing APIs for handling Unicode (Java, ICU, Perl, Windows, MacOS X).
This does not take away from the fact that this is a great book that any programmer interested in Unicode should own.
Gillam is a bit dated now, but still much easier to get started with than the Unicode Standard. I read all three books, but as a programmer, I found only Gillam helpful.
(1) Unicode in essence: an architectural overview of the Unicode standard (six chapters) where you also get bits of terminology and history.
(2) Unicode in depth: A guided tour of the character repertoire (six chapters) where you get a lot about writing systems that can be represented in Unicode, and less about the Unicode characters.
(3) Unicode in action: implementing and using the Unicode standard (five chapters) where you get information aimed at computer programmers that wish to implement parts of the standard or write applications dealing with multilingual text.
Though this book is very long (~800 pages) it is still shorter and a lot more clear than the Unicode standard itself (over 1000 pages).
Code examples are in Java but they are not ment to be complete solutions and so there is no accompanying website or a CD.
Professional programmers are the target audience of this book. The reader is faced with many topics in linguistics, history and data structures. Readers with computer science background would probably appreciate how classic traditional algorithms were adapted and how data structures are used in character sets with a significantly larger number of character than 256.
The author of the book states that the book is about "representing written language in a computer", which may be misleading to some readers. The book is about the Unicode standard. Obviously, there are many other ways to represent written language other than the methods described in the book. As chapter 2 teaches... There are always more ways (sometimes better ways) to represent your data.
Part 2 of the book will not cover every writing system of the world. A better book for that would be "The world's writing systems".
Part3 is probably the most interesting and useful part for programmers (though the first part is important, in my opinion to those who want to UNDERSTAND Unicode).
You can learn about a lot of things and skip many too (depending on your interest and need). I believe that most readers will skip most of the topics.
This is not a book that is read lightly, but it is hellovalot easier and more fun to read than the Unicode standard itself. It appears that once you read this book and get what you want from it, you will end up going to read the Unicode standard only to see updates, hopefully, not for clarifications.
I am dealing with Natural Language Processing and being a Hebrew speaker I also have a lot of text in Hebrew (almost all the time it is Hebrew with other languages too, e.g. documents that contain Hebrew with some English). This book helps understand the difficulties, the current implementations and give you a solid ground to start thinking how you can make things better. Current infrastructure for Hebrew is either poor or not perfect and in most cases the better solutions are proprietary. There seems to be always problems representing 'plain' text in more than one language without stepping into the trap of the soup of different ways to do it. Unicode is one way to do it (arguably, not the best, yet it is alive and growing) I hope this book can help more people understand what they are up against, clear the fog and help people do better implementations.
Gillam provides a lot of useful details, history and explanations for the structure of the character set, and shows how to use it. The book is a companion to the print and online resources of the Unicode standard itself, and provides the glue to many of the pieces, the how-to's and basic data structures.
For example, the Unicode encodings UTF-8/16/32 (and BOM) are explained very well, bidirectional text is discussed with a lot of insight, and the family of Indic scripts with their special features is presented with examples for how to encode Indic text.
This is almost three books in one. The first part provides a very good introduction to Unicode in general. The middle is really useful for all sorts of people, from linguists to content authors who want to understand the scripts encompassed by Unicode. And the last part is extremely helpful for programmers who want to understand how to implement many text processing techniques using Unicode.
Throughout, Rich's style is easy and enjoyable to read, and yet quickly gets to a wealth of useful information.
Great job! Highly recommended.