Unicode Explained: Internationalize Documents, Programs, and Web Sites 1st Edition
| Jukka K. Korpela (Author) Find all the books, read about the author, and more. See search results for this author |
Use the Amazon App to scan ISBNs and compare prices.
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. There are hundreds of different encoding systems for mapping characters to numbers, but Unicode promises a single mapping. Unicode enables a single software product or website to be targeted across multiple platforms, languages and countries without re-engineering. It's no wonder that industry giants like Apple, Hewlett-Packard, IBM andMicrosoft have all adopted Unicode.
Containing everything you need to understand Unicode, this comprehensive reference from O'Reilly takes you on a detailed guide through the complex character world. For starters, it explains how to identify and classify characters - whether they're common, uncommon, or exotic. It then shows you how to type them, utilize their properties, and process character data in a robust manner.
The book is broken up into three distinct parts. The first few chapters provide you with a tutorial presentation of Unicode and character data. It gives you a firm grasp of the terminology you need to reference various components, including character sets, fonts and encodings, glyphs and character repertoires.
The middle section offers more detailed information about using Unicode and other character codes. It explains the principles and methods of defining character codes, describes some of the widely used codes, and presents code conversion techniques. It also discusses properties of characters, collation and sorting, line breaking rules and Unicode encodings. The final four chapters cover more advanced material, suchas programming to support Unicode.
You simply can't afford to be without the nuggets of valuable information detailed in Unicode Explained.
Frequently bought together
Customers who viewed this item also viewed
Editorial Reviews
About the Author
Jukka Korpela is a consultant who specializes in character codes, localization, orthography, usability, and accessibility. After graduating from Helsinki University of Technology, he taught these subjects in the university's Computer Science department and worked on localization and accessibility issues at TIEKE before becoming a full-time author and consultant. His previous books on CSS and XHTML were published in Finland by Docendo press.
Product details
- Publisher : O'Reilly Media; 1st edition (July 11, 2006)
- Language : English
- Paperback : 680 pages
- ISBN-10 : 059610121X
- ISBN-13 : 978-0596101213
- Item Weight : 2.13 pounds
- Dimensions : 7 x 1.26 x 9.19 inches
- Best Sellers Rank: #1,391,289 in Books (See Top 100 in Books)
- #29 in Unicode Encoding Standard
- #91 in XML Programming (Books)
- #587 in User Experience & Website Usability
- Customer Reviews:
I'd like to read this book on Kindle
Don't have a Kindle? Compra tu Kindle aquí, or download a FREE Kindle Reading App.
About the author

I'm an IT generalist and specialist with strong interest in humanities--and humans. This explains why I have focused on character codes, localization, and accessibility.
After M.Sc. degree in mathematics, I worked with IT services, user support, software development, and teaching at Helsinki University of Technology over 25 years and also started writing books. Later, I have worked as an author and consultant, in addition to employment periods at the Finnish Information Society Development Center (IT standardization) and at the Blue1 airline company (multilingual online sales and web site). My books in Finnish deal with programming languages (Fortran, Pascal, C), Unix, web publishing, HTML, CSS, data security, office document design and style, and web typography.
My motto is “Docendo disco, scribendo cogito”—by teaching I learn, by writing I think. I love to get deeper insight and wider understanding by formulating things for others, in a manner that is both understandable and informative, correct, and useful—a real challenge.
I’m an advocate of standardization, but with realism learned the hard way. I always try to promote relevant standards and consensus-based recommendations, but I also try to point out their shortcomings and problems in acceptance and in implementations.
You might find me on various discussion forums on the Internet, but I’m partly an old-timer who loved the good old Usenet, where I frequented in many web-related groups. I’m adapting to the new modes of communication, which are partly much more effective (like moderated e-mail lists and StackOverflow).
Customer reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
I realized that the whole subject is a lot more complicated than I initially thought and the number of questions that needed an answer to move forward with what I was doing increased significantly. I was finding stuff on the web, a little bit here and a little bit there and had it one day, because progress was slow.
I stumbled one day across this book via a Google search, which returned passages from it from its Google Book search results. I found a very good answer to one of my questions and answers to some other questions that were lying around unanswered from before. I checked the index of the book to see what subjects it covers and realized that it pretty much covers all of them. So I went ahead to Amazon and bought it right there and then.
I am glad to this day that I found it and can recommend it to anybody who has only little or no knowledge of Unicode and struggles with getting a grip on all those standards for data encoding, which make it hard to keep the data within XML and text files intact across platforms and prevent your XML based application or tool from breaking because of illegal data in your content.
Its side notes are also interesting - explaining things like Arabic right-to-left with its contextual characters with 4 different forms; or how they mused over using one common Chinese Han character to be shared by Japanese , Koreans and Vietnamese versus including a version of each in their languages' ranges of individually separate characters.
A focus of the book is on the problems that Unicode helps to address and this is important because Unicode is a tool rather than a full solution.
Being a software developer, in experienced with multiple human language support I appreciate this insight to help me to avoid the known problems.
¶ I had another Unicode book on my desk for a long time. Hardbound, thick, impressive. Never found a way to derive useful information from it however. This book is different.
¶ I had high expectations for this book because the author, Jukka Korpela, is one of those erudite and patient people who work hard to raise the signal to noise ratio in Internet newsgroups and other forums. I certainly have quite a few posts from "Yucca" in my working archive of Web tips.
¶ Working with Web pages and applications, one can run into practical problems with text display. For Americans especially, often using default software configurations, some of the problems of displaying content in other languages can seem intractable. They are not of course -- but a bit of help from workers in the rest of the world can be a real lift. After all, they deal with these issues in a practical way more often.
¶ I had a nasty run-in (also known as "learning experience") with browser display issues when my "CSS Cheatsheet" rose in popularity in Google and other search engines. I decided to create a page quoting comments from linking sites in their native languages. Everything was fine until I got to Russian. I felt as if I were up against a conspiracy of browsers, tools, operating systems and even particular custom configurations!
¶ If you are like me and your focus is practical, I recommend:
The first two chapters in Part 1: Characters as Data; Writing Characters
All the advanced topics in Part 3: these 5 chapters covered character issues involved with programming and developing in the Internet environment.
¶ Overall, this book is well-organized and quite readable, with lots of relevant illustrations. Important material is repeated and summarized for greater clarity. The author also used lots of examples from Windows programs that are familiar to many of us. This is a real plus.
The only thing disappointing about this book is that all of his examples and screen shots are for and from Windows. A reader could come away with the feeling that Mac OS X and Linux don't have as much support for Unicode as Windows which, of course, is not the case at all. The least he could have done is to mention and give screenshots of Linux's "Character Map" app and Mac OS X's built-in "Character Palette", both of which are pretty much just like the Windows "Character Map" app.
I'm surprised O'Reilly allowed a book about such a platform-neutral subject to be so Windows-centric. Hopefully they can hire someone to add Linux and Mac OS X examples into the second edition.
Top reviews from other countries
I had a tough technical problem to solve and didn't even have the correct vocabulary to describe it.
Of the three books I bought to help, this is the one I turned to most frequently.
The historical element is interesting, but the technical sections really aid understanding the various flavours of unicode, and what benefits can be had from a successful implementation.
This is now part of my reference library for technical issues, and I'm frequently being asked to contribute to character-based discussions, due to my new found understanding and the assistance I can now offer.
Well worth the time and effort to read it.






