Buy new:
$59.99$59.99
FREE delivery December 12 - 16
Ships from: NEBULA VOYAGER Sold by: NEBULA VOYAGER
Save with Used - Good
$8.00$8.00
FREE delivery December 12 - 17
Ships from: ThriftBooks-Atlanta Sold by: ThriftBooks-Atlanta
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
The Unicode Standard, Version 3.0
Purchase options and add-ons
Expanded implementation guidelines by experts in global software design: normalization, sorting and searching, case mapping, compression, language tagging, boundaries (characters, words, lines, and sentences), rendering of non-spacing marks, transcoding to other character sets, handling unknown characters, surrogate pairs, numbers, editing and selection, keyboard input, and more.
- ISBN-100201616335
- ISBN-13978-0201616330
- PublisherAddison-Wesley Professional
- Publication dateFebruary 16, 2000
- LanguageEnglish
- Dimensions8.76 x 1.88 x 11.18 inches
- Print length1072 pages
Editorial Reviews
From the Back Cover
Unicode
- Characters for all the languages of the world
- The standard for the new millennium
- Required for XML and the Internet
- The basis for modern software standards and products
- The official way to implement ISO/IEC 10646
- The key to global interoperability The Unicode Standard, Version 3.0
The authoritative, technical guide to the creation of software for worldwide use.
Detailed specifications for Unicode:
- Structure, conformance, encoding forms, character properties, semantics, equivalence, combining characters, logical ordering, conversion, allocation, big/little endian usage, Korean syllable formation, control characters, case mappings, numeric values, mathematical properties, writing directions (Arabic, Japanese, English, and so on), character shaping (Arabic, Devanagari, Tamil, and so on)
Expanded implementation guidelines by experts in global software design:
- Normalization, sorting and searching, case mapping, compression, language tagging, boundaries (characters, word, lines, and sentences), rendering of non-spacing marks, transcoding to other character sets, handling unknown characters, surrogate pairs, numbers, editing and selection, keyboard input, and more
Comprehensive charts, references, glossary, and indexes:
- Codes, names, appearances, aliases, cross-references, equivalences, radical-stroke ideographic index, Shift-JIS index, and more
CD-ROM
The comprehensive Unicode Character Database for:
- Character codes, names, properties, decompositions, upper- ,lower-, and title cases, normalizations, shaping
International, national, and vendor character mappings for:
- Western European, Japanese, Chinese, Korean, Greek, Russian, and others
- Windows, Macintosh, Unix, and Linux
Unicode Technical Reportsthat extend the standard for:
- Sorting, displaying, normalizing, linebreaking, compression, serialization, regular expressions, CR/LF, XML, case mappings, and more
0201616335B04062001
- Sorting, displaying, normalizing, linebreaking, compression, serialization, regular expressions, CR/LF, XML, case mappings, and more
- Character codes, names, properties, decompositions, upper- ,lower-, and title cases, normalizations, shaping
- Codes, names, appearances, aliases, cross-references, equivalences, radical-stroke ideographic index, Shift-JIS index, and more
- Normalization, sorting and searching, case mapping, compression, language tagging, boundaries (characters, word, lines, and sentences), rendering of non-spacing marks, transcoding to other character sets, handling unknown characters, surrogate pairs, numbers, editing and selection, keyboard input, and more
- Structure, conformance, encoding forms, character properties, semantics, equivalence, combining characters, logical ordering, conversion, allocation, big/little endian usage, Korean syllable formation, control characters, case mappings, numeric values, mathematical properties, writing directions (Arabic, Japanese, English, and so on), character shaping (Arabic, Devanagari, Tamil, and so on)
About the Author
The Unicode Consortium is a non-profit organization founded to develop, extend, and promote the use of the Unicode Standard. The membership of the Consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. The Unicode Consortium actively cooperates with many of the leading standards development organizations, including ISO/IEC JTC1, W3C, IETF, and ECMA.
0201616335AB07232003
Excerpt. © Reprinted by permission. All rights reserved.
Version 3.0 expands on material from Versions 2.0 and 2.1 and supersedes all other previous versions. The previous versions of the Unicode Standard are:
The Unicode Standard, Version 1.0, Volume 1 (1991) The Unicode Standard, Version 1.0, Volume 2 (1992) The Unicode Standard, Version 1.1, Unicode Technical Report #4 (1993) The Unicode Standard, Version 2.0 (1996) The Unicode Standard, Version 2.1, Unicode Technical Report #8 (1998)
Major additions to Version 3.0 include:
conformance rules for transformation formats new scripts including Ethiopic, Khmer, Mongolian, Myanmar, and Sinhala restructured and enhanced character block descriptions clarified bidirectional algorithm updated implementation guidelines a Shift-JIS index
The Unicode Standard maintains consistency with the international standard ISO/IEC 10646. Version 3.0 of the Unicode Standard corresponds to ISO/IEC 10646-1:2000.
Excerpt. © Reprinted by permission. All rights reserved.
- The Unicode Standard, Version 1.0, Volume 1 (1991)
- The Unicode Standard, Version 1.0, Volume 2 (1992)
- The Unicode Standard, Version 1.1, Unicode Technical Report #4 (1993)
- The Unicode Standard, Version 2.0 (1996)
- The Unicode Standard, Version 2.1, Unicode Technical Report #8 (1998) Major additions to Version 3.0 include:
- conformance rules for transformation formats
- new scripts including Ethiopic, Khmer, Mongolian, Myanmar, and Sinhala
- restructured and enhanced character block descriptions
- clarified bidirectional algorithm
- updated implementation guidelines
- a Shift-JIS index The Unicode Standard maintains consistency with the international standard ISO/IEC 10646. Version 3.0 of the Unicode Standard corresponds to ISO/IEC 10646-1:2000. 0.1 About the Unicode Standard
This book defines Version 3.0 of the Unicode Standard. The general principles and architecture of the Unicode Standard, requirements for conformance, and guidelines for implementers precede the actual coding information. Useful ancillary information is given in the appendices. The accompanying CD-ROM contains tables of use to implementers and all technical reports published to date.Concepts, Architecture, Conformance, and Guidelines
The first five chapters of Version 3.0 introduce the Unicode Standard and provide the information an engineer needs to produce a conforming implementation. Basic text processing, working with combining marks, encoding forms, and doing bidirectional text layout are all described. A special chapter on implementation guidelines answers many common questions that arise when implementing Unicode.- Chapter 1 introduces the standard's basic concepts, design basis, and coverage, and discusses basic text handling requirements.
- Chapter 2 sets forth the fundamental principles underlying the Unicode Standard and covers specific topics such as text processes, overall character properties, and the use of combining marks.
- Chapter 3 constitutes the formal statement of conformance. This chapter also presents the normative algorithms for three processes: the canonical ordering of combining marks, the encoding of Korean Hangul syllables by conjoining jamo, and the formatting of bidirectional text.
- Chapter 4 describes character properties in detail, both normative (required) and informative. Tables giving additional character property information appear on the CD-ROM.
- Chapter 5 discusses implementation issues, including compression, strategies for dealing with unknown and unsupported characters, and transcoding to other standards. Character Block Descriptions
Chapters 6 through 13 contain the character block descriptions that give basic information about each script or collection and may discuss specific characters or pertinent layout information.- Chapter 6 describes the general punctuation characters.
- Chapter 7 presents the European Alphabetic scripts, including Latin, Greek, Cyrillic, Armenian, Georgian, Runic, Ogham, and associated combining marks.
- Chapter 8 presents the Middle Eastern, right-to-left scripts: Hebrew, Arabic, Syriac, and Thaana.
- Chapter 9 covers the South and Southeast Asian scripts, including Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Tibetan, Thai, Lao, Khmer, and Myanmar.
- Chapter 10 presents the East Asian scripts, including Han, Hiragana, Katakana, Hangul, Bopomofo, and Yi.
- Chapter 11 presents other scripts, including Ethiopic, Cherokee, Canadian Aboriginal Syllabics, and Mongolian.
- Chapter 12 presents symbols, including currency, letterlike and technical symbols, and mathematical operators.
- Chapter 13 describes special characters such as the Private Use Area, surrogates, and specials. Charts and Index
The next two chapters document the Unicode Standard's character code assignments, their names and important descriptive information, and Han indices that aid in locating specific ideographs encoded in Unicode.- Chapter 14 gives the code charts and the Character Names List. The code charts contain the normative character encoding assignments, and the names list contains normative information as well as useful cross references and informational notes.
- Chapter 15 provides a radical-stroke index to East Asian ideographs, as well as a Shift-JIS index. Appendices and Tables
The appendices contain detailed background information on important topics: character encoding systems, submission of proposals, and the history of Unicode and its relationship to ISO/IEC 10646.- Appendix A describes the history of Han Unification in the Unicode Standard.
- Appendix B gives instructions on how to submit characters for consideration as additions to the Unicode Standard.
- Appendix C details the relationship between the Unicode Standard and ISO/IEC 10646.
- Appendix D lists the changes to the Unicode Standard since Version 2.0. The appendices are followed by a glossary of terms, a bibliography, and two indices: an index to Unicode characters and an index to the text of Chapters 1 through 15.The Unicode Character Database and Technical Reports
The Unicode Character Database is the name for a collection of files that contain character code values, character names, and character property data. It is described more fully in the file UnicodeCharacterDatabase.html. Version 3.0.0 of the database is provided on the accompanying CD-ROM. Updates and revisions will be made available online. See http://www.unicode.org/unicode/standard/versions/ for information on the latest available version. The following Unicode Technical Reports are formally part of this standard:- UTR #11: East Asian Width, Version 5.0
- UTR #13: Unicode Newline Guidelines, Version 5.0
- UTR #14: Line Breaking Properties, Version 6.0
- UTR #15: Unicode Normalization Forms, Version 18.0 The latest available version of these reports is provided on the CD-ROM. Updates and revisions will be made available online. For information on the latest available version, see http://www.unicode.org/unicode/standard/versions/.On the CD-ROM
The CD-ROM contains the Unicode Character Database, which gives character codes, character names, character properties, and decompositions for decomposable or compatibility characters. In addition to the Unicode Character Database and Unicode Technical Reports that are part of this standard, the CD-ROM also contains additional technical reports (covering topics such as compression, collation, and transformation formats), as well as property-based mapping tables (for example, tables for case) and transcoding tables for international, national, and industry character sets (including the Han cross-reference table). For the complete contents of the CD-ROM, see its READ ME file. Please consult the Unicode Consortium's online resources (see Section 0.3, Resources) to obtain the most up-to-date versions of the materials on the CD-ROM.
0.2 Notational Conventions
Throughout this book, certain typographic conventions are used. In running text, an individual Unicode value is expressed as U+nnnn, where nnnn is a four-digit number in hexadecimal notation, using the digits 0-9 and the letters A-F (for 10 through 15, respectively).- U+0416 is the Unicode value for the character named CYRILLIC CAPITAL LETTER ZHE. In tables, the U+ may be omitted for brevity. A range of Unicode values is expressed as U+xxxxAEU+yyyy, or U+xxxx--U+yyyy, or xxxx..yyyy, where xxxx and yyyy are the first and last Unicode values in the range, and the arrow, long dash, or two dots indicate a contiguous range inclusive of the endpoints.
- The range U+0900→U+097F contains 128 character values. All Unicode characters have unique names, which are identical to those of the English-language edition of International Standard ISO/IEC 10646. Unicode character names contain only uppercase Latin letters A through Z, digits, space, and hyphen-minus; this convention makes it easy to generate computer-language identifiers automatically from the names. Unified East Asian ideographs are named CJK UNIFIED IDEOGRAPH-X, where X is replaced with the hexadecimal Unicode value--for example, CJK UNIFIED IDEOGRAPH-4E00. The names of Hangul syllables are generated algorithmically; for details, see Hangul Syllable Names in Section 3.11, Conjoining Jamo Behavior. In running text, a formal Unicode name is shown in small capitals (for example, GREEK SMALL LETTER MU), and alternative names (aliases) appear in italics (for example, umlaut). Italics are also used to refer to a text element that is not explicitly encoded (for example, pasekh alef) or to set off a foreign word (for example, the Welsh word ynghyd). Phonemic transcriptions are shown between slashes, as in Khmer /khnyom/. The symbols used in the character names list are described at the beginning of Chapter 14, Code Charts. In the text of this book, the word "Unicode" when used alone as a noun refers to the Unicode Standard. In this book, unambiguous dates of the current common era, such as 1999, are unlabeled. In cases of ambiguity, CE is used. Dates before the common era are labeled with BCE.Extended BNF
The Unicode Standard and technical reports use an extended BNF format for describing syntax. As different conventions are used for BNF, Table 0-1, Extended BNF, lists the notation used here.A sequence of characters is sometimes listed in text with angle brackets, such as or .
Table 0-1. Extended BNF
Symbols Meaning x := ... production rule x y the sequence consisting of x then y x* zero or more occurrences of x x? zero or one occurrence of x x+ one or more occurrences of x x y either x or y ( x ) for grouping x y equivalent to (x y (x y)) { x } equivalent to (x)? "abc" string literals ( "_" is sometimes used to denote space for clarity) 'abc' string literals (alternative form) \u1234 Unicode characters within string literals or character classes \v00101234 Unicode scalar values within string literals or character classes U+HHHH Unicode character literal: equivalent to '\uHHHH' U-HHHHHHHH Unicode character literal: equivalent to '\vHHHHHHHH' charClass character class (syntax below) Character Classes. A character class is constructed from one or two base sets. It is either a single base set, the negation of a base set, or the (set) difference between two base sets. The base sets themselves are bounded by brackets, and contain lists of characters, ranges of characters, general categories, or negations of general categories. The syntax follows: charClass := baseSet '¬' baseSet baseSet '-' baseSet
baseSet := '' item (','? item)* ''
item := char char '-' char '{' '¬'? category '}'General categories are defined in Chapter 4, Character Properties, such as {Uppercase Letter} for uppercase letter. Main categories such as {Mark} are the equivalent of a list of multiple subcategories: {Non-Spacing Mark}{Spacing Combining Mark}{Enclosing Mark}. Examples are found in Table 0-2, Character Class Examples.
Table 0-2. Character Class Examples
Syntax Matches a-z English lowercase letters a-z-c English lowercase letters except for c ¬c all characters but c 0-9 European decimal digits \u0030-\u0039 (same as above, using Unicode escapes) 0-9, A-F, a-f hexadecimal digits {Letter},{Non-Spacing Mark} all letters and non-spacing marks {L},{Mn} (same as above, using abbreviated notation) {¬Cn} all assigned Unicode characters \u0600-\u06FF-{Cn} all assigned Arabic charactersOperators
Operators used in this standard are listed in Table 0-3, Operators.Table 0-3. Operators
~ allow break here (see Section 5.15, Locating Text Element Boundaries) x do not allow a break here → is transformed to, or behaves like / integer division (rounded down) % modulo operation; equivalent to the integer remainder for positive numbers0.3 Resources
Unicode Web Site
The Unicode Consortium provides a number of online resources for obtaining information and data about the Unicode Standard, as well as updates and corrigenda. They are listed below.
- www.unicode.org Unicode Anonymous FTP Site
- ftp://ftp.unicode.org Unicode Public Mailing List
- info@unicode.org
- Postal address: P.O. Box 391476, Mountain View, CA 94039-1476 USA Please check the Web site for up-to-date contact information, including telephone, fax, and courier delivery address.
0201616335P04062001
- ftp://ftp.unicode.org Unicode Public Mailing List
- www.unicode.org Unicode Anonymous FTP Site
- The range U+0900→U+097F contains 128 character values. All Unicode characters have unique names, which are identical to those of the English-language edition of International Standard ISO/IEC 10646. Unicode character names contain only uppercase Latin letters A through Z, digits, space, and hyphen-minus; this convention makes it easy to generate computer-language identifiers automatically from the names. Unified East Asian ideographs are named CJK UNIFIED IDEOGRAPH-X, where X is replaced with the hexadecimal Unicode value--for example, CJK UNIFIED IDEOGRAPH-4E00. The names of Hangul syllables are generated algorithmically; for details, see Hangul Syllable Names in Section 3.11, Conjoining Jamo Behavior. In running text, a formal Unicode name is shown in small capitals (for example, GREEK SMALL LETTER MU), and alternative names (aliases) appear in italics (for example, umlaut). Italics are also used to refer to a text element that is not explicitly encoded (for example, pasekh alef) or to set off a foreign word (for example, the Welsh word ynghyd). Phonemic transcriptions are shown between slashes, as in Khmer /khnyom/. The symbols used in the character names list are described at the beginning of Chapter 14, Code Charts. In the text of this book, the word "Unicode" when used alone as a noun refers to the Unicode Standard. In this book, unambiguous dates of the current common era, such as 1999, are unlabeled. In cases of ambiguity, CE is used. Dates before the common era are labeled with BCE.Extended BNF
- U+0416 is the Unicode value for the character named CYRILLIC CAPITAL LETTER ZHE. In tables, the U+ may be omitted for brevity. A range of Unicode values is expressed as U+xxxxAEU+yyyy, or U+xxxx--U+yyyy, or xxxx..yyyy, where xxxx and yyyy are the first and last Unicode values in the range, and the arrow, long dash, or two dots indicate a contiguous range inclusive of the endpoints.
Product details
- Publisher : Addison-Wesley Professional (February 16, 2000)
- Language : English
- Hardcover : 1072 pages
- ISBN-10 : 0201616335
- ISBN-13 : 978-0201616330
- Item Weight : 5.41 pounds
- Dimensions : 8.76 x 1.88 x 11.18 inches
- Best Sellers Rank: #6,682,968 in Books (See Top 100 in Books)
- #39 in Unicode Encoding Standard
- #2,909 in Software Design & Engineering
- #8,493 in Computer Programming Languages
- Customer Reviews:
Customer reviews
- 5 star4 star3 star2 star1 star5 star50%50%0%0%0%50%
- 5 star4 star3 star2 star1 star4 star50%50%0%0%0%50%
- 5 star4 star3 star2 star1 star3 star50%50%0%0%0%0%
- 5 star4 star3 star2 star1 star2 star50%50%0%0%0%0%
- 5 star4 star3 star2 star1 star1 star50%50%0%0%0%0%
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on Amazon-
Top reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
At 1040 large (8.5 x 11) pages it is the ultimate guide to unicode. With information on scripts and glyphs I had no idea even existed.
However if you are just getting started with Unicode I would recomend you get Unicode a Primer written by Tony Graham from M&T books. If you understand or feel you are starting to understand Unicode then The Unicode Standard Version 3.0 is the best comprehensive reference on the subject out today.
Central to the book, taking up the larger part of it, are the tables of the characters themselves, printed large with annotations and cross-references. If you enjoy the lure of strange symbols and curious writing systems then browsing these will occupy delightful hours.
For the Latin alphabet alone there are pages of accented letters and extended Latin alphabet characters used in particular languages or places or traditions: Pan-Turkic "oi", African clicks and other African sounds, obsolete letters from Old English and Old Norse, an "ou" digraph used only in Huron/Algonquin languages in Quebec, and many others, particularly those used for phonetic/phonemic transcriptions.
The Greek character set includes archaic letters and additional letters used in Coptic.
Character sets carried over from previous editions with additions and corrections are Cyrillic (with many national characters), Armenian, Georgian, Hebrew, Arabic (again many national and dialect characters), the most common Hindu scripts (Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada and Malayalam), Tibetan, Thai, Lao, Hangul, Bopomofo, Japanese Katakana and Hiragana, capped by the enormous Han character set containing over 27,000 of the most commonly used ideographs in Chinese/Japanese/Korean writing. Then there are the symbols: mathematical/logical (including lots of arrows), technical, geometrical, and pictographic. You'll find astrological/zodiacal signs, chess pieces, I-Ching trigrams, Roman numerals not commonly known, and much more.
Scripts appearing for the first time this release are Syriac, Ethiopic, Unified Canadian Aboriginal Syllabics, Cherookee, Runes, Ogham, Yi, Mongolian, Sinhala, Thaana, Khmer, Myanmar, complete Braille patterns, and keyboard character sets. And yes, there are public domain/shareware fonts available on the web that support these with their new Unicode values.
There are very good (and not always brief) descriptions of the various scripts and of the special symbol sets. Rounding out the book are some involved, turgid (necessarily so) technical articles on composition, character properties, implementation guidelines, and combining characters, providing rules to use the character properties tables on the CD that accompanies the book. After all, this is the complete official, definitive Unicode standard.
Of course this version, 3.0, is already out-of-date. But updates and corrections are easily available from the official Unicode website where data for 3.1 Beta appears as I write this. My book bulges with interleaved additions and changes. And that's very good. Many standards have died or been superceded because the organizations behind them did not keep up with users' needs or the information was not easily accessible.
Caveats?
The notes on actual uses of the characters could be more extensive, particularly on Latin extended characters. More variants of some glyphs should be shown, as in previous editions, if only in the notations.
Some character names are clumsy or inaccurate (occasionly noted in the book), because of necessity to be compatible with ISO/IEC 10646 and with earlier versions of the Unicode standard. For example, many character names begin with "LEFT" rather than "OPENING" or "RIGHT" rather than "CLOSING" though the same character code is to be used for a mirrored version of the character in right-to-left scripts where "LEFT" and "RIGHT" then become incorrect. And sample this humorous quotation from page 298: "Despite its name, U+0043 SCRIPT CAPITAL LETTER P is neither script nor capital--it is uniquely the Weierstrass elliptic function derived from a calligraphic lowercase p."
This book is essential for software engineers, at least for the next ten years or so. All programmers should understand characters, and UNICODE is the best we have for now. Even if you don't need it in your personal library, you need it in your company or school library.
The standard is flawed, as all real standards are, but it is a functioning standard, and it should be sufficient for many purposes for the near future.
The book itself is fairly well laid out, contains an introduction to character handling problems and methods for most of the major languages in use in our present world as well as tables of basic images for all code points. Be aware that these are _only_ basic images. For most internationalization purposes, be prepared for more research. (And please share your results.)
**** Finally, UNICODE is _not_ a 16 bit code. ****
(This is well explained in the book.) It just turned out that there really are over 50,000 Han characters. (Mojikyo records more than 90,000.) UNICODE can be encoded in an eight-bit or 16-bit expanding method or a 32-bit non-expanding method. The expanding methods can be _cleanly_ parsed, frontwards, backwards, and from the middle, which is a significant improvement over previous methods.
Some of the material in the book is available at the UNICODE consortium's site, but the book is easier to read anyway. One complaint I have about the included CD is that the music track gets in the way of reading the transform files on my iBook.
