What was your goal in writing
a pretty big departure from the books with which you've built your reputation
as a programmer and author, the hefty how-to manuals on Microsoft Windows
programming. What inspired you to write this book?
Charles Petzold: Code
was not written for programmers. It is a unique tour
through the digital technologies that make our computers work, starting with
Morse code and the telegraph, and I wrote it for people who aren't necessarily
I first conceived of Code
in 1987 while writing the PC
Tutor column for PC Magazine
. I realized it might be possible to
demonstrate how computers work starting from very simple ideas, and for almost
a decade I let the book take form in my head before writing a word. Code
was the most difficult writing I've ever done. But I wanted to do something
more challenging than revising
Windows for the nth time. I also wanted to write a book that
might have a shelf life of more than two years, and which my family and
nonprogrammer friends might want to read.
Some of the best parts of Code
are the portions in which
you explain how certain common types of coding (such as UPC bar codes and ISO
grids on rolls of film) work. Can you decode some other everyday representation
of data for us--perhaps those two-dimensional, scannable tags that UPS has been
using for a while now, or ISBNs on books?
Although digital codes are used in many modern household
appliances, they aren't often in clear view. That's why Code
subtitled "The Hidden
Language of Computer Hardware and Software." I
haven't explored those UPS tags, however. From what I understand, each one is
unique and they include some encrypted data that allows them to be locally
printed by the sender of the parcel. What's interesting is that it's very
obviously a binary code and you can actually determine the number of bits in
the grid. That's not quite obvious with bar codes. The ISBN bar codes on books
are very similar to the UPC bar codes, so perhaps the exercise of decoding them
is best left to a reader of Code
You talk a lot about ASCII code in your book. How is the newer
Unicode standard different? How do the bits in a Unicode character break
Unicode is something I feel very strongly about, and I discuss it
at length in Programming Windows
as well as in Code
most digital representations of text use a variation of a 7-bit code called
ASCII--the American Standard Code for Information Interchange. ASCII is truly
standard and was never intended to represent the accented
letters used in many European languages, to say nothing of non-Roman alphabets
such as Hebrew, Greek, and Cyrillic, or the thousands of ideographs used in
Chinese, Japanese, and Korean. Over the years, many extensions to ASCII have
been developed to compensate for these deficiencies, unfortunately all of them
Unicode is a single unambiguous 16-bit text encoding system with
the potential of representing 65,536 characters, including all the characters
from all the world's written languages that are likely to be used in computer
communications. The universal adaptation of Unicode is an important step in
internationalizing computer use, but that's a big job because ASCII is probably
the most entrenched computer standard of them all.
The individual bits of Unicode don't have any independent meaning.
Instead, different alphabets and collections of ideographs are assigned ranges
of codes. Codes 3840 through 4025, for example, represent characters used in
has a lot to do with representations of data and
information. What is your characterization of the difference between data and
Claude Shannon--the inventor of information theory, and one of the
important historical figures who makes an appearance in Code
a distinction between information and data. To Shannon (in The Mathematical
Theory of Communication
, 1949), information represents a differentiation
between two or more possibilities, and thus can be conveyed with one or more
Today, however, particularly influenced by books such as Clifford
Silicon Snake Oil
(1995) and David Shenk's
(1997), we tend to say that data is the raw stuff and information is the
processed stuff. Information makes sense of datPetzold: Information draws
conclusions from datPetzold: Information has utility. One of the problems of
the mass media and the Internet, such authors say, is that we get too much data
and not enough real information.
But I don't find this distinction particularly useful in exploring
the ways in which data (or information) is digitally encoded, which is what
is all about. It doesn't really matter whether the information (or
data) is useful or not. And it's really just a matter of perspective. One
person's data is another person's information--particularly if it's the
person's job to turn data into information for other people.
What about context? This is what XML namespaces are all about. If
I'm an admissions officer at a university, a value called "yield" is the
percentage of accepted students who decide to come to my university. If I'm an
atomic physicist, "yield" is the explosive power of an unregulated nuclear
reaction. The words are the same--how do different representation schemes deal
with context-dependent differences in meaning?
That ambiguity is a potential problem in XML. The more ambiguous
XML is, the less useful it will be.
But in general, bits never really tell you anything about
themselves. One of the most common bugs in computer programming is called a
signed/unsigned mismatch. Often in such cases a negative number is stored in
two's-complement format (discussed in chapter 13 of Code
), but another
part of the program assumes that it's a positive number. There's even a bug
like that in chapter 7 of the fifth edition of Programming Windows
Avoiding such bugs is a necessity of programming. Anything that reads data
needs to know exactly what format is being used to store the data.
The coding systems you describe--Morse, ASCII and so on--are great
as intermediaries between some kind of machine and a human language, such as
English or German. But the human languages still carry all the big ideas.
"Four" is a pretty much universal concept, and nearly everyone in the world
will recognize the Arabic numeral "4" as representative of that concept. But
other universal concepts include "love," "beauty," "shame," and "greed." I
mean, look at Michelangelo's "Pieta" and clearly it's about grief, but it's not
much good for communicating that feeling to someone who can't see the statue.
Is there any hope for universal tele-representation of big ideas?
Morse code and ASCII can represent the word "love" just as well as
written language. The bits in a waveform file can represent the word almost as
well as spoken language. And the bits in a movie file can represent the word
almost as well as face-to-face communication. That we get some meaning from the
word is a result of its formal definition and a lifetime of shared
If a work of art imparts only emotions such as "grief," the work
must be said to have an extremely low signal-to-noise ratio. That's an awful
lot of marble to convey an emotion that could be conveyed just as effectively
using stick figures. What we appreciate in classical sculpture is more
accurately the geometrical form and proportions.
Have you read Bruce Chatwin's
Songlines? The book is about the Aboriginal people of Australia,
and it has a lot to say about the representation of information. There is a
particular passage that's relevant here. The idea of the passage is that there
are many Aboriginal groups scattered around the continent, and they have
mutually unintelligible spoken languages. However, all the groups use songs to
describe physical journeys. The songs work on several levels. The words
describe the landmarks and the terrain, but so do the rhythm and the melody.
Chatwin notes that a man from near Darwin, listening to a man from near
Townsville sing about his home, can extract information about topography from
the song even though he knows none of the words. The songs are systems of
encoding information. They're universally understood. Have you any thoughts on
This might surprise Igor Stravinsky, who said that music was
incapable of expressing anything except itself. In this particular case, I'd be
surprised if the rhythm and melody conveyed more information about typography
than might be managed with simple hand gestures. Anything more elaborate would
require encoding schemes that would inordinately interfere with the structural
rhythm and harmony, perhaps ultimately resembling those John Cage compositions
that were based on star maps.
Although the syntax of language seems to be ingrained in our
newborn brains, vocabularies are obviously not. In this sense, music is more
universal than language because the vocabulary of rhythm and harmony are
related to the biology of our bodies--the pulsing of our natural rhythms and
our sense of hearing. Harmony in particular can be analyzed as the relationship
of frequencies in relatively simple integral ratios. That fact that a major
fifth is 1.5 times the frequency of the tonic is culturally independent. Thus,
different levels of consonance and dissonance are available to convey
culturally independent antipodes. For example, a dissonant passage might convey
a craggy terrain and a consonant passage a level terrain.
However, if this is solely the type of information one is drawing
from a particular piece of music, it too must be said to have a low
signal-to-noise ratio. What is usually much more interesting in music is the
way in which the composer is using form and proportions to convey emotions (or
topography or whatever) rather than the emotions themselves.
"Maybe" and "kind of" are important concepts to human beings, but
they're not well-suited to binary encoding or processing by logic gates. Is
there a place for ambiguity in computing? Or is ambiguity, like chaos in
natural systems, just order of a sort we don't yet understand?
There's a whole field called "fuzzy logic" that attempts to combat
the numerical rigidity that bits and gates seem to imply. Readers of
might be interested in Arturo Sangalli's
The Importance of Being
Fuzzy (1998) for a good basic introduction to fuzzy logic. It's
a topic that would have been discussed in Code
had we ("we" meaning
Microsoft Press and I) decided we wanted a 500-page book rather than a 400-page
, you have a lot to say about computer theory,
meaning that you talk about ways of encoding values so that machines can
interpret them, mechanisms for processing those values, and systems for sharing
those values with human operators. But you don't say much about networks. Does
computer theory change at all when you have lots of computers hooked together?
Is the network really the computer, as Sun Microsystems says?
For purposes of clarity, Code
concentrates on pre-networked
computers. Surely such computers have a considerable amount of utility.
Connecting computers makes possible distributed processing, which is dividing a
particular computer task among multiple machines. That certainly complicates
traditional computer theory somewhat. But for most people using the Internet,
distributed processing is just not very common. For the most part, the transfer
of information is barely more sophisticated than accessing a hard drive or a
CD-ROM. Unfortunately, the most interactive areas of the Internet are those
designed to turn the Web into one big giant mail-order catalog.
You also talk a lot about bits, which are of course the means of
representing values in digital computers. What about as-yet-theoretical quantum
computers, which use quantum bits, or "qubits," to represent not just specific
values, but all possible values at once? If a computer can take all possible
values and perform calculations on them to yield all possible outcomes, what
does that say about information, or about what is real?
My main hope in writing Code
is that the reader comes away
with a really good feeling for what a bit is, and how bits are combined to
convey information. That's essential to understanding this digital era that
we've built for ourselves. Quantum computing would have been discussed in
had we decided to make it a longer book, but it's quite a difficult
topic. And of course, as you imply, quantum computing is probably nowhere close
to becoming an actual product!
I think qubits have major implications for parallel processing,
but attempting to extract metaphysical meaning regarding reality leads, I
think, to that type of anti-intellectual new-age pothead mysticism that passes
for science writing in some circles. Our notions of reality have been altered
much less by quantum theory than by the discoveries of Kepler, Copernicus,
Galileo, Newton, and Darwin.
Do you have any plans to write further about general programming
topics, perhaps about compiler theory? It seems like a logical next step, and
something your readers would enjoy.
Wherever my career goes at this point, it almost certainly won't
involve a book on compiler theory! I'm finding myself more interested in
writing books that would be found in the Science & Technology section of
the bookstore. These are the books I like to read and the subjects that