Amazon.com Interview: Larry Wall

The father of Perl talks about XML, Unicode, the Win32 port, and the philosophy behind the language.

Larry Wall, the father of Perl, stopped by Amazon.com's Seattle offices to field questions about XML support, Perl's Unicode implementation, the Win32 Perl port, the ActiveState visual Perl debugger, and the guiding philosophy behind the language.

Amazon.com's interviewers were Douglas Beaver, Jennifer Buckendorff, Alex Edelman, Tom Mace, Cristina Vaamonde, and Alex Yan. Wall was joined by Gina Blaber from O'Reilly Software.


Amazon.com: Thanks for taking the time to visit Amazon.com. It's no secret that we're a big Perl shop, and you've got a lot of devoted fans here to grill you. So what brings you to Seattle?

Larry Wall: I came to give a talk at the XML conference. The basic gist of what I'll be saying is that Perl has always been pretty much the language of choice for text processing, but what's been happening in the last 10 years is that the definition of text has been changing out from under Perl. It used to be that it was just ASCII, and then we got these various national character sets that were eight-bit, and Perl supported those rather easily. But with Unicode, and now with XML, we have definitions of wide characters that are pretty much universal and definitions of structured text that look like they will become pretty universal. So Perl really needs to support those. That's my project this year: making sure that happens.

Amazon.com: Do you see a clear implementation, or is this still at the exploratory stage?

Wall: It's actually pretty clear. As far as Unicode support goes, Perl will, at least for the initial cut, take Plan9's Unicode implementation. Essentially, they just said, well, that most of our interfaces to the real world are still going to be ASCII with a little bit of Unicode thrown in. It makes a whole lot of sense if our internal representation is not wide characters but actually a thing called UTF-8, which is a flexible representation, so ASCII remains single eight-bit characters and then the higher-order characters become longer sequences of bytes. So Perl's initial implementation of Unicode will do the same thing. Which makes a lot of sense, actually, in a text-processing language that is dealing mostly with flexible text. Regular expressions really don't care what the boundaries are, if they come on even or odd boundaries. It may be that down the road a ways, there may be some architectures that are natively wide characters, where it would make sense to have a Perl that uses wide characters natively, internally, but I'll cross that bridge when I get to it.

Amazon.com: Is the syntax for regular expressions clear in a Unicode implementation?

Wall: Yeah, it should be, pretty much... I mean, the only thing that's strange about the implementation of regular expressions is that the character classes suddenly get a lot bigger, potentially. If you're just looking for eight-bit characters and you have a table inside that has 256 entries, you just do a direct look-up. You can't really afford to do that if there are potentially 65,000. But there are ways of having multiple-level tables to deal with that sort of thing. As far as representation goes, the representation of the Perl script itself will be UTF-8, so there are already ways of specifying in the text of your program if you've got a higher character. You know, how that looks to the programmer depends on whether you've got a UTF-8-aware editor or a Unicode-aware editor. If you've got an editor that spits out 16-bit Unicodes, then what we can do is use Perl's facility called "source-filtering" so we'll recognize that if it's a pure Unicode file, we'll just translate it to UTF-8 coming in. But it should all be pretty straightforward. There are several goals to putting Unicode support into Perl. First of all, old scripts must not break. That is, they continue to work exactly the way they have if you don't change them. Also, old scripts must be very easily convertible to scripts that will deal with Unicode, hopefully just by a single declaration of a file that says "use UTF-8" or something like that. And then the same Perl code that you have right now will just magically start working with the UTF-8 code. There are a few little glitches in there having to do with how pack and unpack work. And if you actually want to match against specific Unicode characters, then you'd have to, of course, modify your program. But by and large, you should be able to make it very easy to make the transition. And of course, as additional goals, you'd like to keep Perl running as fast as, if not faster than, it's already running, and you'd like it to still stay as intuitive as possible. That's basically the Unicode story.

Amazon.com: Do you see Perl being part of the XML standard or simply being in friendly cohabitation?

Wall: Perl has always lived in the interstices. It's too fluid an entity to be stapled down to anything so solid as a standard. It's not that I don't think standards are valuable--I think they're very valuable--but Perl's abilities complement things that are legalistic. Perl sort of goes the other way and is a little bit libertarian. And so if you want to look at the things that Perl works best at, it's actually the places in the world that are most restricted in various ways, and then Perl gives you a way to get from here to there. You expect a glue language to glue things together, not just to glue glue to glue. You glue it to other interesting things that aren't like glue. People ask me if Perl will ever be standardized, and I say that people are free to do whatever they want--after I'm dead.

Amazon.com: No standard as long as you're with us?

Wall: Well, you know, it's unlikely. I'm not going to rule it out entirely. You know, I might live a long time! But I don't see a need for it yet. One reason we don't need a standard for it yet is that we've explicitly embraced the idea that the implementation is the standard. It's very important to us that there be only one implementation of Perl. One of the fallacies of language design is that the specification is good enough. You know, that happened to Ada. It's happening to Java. And no matter how good your specification is, you get divergence, which we're starting to see in the split in the Java world. In the Perl world we started to see a split between the Windows version and the Unix version last August after the first Perl conference. We jumped on that and got all the principals in the same room and knocked their heads together and said, "We're going to have one Perl here."

Amazon.com: It wasn't that painful? They were willing?

Wall: Yes, they were willing. And so version 5.005 of Perl will be a single-source code, basically.

Amazon.com: Can you describe how you keep on top of the implementations and enforce the uniformity across all the platforms?

Wall: I'm not sure "enforce" is quite the right word there. There is a mailing list called perl5-porters, and I rely on them heavily to do the porting of their architectures. And they actually do much more than porting. They're the chief forum for discussion of changes to the language and how to reconcile semantic differences between various operating systems. It's interesting how the governance of the Perl community has evolved over time. It has actually turned out to be somewhat like the United States federal government. It used to be, way back in the Dark Ages, that I just ran the whole thing. I was Mr. Perl--judge, jury, and executioner. But these days the perl5-porters mailing list serves as the legislature. And we have an executive--which is not me, actually. It's whoever is currently the integration manager, essentially--the patch manager. We call that person the Patch Pumpkin Holder. That's the executive, and so that title moves from person to person, and that leaves me to be the Supreme Court. So I get to rule on what's constitutional or not. In fact, there are three or four appeals outstanding that I have not ruled on yet. After I'm done with this XML conference, I will be taking under consideration some of the things that people have put under my nose.

Amazon.com: Is the Win32 port particularly problematic?

Wall: Well, it's been problematic in that there was the source-code divergence, which we have now converged. For the most part, though, the semantics of what will or won't work cross-platform are pretty obvious. You're just not going to get "fork()" to work on Windows unless you happen to be in the Posix universe--in which case nothing else works. But Windows itself has not really been a big problem that way. I mean, there are some cultural differences, sure. And one of the things I had to rule on lately was how we deal with line endings that appear not to be from our system. At the moment, with the current Perl that's out there, if you're on a Unix system and it sees a line ending with a line feed like a Windows system, it blows up and says, "You didn't convert the script to Unix format." That's really kind of antisocial, because, you know, a lot more people are using remote NFS or Samba file-sharing setups, and the same file needs to run on both Windows and on Unix systems, not to mention on Macs. So I decreed that, essentially, any Perl ought to accept any line ending consistently. It really ought to do the right thing, whatever the right thing is. There are some arguments against doing that, but they're weak, so that's why I made that rule.

Gina Blaber, O'Reilly Software: Also, every two weeks or so there's a Win32/Unix teleconference. Larry's in on it, and I'm in on it, and Dick Hardt in Vancouver, and a couple of people from the U.K. and Germany. So basically there are a number of people who are more from the Unix Perl world, and then Dick Hardt, who is from more the Win32 Perl world. We started doing these teleconferences back in September to make sure Win32 Perl really does get integrated back into core Unix Perl. It meant putting aside all the cultural and philosophical differences and just getting down to technical stuff.

Wall: I'd say that we could not have done this without the support from O'Reilly. We used to have the commercial software community, and then the freeware community came along, which was sort of an overreaction, in some ways, to the commercial community. There's always a kind of natural tension between the two. But at the current time, a lot of people are trying to find models where the commercial and freeware communities can cooperate. That's why I hired on with O'Reilly two years ago--seems like just yesterday--because Tim O'Reilly was interested in looking for those cooperative symbiotic models. Before that, the commercial people looked at the freeware people and the freeware people looked at the commercial people and each thought the other was a parasite.

Amazon.com: They were both right!

Wall: Yeah. But if it's mutual parasitism, that's symbiosis! So it's been encouraging to me to actually see this working out in practice, with the way that O'Reilly can bring their strengths to helping the freeware community get its act together. This recent Netscape announcement [of the publication of Navigator's source code] is sort of... One has mixed feelings about it, because we've been preaching this freeware, cooperative gospel for a couple of years now, and nobody has much been listening. Now Netscape comes out with one little announcement, and suddenly everybody wants to know, What's this freeware thing? But that's good, and it's healthy, and anything that fosters more discussion and that frees up more people to pursue different models, I think that's great. When I first started out, you really couldn't do that. You had to go to your company and say, "Can I distribute such-and-so?" and they would send it to their lawyers, and their lawyers would sit on it for six months, and then they'd come back and say, "No." And then you'd say to the company, "Well, are you going to sell it?" And they'd say, "No. We're just going to put it on a shelf, because we might sell it someday." Which is for the birds, you know? So the only choice I had back then was just not to ask. It's better to ask forgiveness than permission [Laughs ]. It also helps to be the sort of person that doesn't make enemies, because then nobody wants to get you in trouble for it, especially if you're in the good graces of your immediate boss. As long as they know what you're doing, then you're probably fine. That's the way it was back then, the only way you could do the freeware.

Amazon.com: "Back then" was when?

Wall: Well, 10 to 15 years ago. Now, with the Netscape announcement and other publicity that freeware is getting, it's now possible for someone to come to their manager and actually make a case that the best model to release some particular piece of software under is a freeware model, that there are other ways to make money on freeware than just by selling the freeware.

Amazon.com: How has this worked with O'Reilly?

Wall: O'Reilly is sort of a one-of-a-kind publisher, and I guess maybe I'm sort of a one-of-a-kind guy, too. But I did an awful lot of really careful work in introducing people to the notion of commercial cooperation with the free model. Like I say, I'm not the sort of person who makes enemies, and part of that is realizing what things I might say that people would take out of context and want to fight over, when it's something I'm really not interested in fighting over. So I eased into it carefully, and in a sense I'm almost grateful that Netscape waited as long as they did to make their announcement, because now the world is readier for some of these concepts. In April we're going to be having a Open Source summit at O'Reilly's headquarters in Palo Alto [California]. Linus [Torvalds] and various other freeware authors will be there. You know, we really don't have much of a set plan for what we're going to do. We're just going to get together and see what happens. And that's not to say that every freeware project has to use the same model. Different kinds of programs need different kinds of models. Perl, being a language as well as a computer program, needs a certain kind of design oversight that something like Apache doesn't necessarily need. So Apache can get away with an oligarchy; Perl needs more of a monarchy.

Amazon.com: Says the monarch! [Laughter ] How about the state of tools on the Win32 side? Windows programmers are used to having groovy visual IDEs and all of that. Are you involved at all with people developing such tools for Perl programming, in particular the visual debugger?

Wall: If I had thought you were interested in that, I would have brought it [the ActiveState visual Perl debugger] to show it off to you! Yeah, it's pretty slick. I think that particularly in the Windows space is a good place for people who want to build useful tools that, you know... Windows users are not used to dealing with freeware, and they'd almost rather buy something, something on a CD that they can download with a credit-card number or whatever. And so it's almost doing them a favor to charge money for a tool like that. [Laughter ] My involvement with the debugger has been only in a controlling role at this point. But the visual debugger that ActiveState has is actually based on the regular symbolic debugger, the line-oriented debugger, which is written in Perl. But what they've done is, instead of talking to the command line, they've just opened up a connection to some operating-system objects so you get these snazzy windows. You can move around and redefine what to do and have all the bells and whistles of a visual debugger. The fact that it's running a Perl script which is running a Perl script is sort of transparent.

Amazon.com: Do you see the Unix side of the world working on something like that?

Wall: I think that would be cool. I don't think I have time to do that. You want to do it?

Amazon.com: What about compilers? Are you involved at all in compiler development?

Wall: Not a whole lot. I've played with it. Malcolm Beattie has been doing that, and the compiler will be a part of version 5.005. I actually smile when I say "compiler," because Perl has always had a compiler--it just compiles down into an internal form and then interprets from that. What people mean by compiler , though, is a back end that will spit out C, which you can compile down to machine language.

Amazon.com: Do you think Perl is going to become a standard administration language for Microsoft Windows NT? Something that it's really never had?

Wall: I think it's well-positioned for that to happen, yeah. But Basic is pretty religious at Microsoft. We are officially not competing with them.

Amazon.com: What do you think about Perl as a beginner or occasional language?

Wall: One of the ways in which Perl is like natural language is that many different levels of competence are acceptable. We don't expect a 5-year-old to speak with the same diction as a 50-year-old. So why do we expect computer programmers to learn an entire language before they start doing anything? So Perl is specifically designed so that you can use whatever subset of it you know is going to be useful.

Amazon.com: Do you have any fantasies about where you want the language to go?

Wall: I'm kind of thinking that Perl might stick at version 5 like Unix did. In a sense, there have really only been two versions of Perl. Perl versions 1 through 4 were all sort of just an evolutionary path. At the end of Perl 4, I realized it was time to scrap the prototype and rewrite it. So Perl 5 is really pretty near a total rewrite. And at that point I realized it was both my first and last chance to do it right, so I put a lot of effort into defining an architecture that would be extensible and scalable and all the good buzzwords. Maybe even respectable. That was a big deal, and I don't really want to go through it again anytime soon. When Perl 2 was out there, it was just a text-processing language. It didn't handle binary data. And I said, you know, if I make Perl handle binary data, who knows where it's going to stop? Well, Perl version 3 handled all binary data, and who knows where it's going to stop? But the reason I decided to do that was, I realized both a specific thing and a general principle from it. The specific thing was that there are a lot of problems out there that are 95 percent text and 5 percent binary data. And if you make it merely possible to deal with binary data, not necessary even easy, then you increase your potential problem domain by more than twice as much. And the general principle is sort of the secondary Perl slogan. The primary Perl slogan is: "There's more than one way to do it." The secondary Perl slogan is: "Easy things should be easy, and hard things should be possible." So text processing is what Perl really tries to make easy. But as a glue language it also has to make hard things possible. My project last year was hooking Perl up to Java. My project this year is hooking Perl up to XML. Doing the Unicode things. So we want everything to be possible and some things to be easy. And I actually want to get past making XML and Unicode possible and make those easy, because as I think I mentioned earlier, I think the definition of what text is has been changing out from under Perl over the last 10 years, and I want to make sure it keeps up with the definition of what text means.