From the Inside Flap
This book puts under one cover the details of an elementary functionlibrary, covering the underlying mathematics as well as providingimplementation details, directed toward IA-64 architecture. Some of thematerial is difficult to find elsewhere, and some of it is scattered over avariety of conference proceedings and journals. The material should appealto readers with interest in elementary functions, as well as readersinterested in using IA-64 effectively. Part I discusses IA-64 architecturein detail, including motivation for the architecture. The description ofIA-64 is illustrated with extended examples chosen from numericalcalculation. Part II shows how to exploit IA-64 architecture in the domainof elementary functions. While the text emphasizes accurate computation, italso points to shortcuts in division and square root that may be of interestin graphics and other applications which heavily use short floating pointtypes. Most of the mathematical arguments are relatively elementary andshould be readable by anyone with an elementary calculus background.
This work is an outgrowth of the Precision Architecture Wide Word (PAWW) project at Hewlett-Packard Laboratories. Thearchitecture drew from prior experiences with very long instruction setarchitectures, particularly those at Cydrome and Multiflow, as well asPA-RISC (Precision Architecture - Reduced InstructionSet Computer). By the time I joined the project in 1992, much of thearchitecture had already been solidified. My architectural contributionsmainly dealt with floating point arithmetic, and I was also active inproducing a prototype compiler for Wide Word, which allowed many of thearchitectural ideas to be tested. PAWW later developed into IA-64.
One of my colleagues, Clemens Roothaan, had produced a library of elementaryfunction routines which exploited the software pipelining capabilities ofthe architecture. He was able to demonstrate routines which ran at speedsassociated with vector processors, but which did not sacrifice numericalaccuracy for performance. Over time, some of these algorithms werestrengthened to run faster, or produce even higher precision. We refer tothe software pipelined implementation as the vector library for theelementary functions.
My plan was to use the same algorithmic ideas to construct a very robustscalar elementary function library. My hope was that the fundamentalalgorithms could be implemented in the C language in such a manner that theywould yield closed subroutines, but would also be amenable to in-lining,after which they could then be software pipelined by the compiler.Eventually, Roothaan's handcrafted functions could be matched by thecompiler, which could also customize an elementary function to theparticular settings where it was invoked. This notion led to the in-lineassembly capability, which enables much finer control to be exercised overfloating point computation than is normally present in a compiler.
Eventually, I undertook to document these algorithms, indicating clearly themethods we had used, as well as the error characteristics of our algorithms.A fascinating by-product developed almost immediately: the act of writingclarified some of the fundamental processes that we were employing. New,faster algorithms were suggested by the text, and they replaced some of ourold techniques. This was especially true in the operations of division andsquare root, for which almost none of our 1992 algorithms survive in thistext. Logarithm also was markedly improved, and, as a by-product, theprecision of the power routine was improved. The trigonometric routines wereenhanced with "accurate A " argument reduction, and an improvedimplementation of the A and A addition formulas which preserveadditional precision.
This book describes a work in progress. Even now, new algorithms have cometo my attention from colleagues at Intel, and, as the greater programmingcommunity comes to use IA-64, I expect new, innovative developments toblossom.Peter Markstein
From the Back Cover
- Covers every major elementary function, including square root and division
- IA-64 architecture and Explicit Parallel Instruction Computing (EPIC), in depth
- By an active participant in HP's IA-64 development team
- For every professional building high-performance IA-64 server applications and operating systems
Optimizing code for the new IA-64 architecture.
"...a timely and valuable book. It will appeal to those interested separately or jointly in IA-64 and the elementary math functions."
William S. Worley, Jr., Distinguished Contributor, Hewlett-Packard Laboratories
In IA-64 and Elementary Functions: Speed and Precision, leading HP computer architect Dr. Peter Markstein introduces the IA-64 architecture and its breakthrough elementary math functions. This informationessential to the development of optimized IA-64 server applications and operating systemswas formerly available only in specialized journals, or not available at all.
Markstein first introduces the IA-64 architecture, the objectives that motivated its design, and the unique architectural features that can be exploited by developers of high-performance elementary function libraries, including software pipelining, instruction grouping, prefetching, predication, speculative execution, and explicit parallelism. He then introduces several techniques that lend themselves to software pipelining, which is exceptionally well supported by the IA-64 architecture and can lead to dramatic performance gains.
The book covers all major elementary functions, demonstrating how they can be implemented to deliver optimal IA-64 performance and accuracy. Among the functions covered: square root and division, which must be performed in software on the IA-64.
For professional computer scientists, system software developers, mathematicians, and anyone building high-performance IA-64 software, IA-64 and Elementary Functions: Speed and Precision will be absolutely indispensable.