A Tour of NTL: NTL Implementation and Portability

NTL is designed to be portable, fast, and relatively easy to use and extend.

To make NTL portable, no assembly code is used (well, almost none, see below). This is highly desirable, as architectures are constantly changing and evolving, and maintaining assembly code is quite costly. By avoiding assembly code, NTL should remain usable, with virtually no maintenance, for many years.

Minimal platform requirements

When the configuration flags NTL_CLEAN_INT and NTL_CLEAN_PTR are both on (this is not the default, see below), NTL makes two requirements of its platform, neither of which are guaranteed by the C++ language definition, but are essentially universal:

int and long quantities, respectively, are represented using a 2's complement representation whose width is equal to the width of unsigned int and unsigned long, respectively.
Double precision floating point conforms to the IEEE floating point standard.

NTl makes very conservative requirements of the C/C++ compiler:

it is assumed that the C compiler conforms to the original ANSI C standard,
it is assumed that the C++ compiler supports all of the language features described in the second edition of Stroustrup's book, minus exceptions, templates, and derived types.

The `NTL_CLEAN_INT` flag

The configuration flag NTL_CLEAN_INT is currently off by default.

When this flag is off, NTL makes another requirement of its platform; namely, that arithmetic operations on the type long do not overflow, but simply "wrap around" modulo the word size. This behavior is not guaranteed by the C++ standard, and yet it is essentially universally implemented. In fact, most compilers will go out of their way to ensure this behavior, since it is a very reasonable behavior, and since many programs implicitly rely on this behavior.

Making this "wrap around" assumption can lead to somewhat more efficient code on some platforms. It seems fairly unlikely that one would ever have to turn the NTL_CLEAN_INT flag on, but it seems a good idea to make this possible, and at the very least to identify and isolate the code that relies on this assumption.

The `NTL_CLEAN_PTR` flag

The configuration flag NTL_CLEAN_PTR is currently off by default.

When this flag is off, NTL makes another requirement of its platform; namely, that the address space is "flat", and in particular, that one can test if an object pointed to by a pointer p is located in a array of objects v[0..n-1] by testing if p >= v and p < v + n. The C++ standard does not guarantee that such a test will work; the only way to perform this test in a standard-conforming way is to iteratively test if p == v, p == v+1, etc.

This assumption of a "flat" address space is essentially universally valid, and making this assumption leads to some more efficient code. For this reason, the NTL_CLEAN_PTR is off by default, but one can always turn it on, and in fact, the overall performance penalty should be negligible for most applications.

Some floating point issues

Relying on floating point may seem prone to errors, but with the guarantees provided by the IEEE standard, one can prove the correctness of the NTL code that uses floating point.

Briefly, the IEEE floating point standard says that basic arithmetic operations on doubles should work as if the operation were performed with infinite precision, and then rounded to p bits, where p is the precision (typically, p = 53).

Throughout most of NTL, correctness follows from weaker assumptions, namely

basic arithmetic operations and conversion from integral types produce results with a relative error of 2^{-p + 1} (assuming no overflow),
multiplication by powers of 2 produce exact results (assuming no overflow),
basic arithmetic operations on integers represented as doubles and conversions from integral types to doubles produce exact results, provided the inputs and outputs are less than 2^p in absolute value,
if y/2 <= x <= 2y, then x-y is computed exactly.

Also, NTL allows the compiler to compute z = x/y as t = 1/y, z = t*x.

One big problem with the IEEE standard is that it allows intermediate quantities to be computed in a higher precision than the standard double precision. This "looseness" in the standard is a substantial impediment to creating portable software. Most platforms today implement the "strict" IEEE standard, with no excess precision. One notable exception -- the 800 pound gorilla, so to speak -- is the Intel x86.

NTL goes out of its way to ensure that its code is correct with both "strict" and "loose" IEEE floating point. This is achieved in a portable fashion throughout NTL, except for the quad_float module, where some desperate hacks, including assembly code, may be used to try to work around problems created by "loose" IEEE floating point [more details]. But note that even if the quad_float package does not work correctly because of these problems, the only other routines that are affected are the LLL_QP routines in the LLL module -- the rest of NTL should work fine.

Note that NTL does require that the IEEE floating point special quantities "infinity" and "not a number" are implemented correctly.

Implementing long integer arithmetic

There are three basic strategies for implementing long integer arithmetic.

The default strategy is implemented in the traditional long integer arithmetic package. This package is derived from the LIP package originally developed by A. K. Lenstra, although it has evolved quite a bit within NTL. This package uses no assembly code and is very portable.

The second strategy is to use the Gnu Multi-Precision Package (GMP) as a supplemental long integer arithmetic package. In this strategy, the representation of long integers is identical to that in he traditional long integer package. This representation is incompatible with the GMP representation, and on-the-fly conversions are done between the two representations (only when this is sensible). This strategy typically yields better performance, but requires that GMP is installed on your platform.

The third strategy is to use GMP as the primary long integer arithmetic package. In this strategy, the representation of long integers is in a form compatible with GMP. This strategy typically yields the best performance, but requires that GMP is installed on your platform, and also introduces some minor backward incompatibilities in the programming interface.

Go here for more details on the use of GMP with NTL.

Algorithms

NTL makes fairly consistent use of asymptotically fast algorithms.

Long integer multiplication is implemented using the classical algorithm, crossing over to Karatsuba for very big numbers. Long integer division is currently only implemented using the classical algorithm -- unless you use NTL with GMP (version 3 or later) as either a supplemental or primary long integer package, which employs an algorithm that is about twice as slow as multiplication for very large numbers.

Polynomial multiplication and division is carried out using a combination of the classical algorithm, Karatsuba, the FFT using small primes, and the FFT using the Schoenhagge-Strassen approach. The choice of algorithm depends on the coefficient domain.

Many algorithms employed throughout NTL are inventions of the author (Victor Shoup) and his colleagues Joachim von zur Gathen and Erich Kaltofen, as well as John Abbott and Paul Zimmermann.

Some of NTL's imperfections

NTL is not a "perfect" library. Here are some limitations of NTL that a "perfect" library would not have:

NTL is neither thread-safe nor re-entrant, and making it so would require a fundamental redesign.
NTL provides only a very crude form of error handling: print an error message and abort. For most NTL users, this is quite sufficient. The alternative would be to have NTL throw exceptions. Writing code that handles exceptions correctly is quite difficult. The easy part is throwing and catching exceptions. The hard part is writing code through which an exception can be safely and correctly thrown. Retrofitting NTL to throw exceptions at this late date would be quite difficult and error prone, and I do not think that there is much demand for it.
NTL does not release all of its resources. There are some routines which for efficiency reasons will allocate some memory and never give it back to the system, so as to avoid re-allocations on subsequent calls. The amount of memory "stolen" by NTL in this way is fairly reasonable, and I have heard no complaints yet about its effects.