When you need to keep a secret, it's all in the numbers. The prime numbers.
Number theory is the basis of many cryptosystems these days. The famous RSA public-key cryptosystem is based on computing a number that is the product of two large primes. And because that number is hard to factor, your secret is safe.
When I was a kid, I became interested in numbers. It started when I read The Lore of Large Numbers, by Phillip J. Davis, first published in 1961. There was also Mathematical Recreations and Essays, by H. S. M. Coxeter and W. W. Rouse Ball. This led to unusual things like memorizing pi to 50 places, so I could recite it at any time. And I am sure I impressed my parents to death by reciting it. I can still hear the cadence of the digits in my head. Pretty soon I became interested in computation.
I often used to ride my bike to the San Jose State Math Library to pore through its stacks of math periodicals. I was especially interested in the Journal of the Mathematics of Computation (known as J. Math. Comp. in the biz). You can access the back issues of this journal online.
Yeah, to say that I was a nerd might have been understating the issue! But my obsession led to an interest in computation and, in particular, such things as prime numbers, factoring, and continued fractions.
Also, remember that this obsession happened in the late 1960s, before I even touched my first computer.
You can think of prime numbers as the basis set for all integers greater than 1 because every such integer can be formed by the product of primes. Kids first get exposure to prime numbers in the fourth grade (or earlier) when they learn to reduce fractions.
You probably know that the prime number sequence starts with 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, ...
The prime number sequence thins out as you get to larger and larger numbers. This is obvious because with larger numbers, there are more and more factors that can possibly divide a number. The prime number theorem states that, as x gets larger, the number of primes less than x generally approaches x/ln(x). Note that ln(x) is the logarithm base e (2.71828...).
In any case, there are a lot of primes. There are, for example, 50,847,534 primes less than a billion. You can easily prove that there are an infinite number of them.
Meeting a Number Theorist
In the summer of 1972, I was in a program at UC Berkeley for promising high-school-age mathematicians, sponsored by the NSF and chaired by Professor Frantisek Wolf. Having studiously read J. Math. Comp. for years, one of my idols was Derrick H. Lehmer, one of the world's foremost number theorists. Professor Lehmer was at Berkeley at the time. One day Paul Gootherts and I worked up our courage and went up the his office in Evans Hall and introduced ourselves to him.
Imagine my surprise when I found that he and his wife Emma were packing up his office! He was retiring that very week. So I struck up a conversation with him. It seems that we were both looking at continued fractions and their use in producing approximations to square roots using irreducible rational numbers. His interest was in factoring theory. Apparently constructing the continued fraction for the square root of a number was useful in producing factors near to the square root of a number, which was in turn useful in a certain primality testing method he was researching.
I accompanied him to the computer lab in the basement of Evans Hall and watched as he loaded a huge deck of card into an IBM card duplicating machine. When it failed and destroyed some of his cards, he exclaimed, "oh, lord! it's eaten my cards!". Of course, no one was more passionate about the use of computers in mathematics than Lehmer.
His wife Emma was apparently from Russia. I noticed that she had dozens of Russian volumes on the shelves in their joint office. I also spoke with her because I had read her paper on repunit primes. Repunits are decimal numbers consisting only of a long chain of 1's. An obvious factor of 10^n - 1.
They were surprised to say the least to find a 16-year-old kid with such an interest in number theory.
My friend Paul Gootherts had implemented a multi-precise arithmetic package in Fortran that we had run many huge calculations on, using a reasonably-fast Control Data Corporation computer that he had timeshare access to. So there was plenty to show and talk about. We had brought our books of calculations. I think we had calculated the cube roots of the first fifty primes to a thousand places each.
When, for a given integer n, four numbers of the form 30n+11, 30n+13, 30n+17, and 30n+19 are all prime, they are collectively called a prime quadruplet. This is a very dense arrangement of prime numbers. Some ready examples are (11, 13, 17, 19), (101, 103, 107, 109), and (191, 193, 197, 199). I took a particular interest in this arrangement after looking at D. H. Lehmer's table of primes under 100,000. With my number-obsessed vision, they practically jumped out at me. Hell, they still do.
When I went to Caltech, I spoke with resident number theorist Marshall Hall about writing a program to find prime quads, which I did. It took advantage of the fact that the center number 30n+15 was a single number that you could subject to a barrage of tests. Divide it by a prime and compute its remainder. That remainder cannot be one of four values, otherwise one of the four numbers in the quad will be composite. It was an exceptionally fast method of determine whether a prime quadruplet seems interesting enough to try to factor all four numbers.
Because of my interest, Hall introduced me to the Markov Spectrum.
Factoring and Primality Testing
The dumbest way of factoring a number is simply to divide it by all the prime numbers less than or equal to its square root. But there is an easy way to speed it up: divide the number by another number that is the product of n primes. Then the remainder, which you design to be significantly smaller than the original number, may be divided by each of the n primes in turn, looking for a zero remainder. This process makes the search about n times faster. This is because the remainder is much smaller and thus the divisions into the remainder will take less work. You can organize this into a binary tree to speed it up even further.
These days, a number field sieve may be used to factor unusually large numbers. But, I suppose that you can choose a thousand-digit number and it simply cannot be factored on any known hardware.
In order for this technique to be useful, it is not enough that factoring is hard, but determining primality must be easy also. And it does appear to be easier than factoring, and these advances have occurred in the last ten years. In 2002, Agarwal, Kayal, and Saxena (AKS for short) found a polynomial-time primality test. It has since been improved by Bernstein and Lenstra. Unfortunately, even an improved AKS algorithm still takes about a year to determine primality of a 100-digit number.
But practically, a randomized algorithm such as the Rabin-Miller method for compositeness testing is, on the average, much faster. The Jacobi Sums algorithm is very practical, having been used to determine primality of exceptionally large numbers (up to 3395 digits). The Elliptic Curve Primality Proving (ECPP) algorithm has been used on a 15071-digit number, with a running time of 5.1 GHz-years. Of course, it can be distributed to multiple processors to speed it up. On ECPP, a 100-digit number takes about 2 seconds to prove primality on the average. Much faster than one year! The Rabin-Miller algorithm is feasible on numbers with a million digits, but remember that it can only produce a certificate of compositeness. But it is fast: a 300-digit number takes about 23 GHz-seconds to certify compositeness.
So, people have been combining the approaches, by doing a fast compositeness test in parallel with a primality test. In this way, the running time of primality testing (with an early-out for a composite number) is minimized.