On the continuous history of approximation

The Difference Engine – the Charles Babbage machine, not the steampunk novel – is a device for finding successive solutions to polynomial equations by adding up the differences introduced by each term between the successive input values.

This sounds like a fairly niche market, but in fact it’s quite useful because there are a whole lot of other functions that can be approximated by polynomial equations. The approach, which is based in calculus, generates a Taylor series (or a MacLaurin series, if the approximation is for input values near zero).

Now, it happens that this collection of other functions includes logarithms:

\(ln(1+x) \approx x – x^2/2 + x^3/3 – x^4/4 + \ldots\)

and exponents:

\(e^x \approx 1 + x + x^2/2! + x^3/3! + x^4/4! + \ldots\)

and so, given a difference engine, you can make tables of logarithms and exponents.

In fact, your computer is probably using exactly this approach to calculate those functions. Here’s how glibc calculates ln(x) for x roughly equal to 1:

  r = x - 1.0;
  r2 = r * r;
  r3 = r * r2;
  y = r3 * (B[1] + r * B[2] + r2 * B[3]
    + r3 * (B[4] + r * B[5] + r2 * B[6]
        + r3 * (B[7] + r * B[8] + r2 * B[9] + r3 * B[10])));
  // some more twiddling that add terms in r and r*r, then return y

In other words, it works out r so that it is calculating ln(1+r), instead of ln(x). Then it adds together r + a*r^2 + b*r^3 + c*r^4 + d*r^5 + ... + k*r^12…it does the Taylor series for ln(1+r)!

Now given these approximations, we can combine numbers into probabilities (using the sigmoid function, which is in terms of e^x) and find the errors on those probabilities (using the cross entropy, which is in terms of ln(x). We can build a learning neural network!

And, more than a century after it was designed, our technique could still do it using the Difference Engine.