The Difference Engine – the Charles Babbage machine, not the steampunk novel – is a device for finding successive solutions to polynomial equations by adding up the differences introduced by each term between the successive input values.
This sounds like a fairly niche market, but in fact it’s quite useful because there are a whole lot of other functions that can be approximated by polynomial equations. The approach, which is based in calculus, generates a Taylor series (or a MacLaurin series, if the approximation is for input values near zero).
Now, it happens that this collection of other functions includes logarithms:\(ln(1+x) \approx x – x^2/2 + x^3/3 – x^4/4 + \ldots\)
and exponents:\(e^x \approx 1 + x + x^2/2! + x^3/3! + x^4/4! + \ldots\)
and so, given a difference engine, you can make tables of logarithms and exponents.
In fact, your computer is probably using exactly this approach to calculate those functions. Here’s how glibc calculates
ln(x) for x roughly equal to 1:
r = x - 1.0; r2 = r * r; r3 = r * r2; y = r3 * (B + r * B + r2 * B + r3 * (B + r * B + r2 * B + r3 * (B + r * B + r2 * B + r3 * B))); // some more twiddling that add terms in r and r*r, then return y
In other words, it works out
r so that it is calculating
ln(1+r), instead of
ln(x). Then it adds together
r + a*r^2 + b*r^3 + c*r^4 + d*r^5 + ... + k*r^12…it does the Taylor series for
Now given these approximations, we can combine numbers into probabilities (using the sigmoid function, which is in terms of
e^x) and find the errors on those probabilities (using the cross entropy, which is in terms of
ln(x). We can build a learning neural network!
And, more than a century after it was designed, our technique could still do it using the Difference Engine.