Renaming AI

Published:

The phrase “artificial intelligence” has long been a pet peeve of myself and many of my colleagues.

To make matters worse, AGI is now being co-opted to mean “improved AI” or essentially ChaptGPT 2.0 (take a look at Q*). So I look forward to the unveiling of Ultra Souped Up Amazing Most Honorable AGI in 2030…

In the meantime, however, I have yet to find a better description of the 2010s and 2020s-era deep learning craze than “differentiable computing.” Now, why not just stick with “deep learning”? I think there’s a more abstract, powerful idea hiding inside that phrase. The notion of a computer that works in continuous function space instead of the discrete space that dominated 20th century computing/algorithms is impressive enough to divorce itself from the boundaries of biological comparison. If we built an airplane, why would we complain it wasn’t an eagle? “AI” probably made sense in the days of Rosenblatt’s perceptron when we simply had no idea what “true” AI would be. We could afford a sense of optimism.

But the goalposts kept shifting past the 1950s. As smarter people than I have already said, when it works, it’s no longer “AI.” Chess is the model of true intellect…until our computers grow large enough and an undergraduate can program a world-class chess solver. Then it’s language, then it’s XYZ, and so forth. No single task or algorithm is likely to be able to define “intelligence.” That way of thinking doesn’t even apply to humans. Some geniuses are undeniably talented in one arena, and totally crazy in others.

So being pedantic and rejecting “AI” makes sense in the absence of a possible definition. We need to know where we are if we have any chance of getting to human-level generality and competence.

First, let’s disqualify the second best title I could come up with (perhaps you have others).

Computational statistics.

Certainly I believe this accurately encapuslates a large swath of ML/AI. Statistical models, traditionally speaking, are of a general form:

“y(x) = f(x) + eps”

Here “f(x)” is some underlying “true” or deterministic generating function and “eps” is some noise model. By making some assumptions on the noise or underlying model, we can analytically determine what statisticians care about most: does the estimator converge asymptotically? Is the estimator mostly clustered around the true statistic? Consistent/unbiased/etc.? But deep learning is mostly indifferent to these questions at present–not even perhaps out of disinterest but of analytical difficulty (though, shameless plug, my previous lab is working on similar questions). And “computational” at this point is too vague to be helpful. It captures the algorithmic spirit of the 2020s “AI.” But when paired with a discipline whose most useful purpose is modeling, “computational statistics” reads like the slight oxymoron “algorithmic models.” To me, it sounds like an interated procedure to find an estimate of parameters of an assumed probability distribution, something like Monte Carlo estimation. And some deep models do this as a subroutine, but it’s too specific of an answer to capture the real spirit of why any of this stuff works.

Of course, the real powerhouse of modern deep learning is backpropagation. Is it biologically plausible? No. And there are some folks like myself who work on neuroAI that this bothers. But it’s no particular impediment to learning, much like bolts don’t seem to prevent planes from flying less than if they had tendons or ligaments. It seems unlikely to evaporate in the next decade of ML research, and it’s only one specific way to traverse a loss function. The world of continuous functions turns out to work for a lot of hard tasks. As a mathematician/statistician that hates working with discrete probability distributions, it’s possible to let a lot of difficult discrete problems wash away with continuous optimization. My thesis (unlinked because I hate it) arises from the idea that the minimizing the L_0 norm of a vector “x” (i.e. its nonzero elements) subject to a linear system “Ax=b” is an NP-hard problem that is well-approximated, up to some constant that only depends on “A”, by a simple convex programming problem. Amazing!

There are plenty of problems like this. And each is a matter of many different fields: computational, numerical, and mathematical. A logical name for a field describes the skills required for mastery of that field, or how it naturally evolved from its predecessors. “AI” didn’t emerge from neuroscience or psychology labs, and “AGI” is unlikely to emerge from such labs as long as they take the approach of trying to correlate artificial representations with biological ones (I see a LOT of these papers). And to call it pure CS is wrong, too, because deep learning takes so many ideas from other fields. Calculus, linear algebra, and algorithms live inside the phrase “differentiable computing.”

Will backprop get us to AGI? I don’t know. Maybe? Probably not. I don’t know what I don’t know.

Airplanes won’t get us to Mars, but the engineering challenges of mass producing airplanes is likely to be a subproblem that you have to solve in order to get to Mars. Similarly, stacking a bunch of weird, somewhat arcane matrix multiplications on top of each other may not be “the way” to replicate human consciousness. But the idea of tiny refinements to some loss function (whether that’s global, local, or multiscale) spread out across a network of abstract, computational units? That seems groundbreaking to me. It’s not how we’ve typically thought of a computer.

So I’ll try to stop rolling my eyes when I see a researcher sell “XYZ task we’ve done for 25 years…but NOW with deep learning!” It’s not the instance of the principle that matters, it’s the higher level paradigm. Differentiable computing already describes a lot more than deep learning, and I predict that whatever subsumes backprop is likely to maintain principles of traversing smooth functions. Such functions are such a fundamental, basic mathematical object that it’ll be hard to supplant without inventing entirely new branches of mathematics. Just, please, before that happens in another 50 years, can we stop saying AI? Please?