The Future of Computational Linguistics: On Beyond Alchemy
— Kenneth Church & Mark Liberman, 2021
1950s empiricism: Info Theory AI as applied statistics
1970s rationalism: formal language theory and logic
1990s empiricism: stochastic grammars (probability & preference)
2010s: empiricism: deep nets
The Past
- CL is an interdisciplinary topic that has been closer to Linguistics at times, but is currently closer to CS (Enginnering), and especially ML.
- To understand the future of our field, we need to understand its past, which we will describe as tribes of researchers migrating through a changing conceptual and socio-economic landscape: In the history of AI, sometimes logic and probability co-exist in harmony. The conceptual split implicit in Boole’s conjunction “Theories of Logic and Probabilities” (in his seminal 1854 book in mathematical logic titled “An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probability”) foreshadowed much of what was to come. A century later, Claude Shannon made two crucial contributions, one on each side of the divide. On the logic side, his 1937 Master’s thesis, which made digital computers possible by translating expressions of Boolean logic into electrical-mechanical terms, was titled “A symbolic Analysis of Relay and Switching Circuits”. On the probability side, his 1948 monograph “A mathematical theory of communication” introduced info theory based on the assumption that msgs are sequences of symbols generated by a stochastic process.
- Logic played a larger role when rationalism was in fashion, and probability played a larger role when empiricism was in fashion, and both logic and probability faded into the background as deep nets gave procedural life to an accociationist (rather than statistical) flavor of empiricism.
- The pendulum paper predicted that the next generation would soom rebel against us. Instead of a return to Rationalism, though, the rebellion took an unexpected turn with the revival of *Connectionism.
- Actually, the third concept, Connectionism, arose in Ronald Fisher’s 1936 paper “The use of multiple measurements in taxonomic problems”, which proposed a method for dividing multivariate measurements into categories based on thresholding a linear combination of the measurement vectors. (Linear Discriminant Analysis). 至此正片开始:Generalizations of this idea — networks of matrix multiplications interconnected via point non-linearities — have risen to prominence under a variety of names: “perceptron”, “neural nets”, “parallel distributed processing”, “connectionism”, and most recently “deep learning”. Fisher’s contributions have been extremely influential, and continue to live on to this day, though the 1936 paper was published in a journal on Eugenics, a topic that has fallen out of favor.
- It was an extension of Fisher’s discriminant analysis that succeeded in the neural nets’ arena, starting with Frank Rosenblatt’s 1958 paper: “The perceptron: a probabilistic model for information storage and organization in the brain”. The phrase “probabilistic model” in Rosenblatt’s title seems to mean only that most of the values used in l