- When is the next ALPAC report due?
— Margaret King,1984, University of Geneva, Switzerland - Reproducible Research and the Common Task Method
— Mark Liberman, 2015, Simons Foundation Lectures
Machine translation has a somewhat chequered history — Margaret King
There were already proposals for automatic translation systems in the 30’s, but it was not until after the second world war that real enthusiasm led to heavy funding the unrealistic expectations. [King 1984]
The start of intensive work on MT is taken as being a memorandum of Warren Weaver. The problem of MT is likened to the problem of code breaking by Weaver. … If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation? — Weaver 1949. [King 1984]
Weaver’s memorandum led to a great deal of activity and eventually to the first conference on the topic, organized by Bar-Hillel in 1952.
At this conference, optimism reigned. Afterwards, at Georgetown University, L.E. Dostert started up an MT project with the declared aim of building a pilot system to convince potential funding agencies of the feasibility and the practicability of MT. This led in 1954 to the famous Georgetown experiment, a pilot system translating from Russian to English, which has hailed as an unqualitied success. During the next ten years over 20 million dollars were invested in MT by various US government agencies
, as stated by the widely cited quote from ALPAC. [King 1984]
Most of the systems developed were almost totally empirical: Syntactic analysis was only done at a local word-centred level, both so-called syntax and dictionary compilation were very narrowly corpus based. The problem of MT was perceived as being an engineering problem requiring clever programming rather than linguistic insight. The growing perception that brute-force was not enough came too late to save research in the US. In 1964, the US National Academy of Sciences set up an investigatory committee, the Automatic Language Processing Advisory Committee (ALPAC), with the task of investigating the results so far obtained and advising on funding. [King 1984]
The committee came to strong negative conclusion. The ALPAC report effectively killed MT research in the States, although some European projects survived. [King 1984]
In the years since ALPAC report, two trends can be distinguished: a). systems which still aim at *no significant human intervention, but accept pre- and post-editing, and b). interactive systems which aim primarily at being translators’ aids. [King 1984] Human-aided MT vs. Machine-aided human translation
How was the winter overcome?
Partially because the development of commercial systems renewed faith in the feasibility of MT, … above all because of the growing and pressing need for translation, research in MT has begun to revive. [King 1984]
When revived empiricism in the 1990s, we chose to reject the position of our teachers for pragmatic reasons. Data had become available like never before. We argued that it is better to do something simple than nothing at all. Let’s go pick some low hanging fruit. While trigrams cannot capture everything, they often work better than the alternatives. It is better to capture the agreement facts that we can capture easily, then to try for more and end up with less. [Church 2007 A Pendulum Swung Too Far] 确实很pragmatic了, 后来发现这竟然跟[Crummy MT]里面关于modesty的建议惊人吻合 (It is probably better to do something modest, than try to do too much and end up accomplishing too little.
)
Wayne’s emphasis on evaluation: Wayne restarted funding in 1986 that led to three decades of prosperity. Mark Liberman attributes the 1986 funding restart to a particular DARPA program manager, Charles Wayne, with a new idea to protect against ‘glamour and deceit’ — Evaluation.
1895, should DARPA restart HLT (human language technology)?
Charles Wayne, on loan to DARPA from the NSA, had an idea: He’ll design a speech recognition research program that
- Protects againt “glamour and deceit” because
– There is a well-defined, objective eval metric,
– applied by a neutral agent (NIST)
– on shared datasets- Ensures that " simple, clear, sure knowledge is gained", because
– Participants must reveal their methods to the sponsor and to one another
– at the time that the eval results are presented
[Liberman’s Talk 2015]
There would be a well-defined objective evaluation, applied by a neutral agent on shared datasets (many of which were distributed by the Linguistic Data Consotium). Though “it’s like being in first grade again — you’re told exactly what to do, and then you’re tested over and over”, Wayne’s idea eventually succeeded because “it worked”. It enabled funding to start because the project was glamour-and-deceit-proof, and to continue because funders could measure progress over time. [Church 2017 A tribute to Charles Wayne]
The recent revival in empiricism has been fueled by three developments. 1) computers are much more powerful and more available. 2) data have become much more available. Data-intensive methods are no longer restricted to those working in affluent industrial laboratories. 3) (perhaps most importantly) due to various political and economic changes, there is a greater emphasis these days on deliverables and evaluation. 于是形成闭环: Data collection efforts have been relatively successful in responding to these pressures by delivering massive quantities of data. [Church & Mercer 1993 Special Issue on CL using Large Corpora]
We are in a different world: Computers were so expensive that the market was limited to potential buyers that could afford them (mostly large enterprises and government). But now the buyers of what we have to sell has stepped forward to consumer business. We are in a new world order where government and enterprise markets have been eclipsed by consumer markets. [Church 2017 A tribute to Charles Wayne]
Science is different, (but not that different). Sharing data and problems a) lowers costs and barriers to entry, b) creates intellectual communities, c) speeds up replication and extention and d) guards against “glamour and deceit”, as well as simple confusion. [Liberman’s Talk 2015]