Correct me if I’m wrong...
有错请纠正
 
THERE is often something sweet, intimate even, about couples who finish each other’s sentences. But it can also be a source of irritation, especially when they get it wrong. A similar irritation (minus the sweetness) is often felt by users of speech-recognition software, which still manages to garble and twist even the most clearly spoken words. Perhaps the solution lies in a more intimate exchange between user and software.
情侣间品味对方未说完的话语时经常有种甜丝丝的感觉,甚至会感到亲切。但这种半截话也可能惹恼对方,尤其是产生了错误解读的情况下更是如此。语音识别软件仍然在千方百计地窜改和歪曲甚至是用户最清晰的吐词,这同样会使人感到恼火(用户对软件当然不会有什么甜蜜的感觉)。解决这个问题的办法或许就只能是用户和软件进行更加密切的交流。
Modern speech-recognition programs do not merely try to identify individual words as they are spoken; rather, they attempt to match whole chunks of speech with statistical models of phrases and sentences. The rationale is that by knowing statistical rules of thumb for the way in which words are usually put together—an abstract probabilistic approximation of grammar, if you will—it is possible to narrow the search when attempting to identify individual words. For example, a noun-phrase will typically consist of a noun preceded by a modifier, such as an article and possibly also an adjective. So if part of a speech pattern sounds like “ball”, the odds of it actually being “ball” will increase if the utterances preceding it sound like “the” and “bouncy”.
现代语音识别程序不只是要识别单词,相反,它们通过统计用户使用句子和短语的模式试图能够识别成段的话语。其基本原理是通过运用统计学上的经验公式归纳出单词组合的方式,这是一个抽象的运用在语法学上的概率型逼近问题,当软件需要识别用户吐出的单词时,运用这种方法可以缩小搜索范围。当然这一切需要用户的配合。例如,名词短语通常由一个名词和前面的一个冠词构成,如一篇文章(an article )等,也可以是由一个名词加一个形容词组成。因此,如果一个声谱上语音的一部分听起来像是“ball”(球),而在这个发音前面有听起来像是定冠词“the”和“bouncy”(跳跃的)这样的声音,这个发音确实是“球”的可能性就增加了。
Although this so-called continuous speech-recognition approach has indeed improved accuracy, it is by no means infallible. Moreover, when it gets things wrong, it often does so spectacularly. The problem is that, as a direct consequence of this technique, the misidentification of even a single word can take the program off on a completely different path as it tries to predict what the rest of the sentence is likely to be.
这种所谓的连续语音识别方法虽然确实能够提高语音识别的正确率,但还做不到万无一失。此外,它一旦识别错误,展现在面前的文字往往令人啼笑皆非。采用这种语音识别技术,只要有一个单词被识别错了,程序就会根据这个单词去猜测句子的剩余部分,因而就会沿着一条完全错误的路径走下去,结果也就可想而知了。
Though such errors are inevitable, there may be a way to let speech-recognition programs take the pain out of making corrections. Per Ola Kristensson and Keith Vertanen, at the University of Cambridge’s Computer Laboratory, have developed a method of allowing speech-recognition programs to share their thoughts, as it were, with the user, in order to speed up the correction process. Their solution, called Parakeet, is a touch-screen-based interface for phones and other mobile devices, which not only displays the words, phrases or sentences that scored highest in the program’s statistical model, but also any close contenders. This allows the user to select alternatives easily, with a quick tap of the finger. More subtly, if none of the predicted sentences is entirely correct, yet collectively they contain the words that were spoken, the user can simply slide his finger across the appropriate words to link them up. 
虽然这样的错误无法避免,但要让语音识别程序去改错,则难度太大。现在有一种方法可能有助于解决这一难题。剑桥大学计算机实验室的珀•欧莱•克里斯特森(Per Ola Kristensson) 和基思•维特尼(Keith Vertanen)研究出一种方法,这种方法在以往版本的软件上已经采用过,就是让语音识别程序和用户进行人机对话以加强了解,使改错的进程得以加快。他们用这种方法开发的软件被称作“长尾鹦鹉”,使用了用于手机和其他移动通讯设备的触屏界面,显示在屏幕上的不仅有程序根据统计模型筛选出的单词、短语或句子,而且还包括了所有与它们近似的成份。这使得用户选择起来更加方便,手指轻轻一点,需要的词就选出来了。更精妙的是,如果程序先前猜测的句子不正确,而这些句子包含了用户所吐出的单词,用户就只需用手指在屏幕上滑过适当的单词将它们排列起来就可以了。
In a sense, all Parakeet is doing is allowing the user to see which alternative words or sentences the program would have predicted. The difference is that existing programs require the user to correct each word individually, from a drop-down list of alternatives, or else to retype or reutter the words. What is frustrating about this, says Dr Kristensson, is that more often than not the correct strings of words were recognised, but rejected by the speech-recognition program on statistical grounds. Parakeet makes them all available to the user.
在某种意义上,“长尾鹦鹉”就是将程序预测的备选单词与句子直观地展示在用户面前。过去类似的程序需要用户从一系列下拉菜单中选词来逐个纠正每个单词,要么就是重新打字或重新再说一遍。克里斯特森说,很多情况下是正确的单词串被识别出来,却被语音识别程序基于统计规则而舍弃了,这才令人沮丧。而“长尾鹦鹉”把所有的备选项都列出来由用户来做出选择。
The prototype uses an open-source speech-recognition program called Pocket Sphinx, developed at Carnegie Mellon University, in Pitturgh, but Dr Kristensson reckons it would be easy to apply the same approach to commercially available programs like Nuance’s Dragon. So far Dr Kristensson and Dr Vertanen have carried out only limited trials on a handful of people. Even so, these have achieved operating rates of around 22 words per minute—considerably higher than the 16 an average user can achieve using predictive texting. With the likes of Google, Nuance and Vlingo now offering mobile speech-recognition services for phones, and the development of entertainment systems and vehicle communication, such as Ford’s Sync platform, Parakeet may be flying into a growing market.
匹兹堡的卡耐基梅隆大学曾开发了一种叫做“袖珍斯芬克斯”的开源语音识别软件,“长尾鹦鹉”就是在此基础上进行研制的。克里斯特森博士估计将这种方法用于如纽安斯公司的“龙”这类商业软件的开发上并不难。迄今为止,克里斯特森和维特尼两位博士仅仅用十几个人对这套软件进行了有限的试验。但即使如此,这些试验还是取得了每分钟22个单词的识别速度,而以往采用预测下文的方法用户平均只能达到每分钟16个单词的输出速度,如此对比就可看出使用“长尾鹦鹉”后文本输出的速度要快得多。随着谷歌、纽安斯和Vlingo等公司纷纷给手机增加了语音识别功能,以及如福特的SYNC这样的车载多媒体通讯娱乐技术的发展,语音识别技术的市场前景无限广阔,“长尾鹦鹉”必将飞得更高