Microsoft Corp.
researchers in Beijing are using data mined from the Web to
enhance an online Chinese-English dictionary and language-practice service, a
technique that could one day be used in similar tools for anyone
learning any language.
很软公司(Microsoft Corp.)在北京的研究人员正在使用从网上搜集的数据,来完善一个在线英-汉辞典与语言练习服务。这种技术有朝一日可能会被用在类似的工具当中,供任何人学习任何语言使用。
Engkoo, at www.engkoo.com, is written in Chinese with two characters meaning 'English' and 'vault.' It has a core of professionally produced
translation data that Microsoft draws from sources such as existing dictionaries, which are used under licenses from their publishers. In Engkoo's database, that content is mixed with data that Microsoft finds through other means, including
sweeping the Web for sites with
parallel Chinese and English
versions. Microsoft machines align the two websites -- followed by their paragraphs,
sentences and individual words -- then
assign a quality ranking to the resulting
translation and file it away. Engkoo is a finalist in this year's Asian Innovation Awards.
这个服务名为"英库"(字面意为"英语文库",网址:www.engkoo.com),其核心是专业机构生产的翻译数据,由微软从已发行辞书等渠道获得,其使用获得了相关出版机构的许可。在英库的数据库里,这些内容同微软经由其他渠道(比如在网上寻找中英双语网站)获得的数据混合在一起。微软的机器把中文版和英文版网站及其段落、句子和单词进行匹配,然后对由此形成的译文进行质量排行,并存档。英库入围了今年的"亚洲创新奖"(Asian Innovation Awards)。
When an Engkoo user types a word or
sentence into the website's input bar, in either Chinese or English, the site draws on
statistics from its vault of data to
translate it. It also displays
samplesentences that use similar words and, in many cases, links to where it found them.
用户往英库的输入框里键入一个词汇或句子时,不管是中文还是英文,网站都会从数据库里提取资料,给出这个词汇或句子的译文。它还显示使用类似词汇的例句,在很多情况下还提供指向出处的链接。
Sourcing
translations from the Internet can help the database keep up-to-date with evolving language, such as new colloquial or
technical terms, the
researchers say. Engkoo users can also report
translations that look wrong. Human editors fix any serious errors and improve the technology where possible to prevent the problem from recurring. 'This is a
system that gets smarter over time,' said Matt Scott, a development group head at Microsoft Research Asia. 'We want
translation to
reflect the Web.'
研究人员说,从网上获取译文有助于数据库同不断变化的语言保持同步,比如新的口头语或科技术语。英库用户还可以举报看起来不对的翻译。如果有严重错误,会由编辑予以纠正,可以的话还会改进技术,防止问题再次发生。微软亚洲研究院(Microsoft Research Asia)一个开发小组的负责人斯科特(Matt Scott)说,这是一个可以变得越来越智能的系统,我们希望译文反映网上的情况。
Statistical machine
learning for
translation services is now widely
researched and is also used by some other websites, like Google Translate. But the
researchers behind Engkoo are also tapping other technologies to
expand their website's range of language-practice tools. For many English
samplesentences on the site, users can listen to audio dictations, which a machine
generates based on collected audio files of native English
speakers talking. The dictations are meant to
imitate human inflections, though their up-and-down swings don't yet match those of a natural voice.
让统计机器学做
翻译服务,目前受到了广泛的研究,也在谷歌翻译(Google Translate)等其他一些网站得到了运用。但英库的研究人员们也在利用其他技术来扩展网站语言练习工具的范围。网站上很多英语例句用户都可以听到语音朗读,而这些语音,则是由机器以所搜集到的语音文件(英语为母语的人士说话)为基础而生成的。语音朗读意在摹仿人声的声调变化,但实际上还赶不上自然语音的抑扬顿挫。
Microsoft's
researchers are also
working on a video dictation feature for Engkoo. The few videos already on the site were created in a similar way to the audio dictations -- by a machine
drawing on
sample video of an English
speaker talking. The goal is for users to be able to watch and learn from the lip movements of a native
speaker dictating any
sentence, even though each video is machine-
generated.
微软的研究人员还在为英库开发一个视频朗读功能。网站上已经有了几个视频,其创建方式与语音朗读功能类似,即由一台机器汇集一位英语人士讲话的视频片断。其目标是让用户能够观看英语为母语的人士在阅读时的嘴唇运动,并从中学习,尽管每一个视频都是机器生成的。
Since tongue movements are also crucial for
pronunciation but
normallyhidden from view, the
researchers are
gathering ultrasound data that could
generate a
parallel set of videos on Engkoo. One option is to turn the black-and-white ultrasound footage into more appealing
cartoon animation showing users how a native
speaker's tongue moves while
speaking a
sentence, said Frank Soong, a Microsoft Research Asia
principalresearcher.
舌头的运动对于发音也很关键,但常常看不到,所以研究人员正在搜集超声波数据,以便在英库上面生成一系列类似的视频。微软亚洲研究院主任研究员宋歌平(Frank Soong)说,一个办法是把黑白版的超声波录像转变为更加吸引人的动画,让用户看到母语为英语的人在说一句话时,其舌头究竟是怎样运动的。
Engkoo was launched last year and gets more than four million visitors a month, according to Microsoft. Microsoft
researchers are also developing a mobile Engkoo
application for phones
running a
version of Windows, and apps for other mobile operating
systems are also under
consideration, Mr. Scott said.
微软方面说,英库于去年上线,每月访客数量超过400万。斯科特说,微软研究人员另外也在开发一款英库手机应用软件,提供给装载微软操作系统的手机使用,用于其他操作系统手机的应用程序也在考虑之中。
The China
version of Microsoft's Bing search engine already links to Engkoo and the
researchers are talking with colleagues at Microsoft about other products where Engkoo could be integrated, said Eric Chang,
director of technology
strategy at Microsoft Research Asia.
微软亚洲研究院技术战略总监张益肇(Eric Chang)说,微软Bing搜索引擎的中国版"必应"上已经有英库的链接,研究人员们也在同微软的同事们讨论,看其他还有哪些产品也可以把英库整合进去。
Engkoo is free online. But a Microsoft Research spokeswoman declined to
comment on whether Engkoo mobile apps will be free or could
contain ads. People who use Engkoo may also use Bing services, which could
eventually help drive
advertisingrevenue for Microsoft.
英库在线版是免费的。但英库手机应用是不是免费,或者是不是有可能包含广告?对此,微软研究院一位发言人拒绝置评。使用英库的人可能也使用必应,而这最终有可能帮助微软提升广告收入。
Microsoft's
researchers plan
versions of Engkoo for other languages too, including Japanese and English. A
version for English
speakers
learning Chinese is also a goal, but the company's
research so far is focused on Chinese-to-English, Mr. Chang said.
微软的研究人员们还打算开发其他语言版本的英库,包括日语和英语。张益肇说,他们还有一个目标是推出一个帮助说英语的人学习汉语的版本,但目前公司的研究还是集中在中文和英文的转换上面。
Mr. Chang sees the mix of technologies being used by Engkoo as a step toward breaking down language barriers. Further technological advances may one day mean that, for
instance, an English
speaker in China could attend a university lecture in Mandarin and have no problem understanding the content.
张益肇认为,英库上面采用的一系列技术朝着打破语言樊篱的方向迈出了一步。随着技术的继续进步,到某一天,一个说英语的人或许就能够去中国的大学听普通话课,讲课的内容理解起来也没有障碍。
'You could be sitting there, and then your mobile phone's
actually doing real-time
translation,' Mr. Chang said. 'Technology can really help in terms of reducing barriers in the use of language. So Engkoo is
definitely a way for us to get more feedback on how people can
utilize technology to do that.'
张益肇说,你可以就坐在那里,然后你的手机实际上是在做
同声传译;科技真的可以起到减少语言使用障碍的作用,所以英库无疑是我们获得更多反馈的一种方式,看人们究竟可以怎样利用科技来减少这种障碍。