GIZA++双语介绍_双语阅读_双语文档

GIZA++ is a freelyavailable implementation of the IBM Models. We need it as a initial step to establish word alignments. Our word　alignments are taken from the intersection of bidirectional runs of GIZA++ plus some additional alignment points from the union of the two runs.
　　Running GIZA++ is the most time consuming step in the training process. It also requires a lot of memory (1-2 GB RAM is common for large parallel corpora).
　　Giza++是从双语句对的两个方向进行迭代学习的，模型训练的绝大多数时间都被这两个方向的迭代训练占去了，所以，Moses提供了第一种简单的解决办法：并行训练（Training in parallel):
　　在训练时加上 –parallel选项，这样训练脚本将被fork，Giza++的两个方向的训练将作为独立的进程。这是一台多处理器机器上的最好选择。
　　Using the –parallel option will fork the script and run the two directions of GIZA as independent processes. This is the best choice on a multi-processor machine.
　　如果你想在单处理器上使用并行运行两个Giza,可以使用下面的方法（我觉得单处理器上没必要使用这个方法，即使使用了效果也应该和正常的方法一样，时间节省效果不明显）：
　　First you start training the usual way with the additional switches –last-step 2 –direction 1, which runs the data perparation and one direction of GIZA++ training．When the GIZA++ step started, start a second training run with the switches –first-step 2 –direction 2. This runs the second GIZA++ run in parallel, and then continues the rest of the model training. (Beware of race conditions! The second GIZA might finish earlier than the first one to training step 3 might start too early!)
　　Moses本身提供的方法可以有效利用2核处理器，但是对于更多核的处理器机器，譬如我的机器是4核cpu的，通过观察发现，训练过程中任何时刻，只有两个处理器的使用率是百分百，其他两个cpu基本闲置。对于闲置的cpu,不用似乎是一种浪费，不过这个问题已经有了解决: Mgiza++。
　　MGiza++是在Giza++基础上扩充的一中多线程Giza++工具，描述如下：
　　Multi Thread GIZA++ is an extension to GIZA++ word aligning tool. It can perform much faster training than origin GIZA++ if you have more than one CPUs, in addition it fixed some bugs in GIZA, and the final aligning perplexity is generally lower than original GIZA++.

freely [´fri:li] ad.自由地；慷慨地 (初中英语单词)
available [ə´veiləbəl] a.可用的；有效的 (初中英语单词)
additional [ə´diʃənəl] a.附加的，额外的 (初中英语单词)
parallel [´pærəlel] a.平行的 n.平行线 (初中英语单词)
origin [´ɔridʒin] n.起源；由来；出身 (初中英语单词)
addition [ə´diʃən] n.加；加法；附加物 (初中英语单词)
initial [i´niʃəl] a.最初的 n.首字母 (高中英语单词)
extension [ik´stenʃən] n.延长；扩展；延期 (高中英语单词)
perplexity [pə´pleksiti] n.困惑；为难；纷乱 (英语四级单词)
script [skript] n.笔迹；手稿；剧本 (英语六级单词)

上传人

网友

栏目分类

英语能力

英语阅读

双语阅读

文章信息

浏览:134

GIZA++双语介绍

上传人

栏目分类

文章信息

相关文档

特色课程

专题

热门标签