对外汉语语料库有哪些_国内外有哪些比较好的语料库?(corpora)

我在英国 可能一些网站 你们需要vpn才能打开~

Some well-known Corpora

• Bank of English (Cobuild) TitaniaCollins Dictionary

• Leeds Collection of

Internet Corpora Leeds collection of Internet corpora

Some parallel corpora

Building your own corpora

• Free software available: antconc – Link:

http://www.antlab.sci.waseda.ac.jp/antconc_inde x.html – Other software can be

found

• E.g. BootCaT for rough and ready web corpora[bnc] Designing and Creating the BNC

• The minimum size of a corpus depends on two main

factors:

--the kind of query that is anticipated from users

--the methodology they use to study the data

Coding and cleaning files

• Antconc will only read .txt files, so you need to

clean up pdfs and html files to that format

• This can take time

• Maher, Waller and Kerans (2008) recommend

converting from html for this reason

– For more accessible genres, html files are going to

be more common

More tips on cleaning files

• Remove reference lists

• Remove non-linguistic content

• Remove extra spaces

• Problems of anomalous characters

– .txt tends to flounder with diacritics

• Hyphen issues

• From Maher, Waller and Kerans 2008

• The

use of open-access corpra Allows you to create quick corpora of specialised

terms for specific jobs

• Major disadvantage is lack of control

• Major

advantage is rapidity

Research fully to ensure a translation doesn’t

exist:

– IATE (including looking on other languages)

– Field specific resources (e.g.

Glossary of Tax Terms)

– Refined Google searches

– Create ‘sample’ translations and search for these in

the TL.

• EU resources for translators:

– EU – DG Translation

• Interactive Terminology for Europe:

– IATE - The EU's multilingual term base

• French Law on the Internet - The Basics and Free Resources By

Emmanuel Barthe http://www.nyulawglobal.org/globalex/french_law_free_resources.htm

Chinese law resources on the internet

Chinese Law Resources on the Internet

Features - A Guide to the Spanish Legal System

http://216.122.177.166/dpz/legloc/default.html

Bilingual law information system: e.g. Hong Kong

Department of Justice

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值