MODL5007M: Introduction to Corpus linguistics for translatorsPython

Java Python MODL5007M: Introduction to Corpus linguistics for translators (15 credits)

1    Overview

Corpus linguistics is aimed at the empirical study of how language is used.  The basis for the study is provided by corpora, i.e. large databanks of texts in natural language. This module explores basic methods in corpus linguistics and aims to equip you with the ability to develop and use monolingual and multilingual corpora for learning foreign languages and doing translations. It complements most closely the core modules in Translation and in Translation Memories. Traditional bilingual dictionaries and their electronic versions provide basic information on translation equivalence, but typically there are more possibilities for translating words in context than offered by dictionaries. In contrast, translation memory tools are designed to provide examples of translations in their context, but the size of a database available for a translator is typically limited. A corpus can help you in studying uses of words in a foreign language and comparing uses in two languages when translating.

2    Available corpora and corpus tools

From the Internet you can access some reference corpora, such as the British National Corpus, as well as general purpose corpora for Arabic, Chinese, Czech, German, Italian, Japanese, Portuguese, Russian, Spanish (and some other languages). A software applic- ation that produces lines with keywords and their contexts is a concordancer. The course will also teach you to use concordancers for studying uses of words and testing translation equivalents.

3    Objectives

On completion of this module, you should be able to:

•  describe and exemplify goals and methods of corpus linguistics

•  describe basic types of corpora

•  understand principles of corpus querying

•  know relevant statistical methods

•  design your own specialised corpora

•  compare word uses in the source and target languages using parallel and comparable corpora

•  use corpus data to build glossaries and task-specific dictionaries

4    Learning approaches

To achieve the module aims, you need a combination of conceptual knowledge and practical experience.  Accordingly, you have weekly lectures (1 hour) combined with seminars (1 hour) or practical sessions (1 hour). Supervised practical sessions in ERIN will focus on basic IT skills for querying corpora and using concordancers.  The lectures covering basic topics of the module alternate with seminars and practical sessions in which theory and practice are confronted and further explored through exercises.

5    Syllabus

Date     Session           Topic

W1       Lecture                Theoretical foundations: Using corpora in research and practice

W1       Practical              Using online corpus interfaces MODL5007M: Introduction to Corpus linguistics for translatorsPython

W2       Lecture                Quantitative study of corpora: frequency lists and collocations

W2       Seminar               Analysing and comparing frequencies

W3       Lecture                Methods for exploiting corpora: making queries

W3       Seminar               Making queries and recording your work

W4       Lecture                Quantitative study of corpora: collocations

W4       Seminar               Using collocations and word sketches

W5       Seminar               Linguistic annotation

W5       Seminar               Experiments with explicit annotation

W6       Lecture                Corpus-based dictionary development

W6       Practical              Development of dictionaries in XML

W7       Reading week

W8       Lecture                Building corpora from the Web

W8       Practical              Building your own corpus

W9       Lecture                Know your corpus: assessing corpus composition

W9       Seminar               Assessing composition of your corpus

W10     Lecture                Introduction to using Python

W10     Practical              Building your corpus in Python

W11     Seminar               More experiments with Python and XML

W11     Seminar               More experiments with Python and XML

6    Assessment

At the end of the course you must complete a case study (of 2000 words) to report your project that compares uses of several lexical items in two languages using data from both large corpora and from corpora collected by you.  The purpose of the case study to demonstrate your ability to use the tools for corpus querying and to analyse evidence provided by these tools.  As an outcome of this case study you will also create a bilingual dictionary in XML for the lexical items with contexts of their uses to demonstrate how you can apply annotation methods.

The progress in the course will be also monitored by participation in the seminars.

For more information, including examples of expected submissions and the reading lists, please see the Minerva area and the Corpus module website: http://corpus.leeds.ac         

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值