1、2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述选题、设计与方法Put it altogether李文中李文中中国外语教育研究中心中国外语教育研究中心20122012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述语料库不是人学的, 正则表达式不是女人学的。2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述 Corpus-driven is basically corpus based.2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Any corpus-based research is necessarily drive
2、n by corpus data.2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述目标:通过语料库分析和研究:验证假设、直觉获得新发现建立新的假设构建新的理论验证已有的发现解决难题2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述创新:数据方法技术解读/理论/视角新2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述基于语料库方法是一种验证程序语料库驱动方法是一种发现程序2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述理据:任何感知都是推断Any perception is but inferencing.2012 语料
3、库与外语研究研修班语料库研究方法概述语料库研究方法概述world ofrealityworld of textEinstein GulfUnbridgeable2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述眼 耳 鼻 舌 身 意色 声 香 味 触 法学问思辨行文本2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述1.基本步骤:2.确定题目3.提出问题4.确定总体和样本5.选择工具6.处理数据7.描述结果:分类、总结特征(description)8.解释结果:观察、描述、解释(explanation)9.解读结果:意义、价值、应用(interpretati
4、on)Identifying a problem Some thing or phenomenon: out of expectation Incongruent Need a solution puzzlingReading to be better informed What has been done as contribution What has been left undone What has been done wrongNever count someone elses money.Formulating research questions Naming: what is
5、Classificatory: How are they interrelated (patterned)? Explanatory: to what extent do they co-occur? Predictive: What will happen if?Never ask a question to which you already know the answer;never ask how to questionFinding a method Population Sample SamplingP(population)S(Sample)R(Result) I(Interpr
6、etation)SamplingvalidityreliabilityValidity Generalizability IFP SS RR ITHENI P2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Descriptive researchsingle texttext vs. textpeople vs. text2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Research questions1. How many different word forms are used in the text? How many running word
7、s are used? What is their distribution?2. To what extent can the level of difficulty of the text be computed on the basis of the graded wordlists?3. How many different word classes are used? What is the number of each word class?2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述MethodTo answer RQ 1, generate a word
8、list of the given text and observe: The number of types The number of tokens the type/token ratio (TTR) If the text is very large, standardize the TTR the types and their frequency cumulative percentage2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述To answer RQ 2, compute the wordlist against a batch of graded w
9、ordlists, and observe: How many types on Level 1, 2, and 3 lists are used in the text? And what is their percentage? What about their tokens? How many types that are not on any list are used in the text? Summarize their features.2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述To answer RQ 3, retrieve each word cl
10、ass from the POS tagged text, and sort them on frequency in decreasing order Retrieve all the nouns, verbs, and adjectives Sort the list2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Instruments Use Antconc 3.0 to generate the wordlist;Use Range to compare and contrast the wordlist against a batch of graded word
11、lists;Use PowerGrep to retrieve the word class from the POS tagged text;2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Explanatory researchinterrelationship between wordsIR between phraseologiesIR between genres2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Research on relationship:shapedirectionstrength2012 语料库与外语研究研修班语料库研究
12、方法概述语料库研究方法概述Research questionsWhat are the words that are unique to the text in terms of its subject matter?To what extent are these words related to the subject/topic of the text?What patterns of relationships exist among the key words?2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述MethodCompare & contrast the
13、 wordlist (of the observed text or corpus) against the wordlist of the reference text or corpus (larger);Observe and group the words within a classification framework; 2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述InstrumentsAntconc 3.0Other applicationsLiterary analysisAutomatic summarization2012 语料库与外语研究研修班语料
14、库研究方法概述语料库研究方法概述Research on word usesObjectives:Observe the collocates of a word;Study its patterns of uses;Study its meanings associated with its patterns of uses; Study the semantic prosody of its meaning2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Research questions1. What words collocate with the Search Wo
15、rd? What is the strength of the collocability?2. What is the pattern of the SW? And what is its semantic preference?3. What is the semantic prosody of the pattern?2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述MethodSearch the word (KW, SW, or Node Word) as KWIC;Observe its collocates and their word classes;Obse
16、rve the meaning that is associated with the pattern;Observe its semantic prosody;2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述InstrumentsAntconc 3.0 Concordance Sort : Level 1, Level 2, Level 3 Frequency count Collocates Sort Sort POS tags2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Research on chunksObjectives:To retrie
17、ve the multiword sequences;To examine the internal structure of such sequences;To obtain the sequences unique to a specific text;2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Research questionsWhat multiword sequences (in terms of n-gram) are found in the given text?How are these sequences structured in terms o
18、f lexical grammatical pattern?How is the message conveyed associated with the overall structure of the sequences?2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述MethodSegment the text and generate a batch of lists of multiword sequences (of various lengths);Observe the structure of the retrieved n-grams and exami
19、ne their regularities;Study the semantic and pragmatic features;2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述InstrumentsKfngramWordsmith Tools v3.0PowerGrep 3.52012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Research on parallel textsObjectives:To observe how the source text was translated into the target text;To observe th
20、e probability of the translation units and corresponding units found in the text;To study the dynamics of the translation of a given community;2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Research questionsWhat translation units can be found in the parallel texts?What corresponding units can be found in the te
21、xts?What are the distributional features of the corresponding units of the identical genre;2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述MethodObserve the parallel texts (or construct the corpora);Retrieve the units focused;Examine the context of such units;Observe their patterns;2012 语料库与外语研究研修班语料库研究方法概述语料库研究方
22、法概述InstrumentsParallel textsParallel concordancerCorresponding units database2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Contrastive StudiesObjectives: to do genre analysisto determine the difficulty level of a textto observe how a text is biased against its readers in terms of event presentation and evaluati
23、onto compare a batch of wordlists with the graded wordlists2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Research questions:What genre features can be identified for a particular genre text, in terms of lexis, patterns, and textual organization?How difficult is a text? Is it a proper input text for the students
24、 at their present competence?2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述What contrastive features are displayed in the point of view, tone, judgement of the text? How do these features affect the readers?How to determine the vocabulary size or lexical density of a text (a batch of texts)?2012 语料库与外语研究研修班语料库研
25、究方法概述语料库研究方法概述Instruments:Keyword analysis; concordance analysisWordlist comparison & contrastCritical discourse analysis2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Other corpus-driven, corpus-based, or corpus-informed research Use of corpora in language teaching Contrastive Interlanguage Analysis Multiword s
26、equences Parallel texts Genre & register analysis Discourse analysis Literary analysis Cognitive linguistic studies2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述Where the corpus is strong: Synchronic and diachronic Syntagmatic and paradigmatic Qualitative and quantitative Empirical and intuitive Probability & CentralityWhere the corpus is weak: Mental processes Psychological reality Possibility absence2012 语料库与外语研究研修班语料库研究方法概述语料库研究方法概述