Thursday, June 2, 2011

CATESOL Roundup, Part 8

Using Corpora for Language Instruction

CATESOL's Sunday workshops have been my favorite. Every time I attend a state conference, I make sure I don't miss its value-added Sunday fare. So, as usual, I stayed for one of the three-hour sessions this year.

It was not the first time I heard Dr. Randi Reppen talk corpus linguistics, but over the course of a few years, reception of corpus-based analyses had seemed to grow quite a bit.  The big three - Oxford, Cambridge, and Longman - have all published corpus-based dictionaries along with teacher-manual-type resources. Real Grammar, a completely corpus-based high-level grammar textbook, has been on the market. The University of Michigan Press and Cambridge have even come out with low-level textbooks that are corpus-based (e.g. Cambridge's Interchange series).

Dr. Reppen opened her presentation by defining a corpus as a large, principled collection of natural texts analyzed both qualitatively and qualitatively using both automatic and interactive computer techniques. In her talk, she listed four ways to use corpora in the classroom:
  1. inform the syllabus to serve our students' needs
  2. develop materials and activities
  3. create specialized corpora
  4. use online resources
In terms of making our decision as to what to choose to teach and present, Dr. Reppen shared these findings from the corpus-based text analysis field:
  • the 12 most common lexical verbs (as opposed to auxiliary verbs) don't get much use in the register of academic writing
  • simple aspects are more common than perfect aspects (Ever notice the popularity of the historical present tense?)  
  • most common words that are not function words (i.e. articles and prepositions) in business classroom teaching are mostly the same heavy-duty verbs as the 12 most common lexical verbs, but the most common words in business textbooks are more noun-heavy (e.g "win" used as a noun)
  • suffixes are more productive than prefixes, and the six most productive suffixes are -tion, -ity, -er, -ness, -ism, and -ment
  • to demand something, Americans use "you must" in written discourse, but "you should" and "I'd like you to..." in spoken discourse
  • progressive tenses are for actions actively controlled by the subject while simple tenses emphasize a situation experienced by the subject
  • "everybody" is more common in conversation than "everyone," especially as an object pronoun
In terms of creating corpus-based activities and materials for students to do and study, Dr. Reppen mentioned the following:
  • give students a page from an authentic article to identify the most productive suffixes and create a matching exercise for them to do
  • create a word frequency list from readings (and skip the function words)
  • model language use and role-play actual dialogues at service counters (e.g. opening: "Hi." "Hi." closing: "Thank you." "Thanks.")
  • teach the function of fillers (e.g. "uh" and "umm" uttered before a decision-making point or used to hold the floor)
  • create a KWIC (key word in context) activity (e.g. gap "must" to show that it has a strong collocate with "be")
  • gap articles
In terms of creating specialized corpora, Dr. Reppen suggested class readings, student papers, and college textbooks as sources. As an interesting side note, Dr. Reppen pointed out that content words (e.g. "matrimony") are often highlighted in college textbooks while polysemous academic words (e.g. "comprise") are hardly ever visible.

To find out how error treatment progresses over a few years across elementary grades, Dr. Reppend once collected and created a writing corpus by hand-coding for three types of errors:
  1. noun morphology (Me and Rebbecca are friend.)
  2. verb morphology (Last night I stay up late.)
  3. S/V agreement
Since the textbooks were not designed to address the errors influenced by the students' first language or dialect, Daily Oral Language was used to tackle the issues.

When a learners' corpus of errors reveals preposition issues such as confusion over "result in" vs. "result of," you can gap out the word after "result" and create a worksheet for the students to fill out.

Corpus-driven activities are hands-on opportunities for our students to play with the language; they are the language equivalent of a science lab session. In my opinion, they can result in real learning because the students are required to use inductive reasoning while drawn to authentic text. They need to notice the context, analyze the discourse, and generate new knowledge. Dr. Reppen cited a couple examples that can lead students to a more enduring understanding of the language point at hand.

Passive Voice
  1. Collect four examples of passive sentences and four examples of active sentences, with each passive/active pair sharing the same verb.
  2. Prepare a checklist of the characteristics of these kinds of sentences.
  3. Have students examine the example sentences and check off only the characteristics of the passive
Collocations
Have students do a KWIC search to find out the most frequent left collocate (adjective) of the noun "range." (They will find "wide" and "broad" are most common and thus most idiomatic.)

Semantic Prosody (i.e. flavor of the word)
  1. Have students use a special corpus of economics textbooks and do a KWIC search for the verb "cause."
  2. Ask them to analyze several of the concordance lines by paying special attention to the nouns that comes after the verb. (They will see a pattern in the way the verb "cause" is used in the economics corpus: often used to convey a negative connotation.)
In creating corpus-based practice and testing activities, Dr. Reppen recommended the guidelines that we first go by register - asking ourselves what it is we are teaching (conversation, research, writing blogs, etc.) and that we drill down on synonyms so that our students may not sound very different.

In terms of available online resources, Dr. Reppen
  1. How do I want to use this site? (For me, or for my students?)
  2. Does the site match my goals?
  3. Does it say what it says it will do?
  4. Is the site stable? Does it crash often?
  5. Are the instructions clear and easy to follow?
  6. Does it charge a user fee? If so, is it worth it?
The three best free online corpora in Dr. Reppen's opinion are:
  1. COCA, which is part of corpus.byu.edu (a word of caution: the "spoken" register there does not mean natural, everyday conversation. Rather, it refers to TV scripts and the like.)
  2. Time Corpus, which is also part of corpus.byu.edu (Dr. Reppen once had her students click on "chart," type "hippie," and analyze the result chart.)
  3. Michigan Corpus Linguistics, which links to MICASE (Michigan Corpus of Academic Spoken English) and the new MICUSP (Michigan Corpus of Upper Level Student Papers that got good grades). With the former, you can listen with or without transcripts. There are also lesson plans, too.
Dr. Reppen's checklist for developing activities using online corpora:
  1. Know what you want to teach!
  2. Select the best corpus resource for your lesson.
  3. Explore the corpus completely for the point you want to teach.
  4. Prepare directions that are complete and easy to follow.
  5. Always have a backup plan.
A few more examples of online corpus activities:

Expressing Opinions
  1. list the language frames such as "I believe"
  2. have students use the chart function of COCA to see which context (register) is "I believe" used under (answer: "spoken")
Multi-Word Verbs vs Single-Word Verbs
  1. have students use COCA to find out the register difference between "look into" and "investigate," for example (answer: multi-word verbs are not as academic)
  2. have students rewrite the sentences in their academic paper changing multi-word verbs to single-word verbs
Formal vs. Informal
  1. spend half an hour training students on how to use a relevant online corpus
  2. have students produce an activity on using formal language vs. informal
There are many things learners interacting with corpora can do, according to Dr. Reppen:
  • learn vocabulary, which forms the bricks of a language
  • practice
  • explore extended collocations
  • compare against model texts
  • discover patterns of use
In terms of concordancing tools and  additional reading, Dr. Reppen highly recommends the following:
  • MonoConc - an easy-to-use and inexpensive concordancing package
  • University Language: A Corpus-Based Study of Spoken and Written Registers, a comprehensive book written by her corpus linguist husband Douglas Biber
  • "A New Academic Word List," published by Averil Coxhead in TESOL Quarterly, Vol. 34, Issue 2 
Dr. Reppen termed the last item as a must-read, but she also cautioned that the AWL should be used as a kind of launching pad and can certainly be expanded. She said that Coxhead's list only covers words that appear in a set number of disciplines, so we shouldn't think that a word missing from the AWL is not an academic word to teach.

(To be continued.)

No comments: