Corpus linguistics the study of language using real-life examples. It is not a branch of linguistics but a methodology or approach. Corpus, the Latin word for "body," refers to the body of natural texts, and the approach involves discovering patterns of language use through analysis of the corpus. Corpus linguistics is experiencing a comeback, as computer programs have revolutionized the approach.
Parental diaries of a child's speech as he first acquires language is a simple example of a corpus that can then be studied to learn language patterns. Foreign language teaching in the first half of the 20th century often used corpora of the target language to compile vocabulary lists for students. The eminent linguist Noam Chomsky did not consider the use of corpora a valid tool, as he believed that language competency was more important than performance data. Early corpus linguistics was largely based on the assumption that there are a limited number of sentences in a natural language and that those sentences can be collected and evaluated.
After falling out of favor in the '60s and '70s, corpus linguistics is experiencing a revival due to the methodological use of the computer. The concordance program is the name of the software most commonly used by linguists. While searching patterns in a corpus of millions of words would take too much time for a human being and the results would be less than accurate, a computer can search and retrieve information in mere seconds. It can calculate frequency, sort data and exploit corpora in ways that were impossible in the past.
Corpus-based analysis can look into how register affects language; patterns of language use, such as how males and females make different use of tag questions; the extent to which language patterns are used; and the factors that affect the variability of language use. Teaching can benefit from corpus linguistics in the design of the syllabus, the development of the materials used, and the type of activities used in the classroom. Students could benefit from the approach by being able to determine more clearly the different uses and meanings of common words, the differences inherent in written and spoken language, and phrases and collocations they could make use of. The body of data that is the corpus is constantly updated and is the product of real-life social interactions. Thus, the corpora are naturalistic data that can be easily accessed, and the findings can be generalized.