Corpus of Contemporary American English

The freely searchable 450-million-word Corpus of Contemporary American English (COCA) is the largest corpus of American English currently available, and the only publicly available corpus of American English to contain a wide array of texts from a number of genres.

It was created by Mark Davies, Professor of Corpus Linguistics at Brigham Young University.[1]

Content

The corpus is composed of more than 450 million words from more than 160,000 texts, including 20 million words each year from 1990 to 2015. The most recent update was made in December 2015. The corpus is used by approximately tens of thousands of people each month, which may make it the most widely used "structured" corpus currently available.

For each year, the corpus is evenly divided between the five genres: spoken, fiction, popular magazines, newspapers, and academic journals. The texts come from a variety of sources:

Queries

See also

References

  1. Kauhanen, Henri (2011-03-21). "The Corpus of Contemporary American English: Background and history". VARIENG. Retrieved 2011-10-13.

Bibliography

  • Davies, Mark (2010). "The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English". Literary and Linguistic Computing 25 (4): 447–65. doi:10.1093/llc/fqq018. 
  • Bennett, Gena R. (2010). Using Corpora in the Language Learning Classroom: Corpus Linguistics for Teachers. Ann Arbor, Michigan: University of Michigan. p. 144. ISBN 978-0-472-03385-0. 
  • Davies, Mark (2010). "More than a peephole: Using large and diverse online corpora". International Journal of Corpus Linguistics 15 (3): 405–11. doi:10.1075/ijcl.15.3.13dav. 
  • Anderson, Wendy; Corbett, John (2009), Exploring English with Online Corpora, Palgrave Macmillan, p. 205, ISBN 978-0-230-55140-4 
  • Davies, Mark (2009). "The 385+ Million Word Corpus of Contemporary American English (1990–present)". International Journal of Corpus Linguistics (John Benjamins Publishing Company) 14 (2): 159–190(32). doi:10.1075/ijcl.14.2.02dav. 
  • Lindquist, Hans (2009). Corpus Linguistics and the Description of English. Edinburgh University Press. ISBN 978-0-7486-2615-1. 
  • Davies, Mark (2005). "The advantage of using relational databases for large corpora: Speed, advanced queries, and unlimited annotation". International Journal of Corpus Linguistics (John Benjamins Publishing Company) 10 (3): 307–334(28). doi:10.1075/ijcl.10.3.02dav. 

External links

This article is issued from Wikipedia - version of the Monday, January 11, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.