Oxford English Corpus

The Oxford English Corpus is a text corpus of 21st century English, used by the makers of the Oxford English Dictionary and by Oxford University Press's language research programme. It is the largest corpus of its kind, containing nearly 2.1 billion words.^[1] It includes language from the UK, the United States, Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore and South Africa.^[2] The text is mainly collected from web pages; some printed texts, such as academic journals, have been collected to supplement particular subject areas.^[2] The sources are writings of all sorts, from "literary novels and specialist journals to everyday newspapers and magazines and from Hansard to the language of blogs, emails, and social media".^[2] This may be contrasted with similar databases that sample only a specific kind of writing. The corpus is generally available only to researchers at Oxford University Press, but other researchers who can demonstrate a strong need may apply for access.^[2]^[3]

The digital version of the Oxford English Corpus is formatted in XML and usually analysed with Sketch Engine software.^[4]

Each document in the OE Corpus is accompanied by metadata naming:

title
author (if known; many websites make this difficult to determine reliably)
author gender (if known)
language type (e.g. British English, American English)
source website
year (+ date, if known)
date of collection
domain + subdomain
document statistics (number of tokens, sentences, etc.)^[4]

References

↑ "The Oxford English Corpus". Sketch Engine. Lexical Computing CZ s.r.o. Retrieved 27 October 2016.
1 2 3 4 "The Oxford English Corpus". Oxford Dictionaries Online. Oxford University Press. Retrieved 8 November 2014.
↑ "Compare COCA". Corpus of Contemporary American English. Retrieved 8 November 2014.
1 2 The Oxford English Corpus. Retrieved February 4, 2014.

Corpus linguistics

Text corpora, English	American National Corpus Bank of English Bergen Corpus of London Teenage Language British National Corpus Brown Corpus Buckeye Corpus Cambridge English Corpus Corpus of Contemporary American English Enron Corpus International Corpus of English Lancaster-Oslo-Bergen Corpus Oxford English Corpus PropBank Spoken English Corpus TIMIT VerbNet Wellington Corpus of Spoken New Zealand English

Text corpora, non-English	Bijankhan Corpus CHILDES Croatian Language Corpus Croatian National Corpus Europarl corpus German Reference Corpus Hamshahri Corpus National Corpus of Polish Neo-Assyrian Text Corpus Project Quranic Arabic Corpus Russian National Corpus Scottish Corpus of Texts and Speech Slovenian National Corpus TalkBank Tatoeba Tehran Monolingual Corpus Tekstaro de Esperanto Thesaurus Linguae Graecae

Organizations	BNC consortium COBUILD

This article is issued from Wikipedia - version of the 10/26/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Oxford English Corpus

See also

References