Analysing vocabulary using the Corpus of Contemporary American English (COCA)

Using the Text Inspector tool, you can access information from the Corpus of Contemporary American English (COCA). This corpus is the most up-to-date available and is an excellent complement to the British National Corpus (BNC).

Understanding this data demonstrates real world usage of language and can help ESOL teachers, curriculum writers and textbook developers to better understand which words to include in their teaching materials and lesson plans.

When used by English language students, it can also help improve their language skills by providing information that many English learning materials don’t include.


What is the Corpus of Contemporary American English (COCA)?

The Corpus of Contemporary American English (COCA) is a 1.1 billion word corpus of American English and is one of the most widely used corpora used.

Created by Professor Mark Davis, it contains a well-balanced collection of spoken, fiction, magazines, newspapers, academic texts, TV, movie subtitles, blogs and web pages.

These texts are from the years 1990-2019, with the most recent update taking place in March 2020. This makes it one of the most up-to-date English corpora in the world.


Why use a corpus for English language learning?

Corpora allow you to understand how the English language is used by real speakers, not just presented in textbooks. This is known as a ‘descriptive’ approach to language.

Understanding this authentic language use helps you to improve your understanding of vocabulary and grammar, sound more native in spoken and written communication and use your language skills to the best of your ability.

If you’re an ESOL teacher, using a corpus like COCA or BNC will help guide your language instruction and help you develop better quality course materials.


Why use the COCA and the BNC together?

The COCA tool used in Text Inspector uses the Corpus of Contemporary American English dataset alongside the BNC tool.

Together, they provide a well-rounded, comprehensive research and learning tool that provides great insight into modern English language usage, regardless of which variety of English is being used.

This is because:

  • Each corpus offers a different selection of written and spoken texts from different genres and time periods.
  • English language use is constantly shaped by global usage. Using both helps to provide a better overall understanding.

 [To learn more about how they compare, visit the English Corpora website.]


Where do I find the COCA information in the analysis?

To find data from the Corpus of Contemporary American English in your analysis, simply look towards the left side of the page.

You’ll see the menu option ‘Lexis COCA’ which you should click.

Here you’ll see detailed information presented in both graphs and tables, with data on tokens, types, elements, lexical counts and more.

However, it’s important to remember that no computer analysis of language can ever be fool proof. Therefore, if you notice that grammatical tag is inaccurate, please go to the TAGGER tool to change and update the problem.

Once you’ve done that, click the ‘Update results’ button at the top of the page.

Try it now! You can analyse short texts up to 250 words for free.

Get started here