Analysing vocabulary using the Corpus of Contemporary American English (COCA)

The COCA tool in Text Inspector uses the Corpus of Contemporary American English dataset and focuses on American English. It complements the BNC tool which focusses on British English.You can find out more about it on this page:

As discussed on that same page:

“COCA and the BNC complement each other nicely, and they are are only large [sic], well-balanced corpora of English that are publicly-available. The BNC has better coverage of informal, everyday conversation, while COCA is much larger and more recent, which has important implications for the quantity and quality of the data overall.
Unless one is inherently interested in only British or American English, there is really no reason to not take advantage of both corpora.”

There are some differences:

“The BNC has a much wider range of spoken sub-genres, while COCA is composed of unscripted conversation on TV and radio shows ……… Both corpora are very well balanced in terms of sub-genres for the written genres (e.g. Newspaper-Sports, or Academic-Medicine). In addition, because there is a diachronic aspect to COCA (coverage over time), in COCA the distribution of 20% in each of the five genres stays constant from year to year.”