Analysing Vocabulary Using the British National Corpus (BNC)

maybe to Using the Text Inspector tool, you can gain access to the British National Corpus.

This will enable you to better understand your chosen text in terms of real word usage in the British English-speaking world.

The knowledge can help improve your ESOL language teaching or learning, allow you to discover more about general use of the language and better inform your linguistic studies.


What is a Corpus?

A corpus (plural= corpora) is a collection of written or spoken texts stored on a computer. These demonstrate exactly how a word or phrase is used in context by real language speakers across a variety of registers.

They are used for many purposes.


  • by lexicographers to create dictionaries, grammar reference materials, grammar practice materials and exam practice tests.
  • to teachers to help guide the development of tools for the teaching of vocabulary, idioms, phrasal verbs, and collocations (other words that usually occur alongside the chosen word).
  • by second language learners who want to understand the authentic use of a word, improve their overall language skills and expand their vocabulary.
  • by anyone interested in learning more about language use.


What is the British National Corpus (BNC)?

The British National Corpus (BNC) is a corpus created from over 100 million word samples.

These samples come from a variety of both written and spoken sources including newspapers, fiction, letters, conversations and academic materials.

Written texts account for around 90% of the corpus and spoken texts account for 10%.

Through analysing the BNC, researchers have created a list of words ordered by how often they appear in the corpus. This list provides an insight into how often certain words are used in British English. Lists like this are commonly known as frequency word lists.

When Text Inspector analyses your text, it will highlight the position of each word in the BNC frequency list and also provide overall statistics on how commonly used the vocabulary in your text is. 


Why use a corpus?

Using a corpus is an excellent way to understand how a language is used across a variety of registers.

Whereas traditional grammar books and second language teaching materials tend to focus on how language should be used (known as ‘prescriptive grammar’), a corpus like the British National Corpus focuses on how it’s really used (known as ‘descriptive grammar’).

When it comes to conducting linguistic research, teaching English as a second language, or learning English, this can be an invaluable insight to have.

For example, many of us were taught that we cannot split an infinitive in English.

This is when an adverb is placed between the word ‘to’ and the verb in an infinitive such as in the sentence “she used to secretly admire his English language skills”.

However, this is simply not the case. People have been splitting infinitives in their language for centuries and will continue to do so. If we follow this prescriptive rule, we’d get the awkward and unnatural sentence; “She used secretly to admire his language skills.”

When we use a corpus, we understand this detail and can use it to help us decide how to use language most effectively.


Why use a Corpus for English Language Learning?

When you understand how words are used by real speakers, you can vastly improve your vocabulary, grammar, and skills as a language learner. This will allow you to sound more native in your spoken and written communication.

If you’re teaching English as a second language, using a corpus like the BNC will allow you to develop better quality, more useful course materials.

In the case of the BNC, you can use the frequency word list to understand how commonly used the words in your text are. This can help you adapt you text for example, so that it focus on more commonly known words to make it easier to understand or to ensure English Language students are learning more commonly used words. Or alternatively, for more academic writing or more advanced English language learners, a text can be modified to include less common words.

You can also track student’s writing over time and understand their use of more or less common words. 


Why Text Inspector Doesn’t use Word Families

Text Inspector analyses your text using the British National Corpus exact frequency rank, instead of using word families as with other tools.

As the name suggests, a word family is a group of words that are related in form and meaning. An example would be the words, ‘solve’, ‘solution’, ‘solvent’, ‘dissolve’ and ‘insoluble’.

This is because we don’t believe that each word in a word families poses the same degree of difficulty.

This is an opinion shared by Schmitt and Zimmerman in their 2012 paper ‘Derivative Word Forms: What Do Learners Know?

“Some teachers and researchers may assume that when a learner knows one member of a word family, the other members are relatively easy to learn. Although knowing one member of a word family undoubtedly facilitates receptive mastery of the other members, the small amount of previous research has suggested that L2 learners often have problems producing the various derivative forms within a word family.”


Why use Both the British National Corpus and the Corpus of Contemporary American English?


Text Inspector uses both the BNC and the COCA for text analysis. There are several reasons for this:

  1. Using both helps ensure that the user gains a better overall understanding of the global use of English, not only British English.
  1. Language is a living thing and many words traditionally considered to belong to American English are used by British English speakers, and vice versa.
  1. Each has their own advantages over the other. For example, the BNC includes more informal, everyday conversation whereas the COCA is much larger in size and was created more recently. This means they complement each other well.

[For an interesting comparison of both corpora, visit the English Corpora website.]


Where do I find the BNC information in the analysis?


After you analyse your text, you’ll be taken to a full summary of the analysis.

If you want to find the information relating to the British National Corpus, look to the left side of the page and click the tab that says ‘Lexis: BNC’.

You will be taken to a page with more detailed information. This includes both graphs and tables explaining tokens, types, elements, lexical counts and much more.



Try it now! You can analyse short texts up to 250 words for free.

Get started here