The British Council, Open University and Oxford University did some research looking at the metrics provided by Text Inspector and how they are useful to language learners. Dr. Nathanial Owen one of the researchers has summarised the findings in a blog post for us below.

Dr. Nathanial Owen is the Senior Research and Validation Manager at Oxford University and a Doctor of Education. 

Text Inspector is a social enterprise run by volunteers with the aim of furthering academic research in linguistics and language learners.

This blog post is developed from a report produced for the British Council by Dr. Nathaniel Owen, Dr, Prithvi Shrestha and Professor Stephen Bax. To see the full report, please visit:    

What are Lexical Profiles?  

Lexical profiles are descriptions of specific levels of language users, expressed in terms of the kinds of language they use. To create lexical profiles, we need a framework of developing language proficiency, for example, the levels of the Common European Framework of Reference for languages (CEFR) (Council of Europe, 2001, 2018).  Lexical profiles can be expressed in terms of key metrics such as those produced by Text Inspector, at each of the levels (A1 – C2).  


We can create lexical profiles using large amounts of learner writing (or speaking). This blog provides an overview of efforts to develop lexical profiles of student writing across multiple levels of the CEFR for learner written English, using Text Inspector measures As well as writing, Text Inspector can also be used to analyse reading and listening texts that learners might use – however this post focuses on writing. 

So how do we develop the lexical profiles? We start with a large number of pieces of student writing. We separate them by level (e.g. A1, A2, B1 etc.). We then analyse the samples using Text Inspector. We can then perform statistical analysis to see which Text Inspector metrics are the most successful at identifying differences across the levels.  


The importance of lexis in learner writing 

Text Inspector provides many metrics of lexis, such as the English Vocabulary Profile (EVP) or the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). Knowledge and use of lexis is a particularly important part of successful language learning. Studies show a clear link between vocabulary knowledge and writing proficiency.  


Why develop lexical profiles of learner writing? 

We wanted to find out which Text Inspector metrics were most sensitive to changes in learner writing proficiency across CEFR bands. We can then use these to develop lexical profiles for learners at different levels, which can be useful for students and teachers. This is important for the following reasons:    

  • Students can see how they are progressing and whether there are specific areas of writing they need to develop in order to progress to the next CEFR band.
  • Test developers can use lexical profiles of student writing as part of validation argument to show how their tests elicit language at different levels of proficiency.
  • Test developers and teachers can also use these metrics as part of assessor training, so that people responsible for marking student writing can use this information to help inform their decisions about what scores to award specific samples of writing.
  • Additionally, more sensitive metrics can be used to refine computational models in order to improve the performance of automated scoring engines.


What we looked at:

The British Council provided 6,407 samples of learner writing from the Aptis test, representing more than one million words of learner writing.  

 The samples represent learners from 65 countries. We used Text Inspector to analyse these samples against the metrics provided by Text Inspector.  

To see what metrics Text Inspector can use to analyse text, go here:  

These included things like the number of words, number of sentences, number of syllables, average sentence length and average word length. We also looked at vocabulary use by comparing vocabulary used by test-takers to data from the BNC, COCA, the EVP, the Academic Word List (AWL), and metadiscourse markers. We then compared results according to the CEFR level awarded to each of the samples, using statistical analysis. 



What we found…  

We found evidence that Text Inspector can detect changes in learner writing systematically as the CEFR level of learners increases, although not all Text Inspector metrics were sensitive to changes which occurred across CEFR levels. We identified twenty-six metrics which were most useful in distinguishing across CEFR boundaries, including:     

  • measures of text length (e.g. sentence, token and type count)
  • measures of sophistication (e.g. syllable count and number of words with more than two syllables)
  • Measures of vocabulary use (fourteen of the 26 metrics represented vocabulary use. See the full manuscript for further details).

Some findings are outlined below (for the full results, please see the report).  

The Use of Lexis

Two examples of lexical sophistication are provided. As we can see, higher proficiency learners are more likely to use words with more than two syllables than lower proficiency writers. Note, however, that the use of these words increases significantly between B1, B2 and C levels, whereas A1-B1 bands do not see much change. Likewise, the overall proportion of syllables per 100 words generally increases with each CEFR band, but the differences from B1 to C levels are larger than A1 to B1 levels.   

CEFRAverage % words with more than 2 syllablesAverage syllables per 100 words


Additionally, the study showed evidence of how lexis use changes across CEFR bands. Highlighted below is data from the English Vocabulary Profile (EVP), which shows learners’ use of higher-level lexis increases with each CEFR band, although also shows that these changes are sometimes small and irregular.  

Average token %



The table shows that the proportion of basic lexis (A1) decreases with CEFR band, but by a very small amount (from 72.45 to 69.52 percent). This is because all writing requires the use of grammatical function words (e.g. and, the, in, on, so etc.), regardless of CEFR level. Very infrequent lexis (C1 and C2) hardly occurs at all, even among very proficient writers. Even for C-level writers, the proportion of C1 and C2 lexis they use amounts to only 1.25 percent of their total output (1.02% + 0.23% from the bottom two boxes in the above table).

The biggest changes occur with A2 and B1 lexis. Use of A2 lexis is most useful for discriminating between A1 and A2 learners. Use of B1 lexis is most useful for discriminating between B2 and C level learners. However, these changes are small. Given the average number of words produced by learners in the research is around 250 words, the percentage differences across CEFR levels may amount to a total of only 2-3 words in most cases. Lexis data is therefore useful, but cannot be used in isolation to determine CEFR level of learner writing.   

Use of Metadiscourse

Text Inspector is unique in providing measures of metadiscourse use in writing samples. 

For more information about how Text Inspector analyses metadiscourse, please go here: 

We found evidence that the use of metadiscourse changes significantly across CEFR bands, and that further investigation of this area is justified. We found that overall use (tokens) of metadiscourse markers peaks at B1 level (21.49 percent) and then falls at B2 and C levels. However, we also found that the range of metadiscourse (types) increases with CEFR level, peaking at B2:  


CEFRToken %Type %


This means that higher ability writers use fewer metadiscourse markers overall, but use a greater variety in comparison with lower ability writer. This data suggests that the way in which learners use metadiscourse varies across CEFR level, which we plan to investigate further.



This blog post provides a brief overview of the findings of a large-scale research study. We have shown that Text Inspector can provide valuable information about the lexis that learners use in their writing as they progress, and the metrics are sensitive to changes in language proficiency. We hope that this is just the start of empirical investigations into lexical use across CEFR bands.

However, the research also revealed that some changes across CEFR bands are smaller than might be expected. A limitation of this type of analysis is that there is no information on context or appropriateness of language use. Other aspects of linguistic competence are very important in judging learner writing, such as their ideas (task completion), organisation, coherence, register and tone.

For more information abut this research, and other research projects related to assessment, please visit the British Council website here:


Want to see how your texts do on these measure? 

Click here 


Find out More here 2 1


What would you like to see next on our blog?  Send us your ideas at