Adjustment for longer texts

In October 2017 we applied an innovative upgrade to make the Text Inspector Scorecard more accurate for longer texts.
Certain text measures are sensitive to text length, which means that with longer texts they can over- or under-estimate aspects of a text. For example, the famous measure of Type-Token ration (TTR) is famously sensitive in this way, as described on the Wordsmith site:
“…this type/token ratio (TTR) varies very widely in accordance with the length of the text — or corpus of texts — which is being studied. A 1,000 word article might have a TTR of 40%; a shorter one might reach 70%; 4 million words will probably give a type/token ratio of about 2%, and so on. ……… The conventional TTR is informative, of course, if you’re dealing with a corpus comprising lots of equal-sized text segments (e.g. the LOB and Brown corpora). But in the real world, especially if your research focus is the text as opposed to the language, you will probably be dealing with texts of different lengths and the conventional TTR will not help you much.”
In a similar way, some of the measures we use in Text Inspector could be sensitive in this way – for example, these:
Lexical Sophistication: English Vocabulary Profile
Lexical Sophistication: British National Corpus
 Lexical Sophistication: Corpus of Contemporary American English
So to avoid this possibility we have introduced an innovative new way of calculating these for longer texts (over 600 words).
Our software takes longer texts and then divides them into smaller chunks, then carries out its calculations on each chunk. Finally it gives an average for all the chunks together and it is that average which you will see on the Scorecard.
This avoids any danger of text-length distortions in these metrics, and our research shows that it is a far more secure and accurate way of assessing these metrics, as we benchmark your text to the CEFR scale.
If you have any questions about this please contact us: