Measure lexical diversity

Text Inspector is a professional online tool for measuring Lexical Diversity.


What is Lexical Diversity?


Lexical Diversity refers to:

“the range of different words used in a text, with a greater range indicating a higher diversity” (McCarthy and Jarvis 2010: 381).

So what does Lexical Diversity mean? Imagine a text which keeps repeating the same few words again and again – for example: ‘manager‘, ‘thinks‘ and ‘finishes‘.

Compare this with a text which avoids that sort of repetition, and instead uses different vocabulary for the same ideas, ‘manager, boss, chief, head, leader‘, ‘thinks, deliberates, ponders, reflects‘, and ‘finishes, completes, finalises‘.

The second text is likely to be more complex and more difficult. It is said to have more ‘Lexical diversity’ than the first text, and this is why Lexical Diversity (LD) is thought to be an important measure of text difficulty. If a text has a higher index of LD or D it is likely to be more complex, more advanced and more difficult (if other things are equal).

As Duran et al. have said:

“A key to understanding the general principle is that lexical diversity is about more than vocabulary range. Alternative terms, ‘flexibility’, ‘vocabulary richness’ (Read 2000), ‘verbal creativity’ (Fradis. Mihailescu, and Jipescu 1992), or ‘lexical range and balance’ (Crystal 1982), indicate that it has to do with how vocabulary is deployed as well as how large the vocabulary might be.” (Duran, Malvern, Richards, Chipere 2004:220)

What do the measures mean?

The Text Inspector tool allows you to measure Lexical Diversity instantly.

Research has show significant differences in this measure (D) as children develop into adults. Duran et al. offer a useful scale as follows:


(Duran, Malvern, Richards, Chipere 2004:238)

For the purposes of Text Inspector, the key finding from this is that Adult second language learner writing would typically have a D measure of around 40-70, while native speaker adult academic writing would typically have a measure of around 80-105, in very general terms.

How can we measure Lexical Diversity?

McCarthy and Jarvis note that:

“[[a] reliable index of lexical diversity (LD) has remained stubbornly elusive for over 60 years” (McCarthy & Jarvis 2007: Abstract)

The main problem is that some measures of LD do not take account fully of differences in text length. In other words, if they are used on texts of different lengths they can give misleading results. (For example, the traditional measure of type/token ratio (TTR) is susceptible to this problem. See here for a discussion.)

MTLD and voc-d

However, researchers now tend to agree that two measures seem to be particularly reliable, namely MTLD and vocd-D. These are the two measures which Text Inspector allows you to measure. McCarthy and Jarvis summarise a comprehensive analysis as follows:

“We conclude by advising researchers to consider using MTLD, vocd-D (or HD-D), and Maas in their studies, rather than any single index, noting that lexical diversity can be assessed in many ways and each approach may be informative as to the construct under investigation.” (McCarthy and Jarvis 2010: Abstract, p381)

They note, however, that it still retains an element of sensitivity to text length.

See also this interesting discussion interesting discussion on “Evaluating the Comparability of Two Measures of Lexical Diversity” by Fredrik deBoer. He also makes the point that: “VOCD-D is still affected by text length, and its developers caution that outside of an ideal range of perhaps 100-500 words, the figure is less reliable.” (np)


We are also grateful to Phil McCarthy for these comments he sent us (July 2017):

Hi … I’m Phil McCarthy … I’m the one who made MTLD.

First, this is a GREAT TOOL … awesome that you’ve put it out there so that people can use lexical diversity in assessments.

Second, I read here that some people wonder why we recommended that researchers use MTLD, vod-D (HD-D) and MAAS given that only MTLD seems to be completely independent of text length. I think there were at least three reasons for that. (Note, the below are my thoughts, and Scott Jarvis may think otherwise … indeed, see his book on the matter).

1/ We didn’t make MTLD just to have a competition with other researchers (something like “Hey, we’re the best!”). We did a lot of work to validate MTLD, but it would be arrogant of us to assume we had covered every angle. The other measures certainly seem to do a very good job, and the other measures only become “problematic” when text length is quite varied … leading to … if the text lengths are THAT varied then is there something else of importance that the researcher(s) should be considering? (Scott Jarvis has written extensively on this.)

2/ The three measures we recommend work in very different ways; most notably, MTLD assesses text sequentially whereas the other measures swallow the text whole. This means that if you randomize the text, MTLD would give you a completely different value whereas the others would give you the same value each time. So, the question becomes, “Is the sequence of the wording of the text a/the quintessential characteristic of the text?” Well, if for you it is, then MTLD is the only way to go because only MTLD assesses text in that way. So, bottom line, the researcher(s) have to make that call.

3/ I’d like to see a study where certain texts were identified as having marked variation depending on the LD measure used. For example, imagine a corpus of 1000 texts where Text X was 12th most diverse by Measure A, 136th most diverse by Measure B, and 389th most diverse by Measure C. I think it would be valuable to identify such texts and work out “what’s going on!” Such analysis would undoubtedly be revealing of textual characteristics and many other things. If we only go in armed with one measure then we miss the potential to see such outcomes, which may be critical to the understanding of the research.

Finally, when we say “voc-D OR HD-D”, we have to remember that voc-D is … really speaking … just an estimation (albeit a great one) of HD-D (see McCarthy and Jarvis 2007). voc-D has been so widely used that HD-D is unlikely to “replace” it, but … again … really speaking …if your research question/text types make you want to use voc-D then … hmmm … you probably really should be using HD-D. That said, again, because voc-D has so much history now, you’d have less problems getting published with voc-D than HD-D. So, on that score, if you’re looking for some research to do, then writing a paper called “Use HD-D, not voc-D” would possibly be useful.

How to measure LD in Text Inspector

Text Inspector is perhaps the best place on the web to measure Lexical Diversity in your text using these two measures. However, you should use them carefully – find out what they measure and what they do not measure.

Text Inspector measures LD by sampling different parts of your text randomly. This means that each time you run an analysis you will get a slightly different figure for the same text!

You are therefore advised to run the LD test several times on the same text, and take the average.


Source, acknowledgements, and technical information

The Text Inspector LD tool is based on the Perl modules for measuring MTLD and voc-d  developed by Aris Xanthos, which is copyright (c) 2011 Aris Xanthos (, and is released under the GPL license (see

In the technical descriptors are the following notes, which should be borne in mind:

You are therefore advised to run the LD test several times on the same text, and take the average.


[In the module] [t]he computation of MTLD is performed two times, once in left-to-right text order and once in right-to-left text order. Each pass yields a weighted average (and variance), and the two averages are in turned averaged to get the value that is finally reported (the two variances are also averaged). This attribute indicates whether the reported average should itself be weighted according to the potentially different number of observations in the two passes (value ‘within_and_between’), or not (value ‘within_only’). The default value is ‘within_only’, to conform with McCarthy and Jarvis (2010), although the author of this implementation finds it more consistent to select ‘within_and_between’.

voc-d :

This module implements the ‘VOCD’ method for measuring the diversity of text units, cf. McKee, G., Malvern, D., & Richards, B. (2000). Measuring Vocabulary Diversity Using Dedicated Software, Literary and Linguistic Computing, 15(3): 323-337


In a nutshell, this method consists in taking a number of subsamples of 35, 36, …, 49, and 50 tokens at random from the data, then computing the average type-token ratio for each of these lengths, and finding the curve that best fits the type-token ratio curve just produced (among a family of curves generated by expressions that differ only by the value of a single parameter). The parameter value corresponding to the best-fitting curve is reported as the result of diversity measurement. The whole procedure can be repeated several times and averaged.”

Text copyright @ 2015-2017

Other sites

Another useful online tool you could look at is Paul Meara’s tool for measuring D at the technical descriptors are the following notes, which should be borne in mind:

NOTE however, that results from Paul Meara’s tool are not directly comparable with results from Text Inspector, as his tool measures on a scale from 0-100, whereas TI measures on a scale from 0-200.