What is Lexical Diversity?
Text Inspector is a professional online tool for measuring Lexical Diversity
Lexical Diversity refers to “the range of different words used in a text, with a greater range indicating a higher diversity“ (McCarthy and Jarvis 2010: 381).
So what does Lexical Diversity mean? Imagine a text which keeps repeating the same few words again and again – for example: ‘manager‘, ‘thinks‘ and ‘finishes‘.
Compare this with a text which avoids that sort of repetition, and instead uses different vocabulary for the same ideas, ‘manager, boss, chief, head, leader‘, ‘thinks, deliberates, ponders, reflects‘, and ‘finishes, completes, finalises‘.
The second text is likely to be more complex and more difficult. It is said to have more ‘Lexical diversity’ than the first text, and this is why Lexical Diversity (LD) is thought to be an important measure of text difficulty. If a text has a higher index of LD or D it is likely to be more complex, more advanced and more difficult (if other things are equal).
As Duran et al. have said:
“A key to understanding the general principle is that lexical diversity is about more than vocabulary range. Alternative terms, ‘flexibility’, ‘vocabulary richness’ (Read 2000), ‘verbal creativity’ (Fradis. Mihailescu, and Jipescu 1992), or ‘lexical range and balance’ (Crystal 1982), indicate that it has to do with how vocabulary is deployed as well as how large the vocabulary might be.” (Duran, Malvern, Richards, Chipere 2004:220)
What do the measures mean?
The Text Inspector tool allows you to measure Lexical Diversity instantly.
Research has show significant differences in this measure (D) as children develop into adults. Duran et al. offer a useful scale as follows:
For the purposes of Text Inspector, the key finding from this is that Adult second language learner writing would typically have a D measure of around 40-70, while native speaker adult academic writing would typically have a measure of around 80-105, in very general terms.
How can we measure Lexical Diversity?
McCarthy and Jarvis note that:
“[[a] reliable index of lexical diversity (LD) has remained stubbornly elusive for over 60 years“ (McCarthy & Jarvis 2007: Abstract)
The main problem is that some measures of LD do not take account fully of differences in text length. In other words, if they are used on texts of different lengths they can give misleading results. (For example, the traditional measure of type/token ratio (TTR) is susceptible to this problem. See here for a discussion.)
MTLD and voc-d
However, researchers now tend to agree that two measures seem to be particularly reliable, namely MTLD and vocd-D. These are the two measures which Text Inspector allows you to measure. McCarthy and Jarvis summarise a comprehensive analysis as follows:
“We conclude by advising researchers to consider using MTLD, vocd-D (or HD-D), and Maas in their studies, rather than any single index, noting that lexical diversity can be assessed in many ways and each approach may be informative as to the construct under investigation.” (McCarthy and Jarvis 2010: Abstract, p381)
They note, however, that it still retains an element of sensitivity to text length.
See also this interesting discussion on “Evaluating the Comparability of Two Measures of Lexical Diversity” by Fredrik deBoer. He also makes the point that: “VOCD-D is still affected by text length, and its developers caution that outside of an ideal range of perhaps 100-500 words, the figure is less reliable.” (np)
How to measure LD in Text Inspector
Text Inspector is perhaps the best place on the web to measure Lexical Diversity in your text using these two measures. However, you should use them carefully – find out what they measure and what they do not measure.
Text Inspector measures LD by sampling different parts of your text randomly. This means that each time you run an analysis you will get a slightly different figure for the same text!
You are therefore advised to run the LD test several times on the same text, and take the average.
Source, acknowledgements, and technical information
The Text Inspector LD tool is based on the Perl modules for measuring MTLD and voc-d developed by Aris Xanthos, which is copyright (c) 2011 Aris Xanthos (email@example.com), and is released under the GPL license (see http://www.gnu.org/licenses/gpl.html).
In the technical descriptors are the following notes, which should be borne in mind:
MTLD: “[In the module] [t]he computation of MTLD is performed two times, once in left-to-right text order and once in right-to-left text order. Each pass yields a weighted average (and variance), and the two averages are in turned averaged to get the value that is finally reported (the two variances are also averaged). This attribute indicates whether the reported average should itself be weighted according to the potentially different number of observations in the two passes (value ‘within_and_between’), or not (value ‘within_only’). The default value is ‘within_only’, to conform with McCarthy and Jarvis (2010), although the author of this implementation finds it more consistent to select ‘within_and_between‘.
voc-d : “This module implements the ‘VOCD’ method for measuring the diversity of text units, cf. McKee, G., Malvern, D., & Richards, B. (2000). Measuring Vocabulary Diversity Using Dedicated Software, Literary and Linguistic Computing, 15(3): 323-337 .
In a nutshell, this method consists in taking a number of subsamples of 35, 36, …, 49, and 50 tokens at random from the data, then computing the average type-token ratio for each of these lengths, and finding the curve that best fits the type-token ratio curve just produced (among a family of curves generated by expressions that differ only by the value of a single parameter). The parameter value corresponding to the best-fitting curve is reported as the result of diversity measurement. The whole procedure can be repeated several times and averaged.”
Text copyright @ Textinspector.com 2015
Another useful online tool you could look at is Paul Meara’s tool for measuring D at http://www.lognostics.co.uk/tools/D_Tools/D_Tools.htm.
NOTE however, that results from Paul Meara’s tool are not directly comparable with results from Text Inspector, as his tool measures on a scale from 0-100, whereas TI measures on a scale from 0-200.