Comments and Questions

Use this page to ask a question or tell us about something about TextInspector. Or you can email us at


  1. Sarah

    How would you like your website (particularly the D tool) to be referenced?
    Many thanks,

    • Text Inspector Help Team

      Dear Sarah

      You could cite Text Inspector as a whole as:

      Text Inspector (2018) Online lexis analysis tool at [Accessed 16/08/2018]

      Particularly for the D tool you could also reference, depending on your focus, the following:

      Duran, P, D. Malvern, B. Richards, N. Chipere (2004) “Developmental Trends in Lexical Diversity” Applied Linguistics OUP 25/2: 220-242

      McCarthy, P. M., & Jarvis, S. (2007) ‘vocd: A theoretical and empirical evaluation’. Language Testing, 24, 459-488

      McCarthy, P.M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment, Behavior Research Methods, 42(2): 381-392

      More information on these three can be found under the Lexical Diversity heading on our sidebar.

      Julie Harris
      Text Inspector Help Team

  2. Domitille Lochet

    Hello, is it possible to analyse a text in Spanish?

    • Text Inspector Help Team

      Dear Domitille Lochet,

      Thanks for your comment!

      We currently do not recommend using Text Inspector as a whole for Spanish. There are some tools (such as the basic statistics) that will work in languages other than English, but the majority do not. The results are not reliable so if you do use it for Spanish caution is advised.

      However we are currently developing a Spanish version of Text Inspector, but it is not close just yet.

      Julie Harris
      Text Inspector Help Team



    I am looking for some texts that fit the APTIS framework with Flesch Kincaid Grade Level (FKGL) (A2 4-6; B1 6-8; B2 9-12). After analysing some texts using, some texts were like C1 with FKGL 10.44 B2; C1 with FKGL 9.80 B2 etc,

    Could you help to clarify this matter? Thank you.

    • Text Inspector Help Team

      Dr. Suhaida Omar,

      Thank you for your comment.

      In answer to your question the FKGL score is just one of many measures for the level of the text. The overall score on the scorecard is made up of many different measures.

      It really depends on the purposes of the material but a text may be readable as per FKGL by B1 students, but if for example the “Lexis: EVP” level of the texts is C1, then those B1 students may not know the vocabulary. Vocabulary tends to quite important (see this link here:

      One measure alone (such as FKGL) cannot determine to CEFR score of the text as a whole. For that reason we use many different measures to give as complete a picture as possible. When you get a B1 FKGL result Text Inspector is telling you that graded texts with the same or similar FKGL as your text tend to be graded at the B1 level.

      It is important to be clear that Text Inspector is not able to give a perfect CEFR score but rather is there as an aid to students, teachers and assessors.

      This is because it compares the inputted text to a large corpus of CEFR levelled texts ONLY for the measures in the scorecard. However there are many aspects of a text that are not measurable through Text Inspector and rather require human judgement (not least grammar).

      I hope this helps, but please let me know if you have any more questions,

      All the best,


  4. Irina Rets

    Dear Text Inspector team,

    I was wondering how to interpret the frequency metrics in your tool (Lexis: BNC or COCA). In my report, I have 1K -2K as 11.26%, for example. What does that mean? Does it mean that I have 11% of words with 1000-2000 word frequency (which is not very frequent)? The higher the K is, the higher the frequency?

    • Text Inspector Help Team

      Dear Irina

      Thank you for your query.

      So, if the metrics says: 1K -2K as 11.26% then it means that 11.26% of words that are used in the text belong to the 1K-2K lists in BNC or COCA.

      1K-2K lists mean that these words are very frequently found in the respective corpus (BNC, COCA). When you see a term like ‘1K list’, that means it is a vocabulary list of 1000 words which are found to have been most frequently used/found in that corpus. So 1K means they are the first 1000 most frequently used/used words.

      The higher the percentage of K1-K2 (or whatever early numbers come with the ‘K’), the more frequently used vocabulary the text includes. If there are some words used from the K6, 7 or higher, then they are regarded as less frequently used words and therefore tend to be more sophisticated or difficult words. (There’s also discussion around the issue that the less frequently used doesn’t necessarily mean they are sophisticated or more difficult – but as a general trend, that’s the way it’s interpreted.)

      Thomas Cobb at the University of Quebec, and Norbert Schmitt at the University of Nottingham are academics who have published a lot in the area of frequency metrics if you would like to read further.

      If you have any other questions please do not hesitate to ask.

      Text Inspector Help Team.

      • Irina Rets

        Thank you so much for your reply! Textinspector is a great tool, I am using it in my PhD readability research and I am so happy it shows a variety of readability metrics for a given text.

  5. Sonthaya Rattanasak


    I intend to subscribe for a one-month period. Do I need to turn off any auto renewal ?
    Please advise as I don’t seem to find the setting for this.


    • Text Inspector Help Team

      Hello – If you subscribe and then turn off auto renewal, you will still get your full month’s use as expected.

      If you ever have any problems please simply email us on and we can give you a refund for any payment you did not mean to send!

      Best wishes,

      Julie Harris
      Text Inspector Help Team

  6. nathan

    Hi can I purchase the individual copy then late upgrade to the organization one?what are the charges for upgrade?

    Kind Regards,

    Nathan M. Kimaku
    Information Technology (IT) Assistant
    East African Educational Publishers Limited

    • vanity22

      Hello, there is no charge for an upgrade. If you switch from one package to another and there is a price difference we will refund you.

  7. Philip McCarthy

    Hi … I’m Phil McCarthy … I’m the one who made MTLD.

    First, this is a GREAT TOOL … awesome that you’ve put it out there so that people can use lexical diversity in assessments.

    Second, I read here that some people wonder why we recommended that researchers use MTLD, vod-D (HD-D) and MAAS given that only MTLD seems to be completely independent of text length. I think there were at least three reasons for that. (Note, the below are my thoughts, and Scott Jarvis may think otherwise … indeed, see his book on the matter).

    1/ We didn’t make MTLD just to have a competition with other researchers (something like “Hey, we’re the best!”). We did a lot of work to validate MTLD, but it would be arrogant of us to assume we had covered every angle. The other measures certainly seem to do a very good job, and the other measures only become “problematic” when text length is quite varied … leading to … if the text lengths are THAT varied then is there something else of importance that the researcher(s) should be considering? (Scott Jarvis has written extensively on this.)

    2/ The three measures we recommend work in very different ways; most notably, MTLD assesses text sequentially whereas the other measures swallow the text whole. This means that if you randomize the text, MTLD would give you a completely different value whereas the others would give you the same value each time. So, the question becomes, “Is the sequence of the wording of the text a/the quintessential characteristic of the text?” Well, if for you it is, then MTLD is the only way to go because only MTLD assesses text in that way. So, bottom line, the researcher(s) have to make that call.

    3/ I’d like to see a study where certain texts were identified as having marked variation depending on the LD measure used. For example, imagine a corpus of 1000 texts where Text X was 12th most diverse by Measure A, 136th most diverse by Measure B, and 389th most diverse by Measure C. I think it would be valuable to identify such texts and work out “what’s going on!” Such analysis would undoubtedly be revealing of textual characteristics and many other things. If we only go in armed with one measure then we miss the potential to see such outcomes, which may be critical to the understanding of the research.

    Finally, when we say “voc-D OR HD-D”, we have to remember that voc-D is … really speaking … just an estimation (albeit a great one) of HD-D (see McCarthy and Jarvis 2007). voc-D has been so widely used that HD-D is unlikely to “replace” it, but … again … really speaking …if your research question/text types make you want to use voc-D then … hmmm … you probably really should be using HD-D. That said, again, because voc-D has so much history now, you’d have less problems getting published with voc-D than HD-D. So, on that score, if you’re looking for some research to do, then writing a paper called “Use HD-D, not voc-D” would possibly be useful.

    • Text Inspector Help Team

      Many thanks – it is great to get an expert view!
      I hope that you don’t mind if we put this onto our Lexical Diversity HELP pages too so that it will be in a more accessible place for people to see?

      Thanks again

      Stephen Bax

  8. svo

    I would like to ask a question. I plan to use this tool for analyzing texts, and I was wondering if there are any validation research studies on this tool.

    Thank you so much,

    I look forward to hearing from you.

    Thank you,

    • Text Inspector Help Team

      Hello and thank you for your query. The main research study, which lies behind the statistics in the Scorecard, was carried out by Professor Stephen Bax on data including thousands of writing, reading and listening texts. This gave the metrics which we use to calculate the statistics. At the moment it is in preparation for publication, but we aim to release some pre-publication details on our Help pages very soon. Thanks.

  9. Rcp

    I just get the idea of your web page from ELTons awards. And I want to be clear about using this measuring tool for my classes at school.
    I wonder in what ways I can use this tool for primary and secondary classes.

    • Text Inspector Help Team

      Hello and thank you for your question. You can use Text Inspector by checking any reading or listening text you want to use in your teaching to see if the vocabulary level is correct for the class you are teaching.

      You can also use it to check the written work of your students to see the level of vocabulary they are using, and help them to use more advanced vocabulary.

      We will soon put some videos on our Help pages to give teachers more ideas.


  10. Thomas Zapounidis

    Dear Textinspector,
    Congratulations once more for your software.It is a handy tools at the hands of many researchers.
    I am currently analysing some learner words from different language skill as to their EVP and BNC coverage. I have two questions-querries
    1) how is the token defined by your softare? I’m asking because running the same amount of text with other software (e.g Antconc) I get different token counts
    2) I have run some of my words with the Textinspector and found that with reference to months ”April” is given as not listed in your BNC list (while in the written BNC list that I have found from Laurence Anthony’s page it is ranked 660).
    Thank you for your time,
    Kind Regards,
    Thomas Zapounidis

    • Text Inspector Help Team

      Hello Thomas and thank you for your question and kind words!

      Tokens and Types:
      It is true that different software tools will analyses tokens and types slightly differently. It is difficult for us to say how we compare with others, but the point is that we show you the analysis in full detail and allow you to change the number of tokens and types yourself if you want to count them differently. This is unique to Text Inspector.

      We use the BNC list from and you are right – it seems to be missing April for some strange reason (though the other months are there). We will add it in the next upgrade. Thanks for noticing it.

      To include it in your analysis, you could add it to the count manually as a Known Word by using the tool on the main page:

      a) click on Use custom known words list below the Text entry box
      b) add ‘April’
      c) run the analysis

      Your word will appear in the count as a Known Word so you can calculate its frequency.


  11. Muadh Al-Essa

    Thanks for making such an amazing tool.
    Is it alright if used this tool in my research project?
    And if so, how would you prefer to be cited as?

    Thank you very much.

    • Text Inspector Help Team

      Hello, thank you for your comments, and yes please do use it in your research!

      You could cite it as
      Text Inspector (2016) Online lexis analysis tool at [Accessed 14/12/2016]

      Best wishes,
      Text Inspector Help Team

  12. Iswara

    Hello there. I’ve just scanned a reading text here and then I saw its lexical profile was displayed as D ..
    My question:
    1. Does this D mean “Distinguished (Above C2)”?
    2. or Does this D mean “Level 4 D = B2”?

    I am confused because I don’t know what this D means ..
    When I follow up this information to ACTFL Proficiency guidelines D means above C2
    When I follow up this information to UCL CLIE ( D means B2

    Need your explanation. Thank you.

    • Text Inspector Help Team

      Hello – on this website we use D to mean “a level above C2”.

  13. Gina

    First of all, I would like to thank you for the incredible tool the simple versions of which you provide for free.
    I intend to use textinspector in my post-graduate research to investigate lexical diversity. After having browsed your site and tried the tool a couple of times I ‘m not sure what score is the one I should be paying attention to. I mean, I ‘ve seen the number of types, tokens, type/token ratio, syllables and so one -all very interesting- but I expected to see one numerical result that expressed lexical diversity. Am I missing something?

    • Text Inspector Help Team

      Thank you Gina. We also think that Text Inspector is incredible!

      The score you should look at is the top line on the Lexical Diversity page. This gives you two different measures of Lexical Diversity, called VocD and MTLD.

      However, both of them are approximate, and involve SAMPLING parts of your text, so if you put a text in twice you will get a slightly different result! You will also sometimes get a very different result from MTLD and VocD, owing to the different ways they sample the text. If you are doing research you should compare different measures and then evaluate the results, also looking closely at other statistics about the vocabulary in your text (e.g. in the BNC and COCA areas).

      Another useful online tool you could look at is Paul Meara’s tool for measuring D at If you put teh same text into that tool and Text Inspector you will get approximately the same measure, but again with slight differences owing to differences in calculation and sampling.

      Best wishes, Tom

      Text Inspector Help Team

      • Gina

        Thank you for your reply, your guidance and your suggestions. I found it and I also subscribed for the standard plan, so I have more info at my disposal.
        I know about the two different measurements of Voc-D and MTLD that attempt to measure lexical diversity better than type/token but could you explain why sometimes the two scores vary a lot and sometimes very little? And am I supposed to find the mean score of these two? I Probably need to study about it but your explanation would be a good start.
        Thank you.

        • Text Inspector Help Team

          I agree that it is not clear or easy. If you look at the article here (which is McCarthy, P.M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment, Behavior Research Methods, 42(2): 381-392) you will see that they say this (in the abstract):

          “MTLD performs well with respect to all four types of validity and is, in fact, the only index not found to vary as a function of text length.”

          So if there is a difference when you use voc-D and MTLD, it might be because of text length, and it seems that MTLD might be the more reliable. They also say:

          “three of the indices—MTLD, vocd-D (or HD-D), and Maas — appear to capture unique lexical information“. (My emphasis) And also:

          “We conclude by advising researchers to consider using MTLD, vocd-D (or HD-D), and Maas in their studies, rather than any single index, noting that lexical diversity can be assessed in many ways and each approach may be informative as to the construct under investigation.”

          I agree that this is not too helpful – they are basically saying that these indices give different information about your text but they don’t say what! They then ay try to use several of them, not only one.

          I suggest that to start with, if you are comparing texts, you choose ONE of them (we prefer MTLD) and also research what that index actually means by looking in depth at the actual words in the text.

  14. Mary Grace Portelli

    I like to write my own tests for my students, but I never quite know what is a safe vocabulary profile percentage to aim for with regard to the targeted level, i.e. if i’m planning a level B2 test, what, in your opinion, would be a safe percentage of level B2 vocabulary to aim for in the input text … would you say that around 12% level B2 vocabulary in a text (+ around 1% at C1) is safe enough to ensure that the level of the text is B2.

    • Text Inspector Help Team

      Mary, your question is exactly what we are working on now. We are researching large samples of reading texts at different CEFR levels, so that in future you can put a text into TI and then it will give you an approximate CEFR level based on a number of key indicators. (At the moment you get a Lexical Profile but it is based on student writing, so it is not fully applicable to reading texts.)

      An example of our work on reading texts: we have found that the percentage of B2 vocabulary in a reading text, as measured using the EVP tool, seems to be a statistically significant indicator of your text’s overall level.

      So in the very near future TI will offer a more advanced measure of a range of indicators so you, as a teacher, can get a quick idea of whether your chosen reading text is in fact right for your students’ level.

  15. David Allen

    I’m currently conducting post-graduate research in corpus linguistics using statistical analysis and text mining techniques and would like to expand on this in future research. Has this corpus and corresponding tools been used in academic research in fields like applied linguistics? Are resources available for someone who might wish to use these tools in a research project?

    • Text Inspector Help Team

      David, thanks for your question. As Text Inspector is quite new it has not yet been used in many research projects. Also, since some of the measures in TI are unique (such as the EVP tool), they are only now available to researchers.

      However, research is now being conducted using TI with large banks of texts to see which measures seem to correlate with which levels of difficulty. At the moment, when you input a text of more than 100 words, you get a Text Inspector Lexical Profile, and this is based on extensive research using written texts. Similar work is under way using reading and listening texts.

      To answer your other question – we feel that TI is very well suited to large scale research projects….

  16. abdikarim

    I find the text inspector very useful, but it is not clearly saying the level of the text. So I would like to know which indicator is teeling the exact over all level of a text, like B1.

    • Text Inspector help team

      Thanks abdikarim

      You are right. We are developing a measure so that it will tell you more clearly what the statistics mean for each text, so it will give you a good guide about the level of your text – A1, B2 and so on.

      This needs some careful research on a large body of texts, so it will take some tIme, but when it is ready we will publish it on the site and also here.

      Thanks for your interest.


Submit a Comment

Your email address will not be published. Required fields are marked *