  1. I would like to ask a question. I plan to use this tool for analyzing texts, and I was wondering if there are any validation research studies on this tool.

    Thank you so much,

    I look forward to hearing from you.

    Thank you,


    1. Text Inspector Help Team June 10, 2017 at 2:17 pm

      Hello and thank you for your query. The main research study, which lies behind the statistics in the Scorecard, was carried out by Professor Stephen Bax on data including thousands of writing, reading and listening texts. This gave the metrics which we use to calculate the statistics. At the moment it is in preparation for publication, but we aim to release some pre-publication details on our Help pages very soon. Thanks.


  2. Hi,
    I just get the idea of your web page from ELTons awards. And I want to be clear about using this measuring tool for my classes at school.
    I wonder in what ways I can use this tool for primary and secondary classes.


    1. Text Inspector Help Team May 12, 2017 at 5:31 pm

      Hello and thank you for your question. You can use Text Inspector by checking any reading or listening text you want to use in your teaching to see if the vocabulary level is correct for the class you are teaching.

      You can also use it to check the written work of your students to see the level of vocabulary they are using, and help them to use more advanced vocabulary.

      We will soon put some videos on our Help pages to give teachers more ideas.



  3. Thomas Zapounidis December 30, 2016 at 11:32 am

    Dear Textinspector,
    Congratulations once more for your software.It is a handy tools at the hands of many researchers.
    I am currently analysing some learner words from different language skill as to their EVP and BNC coverage. I have two questions-querries
    1) how is the token defined by your softare? I’m asking because running the same amount of text with other software (e.g Antconc) I get different token counts
    2) I have run some of my words with the Textinspector and found that with reference to months ”April” is given as not listed in your BNC list (while in the written BNC list that I have found from Laurence Anthony’s page it is ranked 660).
    Thank you for your time,
    Kind Regards,
    Thomas Zapounidis


    1. Text Inspector Help Team January 14, 2017 at 4:51 pm

      Hello Thomas and thank you for your question and kind words!

      Tokens and Types:
      It is true that different software tools will analyses tokens and types slightly differently. It is difficult for us to say how we compare with others, but the point is that we show you the analysis in full detail and allow you to change the number of tokens and types yourself if you want to count them differently. This is unique to Text Inspector.

      We use the BNC list from and you are right – it seems to be missing April for some strange reason (though the other months are there). We will add it in the next upgrade. Thanks for noticing it.

      To include it in your analysis, you could add it to the count manually as a Known Word by using the tool on the main page:

      a) click on Use custom known words list below the Text entry box
      b) add ‘April’
      c) run the analysis

      Your word will appear in the count as a Known Word so you can calculate its frequency.



  4. Hello,
    Thanks for making such an amazing tool.
    Is it alright if used this tool in my research project?
    And if so, how would you prefer to be cited as?

    Thank you very much.


    1. Text Inspector Help Team December 14, 2016 at 5:54 pm

      Hello, thank you for your comments, and yes please do use it in your research!

      You could cite it as
      Text Inspector (2016) Online lexis analysis tool at [Accessed 14/12/2016]

      Best wishes,
      Text Inspector Help Team


  5. Hello there. I’ve just scanned a reading text here and then I saw its lexical profile was displayed as D ..
    My question:
    1. Does this D mean “Distinguished (Above C2)”?
    2. or Does this D mean “Level 4 D = B2”?

    I am confused because I don’t know what this D means ..
    When I follow up this information to ACTFL Proficiency guidelines D means above C2
    When I follow up this information to UCL CLIE ( D means B2

    Need your explanation. Thank you.


    1. Text Inspector Help Team May 20, 2016 at 6:33 pm

      Hello – on this website we use D to mean “a level above C2”.


  6. First of all, I would like to thank you for the incredible tool the simple versions of which you provide for free.
    I intend to use textinspector in my post-graduate research to investigate lexical diversity. After having browsed your site and tried the tool a couple of times I ‘m not sure what score is the one I should be paying attention to. I mean, I ‘ve seen the number of types, tokens, type/token ratio, syllables and so one -all very interesting- but I expected to see one numerical result that expressed lexical diversity. Am I missing something?


    1. Text Inspector Help Team April 28, 2016 at 2:05 pm

      Thank you Gina. We also think that Text Inspector is incredible!

      The score you should look at is the top line on the Lexical Diversity page. This gives you two different measures of Lexical Diversity, called VocD and MTLD.

      However, both of them are approximate, and involve SAMPLING parts of your text, so if you put a text in twice you will get a slightly different result! You will also sometimes get a very different result from MTLD and VocD, owing to the different ways they sample the text. If you are doing research you should compare different measures and then evaluate the results, also looking closely at other statistics about the vocabulary in your text (e.g. in the BNC and COCA areas).

      Another useful online tool you could look at is Paul Meara’s tool for measuring D at If you put teh same text into that tool and Text Inspector you will get approximately the same measure, but again with slight differences owing to differences in calculation and sampling.

      Best wishes, Tom

      Text Inspector Help Team


      1. Thank you for your reply, your guidance and your suggestions. I found it and I also subscribed for the standard plan, so I have more info at my disposal.
        I know about the two different measurements of Voc-D and MTLD that attempt to measure lexical diversity better than type/token but could you explain why sometimes the two scores vary a lot and sometimes very little? And am I supposed to find the mean score of these two? I Probably need to study about it but your explanation would be a good start.
        Thank you.


        1. Text Inspector Help Team April 28, 2016 at 9:41 pm

          I agree that it is not clear or easy. If you look at the article here (which is McCarthy, P.M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment, Behavior Research Methods, 42(2): 381-392) you will see that they say this (in the abstract):

          “MTLD performs well with respect to all four types of validity and is, in fact, the only index not found to vary as a function of text length.”

          So if there is a difference when you use voc-D and MTLD, it might be because of text length, and it seems that MTLD might be the more reliable. They also say:

          “three of the indices—MTLD, vocd-D (or HD-D), and Maas — appear to capture unique lexical information“. (My emphasis) And also:

          “We conclude by advising researchers to consider using MTLD, vocd-D (or HD-D), and Maas in their studies, rather than any single index, noting that lexical diversity can be assessed in many ways and each approach may be informative as to the construct under investigation.”

          I agree that this is not too helpful – they are basically saying that these indices give different information about your text but they don’t say what! They then ay try to use several of them, not only one.

          I suggest that to start with, if you are comparing texts, you choose ONE of them (we prefer MTLD) and also research what that index actually means by looking in depth at the actual words in the text.


  7. Mary Grace Portelli August 21, 2015 at 2:16 pm

    I like to write my own tests for my students, but I never quite know what is a safe vocabulary profile percentage to aim for with regard to the targeted level, i.e. if i’m planning a level B2 test, what, in your opinion, would be a safe percentage of level B2 vocabulary to aim for in the input text … would you say that around 12% level B2 vocabulary in a text (+ around 1% at C1) is safe enough to ensure that the level of the text is B2.


    1. Text Inspector Help Team September 7, 2015 at 10:00 am

      Mary, your question is exactly what we are working on now. We are researching large samples of reading texts at different CEFR levels, so that in future you can put a text into TI and then it will give you an approximate CEFR level based on a number of key indicators. (At the moment you get a Lexical Profile but it is based on student writing, so it is not fully applicable to reading texts.)

      An example of our work on reading texts: we have found that the percentage of B2 vocabulary in a reading text, as measured using the EVP tool, seems to be a statistically significant indicator of your text’s overall level.

      So in the very near future TI will offer a more advanced measure of a range of indicators so you, as a teacher, can get a quick idea of whether your chosen reading text is in fact right for your students’ level.


  8. I’m currently conducting post-graduate research in corpus linguistics using statistical analysis and text mining techniques and would like to expand on this in future research. Has this corpus and corresponding tools been used in academic research in fields like applied linguistics? Are resources available for someone who might wish to use these tools in a research project?


    1. Text Inspector Help Team September 7, 2015 at 9:53 am

      David, thanks for your question. As Text Inspector is quite new it has not yet been used in many research projects. Also, since some of the measures in TI are unique (such as the EVP tool), they are only now available to researchers.

      However, research is now being conducted using TI with large banks of texts to see which measures seem to correlate with which levels of difficulty. At the moment, when you input a text of more than 100 words, you get a Text Inspector Lexical Profile, and this is based on extensive research using written texts. Similar work is under way using reading and listening texts.

      To answer your other question – we feel that TI is very well suited to large scale research projects….


  9. I find the text inspector very useful, but it is not clearly saying the level of the text. So I would like to know which indicator is teeling the exact over all level of a text, like B1.


    1. Text Inspector help team December 27, 2014 at 11:40 pm

      Thanks abdikarim

      You are right. We are developing a measure so that it will tell you more clearly what the statistics mean for each text, so it will give you a good guide about the level of your text – A1, B2 and so on.

      This needs some careful research on a large body of texts, so it will take some tIme, but when it is ready we will publish it on the site and also here.

      Thanks for your interest.


