Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kucera and Francis. Behavior Research Methods, Instruments, & Computers, 30, 272-277.
Available in PDF format (1.6MB)


Word frequency is one of the strongest determiners of reaction time (RT) in word recognition tasks; it is an important theoretical and methodological variable. The Kucera and Francis (1967) word frequency count (derived from the 1-milion-word Brown corpus) is used by most investigators concerned with the issue of word frequency. Word frequency estimates from the Brown corpus were compared with those from a 131-million-word corpus (the HAL corpus; conversational text gathered from Usenet) in a standard word naming task with 32 subjects. RT was predicted equally well by both corpora for high-frequency words, but the larger corpus provided better predictors for low- and medium-frequency words. Furthermore, the larger corpus provides estimates for 97,261 lexical items; the smaller corpus, for 50,406 items.