Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening

Table 2 Corpora descriptive statistics for characters, words, and tokens

Corpus	Num. chars	Num. chars	Num. chars	Num. words	Num. words	Num. words	Num. tokens	Num. of tokens	Num. tokens
Corpus	Mean	Max	Min	Mean	Max	Min	Mean	Max	Min
ACT (training)	1229.6	52,491.0	0	216.9	9350.0	0	260.8	10,946.0	0
Gen. Pop. (training)	1324.9	76,831.0	0	233.0	15,029.0	0	276.9	17,422.0	0
Gen. Pop. (validation)	1118.7	58,080.0	0	196.9	9588.0	0	234.9	11,251.0	0

ISSN: 1472-6947