Frequencies of the collected EHR fields (a) and descriptive statistics of the unstructured clinical notes (b). *A data entry is a piece of information (e.g. diagnosis) documented during a patient’s visit. If a patient has the same diagnosis/ICD-9 code during multiple visits, we only count the diagnosis/ICD-9 code once for that patient. **Tokens include words, numbers, symbols and punctuations in clinical narratives.