Latent Class Analysis (LCA) is a statistical method for finding subtypes of related cases (latent classes) from multivariate categorical data [1]. The most common use of LCA is to discover case subtypes (or confirm hypothesized subtypes) based on multivariate categorical data [1–4]. LCA is well suited to many health applications where one wishes to identify disease subtypes or diagnostic subcategories [1–4]. LCA models do not rely on traditional modeling assumptions (normal distribution, linear relationship, homogeneity) and are therefore, less subject to biases associated with data not conforming to model assumptions [1–4]. In this paper, we demonstrate the utility of LCA for the prediction of falls among community dwelling elderly.

Falls among the elderly are a major public health concern. Research on falls and fall-related behavior among the elderly has found that falls are the leading cause of injury deaths among individuals who are over 65 years of age [5–11]. Research has shown that sixty percent of fall-related deaths occur among individuals who are 75 years of age or older [5–11]. Demography research estimates that by 2030, the population of individuals who are 65 years of age or older will double and by 2050 the population of individuals who are 85 years of age or older will quadruple [5–11].

Predicting elderly falling can be complex and often involves heterogeneous markers. Therefore, the identification of more homogeneous subgroups of individuals and the refinement of the measurement criteria are typically inter-related research goals. Appropriate statistical applications, such as latent class analysis, have become available for researchers to model the complex heterogenous measurements.

Latent class models are used to cluster participants. This type of model is adequate if the sample consists of different subtypes and it is not known before-hand which participant belongs to which of the subtypes [2]. The latent categorical variable is used to model heterogeneity. In the classic form of the latent class model, observed variables within each latent class are assumed to be independent, and no structure for the covariances of observed variables is specified [2].

LCA is one of the most widely used latent structure models for categorical data [12]. LCA differs from more well-known methods such as K-means clustering which apply arbitrary distance metrics to group individuals based on their similarity [13–15]. LCA derives clusters based on conditional independence assumptions applied to multivariate categorical data distributed as binomial or multinomial variables [16, 17]. Using statistical distributions rather than distance metrics to define clusters helps in evaluating whether a model with a particular number of clusters is able to fit the data, since tests can be performed to observed (ni) versus model expected values (mi), using exact methods as recommended [18, 19]. This comparison gives rise to a *χ*
^{2} test of global model fit, in which significant values indicate lack of fit [20]. Here lack of fit means deviation of (model) predicted (m) frequencies from observed frequencies (n) [16].

Latent class analysis assumes that each observation is a member of one and only one latent class (unobservable) and that the indicator (manifest) variables are mutually independent of each other [20]. The models are expressed in probabilities of belonging to each latent class. For example, seven manifest variables can be expressed as:

{\mathrm{\pi}}_{\mathrm{ijklmnot}}{=\mathrm{\pi}}_{\mathrm{t}}^{\mathrm{X}}{\mathrm{\pi}}_{\mathrm{it}}^{\mathrm{A}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{jt}}^{\mathrm{B}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{kt}}^{\mathrm{C}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{lt}}^{\mathrm{D}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{mt}}^{\mathrm{E}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{nt}}^{\mathrm{F}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{ot}}^{\mathrm{G}|\mathrm{X}}

where {\mathrm{\pi}}_{\mathrm{t}}^{\mathrm{X}} denotes the probability of being in a latent class (t = 1,2,…,T) of latent variable X; {\mathrm{\pi}}_{\mathrm{it}}^{\mathrm{A}|\mathrm{X}} denotes the conditional probability of obtaining the *i* th response from item A, from members of class t, i = 1,2,…,I; and {\mathrm{\pi}}_{\mathrm{jt}}^{\mathrm{B}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{kt}}^{\mathrm{C}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{lt}}^{\mathrm{D}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{mt}}^{\mathrm{E}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{nt}}^{\mathrm{F}|\mathrm{X}}{\mathrm{\pi}}_{\mathrm{ot}}^{\mathrm{G}|\mathrm{X}}, j = 1,2,…,j k = 1,2,…,k l = 1,2,…,l m = 1,2,…,m n = 1,2,…,n O = 1,2,…,O are the corresponding conditional probabilities for items B,C,D,E,F, and G respectively.

We are testing the hypothesis that a two-class distal relationship (fall/no fall) can explain the relationship among the biomedical, pharmacological and demographic variables. Proper analysis of this data requires the understanding of two interdependent outcomes.^{2} First, the binary outcome is whether or not the event occurred (fall or no fall) and second what covariates increase or decrease the likelihood of this occurrence. The four specific aims of the study are to identify items that indicate classes, estimate class probabilities, relate the class probabilities to covariates, and predict a distal outcome (fall/no-fall) based on class membership. We model this process through the application of latent class analysis (Figure 1).