- Research article
- Open Access
- Published:

# Medical practices display power law behaviors similar to spoken languages

*BMC Medical Informatics and Decision Making*
**volume 13**, Article number: 102 (2013)

## Abstract

### Background

Medical care commonly involves the apprehension of complex patterns of patient derangements to which the practitioner responds with patterns of interventions, as opposed to single therapeutic maneuvers. This complexity renders the objective assessment of practice patterns using conventional statistical approaches difficult.

### Methods

Combinatorial approaches drawn from symbolic dynamics are used to encode the observed patterns of patient derangement and associated practitioner response patterns as sequences of symbols. Concatenating each patient derangement symbol with the contemporaneous practitioner response symbol creates “words” encoding the simultaneous patient derangement and provider response patterns and yields an observed vocabulary with quantifiable statistical characteristics.

### Results

A fundamental observation in many natural languages is the existence of a power law relationship between the rank order of word usage and the absolute frequency with which particular words are uttered. We show that population level patterns of patient derangement: practitioner intervention word usage in two entirely unrelated domains of medical care display power law relationships similar to those of natural languages, and that–in one of these domains–power law behavior at the population level reflects power law behavior at the level of individual practitioners.

### Conclusions

Our results suggest that patterns of medical care can be approached using quantitative linguistic techniques, a finding that has implications for the assessment of expertise, machine learning identification of optimal practices, and construction of bedside decision support tools.

## Background

Medical care commonly involves the apprehension of complex patterns of patient derangements to which the practitioner responds with patterns of interventions, as opposed to single therapeutic maneuvers. This complexity renders the objective assessment of practice patterns using conventional statistical approaches difficult. Combinatorial approaches drawn from symbolic dynamics can be used to encode the observed patterns of patient derangement and associated practitioner response patterns as sequences of symbols [1–7]. Concatenating each patient derangement symbol with the contemporaneous practitioner response symbol creates “words” encoding the simultaneous patient derangement and provider response patterns and yields an observed vocabulary with quantifiable statistical characteristics [7].

We reasoned that, if patient physiologic derangements are encoded as symbol streams and the provider responses are similarly characterized, patterns of medical practice may be quantitatively analyzed using linguistic approaches such as counting the frequency with which particular “words” (integrated symbols) appear. We applied this approach to individuals (subjects) managing simulated patients undergoing mechanical ventilation, as well as to the practices of experienced providers managing anemia in the setting of End Stage Renal Disease, two entirely unrelated clinical settings characterized by vastly different time scales.

A fundamental observation in many natural languages is the existence of a power law relationship between the rank order of word usage and the absolute frequency. In particular, suppose that we have *N* distinct words and one counts the frequency of each word and forms a histogram of the frequencies of the words, starting from the most frequently used word to the word of least frequency. Let *f*
_{
i
} denote the frequency of the *i*^{th} word and order the frequencies so that *f*
_{1} ≥ *f*
_{2} ≥ _{∙∙∙} ≥ *f*
_{
N
}. With each frequency *f*
_{
i
}, associate a rank *r*
_{
i
}∈{1,2,_{∙∙∙},*N*}; hence, *r*
_{
i
} = *i*. The frequencies satisfy *Zipf-like Law* if

where *a* and *β* are positive constants. Alternatively, this relationship could be expressed as

Plotting the variable *y* = ln(*f*) and *x* = ln(*r*) gives a linear relations between *x* and *y* slope − *a* and *y-* intercept at ln(*β*). Figure 1 contains illustrations for text taken from famous literary works [8–10].

Our work investigates the usage of “words” in 2 different domains of medical practice and the degree to which word usage displays Zipf-like laws. In the following sections we will discuss what constitute words for each domain and the extent to which their frequencies satisfy ln (*f*
_{
i
}) = *-a* ln (*r*
_{
i
}) + ln (*β*).

## Methods

The general algorithm for encoding patient derangements, provider responses, and integrated derangement response patterns as symbols is depicted in the subsection below that demonstrates the encoding steps. Analyses were performed using either *Mathematica* 9 (Wolfram Research, Champaign, IL, USA) or *Visual Basic for Applications* running in *Microsoft Excel* (Redmond, Washington, USA). The mechanical ventilation study was deemed anonymous Exempt Educational Research by the University of Pittsburgh and University of Hawaii Institutional Review Boards; the dialysis investigation was approved as Minimal Risk by the Veterans Affairs- Pittsburgh Institutional Review Board. For the textual examples, source files were obtained from the integrated *Mathematica* database.

### Word usage in works of literature

Text files for the first ~ 2000 words of Beowulf, Through the Looking Glass (Lewis Carroll), The Origin of the Species (Charles Darwin), and Genesis (King James Version), as well as the first ~ 1100 words of The Love Song of J Alfred Prufrock (TS Eliot) were obtained from the Wolfram Research database. These files were converted to tab delimited strings after elimination of punctuation marks. For each corpus, a dictionary of all observed words was constructed; these were combined to form a total lexicon. For each work, and for the total lexicon, the appearance frequency of each word was determined and these appearance frequencies were rank ordered to generate the figures.

### Mechanical ventilation

We applied symbolic dynamics approaches to analyze practice patterns germane to managing machine-based respiratory support in patients unable to breathe independently (mechanical ventilation). Mechanical ventilation is a lifesaving but potentially harmful intervention in which a practitioner adjusts machine support parameters (such as oxygen concentration, breath frequency, and breath size) on a minute-by-minute or day-by-day basis. Using previously developed models of mechanical ventilation, a digital training simulator was developed that confronted subjects with a physiologically realistic population of 100 dynamically active “virtual patients,” in concert with defined physiologic goals [11–17]. All sequential physiologic values taken on by the virtual patients, as well as the sequential adjustments made by the subjects, were recorded. A more recent version of the simulator can be downloaded, free of charge, from http://www.math.vanderbilt.edu/~pscrooke/CANVENT/upload.html.

Elements of the pattern of physiologic derangement comprised oxygen saturation (goal: > 90%), pH (goal: lower bound < pH <7.45, where lower bound is disease process specific), mean arterial blood pressure (goal: > 65 mmHg), and end-of-breath distending pressure in the airspaces (goal: < 30 cmH_{2}O); subjects were provided goal ranges for each of these variables for each virtual patient. Elements of the intervention pattern comprised inspired concentration of oxygen, applied end expiratory pressure, breath frequency, and mode of ventilation (pressure controlled/time cycled or volume controlled). For pressure controlled ventilation, applied inspiratory pressure and inspiratory time were recorded; for volume controlled ventilation breath size, inspiratory gas flow rate, and inspiratory pause time were recorded.

Combinatorial techniques were applied to encode the pattern of patient derangement prior to each adjustment, as well as the integrated pattern of interventions implemented by the subject at that point. Contemporaneous symbols (disorder and intervention) were combined to define “words” encoding the disorder: intervention pattern. 29 subjects ranging from trainees to experienced faculty members each managed 100 virtual patients, attempting to attain pre-specified and explicit physiologic endpoints. The algorithm that is used to construct the derangement: practitioner intervention words is summarized in a word-algorithm (see Additional file 1).

### Dialysis

The second domain of medical interaction analyzed was the management of anemia in End Stage Renal Disease, an area of great clinical scrutiny with shifting clinical goals generally encompassing a hemoglobin value of greater than 10 and less than 12 g/L. Therapeutic adjustments–such as increasing the dose of erythropoiesis stimulating agents or the administration of iron–are made on a time scale of weeks to months.

Elements of the patient patterns for these analyses included hemoglobin and ferritin concentrations, iron saturation, tertile of erythopoiesis stimulating agent dosage, rate of change in hemoglobin from the previous month, and the difference between current hemoglobin and the specified target range. The target ranges used in the analysis were hemoglobin 10–12; iron saturation >20%; and ferritin > 200 and less than 500. Elements of the provider response pattern included changes in erythropoiesis stimulating agent (start, stop, increase or decrease by 25 to 50%, increase or decrease by >50%) and iron management (start, stop, or intensify).

A database interrogation tool was constructed to capture the prevailing pattern of derangement in each patient, as well as the pattern of response implemented by the practitioner in response to the observed ensemble of patient derangements. The database incorporated preexisting data from approximately 30 patients in a medium size metropolitan dialysis unit who were followed over a 3-year period. Management was implemented by nine trainees and one Advanced Practice Registered Nurse, all supervised by one of five senior attending physicians who rotated in through the unit on a monthly basis. There is not a specific anemia treatment protocol in this dialysis unit.

### Intermittent silence analysis of mechanical ventilation

We now address an interesting question that is based on earlier work by Miller: to what extent might the observed patterns reflect a statistical process driven by random selection of “keystrokes” (selected interventions) punctuated by intermittent spaces [18]? This is equivalent to a generating model in which “…a monkey hits the keys of a typewriter at random…” and the length distribution of the terms thus produced is conditioned by the number of keys and the probability of striking the space key. Miller’s argument was subsequently formalized by Li [19]. We addressed this question both for the individual interventions implemented and for the lengths of solutions observed for the virtual patients.

To investigate the patterns displayed in individual interventions during simulated volume controlled mechanical ventilation, we conceptualized a keyboard having 14 keys: 12 setting change keys (increase tidal volume, decrease tidal volume; increase frequency, decrease frequency, *etc*.), one key for a fluid bolus, and one space bar (no other interventions). For each parameter, only the presence of a change in a setting was considered, not the magnitude of the change itself. A similar approach was implemented for pressure controlled ventilation, albeit using the 10 setting changes possible in this mode, one key for a fluid bolus, and one space bar. The monkey randomly strikes keys on the keyboard until the space key is struck, and the number of inputs changed (up to 7 in volume controlled ventilation and 6 in pressure controlled ventilation) corresponds to the word length in Miller and Li’s work, and is denoted by “intervention pattern length.” The rank ordered distributions of intervention pattern lengths > 0 for volume cycled and pressure control ventilation were determined separately. These were compared to the distributions expected from Miller’s intermittent silence and Li’s random text analyses.

To address solution lengths, we considered the lengths of “solutions:” sequences of interventions required to complete the solution (or non-solution) of an individual virtual patient in the mechanical ventilation data. The number of interventions required to complete each virtual patient was counted, yielding a distribution of solution lengths that were then rank ordered by appearance frequency, and the effective frequency of a “space” was determined within the dataset. For the combination of volume controlled and pressure controlled ventilation there were 1946 discrete possible interventions (1458 in volume controlled + 486 in pressure controlled + 2 for changes between modes), corresponding to the “keys” in [18] and “symbols” in [19]. Only solutions of length greater than or equal to 1 and having a total appearance count >1 were considered. The values obtained were used to parameterize equations 6 of [18] and [19], and observed behaviors versus those anticipated from “intermittent silence” and “random text” processes were compared.

## Results and discussion

### Mechanical ventilation

The mechanical ventilation database contained 14,629 disorder: intervention dyads. Incomplete pairs resulting from keystroke errors accounted for 0.6% and were excluded, leaving 14,552 dyads for analysis. We analyzed the last 25 virtual patients managed by each subject, comprising a database of 2926 dyads. Comparing the appearance frequency of each word to its rank order revealed a power law relationship. Plotting the natural logarithm of appearance frequency against the natural logarithm of rank order over approximately 6 orders of magnitude yielded a frequency: rank slope of -0.95 and an R^{2} of 0.96 (Figure 2).

### Dialysis

797 unique exemplars of patient derangement: practitioner intervention were available for analysis. As above, combinatorial techniques were used to encode the observed pattern of patient derangement and the integrated pattern of the practitioner’s response. These symbols were then concatenated to form words, and the frequency of word usage in the population of practitioners was characterized. Comparing the appearance frequency of each word to its rank order revealed a power law relationship. Plotting the natural logarithm of appearance frequency against the natural logarithm of rank order over approximately 6 orders of magnitude yielded a frequency: rank slope of -0.73 and an R^{2} of 0.93 (Figure 3). An analysis at the individual level was not possible for this population, due to infeasibility of precisely ascribing interventions to unique providers.

### Power law behavior at the population level reflects similar behavior at the individual level

An important question is whether the population level power law behavior reflected power law behavior at the individual level or was attributable to characteristics of the provider population and did not reflect power law behavior at the individual level. We used the mechanical ventilation data to address this issue, plotting overall word frequency versus word rank for each subject (Figure 4). These analyses revealed power law behavior at the level of individuals, with a mean slope of the natural logarithm of appearance frequency versus natural logarithm of rank of -0.74 (range, -0.48 to -0.93; standard deviation 0.13) and a mean correlation coefficient of -0.96 (range, -0.92 to -0.98, standard deviation 0.015).

### Observed intervention lengths did not fit either an intermittent silence or random text model

Plots of intervention length frequency versus intervention length rank for both volume controlled ventilation and pressure controlled ventilation were characterized by steep declines in frequency as rank number increased; this trajectory was smooth, did not follow a clear power law, and was much more marked than the behavior anticipated from implementing an intermittent silence model or a random text model (Figure 5).

### Observed solution lengths did not fit either an intermittent silence or random text model

The observed frequency of an effective space (patient completed) was 0.2. The observed data display a strong power law behavior that is much more dramatic than that anticipated in an intermittent silence model using 1946 keys and a pause frequency of 0.2 or a random text model (Figure 6). When an effective space frequency of 5.1 × 10^{-4} (corresponding to a purely random selection of the space bar) is used, the difference is more marked.

Our findings suggest that patterns of complex, multidimensional medical care can be captured using symbolic dynamic techniques, and that the quantitative languages thereby constructed display statistical similarities to natural languages. This appears to be a population level phenomenon reflective of behaviors at the individual level (Figure 4). Moreover, during the simulated practice of mechanical ventilation the “solution lengths” also followed a power law, and the observed trajectories of both intervention lengths and the solutions into which they are concatenated diverged sharply from those predicted by intermittent silence and random text models. The observation of power law behavior in the “languages” adopted by practitioners as they addressed problems in two vastly different domains of care and characterized by entirely different timescales suggests that this statistical finding may be germane to other arenas of titratable care.

Power law behavior has been observed in many settings, ranging from the growth of cities through income levels and including the “coding” of clinical diagnoses [20–23]. As noted in [18, 19], there is debate regarding the extent to which such dynamics reflect meaningful dynamic underpinnings. In particular, Miller suggested that the power law behavior observed by Zipf was a consequence of “intermittent silences” punctuating a random process, and Li contends that the choice of rank as the independent variable accounts for the findings of Zipf.

In the case of languages, at least 3 mechanistic explanations for power law behavior have been put forth. In one, the natural tension between communicating parties leads to differential encoding between “sender” and “receiver,” with consequent power law behavior [24–27]. Alternatively, the principle of “selective attachment,” possibly acting through Pittman-Yor or similar mechanisms, may lead to clustering of “concepts” [28]. Nowack and Krakauer provide an evolutionary analysis of language development in which a limited number of phonemes, the potential for erroneous message transmission/interpretation, and the advantages of linking objects to actions leads to the development of languages having structure at multiple levels [29]. Finally, a combination of preferential attachment and “similarity” between sets of interventions may be at play [30].

Our finding that observed behaviors in the practice of mechanical ventilation deviate dramatically from those expected under either intermittent silence or random text analyses suggest that–in this context of problem solving–Zipf like behavior is not merely a consequence of intermittent silences or the selection of rank as the independent variable, but might reflect one of the alternative mechanisms enumerated above. Given that our data did not require “external communication,” we favor selective attachment mechanisms as the dominant factors in this context; however, the possibility that the practitioner is–to some extent–“speaking” to her/himself cannot be excluded, and would align with [27] and [29]. If this latter explanation holds, the multi-scale behaviors (“word length” and “solution length”) observed may share causes similar to the evolutionary pattern described by Nowak.

A data mining approach in which patient derangement: practitioner intervention dyads are encoded as symbol streams could prove useful for gauging the likely patient-specific efficacy of multicomponent therapeutic interventions (bedside decision support) as well as stratifying the potential effectiveness of multidimensional interventions at the population level. The power law behavior we observed suggests that, if data are continuously collected with periodic updating of the database, the rate of knowledge accrual will be greatest for more common problems, and will fall off in an exponential fashion as the frequency of specific derangement: intervention dyads declines.

In addition, the language-like statistical characteristics we observed may shed light on the cognitive structures underlying medical decision-making. Our subjects appear to have been “speaking mechanical ventilation” or “anemia management.” This suggests that characterizing the patterns of practice adopted by individual practitioners might be a natural approach to assessment of expertise or quality of care, an approach that could more effectively capture the nuances germane to management of individual patients. Similarly, the deviation of observed behaviors on a particular problem set from that anticipated in an intermittent silence model could provide a novel metric of the “difficulty” of the problems included in the set.

## Conclusions

Our results demonstrate that population level patterns of patient derangement:practitioner intervention word usage in two entirely unrelated domains of medical care display power law relationships identical to those of natural languages, and that- in one of these domains- power law behavior at the population level reflects power law behavior at the level of individual practitioners. In addition, the distribution of solution lengths in one of the domains does not follow that expected for a process driven by intermittent silence. Our results suggest that patterns of medical care can be approached using quantitative linguistic techniques, a finding that has implications for the assessment of expertise, machine learning identification of optimal practices, and construction of bedside decision support tools.

## References

Engbert R, et al: Tempo-induced transitions in polyrhythmic hand movements. Physical Review E. 1997, 56: 5823-5833. 10.1103/PhysRevE.56.5823.

Hao B: Symbolic dynamics and characterization of complexity. Physica D. 1991, 51: 161-176. 10.1016/0167-2789(91)90229-3.

Engbert R, et al: Symbolic dynamics of physiological synchronization: examples from bimanual movements and cardiorespiratory interaction. Nonlinear Anal Theory, Methods, and Appl. 1997, 30: 973-984. 10.1016/S0362-546X(96)00137-X.

Schwarz U, Benz AO, Kurths J, Witt A: Analysis of solar spike events by means of symbolic dynamics methods. Astron Astrophys. 1993, 277: 215-224.

Tang XZ, Tracy ER, Boozer AD, De Brauw A: Symbol sequence statistics in noisy chaotic signal reconstruction. Physical Review E. 1995, 51: 3871-3889. 10.1103/PhysRevE.51.3871.

Daw CS, Kennel MB, Finney CEA, Connolly FT: Observing and modeling nonlinear dynamics in an internal combustion engine. Physical Review E. 1998, 57: 2811-2819. 10.1103/PhysRevE.57.2811.

Paladino JD, Kaynar AM, Crooke PS, Hotchkiss JR: The language of caring: quantitating medical practice patterns using symbolic dynamics. Math Model Nat Phenomena. 2010, 5: 165-172. 10.1051/mmnp/20105310.

Cantone D, Faro S: On the frequency of characters in natural language texts. Available at: http://www.dmi.unict.it/~faro/papers/conference/faro1.pdf

Ferrer I, Cancho R: The variation of Zipf’s law in human language. Eur Phys J B. 2005, 44: 249-257. 10.1140/epjb/e2005-00121-8.

Debowski L: On the vocabulary of grammar-based codes and the logical consistency of texts. IEEE Trans Inf Theory. 2011, 57: 4589-4599.

Crooke PS, Head JD, Marini JJ, Hotchkiss JR: Patient-ventilator interaction: a general model for non-passive mechanical ventilation. Int Math Association J. 1998, 15: 321-337.

Hotchkiss JR, Dries DJ, Marini JJ, Crooke PS: Dynamical behavior during noninvasive ventilation: chaotic support?. Am J Resp Crit Care Med. 2001, 163: 374-378. 10.1164/ajrccm.163.2.2004004.

Hotchkiss JR, et al: Oscillations and noise: inherent instability of pressure support ventilation?. Am J Resp Crit Care Med. 2002, 165: 47-53. 10.1164/ajrccm.165.1.2101025.

Crooke PS, Hotchkiss JR, Marini JJ: Linear and nonlinear mathematical models for non-invasive, passive ventilation. Math Comput Model. 2002, 35: 1297-1313. 10.1016/S0895-7177(02)00086-9.

Crooke PS, Marini JJ, Hotchkiss JR: Modeling recruitment maneuvers with a variable compliance model for pressure preset ventilation. J Theoretical Medicine. 2002, 4: 197-207. 10.1080/1027366021000023124.

Crooke PS, Hota S, Marini JJ, Hotchkiss JR: Mathematical models of pressure preset ventilation with different resistance assumptions. Math Comput Model. 2003, 38: 495-502. 10.1016/S0895-7177(03)90021-5.

Crooke PS, et al: Mathematical models for the pressure-preset ventilation of oleic acid-injured pigs. Math Med Biol. 2005, 22: 99-112. 10.1093/imammb/dqh023.

Miller GA: Some effects of intermittent silence. Am J Psychol. 1957, 70 (2): 311-314. 10.2307/1419346.

Li W: Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Trans Inf Theory. 1992, 38 (6): 1842-1845. 10.1109/18.165464.

Jiang B, Jia T: Zipf’s law for all the natural cities in the United States: a geospatial perspective. Int J Geogr Inf Sci. 2011, 25 (8): 1269-1281. 10.1080/13658816.2010.510801.

Pareto V: The new theories of economics. J Pol Eco. 1897, 5 (4): 485-502. 10.1086/250454.

Rozenfeld HD, Rybski D, Andrade JS, Batty M, Stanley HE, Maske HA: Laws of population growth. Proc Nat Acad Sci USA. 2008, 105 (48): 18702-18707. 10.1073/pnas.0807435105.

Tachimori Y, Tahara T: Clinical diagnoses following Zipf’s law. Fractals. 2002, 10 (3): 341-351. 10.1142/S0218348X02001117.

Corminas-Murtra B, Fortuny J, Sole R: Emergence of Zipf’s law in the evolution of communication. Physical Review E. 2011, 83: 036115-

Huang S, Renals S: Power law discounting for n-gram language models. 2010, Vancouver, Canada: Proc. IEEE ICASSP-10, 5178-5181.

Czirok A, Mantegna R, Havlin S, Stanley H: Correlations in binary sequences and a generalized Zipf analysis. Physical Review E. 1995, 52: 446-452. 10.1103/PhysRevE.52.446.

Ferrer I, Cancho R, Sole RV: Least Effort and the origins of scaling in human language. Proc Nat Acad Sci USA. 2003, 100 (3): 788-791. 10.1073/pnas.0335980100.

Goldwater S, Griffiths T, Johnson M: Interpolating between types and tokens by estimating power-law generators. Advances in neural information processing systems. 2006, Cambridge, MA: MIT Press, 18

Nowak MA, Krakauer DCR: The evolution of language. Proc Nat Acad Sci USA. 1999, 96: 8028-8033. 10.1073/pnas.96.14.8028.

Papadopoulos F, et al: Popularity versus similarity in growing networks. Nature. 2012, 489: 537-10.1038/nature11459.

### Pre-publication history

The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/13/102/prepub

## Acknowledgements

This research was funded by NIH R01 HL084113. Dr. Paladino received fellowship support from NIH T32HL007820. Drs. Linda Fried and Paul Palevsky provided guidance in regulatory compliance and variable selection for the anemia analyses. The contents do not represent the views of the Department of Veterans Affairs or the United States Government.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Competing interests

The authors declared that they have no competing interests.

### Authors’ contributions

JRH, PSC, and JDP developed the mechanical ventilation simulator. JDP, CRB, and JRH developed the population of virtual patients. JDP and AMK recruited subjects for the ventilation study. JRH and PSC developed the mechanical ventilation and dialysis practice pattern characterization tools. JRH collected and collated the data for the dialysis study. JRH and PSC conducted the analyses. All authors read and approved the final manuscript.

## Electronic supplementary material

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

**Open Access**
This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License (
https://creativecommons.org/licenses/by/2.0
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

### Cite this article

Paladino, J.D., Crooke, P.S., Brackney, C.R. *et al.* Medical practices display power law behaviors similar to spoken languages.
*BMC Med Inform Decis Mak* **13, **102 (2013). https://doi.org/10.1186/1472-6947-13-102

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/1472-6947-13-102

### Keywords

- Power law
- Natural languages
- Mechanical ventilation
- Dialysis
- Medical practices