Common patient-reported outcomes across ICHOM Standard Sets: the potential contribution of PROMIS®

Background The International Consortium for Health Outcomes Measurement (ICHOM) develops condition-specific Standard Sets of outcomes to be measured in clinical practice for value-based healthcare evaluation. Standard Sets are developed by different working groups, which is inefficient and may lead to inconsistencies in selected PROs and PROMs. We aimed to identify common PROs across ICHOM Standard Sets and examined to what extend these PROs can be measured with a generic set of PROMs: the Patient-Reported Outcomes Measurement Information System (PROMIS®). Methods We extracted all PROs and recommended PROMs from 39 ICHOM Standard Sets. Similar PROs were categorized into unique PRO concepts. We examined which of these PRO concepts can be measured with PROMIS. Results A total of 307 PROs were identified in 39 ICHOM Standard Sets and 114 unique PROMs are recommended for measuring these PROs. The 307 PROs could be categorized into 22 unique PRO concepts. More than half (17/22) of these PRO concepts (covering about 75% of the PROs and 75% of the PROMs) can be measured with a PROMIS measure. Conclusion Considerable overlap was found in PROs across ICHOM Standard Sets, and large differences in terminology used and PROMs recommended, even for the same PROs. We recommend a more universal and standardized approach to the selection of PROs and PROMs. Such an approach, focusing on a set of core PROs for all patients, measured with a system like PROMIS, may provide more opportunities for patient-centered care and facilitate the uptake of Standard Sets in clinical practice.

among a team of experts and patient representatives in the field [5]. Up to May 2021, 39 Standard Sets were published and another five were in progress [4,.
A potential barrier for implementing ICHOM Standard Sets is that they are independently developed by different working groups. Although it is important that Standard Sets are developed by people who have expertise in the particular condition, collaboration and harmonization across Standard Sets is currently limited, which leads to large differences and inconsistencies in selected PROs, terminology used, and recommended Patient-Reported Outcome Measures (PROMs), even for the same PROs [4,[6][7][8][9][10][11][12][13][14][15][16][17]. This complicates the implementation and use of Standard Sets in clinical practice.
A system of common data elements across conditions could improve the situation and may speed up the development and uptake of Standard Sets considerably. Generally, all people want to feel and function 'normally' , i.e., live without symptoms, such as pain, fatigue or depression, and be able to carry out daily activities and social roles. These feelings and functions can be affected by different health conditions. For example, climbing stairs can be affected by knee osteoarthritis (e.g. because of pain), lung disease (e.g. because of breathlessness) of heart failure (e.g. because of fatigue). Different conditions can result in the same patient-reported problem; in this case, difficulty climbing stairs. Because human values, experiences, and desires cut across health status, there is considerable overlap in relevant PROs across conditions [36].
Ideally, these non-condition specific (or common) outcomes could be measured with just one set of universal (i.e. generic) PROMs across conditions. This is, however, currently not the case. A large number of different PROMs is being used for measuring common outcomes within and between different patient groups. One reason for this is that it has been considered important to develop disease-specific PROMs, while for common outcomes like pain or fatigue, this may not be necessary. Another reason is that many PROMs were developed because of criticism on the content or insufficient measurement properties of existing PROMs. However, with recent methodological innovations in PROM development, such as application of item response theory (IRT), universal PROMs have been developed with good measurement properties, that can be applied across medical conditions, including patients without a medical diagnosis and patients with multiple (chronic) conditions [37,38]. One such cross-cutting IRT-based measurement option is the Patient-Reported Outcomes Measurement Information System (PROMIS ® , Table 1). PROMIS researchers developed a conceptual framework of commonly relevant PROs across the broad domains of physical, mental, and social functioning. They also developed PROMs within each of these domains, that are universally applicable across patients populations [39,40]. PROMIS researchers used IRT to create item banks (i.e. large sets of questions), enabling the possibility of applying short forms where needed and computerized adaptive tests (CAT) where possible, that yield highly reliable and comparable scores with a few relevant items only [41][42][43].
The aim of this study was to identify common PROs across ICHOM Standard Sets and examine the extent to which these PROs can be measured with PROMIS.

Methods
One author (MZ) downloaded the reference guides of all 39 available ICHOM Standard Sets from the ICHOM website on June 2021 and extracted all individuals PROs and recommended PROMs (including single items,

Results
A total of 307 PROs were extracted from the 39 Standard Sets (  Table 3 shows which PROMIS item banks are available to measure these PRO concepts. Figure 1 shows the most commonly included PROs in ICHOM Standard Sets that can be measured with PROMIS. There is room for harmonization of PROs across ICHOM Standard Sets. While value-based healthcare is trying to eliminate the silos between medical specialties, it currently seems to have created new silos between conditions. Some PROs are included in only a few Standard Sets, while they seem relevant for many patients. For example, fatigue is included in the Standard Sets for inflammatory arthritis and heart failure, but not in the Standard Set for coronary artery disease. Sleep disturbances is only included in seven of the 39 Standard Sets, although it is a common symptom in many other diseases [45]. These results may partly be explained by a selection of only the most relevant PROs per condition. However, it is questionable to what extend this ranking represents the patient's perspective. A more universal and standardized approach, focusing on a core set of PROs that are relevant for most patients, may provide more opportunities for patient-centered care. The profile domains from the PROMIS conceptual framework, including Fatigue, Pain Intensity, Pain Interference, Physical Function, Sleep Disturbance, Anxiety, Depression, and the Ability to Participate in Social Roles and Activities (included in the PROMIS-29 Profile measure [46] and also available as CATs), have been found relevant to many disease populations [47][48][49][50][51][52] and seem to be a good starting point. The core PROs could then be supplemented with diseasespecific PROs (e.g. disease-specific symptoms) where needed, covering the additional 25% of PROs included in the ICHOM sets that cannot not be measured with PROMIS.

Discussion
We also found large differences in PRO terminology used. For example, the PRO concept physical function       The numbers in each cell represent the number of outcomes covering the respective PRO concept included in the ICHOM Standard Set Reference Guide Many different PROMs are being recommended, even when assessing the same PRO. For some patient groups a disease-specific PROM is recommended, while for another patient group, a domain-specific or generic PROM is recommended to measure the same PRO. For example, to measure depression, disease-specific instruments (e.g. the Movement Disorders Society Unified Parkinson Disease Rating Scale (MDS-UPDRS) for Parkinson), a cancer-specific instrument (EORTC QLQ-C30), several domain-specific instruments (e.g. Hospital Anxiety and Depression Scale (HADS), PHQ-2, PHQ-9, WHO-5) and several generic instruments (e.g. PROMIS-29 Profile, PROMIS Global Health and 36-Item Short Form Health Survey (SF-36)) are being recommended. Furthermore, nine different PROMs are being recommended to measure fatigue. This variability may partly be explained by (lack of ) available evidence to support the use of a particular PROM in a specific condition. It is time-consuming and costly to translate and validate so many PROMs across countries. Recommending different PROMs for the same PROs hampers outcome measurement in daily clinical practice and comparisons across patient groups with different conditions. Harmonization of PROMs can improve this situation. For example, a PROMIS Depression or Fatigue measure could be used across all patient groups for which these PROs are relevant. Research has shown that it may not be necessary to measure common symptoms like fatigue or depression with a different instrument, validated for each different patient group. A study in patients with rheumatoid arthritis, for example, found that up to 90% of patients with arthritis would rate their level of fatigue similarly when asked in a general sense about fatigue or when asked about the fatigue they attributed to their rheumatoid arthritis, which suggest that a generic PROM can be used instead of a disease-specific PROM [53]. Furthermore, evidence is growing for the validity of generic PROMIS measures across patient populations [47][48][49][50][51][52]54].
Another issue we identified is that there is often not an exact match in the ICHOM sets between the PROs and PROMs. This means that not every PRO is measured with a separate (sub)scale. For example, in the ICHOM Standard Set for Overall Adult Health, five PROs are included that address mental health (general mental health, sleep, depression, vitality, and anxiety). Two PROMs are recommended (PROMIS Global Health and WHO5) for measuring these PROs. However, neither the PROMIS Global Health nor the WHO5 provide separate scores for sleep, depression, vitality, and anxiety. If PROMs are to be used in the consultation room, separate scores for each PRO may be more helpful.
We argue that there is room for harmonization of PROs and PROMs across ICHOM Standard Sets. Our study showed that many PROs that matter to patients are common across patient groups. It may not be necessary to use a different PROM to measure common PROs, like pain, fatigue, depression, in different patient populations. It is hard to identify the best PROM for a specific patient population because the number of validation studies of PROMs in specific patient populations is limited and evidence on important measurement properties (e.g. responsiveness) is often lacking (See for example [55][56][57]). It is too expensive and time-consuming to develop, validate, translate, and maintain different high quality PROMs for every patient group. It is also too expensive, complex and time-consuming to implement all these different PROMs in electronic health records, give the right PROM to the right patient, interpret the scores for every PROM in the correct way, and discuss them appropriately with patients in the consultation room. It is burdensome and confusing for patients with multi-or comorbidity to complete multiple, partly overlapping, PROMs for every health care professional they consult.
To go forward, we recommend a more universal and standardized approach to PRO and PROM selection. Much can be gained by selecting common PROs and PROMs across conditions wherever possible, for example using the conceptual framework and measures of PROMIS. There are several initiatives ongoing in this direction. One is the recently developed ICHOM adult overall health and pediatric overall health Standard Sets [58,59]. These Standard Set contain 15 and 10 PROs respectively, to be measured in all adult or pediatric patients. Eleven of the 15 PROs for adults and eight of the 10 PROs for children the can be measured with PROMIS. The relevance and feasibility of measuring a common set of PROs in all patients needs to be evaluated. In the Netherlands, an alternative approach is being proposed. A national PROM working group developed a 'menu' of commonly relevant PROs and recommended PROMs, that can be used to select relevant PROs for a specific patient group(s) [60]. Both the ICHOM overall health Standard Sets and the Dutch 'menu' recommend measuring generic PROs where possible, supplemented with disease-specific PROs where needed. An approach which may be referred to as 'generic unless' . Both initiatives include PROMIS measures in their recommendations to standardize and reduce the number of PROMs being used. PROMIS measures are also included in 14 out of the 39 ICHOM Standard Sets. There is, however, much room left for improving the efficiency and validity of ICHOM Standard Sets.
PROMIS has several advantages over traditional PROMs. Since it is applicable across disease populations, it enables benchmarking, learning and improving quality of care in patients groups with multimorbidity, the main cost-drivers of healthcare, and across many of these patient groups. Moreover, it is also suitable for patients without a definite diagnosis or for patients with rare diseases, for which validated disease-specific PROMs are quite often not available. An additional advantage of PROMIS is the possibility of CAT [37]. With CAT the computer selects items from an item bank, based on answers to previous items. This yield highly reliable scores with a few relevant items only, which is an important benefit for using PROMs in clinical practice. Technical solutions for CAT application are currently available in a limited number of countries, but this is expected to increase in the near future. As long as these technical solutions are lacking, PROMIS short forms can be applied as an alternative. Finally, PROMIS is a sustainable system, maintained by the PROMIS Health Organization, an international network of researchers and clinicians across a large number of countries, who collaborate to facilitate widespread use and adoption of PROMIS in research and clinical practice.
A limitation of our study is that the classification of PROs was not done by raters independently. The initial classification was done by one rater, and then reviewed with confirmation by a second rater. Furthermore, classification was based on information in the ICHOM reference guides only. We did not map the recommended PROMs on item level to the PROMIS measures. Our classification may therefore be not completely correct. However, the exact numbers are not important for our call for a more universal and standardized approach to PRO and PROM selection.

Conclusion
We found considerable overlap in selected PROs across ICHOM Standard Sets, and large differences in terminology and recommended PROMs for the same PROs. For measuring 307 different PROs, covering 22 unique PRO concepts, a total of 114 different PROMs are currently being recommended. We recommend a more universal and standardized approach to the selection of PROs and PROMs. PROMIS offers an evidence-based conceptual framework of commonly relevant PROs and provides a sustainable set of validated PROMs, that are applicable across patient populations and medical specialties.