Data analysis in community health assessment (CHA) involves the collection, integration, and analysis of large numerical and spatial data sets in order to identify health priorities in community (or communities) of interest. Numerical data might include: vital statistics (e.g., birth, and death), registry data (e.g., cancer), inpatient and outpatient hospitalization data, and population (census) data. Spatial data might consist of spatial boundary files (such as 'shape' files) that contain geographically-defined coordinates. Combining numerical and spatial data is important for answering community health questions such as: "How does region A compare to its surrounding regions in relation to the incidence of asthma?" or "What are the top five causes of cancer deaths in a region, and how do these compare to the top 5 cancer deaths for the country?"
Geographic Information Systems (GIS) are applications that enable for management and analysis using spatial data [1]. Publications on the use of GIS in public health [2–8] suggest that it is viewed by many professionals as a useful tool for decision making. However, the technology has limitations in performing analysis of numerical data because of its traditional database architecture.
On-Line Analytical Processing (OLAP) is a multidimensional datawarehouse environment that is designed to facilitate querying of large numerical data [9, 10]. Data in an OLAP data warehouse can be stored as a multidimensional cube in which all the numerical values are pre-calculated. While this can cause high memory requirements, querying only requires OLAP functions to fetch the data without the necessity to perform complex joins between tables. The software has been around since the 1990's and was initially very popular for use in the corporate environment to support high level decision making. OLAP has begun to gain popularity in the healthcare field but is still widely unknown to most health science researchers.
Coupling the spatial capabilities of GIS with a powerful technology for numerical analysis of On-Line Analytical Processing (OLAP), might enhance community health assessment data analysis. Examples of Online Analytical Processing-Geospatial Information System (OLAP-GIS) decision support systems have already been used for analysis in environmental health, community health, motor vehicle safety, and healthcare quality [11–14].
Combining Numerical and Spatial Data for Community Health Assessment
Modern-day CHA professionals in developed countries frequently analyze public health data in order to identify health priorities. The steps in the process might be the:
• Identification of the spatial location of a geographic community using GIS or a paper map;
• Identification of health factors within the community using numerical data such as death counts, disease incidence or prevalence rates;
• Identification of the spatial location of bordering communities of interest using GIS or a paper map;
• Identification of health factors within bordering communities using numerical data such as death counts, disease incidence, or prevalence rates;
• Comparison of factors within the community against factors of the bordering community using statistical methods for adjustment and calculations such as relative risk and odds ratios;
• Viewing of results using tables, graphs, or spatial visualization.
The first step (of identification of the location of a geographic community) is a spatial component. This step represents the act of merely locating the area or region of interest on a map. The second step, identifying the health factors within the community, is purely numerical. For example, the ranking of top 5 diseases per 100,000 for a particular age category aggregated at the community level is a numerical process. However, the next step, identifying the bordering communities of interest is purely spatial. Like the first step, this can be done by using a map. The identification of health factors in these counties is purely numerical as in step 2. Statistical measures and adjustments are performed in order to determine health priorities.
Many community health experts use Information Technology (IT) for this type of data analysis. We conducted a survey of CHA professionals and found that many of them use software such as databases, statistical packages, and even GIS [15]. The potential for OLAP-GIS in community health data analysis is not well understood. We thus decided to conduct an evaluation comparing OLAP-GIS to information technology (IT) that is commonly used, including GIS and traditional analytical/statistical tools. We hypothesized that using an OLAP-GIS system instead of the combined use of a SPSS and GIS would greatly facilitate CHA data analysis when considering efficiency, accuracy and user satisfaction.
SOVAT
At the University of Pittsburgh, we have developed an OLAP-GIS system called the Spatial OLAP Visualization and Analysis Tool (SOVAT) [16, 17]. SOVAT is intended to support community health assessment data analysis. The system combines large amounts of health and population data and displays the information through a graphical user interface. The interface, developed using an iterative design approach [18], supports direct user manipulation as well as analysis of numerical and spatial components (Figure 1).
The SOVAT interface contains the ability to navigate through large public health data sets by using OLAP functions such as: drill-down (view more detailed data), drill-up (view more aggregated data), and slice and dice (view specific variables of data). In addition to these functions, SOVAT contains unique functions that are not standard in OLAP but were believed to enhance community health assessment. One such feature is called drill-out, which enables the user to click on a map object such as a county, and submit a query that contains both numerical and spatial aggregation. For example, to perform drill out on a 'region A', SOVAT would first identify the regions that border region A. This would be done through spatial analysis of the coordinates. Then the system would aggregate the numerical measures (such as an incidence rate) for each bordering region. This function enables the user to quickly perform comparisons of different geographical areas across different numerical public health measures.
We evaluated SOVAT against technology that we previously determined to be commonly used by CHA professionals, namely the combined use of SPSS statistical software and GIS software (referred to here as SPSS-GIS) [15], in order to understand its potential as a data analysis tool during community health assessments.