Skip to main content
Fig. 1 | BMC Medical Informatics and Decision Making

Fig. 1

From: CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital

Fig. 1

CogStack Architecture and Dataflow All components can be deployed via the Docker containerisation software. 1. New job execution Master instance of CogStack identifies new data in Trust Data Sources at intermittent intervals. 2. Partitioning The job is partitioned into a user definable number of work units. 3a. Derive the freetext content Extract plain and/or formatted text from common proprietary document binary formats (performing OCR where necessary), using the Tika Library to enable the downstream processing of high value unstructured data elements. 3b. Supplement the text content with meta-data Filter and de-normalise a subset of the structured clinical data to provide a patient orientated, transparent representation of high value metadata concepts. For example, this might include calculated fields to represent patient age at document date, first part of postcode and ethnicity and lab results. 3c. De-identification Transform the resulting text documents into de-identified text documents, by masking personal health identifiers via the use of the Cognition de-identification algorithms. This is necessary to address governance concerns associated with the secondary use of patient data. Identifiers in structured data can be excluded via SQL query, according to business requirements. 4. Information Extraction Apply generic clinical IE pipelines to derive additional structured data from free text and supplement the quantity of available structured data at the point of query. 5. Indexing Build a JSON object from the resulting structured and unstructured data, which can then be readily be indexed into an Elasticsearch cluster. 6. Visualisation The Kibana suite provides a range of attractive options for viewing, aggregating and dash-boarding the loaded data

Back to article page