BioSunMS is a RCP application through bringing the power of lab's network to access and manage a large amount of information. It can be installed on Windows or Linux system. It allows researchers to enter patient data in a customizable template and group the data by queries. It has a security system for protecting data. Access is granted via user groups rather than individual users.
Security and access
BioSunMS is a network-based system like an Intranet. It can be accessed from all computers from the same local network. It can even be accessed through the internet. Therefore, data access and management can be password protected. The system always stores actions made by any users. The administrator can check the action history at any time.
From the User and Group Management View, BioSunMS administrator can add or edit the user and the group. If a user group has been permitted to access a kind of data, any users belonging to the group are permitted to read the data in the Sample View. Any user who doesn't belong to the user group will not be permitted to read the data in the Sample View. Any user can create a new record, and modify a record which belongs to the same user group. However, only administrators can create new users and user groups. By default a new record is accessible to all users. To restrict access to a record, assign that record to the desired user group.
Sample and clinical information
Recording detailed information on collection, processing and storage of samples is crucial both for efficient reporting on biomedical study and for subsequent data analysis[25]. Many patients and sample related variables, such as gender, age, genetic factors, drug treatment, and symptom, dietary, and family history are particularly important in the context of clinical proteomics study. BioSunMS provides an efficient link between sample data and experiment results. Users can define the condition by selecting a series of fields to be used to automatically segregate spectra into groups. For example, in a typical profiling study, a condition might consist of the array type, sample type, different subjects (e.g. healthy and diseased), clinical laboratory result and diagnosis. After grouping the data by condition, BioSunMS will create a table with characteristics of the group of patients information (Figure 4). Finally, the data produced by the above steps can be exported for further processing outside of BioSunMS system using the export function provided by Bioresource View.
Pre-processing and visualization
BioSunMS is useful for the users with the need to store, process and analyze MALDI-TOF or SELDI-TOF MS data. It supports general mass spectrometry file format, such as mzML, mzXML, mzData and CSV. For mzML data format standard released by the HUPO-PSI and Institute for Systems Biology in June 2008, BioSunMS reads and writes it using a package of ProteomeCommons.org IO Framework [26] and ProteoWizard[27]. Currently, the majority of instruments store mass spectra in vendor specific formats. Users can convert raw file to universal file, for example, mzXML format using the converter programs provided by the vendors. At present, we have developed a plug-in which integrates CompassXport[28] for converting raw file of Bruker Corporation[29] to mzXML. Other plug-ins for format converting is under development.
BioSunMS provides basic visualization tools and advanced processing algorithms. The method and related procedure, BioConductor PROcess was used for baseline subtraction, smoothing, normalization and peaks identification. Our data visualization module aims to support analysts in finding interesting peaks in spectra, selecting them for further analysis and visualizing the selected peaks by automatic or manual detection.
Sample classification and sample prediction
The analysis of several spectra coming from biological samples belonging to different subjects (e.g. healthy and diseased) focuses on identifying discriminant values in spectra related to diseases. The BioSunMS allows users to select features using t-test, construct models using SVM, and predict new samples by the wizard with the default parameters. Users can compare the generalization performance of a range of classifiers by plotting their performance on the test set in ROC curve. BioSunMS also provides heat maps and hierarchical clustering. In the future, we will gradually incorporate classification system Tclass [30] and sample class discovery system SamCluster [31] into BioSunMS, which were developed by our center for gene expression profile-based analysis.
R package
BioSunMS uses the R statistical programming language and some third part packages for processing and analysis of MS data. We implemented the communication between the BioSunMS and the R package with the class ncbaSpanker.System.R.RPackage. This class provides some methods for executing R scripts and commands. All communication with R language takes place via the file system. In general, BioSunMS writes a R script, makes the R system execute the script file, and output the results. Then BioSunMS reads an R output file to retrieve the results. The RPackage class hides all the details of communication with R. So users with little knowledge about R language can accomplish their task easily. Because many programs for gene expression profile-based analysis have been provided in R package, and some algorithms can be directly used in peptide profiling analysis, we use R package as the background computation. In addition, many packages for MS data analysis and visualization are also provided, for example, GALGO[32], caMassClass [33] and so on. We will incorporate them into BioSunMS system in the future.
Development environment
Here we want to emphasize an important characteristic of BioSunMS. It can not only be used as the standard-alone program, but also can be a framework for developers. By taking full advantage of the editing and visualization components provided by Eclipse, the developers can focus entirely on the problems at hand. Moreover, BioSunMS was implemented in Eclipse platform, which made it more flexible and easier to adopt an algorithm as a plug-in. For example, we have successfully developed a plug-in, cn.org.biosun.knn, for sample classification using k-nearest neighbour (kNN) method.