- Research article
- Open Access
- Open Peer Review
A study of health effects of long-distance ocean voyages on seamen using a data classification approach
- Yunmei Lu†1,
- Yanhong Gao†2,
- Zhongbo Cao1,
- Juan Cui3, 4,
- Zhennan Dong2,
- Yaping Tian2Email author and
- Ying Xu1, 3, 4Email author
© Lu et al; licensee BioMed Central Ltd. 2010
- Received: 30 May 2009
- Accepted: 10 March 2010
- Published: 10 March 2010
Long-distance ocean voyages may have substantial impacts on seamen's health, possibly causing malnutrition and other illness. Measures can possibly be taken to prevent such problems from happening through preparing special diet and making special precautions prior or during the sailing if a detailed understanding can be gained about what specific health effects such voyages may have on the seamen.
We present a computational study on 200 seamen using 41 chemistry indicators measured on their blood samples collected before and after the sailing. Our computational study is done using a data classification approach with a support vector machine-based classifier in conjunction with feature selections using a recursive feature elimination procedure.
Our analysis results suggest that among the 41 blood chemistry measures, nine are most likely to be affected during the sailing, which provide important clues about the specific effects of ocean voyage on seamen's health.
The identification of the nine blood chemistry measures provides important clues about the effects of long-distance voyage on seamen's health. These findings will prove to be useful to guide in improving the living and working environment, as well as food preparation on ships.
- Support Vector Machine
- Feature Selection
- Trained Classifier
- Recursive Feature Elimination
- Trained Support Vector Machine
Ocean-going seamen are living on a ship with a confined environment for a long period of time. Such a special environment, often not most human friendly, may cause various changes in the human body for people who work and live there for an extended period of time . Seamen may experience subtle changes in physiological [2–5] and psychological functions [6, 7] in their bodies. Many studies have been conducted on different aspects of maritime health issues. It has been previously reported that ocean voyages could affect the human immune system and result in various illness [8, 9]. Specifically, it is found that the risk of ischemic heart disease (IHD) lethality on board is much higher than that on land, based on an analysis of data of 124 seamen who died suddenly of myocardial infarction . The diet and the lack of physical exercises, while living on ship, are believed to be the top two contributing factors to IHD . Besides, cardiovascular disease is another serious maritime health problem . We expect that these illnesses and/or changes in physiological conditions will be reflected by changes in some blood chemistry measures of the seamen. In this paper, we focus on the identification of the most significantly changed chemistry measures in the seamen blood samples that we have collected before and after their ocean voyage.
Statistical methods have been often used to analyze the health effects of long-distance ocean voyages on seamen [12, 13]; however, simple statistical methods are often found to be inadequate for dealing with complex relationships among physiological and psychological functions in seamen bodies under study. In this paper, we present a computational study of this type of problem using an approach different from the traditional statistical methods. We consider the problem of identifying the health effects of ocean voyages on seamen as a classification problem, i.e., to classify blood chemistry measures that are consistently affected or not by ocean voyage, and apply a supervised machine learning method, specifically support vector machines (SVM), to solve the classification problem. Support vector machines have been widely used for classification problems and found to be particularly effective for discovering informative feature patterns for small data sets .
The identification of discriminant features (e.g., chemistry measures of blood) among pre-defined classes of objects is of fundamental and practical interest. By identifying relationships linking specific features (e.g., certain blood chemistry measures) and feature values with certain classes of objects such as diseases, one can possibly derive new insights about the disease and its development.
For our problem, we have used a feature selection method in conjunction with the SVM-based classifier, called recursive feature elimination (RFE), to find blood chemistry measures that show consistent and substantial changes caused by ocean voyage, which takes into consideration of mutual information between features in the feature selection process. This procedure has proved to work better than other correlation-based methods  for solving similar problems. We anticipate that our findings, in terms of identified blood chemistry measures with the most substantial and consistent changes in seamen bodies due to ocean voyage, will provide useful guiding information for food preparation and intake for seamen during ocean voyages and for better designing of a healthier living and working environment on ship.
that can best separate the two subsets with different y labels of the given seamen blood samples, where ω is a weight vector and b is a bias value, to be determined through finding the optimal discriminant function. For this problem, we have chosen to use an SVM approach coupled with an RFE procedure, to find the optimal discriminant function F(x), as well as a subset of blood measures that show substantial and consistent changes between the two subsets.
Support vector machine
mapping the input space into a higher dimensional feature space through a kernel function, and
constructing in this feature space two maximal margin hyperplanes  to separate the mapped data samples in the higher dimensional space.
where K (x h , x k ) is the kernel function that maps a data vector in the input space to a higher dimensional feature space, and SVs are all possible support vectors on the parallel hyperplanes as mentioned above. For further details of support vector machines, we refer the reader to .
Recursive feature selection
- train a classifier using an SVM;
- compute the ranking for all features, based on some pre-defined criteria, and
- remove the feature with the lowest ranking.
where H is the matrix with elements y h y k K(x h , x k ), H(-i) is H with the ith feature removed, and K is a kernel function that measures the similarity between x h and x k We used a Gaussian radial basis kernel function K (x h , x k ) = exp(-γ ||x h , x k ||2)in this study.
In this procedure, features are removed one at a time, and then the SVM will be retrained to update the new ranking of features. It should be noted that the top ranked features are not necessarily the ones that are individually most relevant to the classification performance since the relevance of features are evaluated in the context of other features, i.e., mutual information among features in terms of their collective discerning power are considered.
Seamen's Blood Chemistry Data
200 seamen are involved in this study, who are healthy Han Chinese males with age from 19 to 38 (mean 25.2 ± 4.9). Before and after a 3-month voyage, 5 milliliters of venous blood was collected from each of the 200 seamen after fasting for 12 hours, and then has been centrifuged for 5 minutes. Serum was collected and analyzed using a HITACHI 7600 modular full automatic biochemical analyzer, for 41 chemistry measures (as listed in Table 1). (The seamen blood chemistry data is available upon request). By removing those with any missing information, a total of 170 pre-sailing and 170 post-sailing samples have been complied with completed blood chemistry measures. In our computational study, each seaman sample is represented as a feature vector consisting of 41 blood chemistry measures.
41 Blood chemistry measures used in this study
Generation of training and test Datasets
Our goal is to identify a subset of blood chemistry measures among the 41 measures that show consistent and significant changes across the seamen's blood samples collected before and after sailing. We first split the whole dataset randomly into two subsets, one for training and another for testing. A training set is used to select features (blood chemistry measures) and find the right weights of the features so an optimal separating hyperplane between the two labeled subsets can be derived, while the test set is used to evaluate the effectiveness of the trained SVM mostly for its generality, where the evaluation criterion on the trained SVM is the sign function in Eq.(6).
We first mix all the blood chemistry data, both pre- and post-sailing data, into one set while keeping the "pre-" and "post-sailing" label (-/+) for each vector, and then we separate this dataset into a training and a test set, by randomly putting sample data into the two subsets. One key to establish a good training set is that it should capture all the varieties existing in our seamen samples. In order to accomplish this, we have used multiple training sets and the associated test sets to assess each trained classifier, and used a combined classifier based on all the trained individual classifiers (based on specific training set) using a majority-rule vote at the end. In order to obtain good training sets through random partition of the original dataset, we considered seven different ratios between the numbers of samples in the training and the test set, ranging from 1:1, 1.5:1, 2:1, 2.5:1, 3:1, 3.5:1, to 4:1, respectively. In total, we generated 300 pairs of training sets and associated test sets, and trained a classifier for each of the 300 training sets. Note that samples with the same or similar pre- and post- observed values but different labels are considered as noise affecting the performance. We checked our dataset to ensure that no two vectors have conflicting labels. Patients in this study all signed the Informed Consent Form; and this study has been approved by the Medical Ethics Committee in Chinese PLA General Hospital, China.
A ranking list by one classifier trained on one specific training dataset
Majority-rule for selecting important features
As discussed above, different ways of partitioning the original dataset into training and test sets may lead to (somewhat) different performance by the trained classifier. We have generated 300 pairs of training and test sets, and trained an SVM-based classifier for each training set. To decide the ultimate subset of features to use for getting the best classifier, we have used a majority-rule voting strategy. The premise of this strategy is that the intrinsically important features will always be chosen by the best trained classifier, which should be independent of the specific sampling. The majority-rule voting process is described as follows: for each of the 41 features, we count the number of times when this feature is among the remained features for the ith training set, i = 1, 2, ..., 300. For example, feature MAO (32) is present in all the 300 subsets, so its count is 300, while the count of feature PHOS (17) is 286. After we get the count for each of the 41 features, we re-rank all the selected features based on this count, as shown in "Count" column of Table 3.
Most commonly used features and associated prediction accuracy
Count (percentage %)
Features used in all classifiers
Comparison with t-test-based feature selection
Blood measures chosen by the t-test with p-value < = 0.05
0.85 ± 0.21
0.36 ± 0.18
1.17 ± 0.17
1.39 ± 0.16
186.02 ± 37.99
158.20 ± 28.07
81.64 ± 20.44
74.34 ± 18.46
2.54 ± 0.13
2.44 ± 0.10
9.77 ± 4.76
6.20 ± 3.25
52.19 ± 2.49
50.47 ± 2.42
4.05 ± 0.33
4.35 ± 0.36
11.5 ± 2.23
10.38 ± 2.09
14.15 ± 10.66
11.63 ± 9.11
4.12 ± 0.64
3.90 ± 0.69
144.50 ± 2.74
142.86 ± 2.47
2.56 ± 1.22
2.87 ± 1.42
32.21 ± 9.89
36.50 ± 10.84
2.03 ± 0.51
1.91 ± 0.53
169.57 ± 29.52
181.18 ± 23
3.62 ± 0.46
4.26 ± 1.97
169.57 ± 29.52
181.18 ± 23
54.99 ± 8.68
56.99 ± 7.56
79.82 ± 5.08
78.02 ± 5
0.85 ± 0.2
0.82 ± 0.22
22.79 ± 7.02
20.49 ± 6.94
22.38 ± 5.07
21.14 ± 4.97
2.43 ± 0.34
2.36 ± 0.36
171.81 ± 117.62
142.78 ± 100.49
27.56 ± 4.34
28.60 ± 4.38
18.45 ± 8.68
19.98 ± 10.68
79.44 ± 9.34
78.40 ± 8.23
6.94 ± 2.07
7.47 ± 3.45
We have compared the ranked lists of features by SVM-RFE and by paired t-test, and found that among the top nine features in the two lists, seven of them are common to both lists. The difference is mostly due to the way that the features are selected in the two methods, where SVM-RFE uses more global information and the paired t-test bases solely on individual features in dependent of others in their feature selections. We believe that the substantial smaller set of features that our method selected, compared to the other methods, provides a more focused and informative subset of features for further studies.
Comparison of blood chemistry measures before and after sailing
0.85 ± 0.21
0.36 ± 0.18
1.17 ± 0.17
1.39 ± 0.16
9.77 ± 4.76
6.20 ± 3.25
2.54 ± 0.13
2.44 ± 0.10
169.57 ± 29.52
181.18 ± 23
186.02 ± 37.99
158.20 ± 28
4.05 ± 0.33
4.35 ± 0.36
144.49 ± 2.74
142.86 ± 2.47
52.19 ± 2.49
50.47 ± 2.42
Creatine Kinase (CK) is typically present in the cytoplasm and mitochondria of organs such as heart, muscle and brain; and it is directly related to cellular energy conversion, muscle contraction and regeneration of ATP. It reversibly catalyzes the phosphoryl transfer reaction between creatine and ATP. We found that after the occean voyage, the mean value of CK reduced to 142.46 U/L from 171.68 U/L measured before sailing. This is probably due to the lack of physical exercise by the seamen, which ultimately led to the reduced demand for energy of the body muscle, as well as the reduction of CK. As one of its isoenzymes, the mean value of CKMB is reduced to 6.2 U/L from 9.73 U/L, the variation tendency is the same with CK. Lactate dehydrogenase (LDH) is a key enzyme in the glycolytic process. It exists in virtually all organs, particularly in liver, kidney, cardiac muscle, skeletal muscle, pancreas and lung. In anaerobic conditions, the regeneration of NAD+ is completed by the reaction in which LDH catalyzes pyruvic acid to become lactic acid; and LDH can also catalyze lactic acid to become pyruvic acid, with hydrogen being transferred to its coenzyme to become NADH. The mean value of LDH changed from 185.87 U/L to 158.15 U/L from pre-sailing to post-sailing, which could be due to the reduction of physical exercise intensity of seamen during the ocean voyage, having led to the reduction of energy supplied by the glycolytic process except for the normal aerobic metabolism of the body.
ALB (albumin) is made by the liver, and decreased serum albumin may indicate liver diseases as well as kidney disease, which allows albumin to escape into the urine.
Decreased ALB could also be explained by malnutrition or a low protein diet . The altered ALB levels often suggest changed liver metabolism. Increased protein intake during ocean voyage may be needed in the seamen's diet.
Fructosamine (FRUC) is a substance formed from plasma protein during the life cycle of glucose. Since the half-life for plasma protein is 17 days, the measured FRUC level reflects the blood glucose level generated by the food taken in the 1-3 weeks prior to the voyage. Hence the observed change of FRUC from 169.74 μmol/L to 181.41 μmol/L after sailing probably reflects the type of food taken during the ocean voyage, which suggests that seamen should take less sugar during their future ocean voyage.
The observed change in the levels of inorganic ions, Ca, PHOS, K and Na may be caused by electromagnetism radiation and stress during the sailing. Though they cannot be used to diagnose specific diseases, their decreased levels generally indicate the poor state of the seamen's health. In the detected electrolyte, the level of serum sodium (Na) decreased obviously, possibly due to that Na ran off during sweating without being supplemented properly. In addition, calcium (Ca) is lacking in their diet, special measures should be adopted in their food preparation.
The possible effects of long-distance ocean voyage on seamen's health are receiving increasingly more attention in recent years. Living on ship with confined environments, ocean-going seamen could suffer from various health problems due to abnormal electromagnetism radiation, great temperature changes, poor diet structure, which may cause subtle changes in physiological and psychological functions in their bodies. In this study, we have used an SVM-RFE approach to identify important blood chemistry measures with significant and consistent changes before and after a voyage. A number of features have been identified to have such changes, such as MAO, PHOS, CK-MB, Ca and FRUC. Their identification provides important clues about how ocean voyage may affect seamen's health. Our findings could provide useful guidance for making necessary changes in their living environments, food preparation and exercise routines.
YML and ZBC would like to thank Professors Chunguang Zhou and Yanchun Liang for their support and encouragement during this research project, and also Dr. Yan Wang and You Zhou, for their help and advices. The authors are grateful to the support of the NSFC (60873146, 60903097) and the National High-Tech R&D Program of China (863) (grant 2009AA02Z307). This study is also supported by the Ministry of Science and Technology of China (2006FY230300). The work by JC and YX is supported in part by US National Science Foundation (NSF/DBI-0354771, NSF/ITR-IIS-0407204, NSF/CCF-0621700, NSF/DBI-0542119).
- Zhang RP, Sun XC, Zhang B: Advance in Research of Effects of Ship Environment on Seamen. Prev Med Chin PLA. 2006, 24 (2): 149-151.Google Scholar
- Kamada T, Lwata N, Kojima Y: Analyses of neurotic symptoms and subjective symptoms of fatigue in seamen during a long voyage. Sangyo lgaku. 1990, 32 (6): 461-469.View ArticleGoogle Scholar
- Myznikov IL, Shcherbina : Central hemodynamics in seamen during trans - latitudinal voyage. Gig Sanit. 2004, 1: 34-37.PubMedGoogle Scholar
- Luger TJ, Giner R, Lorenz IH: Cardiological monitoring of sailors via offshore Internet connection. J Sports Med Phys Fitness. 2001, 41 (4): 486-490.PubMedGoogle Scholar
- Shcherbina FA, Myznikov IL: Parameters of central hemodynamics in sailors on voyage cruises of varying length. Aviakosm Ekolog Med. 2000, 34 (4): 67-68.PubMedGoogle Scholar
- Leka S: Psychosocial hazards and seafarer health: priorities for research and practice. International Maritime Health. 2004, 55 (1-4): 137-153.PubMedGoogle Scholar
- Comperatore CA, Rivera PK, Kingsley L: Enduring the ship-board stressor complex: a systems approach. Aviation, pace and environment medicine. 2005, 76 (6, Suppl): B108-118.Google Scholar
- Protasov VV, Slezinger VM, Antiukhova MP: The dynamics of anti - influenza immunity in sailors of the Baltic Fleet. Voen Med Zh. 1996, 317 (9): 33-34.PubMedGoogle Scholar
- Myznikov IL, Makhrov MG, Rogovanov Dlu: Morbidity in seamen during long voyages according to the results of long - term studies. Voen Med Zh. 2000, 321 (7): 60-63.PubMedGoogle Scholar
- Serdechnaia EV, Kazakevich EV, Popov VV: Myocardial infarction and sudden cardiac death in seamen of the north shipline. Klin Med (Mosk). 1999, 77 (11): 19-21.Google Scholar
- Filikowski J, Rzepiak M, Renke W: Selected risk factors of ischemic heart disease in Polish seafarers. International Maritime Health. 2003, 54 (1-4): 40-46.PubMedGoogle Scholar
- Abo J, KoikeYoshio : Lone-term Voyages and Bone Mass Among Seamen. The Report of Tokyo University of Fisheries. 2003, 39: 25-33.Google Scholar
- Gao YH, Yu QL: Effects of different work environment on the immune function of naval personel. Prev Med Chin PLA. 2008, 33 (2): 226-228.Google Scholar
- Guyon I, Weston J, Barnhill S: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning. 2002, 46: 389-422. 10.1023/A:1012487302797.View ArticleGoogle Scholar
- Vapnik V: The nature of statistical learning theory. Springer Verlag. 1995Google Scholar
- Gunn SR: Support Vector Machines for Classification and Regression. Technical Report. 1998Google Scholar
- Mao Y, Zhou XB, Pi DY, Sun YX, Wong STC: Multi-Class cancer classification by using fuzzy support vector machine and binary decision tree with gene selection. Biomed Biotechnol. 2005, 2: 160-171. 10.1155/JBB.2005.160.View ArticleGoogle Scholar
- Mao Y, Pi DY, Liu YM, Sun YX: Accelerated Recursive Feature Elimination Based on Support Vector Machine for Key Variable Identification. Chinese J Chem Eng. 2006, 14 (1): 65-72. 10.1016/S1004-9541(06)60039-6.View ArticleGoogle Scholar
- John GH, Kohavi R, Pfleger K: Irrelevant Features and the Subset Selection Problem. Proc of 11th National Conf on Machine learning, New Brunswick. Edited by: Cohen WW, Hirsh H. 1994, 121-129.Google Scholar
- Kira K, Rendell LA: The feature selection problem: Traditional methods and a new algorithm. Proc of 9th National Conf on AI, San Jose. Edited by: William RS. 1992, 129-134.Google Scholar
- Medical Encyclopedia. [http://www.nlm.nih.gov/medlineplus/encyclopedia.html]
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/10/13/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.