Skip to main content

Table 1 Dataset description

From: Application of machine learning techniques for predicting survival in ovarian cancer

Feature

Domain of values

Type of feature

County

165 different values

Categorical

Histologic type ICD-O-3

150 different values

Categorical

Laterality

1. Bilateral, single primary

2. Paired site, but no information concerning laterality

3. Right - origin of primary

4. Left - origin of primary

5. Only one side - side unspecified

Categorical

Radiation sequence with surgery

1. No radiation and/or cancer-directed surgery

2. Radiation after surgery

3. Radiation prior to surgery

4. Sequence unknown, but both were given

5. Radiation before and after surgery

6. Intraoperative radiation

Categorical

Reason no cancer-directed surgery

1. Surgery performed

2. Not recommended

3. Recommended but not performed, unknown reason

4. Not recommended, contraindicated due to other cond; autopsy only

5. Not performed, patient died prior to recommended surgery

Categorical

Sequence number

1. One primary only

2. 1st of 2 or more primaries

Categorical

Race recode

1. White

2. Black

3. Asian or Pacific Islander

4. American Indian/Alaska Native

Categorical

Marital status at diagnosis

1. Married (including common law)

2. Widowed

3. Single (never married)

4. Divorced

5. Separated

6. Unmarried or Domestic Partner

Categorical

PRCDA region

1. Pacific Coast

2. East

3. Northern Plains

4. Southwest

5. Alaska

Categorical

Summary stage

1. Distant

2. Regional

3. Localized

Categorical

Insurance recode

1. Insured

2. Insured/No specifics

3. Any Medicaid

4. Insurance status unknown

5. Uninsured

Categorical

CS site-specific factor 1

6 different numeric values (Mean:509, Standard deviation: 535.71, Range: 10–999)

Numerical

Year of diagnosis

17 different years (Range: 2000–2016)

Numerical

Age at diagnosis

100 different ages (Range: 0-113)

Numerical

Chemotherapy recode

1. yes

2. no

Categorical

Rural-Urban continuum code

1. Counties in metropolitan areas GE 1 million pop

2. Counties in metropolitan areas of 250,000 to 1 million pop

3. Counties in metropolitan areas of LT 250 thousand pop

4. Urban pop of 2,500 to 19,999, adjacent to a metro area

5. Urban pop of 2,500 to 19,999, not adjacent to a metro area

6. Urban pop of GE 20,000 adjacent to a metropolitan area

7. Urban pop of GE 20,000 not adjacent to a metropolitan area

8. Comp rural LT 2,500 urban pop, not adjacent to metro area

9. Comp rural LT 2,500 urban pop, adjacent to a metro area

Categorical

Grade

1. Well differentiated; Grade I

2. Moderately differentiated; Grade II

3. Poorly differentiated; Grade III

4. Undifferentiated; anaplastic; Grade IV

Categorical