Table 1 Internal clinical genomics data model for genetic test result curation

From: Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness

Field_Name Descriptions Allowed_Values Examples Completeness
Hugo_Symbol Gene Symbol String EGFR 75% for Positive, 20.8% for VUS
Ensemble_Gene_ID Ensemble Gene ID String starting with prefix "ENSG" ENSG00000146648 0%
Transcript_ID Transcript ID String starting with prefix "NM" or "NR" ENSG00000146648 0%
De_sample_ID De-identified sample ID String MCM123 -
Pathogeneity_Report_Date Date of initial genetic report String of mm/dd/yyyy 2/12/2006 0%
Variant_Type Type of variants String of "SNP", "INDEL", “CNV”, “Rearrangement” SNP 50%
Variant_Source Somatic or germline String of "somatic" or "germline" germline 26.4%
Variant_Pathogenicity Initial reported pathogenicity String of "actionable", "pathogenic", or "VUS" pathogenic 43.8%
Variant_Classification   Selected strings from some of below:
Frame_Shift_Del, Frame_Shift_Ins, In_Frame_Del, In_Frame_Ins, Missense_Mutation, Nonsense_Mutation, Silent, Splice_Site, Translation_Start_Site, Nonstop_Mutation, 3′UTR, 3′Flank, 5′UTR, 5′Flank, IGR, Intron, RNA, Targeted_Region, De_novo_Start_InFrame, De_novo_Start_OutOfFrame
Missense_Mutation 6.3%
HGVS_Short HGVS nomenclature for cDNA and Amino Acid Change A string following HGVS nomenclature to detonate protein amino acid change p.Arg149Trp 25%
NCBI_Build The Genome Reference Consortium Build   "GRCh37" 0%
Chromosome Chromosome of event String of "1"-"22", "X", "Y", "M" "7" 0%
Start_Position Start position of event Numerical   0%
End_Position End position of event Numerical   0%
Strand Strand that the mutation is reported for Character of "+" or "−" "+" 0%
Variant_Allele_Freq Percentage of variant presence in the sample Numerical between 0 and 100 30 0%
BP_Coverage Base-pair coverage Numerical 270 0%
Variant_Pathogenicity_Updated Updated pathogenicity    -
Pathogeneity_Update_Date Update date    -