An explainable CNN approach for medical codes prediction from clinical text

Table 1 Table of Notations

Notation	Description
\({\mathcal {L}}\)	The set of ICD-9 codes
\(y_{i,\ell }\in \ {0,\ 1}\)	The true value of the label task for instance i and \(\ell \in {\mathcal {L}}\), 1 indicates the label is true for instance i
\(d_e\)	The size of the input embedding
\(d_c\)	The size of the convolution output, a.k.a. the number of convolution filters
\({\mathbf {X}}=[{\mathbf {x}}_1,{\mathbf {x}}_2,\ldots ,{\mathbf {x}}_N]\)	The matrix of a document instance, where \({\mathbf {N}}\) is the length of the document and \({\mathbf {x}}_i\) is the vector representation of the word
\({\mathbf {W}}_c\in {\mathbb {R}}^{k\times d_e\times d_c}\)	Convolution filters, where k is the width of filter window
\({\mathbf {H}}\in {\mathbb {R}}^{d_c\times N}\)	Convolutional representation of the document
\(*\)	Convolution operator
g	An element-wise nonlinear transformation
\({\mathbf {b}}_c\in {\mathbb {R}}^{d_c}\)	The bias in convolutional operation
\({\mathbf {u}}_\ell \in {\mathbb {R}}^{d_c}\)	Attention parameter vector for label \(\ell\)
\(\varvec{\alpha }_\ell \in {\mathbb {R}}^N\)	Attention result vector for label \(\ell\)
\(b_\ell\)	Scalar offset in linear layer for label \(\ell\)
\(\varvec{\beta }_\ell \in {\mathbb {R}}^{d_c}\)	Vector of prediction weights
\(\sigma\)	Sigmoid function
\({\text { SoftMax }}()\)	\({\text{SoftMax}}({\mathbf{x}}) = \frac{{\exp ({\mathbf{x}})}}{{\sum\nolimits_{i} {\exp (x_{i} )} }}\), where \({\text{exp }}({\mathbf {x}})\) is the element-wise exponentiation of the vector \({\mathbf {x}}\)

ISSN: 1472-6947