From: An explainable CNN approach for medical codes prediction from clinical text
Notation | Description |
---|---|
\({\mathcal {L}}\) | The set of ICD-9 codes |
\(y_{i,\ell }\in \ {0,\ 1}\) | The true value of the label task for instance i and \(\ell \in {\mathcal {L}}\), 1 indicates the label is true for instance i |
\(d_e\) | The size of the input embedding |
\(d_c\) | The size of the convolution output, a.k.a. the number of convolution filters |
\({\mathbf {X}}=[{\mathbf {x}}_1,{\mathbf {x}}_2,\ldots ,{\mathbf {x}}_N]\) | The matrix of a document instance, where \({\mathbf {N}}\) is the length of the document and \({\mathbf {x}}_i\) is the vector representation of the word |
\({\mathbf {W}}_c\in {\mathbb {R}}^{k\times d_e\times d_c}\) | Convolution filters, where k is the width of filter window |
\({\mathbf {H}}\in {\mathbb {R}}^{d_c\times N}\) | Convolutional representation of the document |
\(*\) | Convolution operator |
g | An element-wise nonlinear transformation |
\({\mathbf {b}}_c\in {\mathbb {R}}^{d_c}\) | The bias in convolutional operation |
\({\mathbf {u}}_\ell \in {\mathbb {R}}^{d_c}\) | Attention parameter vector for label \(\ell\) |
\(\varvec{\alpha }_\ell \in {\mathbb {R}}^N\) | Attention result vector for label \(\ell\) |
\(b_\ell\) | Scalar offset in linear layer for label \(\ell\) |
\(\varvec{\beta }_\ell \in {\mathbb {R}}^{d_c}\) | Vector of prediction weights |
\(\sigma\) | Sigmoid function |
\({\text { SoftMax }}()\) | \({\text{SoftMax}}({\mathbf{x}}) = \frac{{\exp ({\mathbf{x}})}}{{\sum\nolimits_{i} {\exp (x_{i} )} }}\), where \({\text{exp }}({\mathbf {x}})\) is the element-wise exponentiation of the vector \({\mathbf {x}}\) |