From: Tongue image quality assessment based on a deep convolutional neural network
Layers | Feature map size | Structure |
---|---|---|
Conv1 | 200 × 200 | 7 × 7 conv, 64, stride 2 |
Conv2_x | 100 × 100 | 3 × 3 max pool, stride 2 |
\(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {64} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {64} \hfill \\ {1 \times 1~} \hfill & {{\text{conv}},} \hfill & {256} \hfill \\ \end{array} } \right] \times 3\) | ||
Conv3_x | 50 × 50 | \(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {{\text{128}}} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {{\text{128}}} \hfill \\ {1 \times 1~} \hfill & {{\text{conv}},} \hfill & {{\text{512}}} \hfill \\ \end{array} } \right] \times 8\) |
Conv4_x | 25 × 25 | \(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {{\text{256}}} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {{\text{256}}} \hfill \\ {1 \times 1~} \hfill & {{\text{conv}},} \hfill & {{\text{1024}}} \hfill \\ \end{array} } \right] \times 36\) |
Conv5_x | 13 × 13 | \(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {{\text{512}}} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {{\text{512}}} \hfill \\ {1 \times 1~} \hfill & {{\text{conv}},} \hfill & {{\text{2048}}} \hfill \\ \end{array} } \right] \times 3\) |
Classification Layer | 1 × 1 | 13 × 13 global average pool |
2208D fully connected layer with ReLU | ||
2D fully connected layer | ||
Softmax |