From: Tongue image quality assessment based on a deep convolutional neural network
Layers | Feature map size | Structure |
---|---|---|
Convolution | 200 × 200 | 7 × 7 conv, 32, stride 2 |
Pooling | 100 × 100 | 3 × 3 max pool, stride 2 |
Dense Block (1) | 100 × 100 | \(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {128} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {32} \hfill \\ \end{array} } \right] \times 6\) |
Transition Layer (1) | 100 × 100 | 1 × 1 conv, 112 |
50 × 50 | 2 × 2 average pool, stride 2 | |
Dense Block (2) | 50 × 50 | \(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {128} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {32} \hfill \\ \end{array} } \right] \times 12\) |
Transition Layer (2) | 50 × 50 | 1 × 1 conv, 248 |
25 × 25 | 2 × 2 average pool, stride 2 | |
Dense Block (3) | 25 × 25 | \(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {128} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {32} \hfill \\ \end{array} } \right] \times {\text{32}}\) |
Transition Layer (3) | 25 × 25 | 1 × 1 conv, 636 |
12 × 12 | 2 × 2 average pool, stride 2 | |
Dense Block (4) | 12 × 12 | \(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {128} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {32} \hfill \\ \end{array} } \right] \times {\text{32}}\) |
Classification Layer | 1 × 1 | 12 × 12 global average pool |
2208D fully connected layer with ReLU | ||
2D fully connected layer | ||
Softmax |