Skip to main content

Table 1 ResNet-152 for tongue image quality control

From: Tongue image quality assessment based on a deep convolutional neural network

Layers

Feature map size

Structure

Conv1

200 × 200

7 × 7 conv, 64, stride 2

Conv2_x

100 × 100

3 × 3 max pool, stride 2

\(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {64} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {64} \hfill \\ {1 \times 1~} \hfill & {{\text{conv}},} \hfill & {256} \hfill \\ \end{array} } \right] \times 3\)

Conv3_x

50 × 50

\(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {{\text{128}}} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {{\text{128}}} \hfill \\ {1 \times 1~} \hfill & {{\text{conv}},} \hfill & {{\text{512}}} \hfill \\ \end{array} } \right] \times 8\)

Conv4_x

25 × 25

\(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {{\text{256}}} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {{\text{256}}} \hfill \\ {1 \times 1~} \hfill & {{\text{conv}},} \hfill & {{\text{1024}}} \hfill \\ \end{array} } \right] \times 36\)

Conv5_x

13 × 13

\(\left[ {\begin{array}{*{20}l} {1 \times 1} \hfill & {{\text{conv}},} \hfill & {{\text{512}}} \hfill \\ {3 \times 3} \hfill & {{\text{conv}},} \hfill & {{\text{512}}} \hfill \\ {1 \times 1~} \hfill & {{\text{conv}},} \hfill & {{\text{2048}}} \hfill \\ \end{array} } \right] \times 3\)

Classification Layer

1 × 1

13 × 13 global average pool

2208D fully connected layer with ReLU

2D fully connected layer

Softmax

  1. Building blocks are shown in brackets, with the number of blocks stacked. Downsampling is performed by Conv3_1, Conv4_1, and Conv5_1 with a stride of 2