Fig. 3From: Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolutionThe schematic diagram of a W-MSA-3D and b SW-MSA-3D with a window size of 2. The tokens of the same color in a belong to the same window, and we only calculate the self-attention within each window. To obtain the dependency information interaction between adjacent windows, we divide some tokens within neighboring windows into the same window after cyclic shifting, and only tokens satisfying these conditions are allowed to calculate the window self-attention between them. Other tokens that do not satisfy the condition are shielded from attention between them by a masking mechanism even if they belong to the same window after a circular shift, as shown in bBack to article page