Model User Guide

[ ]:

This user guide requires `keras-unet-collection==0.1.9` or higher.

## Content

* [**U-net**](#U-net)
* [**V-net**](#V-net)
* [**Attention-Unet**](#Attention-Unet)
* [**U-net++**](#U-net++)
* [**UNET 3+**](#UNET-3+)
* [**R2U-net**](#R2U-net)
* [**ResUnet-a**](#ResUnet-a)
* [**U^2-Net**](#U^2-Net)
* [**TransUNET**](#TransUNET)
* [**Swin-UNET**](#Swin-UNET)

[1]:

import tensorflow as tf
from tensorflow import keras

[2]:

print('TensorFlow {}; Keras {}'.format(tf.__version__, keras.__version__))

TensorFlow 2.5.0; Keras 2.5.0

Step 1: importing `models` from `swiss_army_keras`

[3]:

from swiss_army_keras import models

Step 2: defining your hyper-parameters

Commonly used hyper-parameter options are listed as follows. Full details are available through the Python helper function:

inpust_size: a tuple or list that defines the shape of input tensors.
- models.resunet_a_2d, models.transunet_2d, and models.swin_unet_2d support int only, others also support inpust_size=(None, None, 3).
- activation='PReLU' is not compatible with inpust_size=(None, None, 3).
filter_num: a list that defines the number of convolutional filters per down- and up-sampling blocks.
- For unet_2d, att_unet_2d, unet_plus_2d, r2_unet_2d, depth \(\ge\) 2 is expected.
- For resunet_a_2d and u2net_2d, depth \(\ge\) 3 is expected.
n_labels: number of output targets, e.g., n_labels=2 for binary classification.
activation: the activation function of hidden layers. Available choices are 'ReLU', 'LeakyReLU', 'PReLU', 'ELU', 'GELU', 'Snake'.
output_activation: the activation function of the output layer. Recommended choices are 'Sigmoid', 'Softmax', None (linear), 'Snake'.
batch_norm: if specified as True, all convolutional layers will be configured as stacks of “Conv2D-BN-Activation”.
stack_num_down: number of convolutional layers per downsampling level.
stack_num_up: number of convolutional layers (after concatenation) per upsampling level.
pool: the configuration of downsampling (encoding) blocks.
- pool=False: downsampling with a convolutional layer (2-by-2 convolution kernels with 2 strides; optional batch normalization and activation).
- pool=True or pool='max' downsampling with a max-pooling layer.
- pool='ave' downsampling with a average-pooling layer.
unpool: the configuration of upsampling (decoding) blocks.
- unpool=False: upsampling with a transpose convolutional layer (2-by-2 convolution kernels with 2 strides; optional batch normalization and activation).
- unpool=True or unpool='bilinear' upsampling with bilinear interpolation.
- unpool='nearest' upsampling with reflective padding.
name: user-specified prefix of the configured layer and model. Use keras.models.Model.summary to identify the exact name of each layer.

Step 3: Configuring your model

Note

Configured models can be saved through model.save(filepath, save_traces=True), but they may contain python objects that are not part of the tensorflow.keras. Thus when loading the model, it is preferred to load the weights only, and set/freeze them within a new configuration.

e.g.

weights = dummy_loader(model_old_path)
model_new = swin_transformer_model(...)
model_new.set_weights(weights)

U-net

Example 1: U-net for binary classification with:

Five down- and upsampliung levels (or four downsampling levels and one bottom level).
Two convolutional layers per downsampling level.
One convolutional layer (after concatenation) per upsamling level.
Gaussian Error Linear Unit (GELU) activcation, Softmax output activation, batch normalization.
Downsampling through Maxpooling.
Upsampling through reflective padding.

[4]:

model = models.unet_2d((None, None, 3), [64, 128, 256, 512, 1024], n_labels=2,
                      stack_num_down=2, stack_num_up=1,
                      activation='GELU', output_activation='Softmax',
                      batch_norm=True, pool='max', unpool='nearest', name='unet')

V-net

Example 2: Vnet (originally proposed for 3-d inputs, here modified for 2-d inputs) for binary classification with:

Input size of (256, 256, 1); PReLU does not support input tensor with shapes of NoneType
Five down- and upsampliung levels (or four downsampling levels and one bottom level).
Number of stacked convolutional layers of the residual path increase with downsampling levels from one to three (symmetrically, decrease with upsampling levels).
- res_num_ini=1
- res_num_max=3
PReLU activcation, Softmax output activation, batch normalization.
Downsampling through stride convolutional layers.
Upsampling through transpose convolutional layers.

[5]:

model = models.vnet_2d((256, 256, 1), filter_num=[16, 32, 64, 128, 256], n_labels=2,
                      res_num_ini=1, res_num_max=3,
                      activation='PReLU', output_activation='Softmax',
                      batch_norm=True, pool=False, unpool=False, name='vnet')

Attention-Unet

Example 3: attention-Unet for single target regression with:

Four down- and upsampling levels.
Two convolutional layers per downsampling level.
Two convolutional layers (after concatenation) per upsampling level.
ReLU activation, linear output activation (None), batch normalization.
Additive attention, ReLU attention activation.
Downsampling through stride convolutional layers.
Upsampling through bilinear interpolation.

[6]:

model = models.att_unet_2d((None, None, 3), [64, 128, 256, 512], n_labels=1,
                           stack_num_down=2, stack_num_up=2,
                           activation='ReLU', atten_activation='ReLU', attention='add', output_activation=None,
                           batch_norm=True, pool=False, unpool='bilinear', name='attunet')

U-net++

Example 4: U-net++ for three-label classification with:

Four down- and upsampling levels.
Two convolutional layers per downsampling level.
Two convolutional layers (after concatenation) per upsampling level.
LeakyReLU activation, Softmax output activation, no batch normalization.
Downsampling through Maxpooling.
Upsampling through transpose convolutional layers.
Deep supervision.

[7]:

model = models.unet_plus_2d((None, None, 3), [64, 128, 256, 512], n_labels=3,
                            stack_num_down=2, stack_num_up=2,
                            activation='LeakyReLU', output_activation='Softmax',
                            batch_norm=False, pool='max', unpool=False, deep_supervision=True, name='xnet')

----------
deep_supervision = True
names of output tensors are listed as follows ("sup0" is the shallowest supervision layer;
"final" is the final output layer):

        xnet_output_sup0_activation
        xnet_output_sup1_activation
        xnet_output_sup2_activation
        xnet_output_final_activation

UNET 3+

Example 5: UNet 3+ for binary classification with:

Four down- and upsampling levels.
Two convolutional layers per downsampling level.
One convolutional layers (after concatenation) per upsampling level.
ReLU activation, Sigmoid output activation, batch normalization.
Downsampling through Maxpooling.
Upsampling through transpose convolutional layers.
Deep supervision.

[8]:

model = models.unet_3plus_2d((128, 128, 3), n_labels=2, filter_num_down=[64, 128, 256, 512],
                             filter_num_skip='auto', filter_num_aggregate='auto',
                             stack_num_down=2, stack_num_up=1, activation='ReLU', output_activation='Sigmoid',
                             batch_norm=True, pool='max', unpool=False, deep_supervision=True, name='unet3plus')

Automated hyper-parameter determination is applied with the following details:
----------
        Number of convolution filters after each full-scale skip connection: filter_num_skip = [64, 64, 64]
        Number of channels of full-scale aggregated feature maps: filter_num_aggregate = 256
----------
deep_supervision = True
names of output tensors are listed as follows ("sup0" is the shallowest supervision layer;
"final" is the final output layer):

        unet3plus_output_sup0_activation
        unet3plus_output_sup1_activation
        unet3plus_output_sup2_activation
        unet3plus_output_final_activation

filter_num_skip and filter_num_aggregate can be specified explicitly:

[9]:

model = models.unet_3plus_2d((128, 128, 3), n_labels=2, filter_num_down=[64, 128, 256, 512],
                             filter_num_skip=[64, 64, 64], filter_num_aggregate=256,
                             stack_num_down=2, stack_num_up=1, activation='ReLU', output_activation='Sigmoid',
                             batch_norm=True, pool='max', unpool=False, deep_supervision=True, name='unet3plus')

----------
deep_supervision = True
names of output tensors are listed as follows ("sup0" is the shallowest supervision layer;
"final" is the final output layer):

        unet3plus_output_sup0_activation
        unet3plus_output_sup1_activation
        unet3plus_output_sup2_activation
        unet3plus_output_final_activation

R2U-net

Example 6: R2U-net for binary classification with:

Four down- and upsampling levels.
Two recurrent convolutional layers with two iterations per down- and upsampling level.
ReLU activation, Softmax output activation, no batch normalization.
Downsampling through Maxpooling.
Upsampling through reflective padding.

[10]:

model = models.r2_unet_2d((None, None, 3), [64, 128, 256, 512], n_labels=2,
                          stack_num_down=2, stack_num_up=1, recur_num=2,
                          activation='ReLU', output_activation='Softmax',
                          batch_norm=True, pool='max', unpool='nearest', name='r2unet')

ResUnet-a

Example 7: ResUnet-a for 16-label classification with:

input size of (128, 128, 3)
Six downsampling levels followed by an Atrous Spatial Pyramid Pooling (ASPP) layer with 256 filters.
Six upsampling levels followed by an ASPP layer with 128 filters.
dilation rates of {1, 3, 15, 31} for shallow layers, {1,3,15} for intermediate layers, and {1,} for deep layers.
ReLU activation, Sigmoid output activation, batch normalization.
Downsampling through stride convolutional layers.
Upsampling through reflective padding.

[11]:

model = models.resunet_a_2d((128, 128, 3), [32, 64, 128, 256, 512, 1024],
                            dilation_num=[1, 3, 15, 31],
                            n_labels=16, aspp_num_down=256, aspp_num_up=128,
                            activation='ReLU', output_activation='Sigmoid',
                            batch_norm=True, pool=False, unpool='nearest', name='resunet')

Received dilation rates: [1, 3, 15, 31]
Received dilation rates are not defined on a per downsampling level basis.
Automated determinations are applied with the following details:
        depth-0, dilation_rate = [1, 3, 15, 31]
        depth-1, dilation_rate = [1, 3, 15, 31]
        depth-2, dilation_rate = [1, 3, 15]
        depth-3, dilation_rate = [1, 3, 15]
        depth-4, dilation_rate = [1]
        depth-5, dilation_rate = [1]

dilation_num can be specified per down- and uplampling level:

[12]:

model = models.resunet_a_2d((128, 128, 3), [32, 64, 128, 256, 512, 1024],
                            dilation_num=[[1, 3, 15, 31], [1, 3, 15, 31], [1, 3, 15], [1, 3, 15], [1,], [1,],],
                            n_labels=16, aspp_num_down=256, aspp_num_up=128,
                            activation='ReLU', output_activation='Sigmoid',
                            batch_norm=True, pool=False, unpool='nearest', name='resunet')

U^2-Net

Example 8: U^2-Net for binary classification with:

Six downsampling levels with the first four layers built with RSU, and the last two (one downsampling layer, one bottom layer) built with RSU-F4.
- filter_num_down=[64, 128, 256, 512]
- filter_mid_num_down=[32, 32, 64, 128]
- filter_4f_num=[512, 512]
- filter_4f_mid_num=[256, 256]
Six upsampling levels with the deepest layer built with RSU-F4, and the other four layers built with RSU.
- filter_num_up=[64, 64, 128, 256]
- filter_mid_num_up=[16, 32, 64, 128]
ReLU activation, Sigmoid output activation, batch normalization.
Deep supervision
Downsampling through stride convolutional layers.
Upsampling through transpose convolutional layers.

*In the original work of U^2-Net, down- and upsampling were achieved through maxpooling (pool=True or pool='max') and bilinear interpolation (unpool=True or unpool='bilinear').

[13]:

model = models.u2net_2d((128, 128, 3), n_labels=2,
                        filter_num_down=[64, 128, 256, 512], filter_num_up=[64, 64, 128, 256],
                        filter_mid_num_down=[32, 32, 64, 128], filter_mid_num_up=[16, 32, 64, 128],
                        filter_4f_num=[512, 512], filter_4f_mid_num=[256, 256],
                        activation='ReLU', output_activation=None,
                        batch_norm=True, pool=False, unpool=False, deep_supervision=True, name='u2net')

----------
The depth of u2net_2d = len(filter_num_down) + len(filter_4f_num) = 6
----------
deep_supervision = True
names of output tensors are listed as follows ("sup0" is the shallowest supervision layer;
"final" is the final output layer):

        u2net_output_sup0_trans_conv
        u2net_output_sup1_trans_conv
        u2net_output_sup2_trans_conv
        u2net_output_sup3_trans_conv
        u2net_output_sup4_trans_conv
        u2net_output_sup5_trans_conv
        u2net_output_final

u2net_2d supports automated determination of filter numbers per down- and upsampling level. Auto-mode may produce a slightly larger network.

[14]:

model = models.u2net_2d((None, None, 3), n_labels=2,
                        filter_num_down=[64, 128, 256, 512],
                        activation='ReLU', output_activation='Sigmoid',
                        batch_norm=True, pool=False, unpool=False, deep_supervision=True, name='u2net')

Automated hyper-parameter determination is applied with the following details:
----------
        Number of RSU output channels within downsampling blocks: filter_num_down = [64, 128, 256, 512]
        Number of RSU intermediate channels within downsampling blocks: filter_mid_num_down = [16, 32, 64, 128]
        Number of RSU output channels within upsampling blocks: filter_num_up = [64, 128, 256, 512]
        Number of RSU intermediate channels within upsampling blocks: filter_mid_num_up = [16, 32, 64, 128]
        Number of RSU-4F output channels within downsampling and bottom blocks: filter_4f_num = [512, 512]
        Number of RSU-4F intermediate channels within downsampling and bottom blocks: filter_4f_num = [256, 256]
----------
Explicitly specifying keywords listed above if their "auto" settings do not satisfy your needs
----------
The depth of u2net_2d = len(filter_num_down) + len(filter_4f_num) = 6
----------
deep_supervision = True
names of output tensors are listed as follows ("sup0" is the shallowest supervision layer;
"final" is the final output layer):

        u2net_output_sup0_activation
        u2net_output_sup1_activation
        u2net_output_sup2_activation
        u2net_output_sup3_activation
        u2net_output_sup4_activation
        u2net_output_sup5_activation
        u2net_output_final_activation

TransUNET

Example 9: TransUNET for 12-label classification with:

input size of (512, 512, 3)
Four down- and upsampling levels.
Two convolutional layers per downsampling level.
Two convolutional layers (after concatenation) per upsampling level.
12 transformer blocks (num_transformer=12).
12 attention heads (num_heads=12).
3072 MLP nodes per vision transformer (num_mlp=3072).
768 embeding dimensions (embed_dim=768).
Gaussian Error Linear Unit (GELU) activcation for transformer MLPs.
ReLU activation, softmax output activation, batch normalization.
Downsampling through maxpooling.
Upsampling through bilinear interpolation.

[15]:

model = models.transunet_2d((512, 512, 3), filter_num=[64, 128, 256, 512], n_labels=12, stack_num_down=2, stack_num_up=2,
                                embed_dim=768, num_mlp=3072, num_heads=12, num_transformer=12,
                                activation='ReLU', mlp_activation='GELU', output_activation='Softmax',
                                batch_norm=True, pool=True, unpool='bilinear', name='transunet')

Swin-UNET

Example 10: Swin-UNET for 3-label classification with:

input size of (128, 128, 3)
Four down- and upsampling levels (or three downsampling levels and one bottom level) (depth=4).
Two Swin-Transformers per downsampling level.
Two Swin-Transformers (after concatenation) per upsampling level.
Extract 2-by-2 patches from the input (patch_size=(2, 2))
Embed 2-by-2 patches to 64 dimensions (filter_num_begin=64, a.k.a, number of embedded dimensions).
Number of attention heads for each down- and upsampling level: num_heads=[4, 8, 8, 8].
Size of attention windows for each down- and upsampling level: window_size=[4, 2, 2, 2].
512 nodes per Swin-Transformer (num_mlp=512)
Shift attention windows (i.e., Swin-MSA) (shift_window=True).

[16]:

model = models.swin_unet_2d((128, 128, 3), filter_num_begin=64, n_labels=3, depth=4, stack_num_down=2, stack_num_up=2,
                            patch_size=(2, 2), num_heads=[4, 8, 8, 8], window_size=[4, 2, 2, 2], num_mlp=512,
                            output_activation='Softmax', shift_window=True, name='swin_unet')

[ ]: