📦 Segmentation Models

Unet

class segmentation_models_pytorch.Unet(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm=True, decoder_channels=(256, 128, 64, 32, 16), decoder_attention_type=None, in_channels=3, classes=1, activation=None, aux_params=None)[source]

Unet is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. Encoder extract features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask. Use concatenation for fusing decoder blocks with skip connections.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimentions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – List of integers which specify in_channels parameter for convolutions used in decoder. Lenght of the list should be the same as encoder_depth

  • decoder_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Avaliable options are True, False, “inplace”

  • decoder_attention_type – Attention module used in decoder of the model. Avaliable options are None and scse. SCSE paper - https://arxiv.org/abs/1808.08127

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Avaliable options are “sigmoid”, “softmax”, “logsoftmax”, “identity”, callable and None. Default is None

  • aux_params

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

Unet

Return type

torch.nn.Module

Unet++

class segmentation_models_pytorch.UnetPlusPlus(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm=True, decoder_channels=(256, 128, 64, 32, 16), decoder_attention_type=None, in_channels=3, classes=1, activation=None, aux_params=None)[source]

Unet++ is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. Encoder extract features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask. Decoder of Unet++ is more complex than in usual Unet.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimentions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – List of integers which specify in_channels parameter for convolutions used in decoder. Lenght of the list should be the same as encoder_depth

  • decoder_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Avaliable options are True, False, “inplace”

  • decoder_attention_type – Attention module used in decoder of the model. Avaliable options are None and scse. SCSE paper - https://arxiv.org/abs/1808.08127

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Avaliable options are “sigmoid”, “softmax”, “logsoftmax”, “identity”, callable and None. Default is None

  • aux_params

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

Unet++

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1807.10165

Linknet

class segmentation_models_pytorch.Linknet(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm=True, in_channels=3, classes=1, activation=None, aux_params=None)[source]

Linknet is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. Encoder extract features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask. Use sum for fusing decoder blocks with skip connections.

Note

This implementation by default has 4 skip connections (original - 3).

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimentions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Avaliable options are True, False, “inplace”

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Avaliable options are “sigmoid”, “softmax”, “logsoftmax”, “identity”, callable and None. Default is None

  • aux_params

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

Linknet

Return type

torch.nn.Module

FPN

class segmentation_models_pytorch.FPN(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_pyramid_channels=256, decoder_segmentation_channels=128, decoder_merge_policy='add', decoder_dropout=0.2, in_channels=3, classes=1, activation=None, upsampling=4, aux_params=None)[source]

FPN is a fully convolution neural network for image semantic segmentation.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimentions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_pyramid_channels – A number of convolution filters in Feature Pyramid of FPN

  • decoder_segmentation_channels – A number of convolution filters in segmentation blocks of FPN

  • decoder_merge_policy – Determines how to merge pyramid features inside FPN. Avaliable options are add and cat

  • decoder_dropout – Spatial dropout rate in range (0, 1) for feature pyramid in FPN

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Avaliable options are “sigmoid”, “softmax”, “logsoftmax”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 4 to preserve input-output spatial shape identity

  • aux_params

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

FPN

Return type

torch.nn.Module

PSPNet

class segmentation_models_pytorch.PSPNet(encoder_name='resnet34', encoder_weights='imagenet', encoder_depth=3, psp_out_channels=512, psp_use_batchnorm=True, psp_dropout=0.2, in_channels=3, classes=1, activation=None, upsampling=8, aux_params=None)[source]

PSPNet is a fully convolution neural network for image semantic segmentation. Consist of encoder and Spatial Pyramid (decoder). Spatial Pyramid build on top of encoder and does not use “fine-features” (features of high spatial resolution). PSPNet can be used for multiclass segmentation of high resolution images, however it is not good for detecting small objects and producing accurate, pixel-level mask.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimentions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • psp_out_channels – A number of filters in Saptial Pyramid

  • psp_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Avaliable options are True, False, “inplace”

  • psp_dropout – Spatial dropout rate in [0, 1) used in Spatial Pyramid

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Avaliable options are “sigmoid”, “softmax”, “logsoftmax”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 8 to preserve input-output spatial shape identity

  • aux_params

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

PSPNet

Return type

torch.nn.Module

PAN

class segmentation_models_pytorch.PAN(encoder_name='resnet34', encoder_weights='imagenet', encoder_dilation=True, decoder_channels=32, in_channels=3, classes=1, activation=None, upsampling=4, aux_params=None)[source]

Implementation of PAN (Pyramid Attention Network).

Note

Currently works with shape of input tensor >= [B x C x 128 x 128] for pytorch <= 1.1.0 and with shape of input tensor >= [B x C x 256 x 256] for pytorch == 1.3.1

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • encoder_dilation – Flag to use dilation in encoder last layer. Doesn’t work with *ception*, vgg*, densenet*` backbones, default is True

  • decoder_channels – A number of convolution layer filters in decoder blocks

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Avaliable options are “sigmoid”, “softmax”, “logsoftmax”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 4 to preserve input-output spatial shape identity

  • aux_params

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

PAN

Return type

torch.nn.Module

DeepLabV3

class segmentation_models_pytorch.DeepLabV3(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_channels=256, in_channels=3, classes=1, activation=None, upsampling=8, aux_params=None)[source]

DeepLabV3 implemetation from “Rethinking Atrous Convolution for Semantic Image Segmentation”

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimentions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – A number of convolution filters in ASPP module. Default is 256

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Avaliable options are “sigmoid”, “softmax”, “logsoftmax”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 8 to preserve input-output spatial shape identity

  • aux_params

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

DeepLabV3

Return type

torch.nn.Module

DeepLabV3+

class segmentation_models_pytorch.DeepLabV3Plus(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', encoder_output_stride=16, decoder_channels=256, decoder_atrous_rates=(12, 24, 36), in_channels=3, classes=1, activation=None, upsampling=4, aux_params=None)[source]

DeepLabV3+ implemetation from “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation”

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimentions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • encoder_output_stride – Downsampling factor for last encoder features (see original paper for explanation)

  • decoder_atrous_rates – Dilation rates for ASPP module (should be a tuple of 3 integer values)

  • decoder_channels – A number of convolution filters in ASPP module. Default is 256

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Avaliable options are “sigmoid”, “softmax”, “logsoftmax”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 4 to preserve input-output spatial shape identity

  • aux_params

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

DeepLabV3Plus

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1802.02611v3