dmx.compressor.numerical.smoothquant

Classes

`ActivationWeightSmoothQuant`(ch_axis, win_ch_axis)	This is the derived class for Activation x Weight smoothQuant.
`SmoothQuant`(a_ch_axis, b_ch_axis[, ...])	SmoothQuant is a quantization technique that reduces MatMul quantization error by migrating the quantization difficulty from the first input of the MatMul (input A) to the second input (input B).

class dmx.compressor.numerical.smoothquant.ActivationWeightSmoothQuant(ch_axis: int, win_ch_axis: int, migration_strength: float = 0.5, scale_format: str | Format = 'SAME', dynamic: bool = False, scale_min: float = 1e-05, **kwargs)

Bases: SmoothQuant

This is the derived class for Activation x Weight smoothQuant.

Parameters:

ch_axis (int) – channel axis for the input activation tensor
win_ch_axis (int) – channel axis for the weight tensor
migration_strength (float) – controls how much quantization difficulty we want to migrate from activations to weights, should be between 0 and 1, default is 0.5.
scale_format (str or dmx.Format) – the numerical format to store and compute the scaler, default is “SAME”.
dynamic (bool) – If set to True, the maximum value of activations will be calculated dynamically, default is False.
scale_min (float) – minimum epsilon value used to prevent division by zero calculating the scaling factors, default is 1e-5.

`ch_axis`

channel axis for the input activation tensor

Type:: int

`win_ch_axis`

channel axis for the weight tensor

Type:: int

`fused_to_weight`

If set to True, the scaling factors will be fused to the weights, cannot be enabled when dynamic is set.

Type:: bool

compute_scale(inp_maxabs: Tensor) → None

Computes the scaling tensor.

Parameters:: inp_maxabs (Tensor) – the maximum value of absolute of input activation

property dynamic: Tensor

Checks if the dynamic flag is set for the input activation.

Returns:

A boolean tensor set to True if the dynamic flag is one,: and set to False otherwise.

extra_repr() → str: Returns the extra representation of Activation x Weight smoothQuant

forward(inp: Tensor, wgt: Tensor) → None

Computes the smoothQuant scaling tensor and scales input activation and weight

Parameters:

inp (tensor) – the input activation tensor
wgt (tensor) – the weight tensor

fuse_to_weight(wgt: Tensor) → None

Fuses the scaling factor to the weight tensor.

Parameters:: wgt (Tensor) – the weight tensor

property input_maxabs_exists: bool

Checks if input_maxabs is already calculated.

Returns:: True if input_maxabs is calculated, False otherwise.

reset_weight_maxabs() → None: Resets weight maxabs.

scale_input(inp)

Scales the input activation.

Parameters:: inp (Tensor) – the input tensor that scaling will be applied on
Returns:: scaled input activation tensor

scale_weight(wgt)

Scales weight.

Parameters:: wgt (Tensor) – the weight tensor that scaling will be applied on
Returns:: scaled weight tensor

set_dynamic(dynamic: bool = True) → None

Sets/resets the dynamic flag for the input activation

Parameters:: a_dynamic (bool) – if set to True, the maximum value of the input activation will be calculated dynamically, default is True.
Raises:: RuntimeError – If the dynamic and the fused_to_weight flags are both enabled.

property weight_maxabs_computed: bool

Checks if weight_maxabs is already calculated.

Returns:: True if weight_maxabs is calculated, False otherwise.

class dmx.compressor.numerical.smoothquant.SmoothQuant(a_ch_axis: int, b_ch_axis: int, a_dynamic: bool = False, b_dynamic: bool = False, migration_strength: float = 0.5, scale_format: str | Format = 'SAME', scale_min: float = 1e-05, **kwargs)

Bases: Module

SmoothQuant is a quantization technique that reduces MatMul quantization error by migrating the quantization difficulty from the first input of the MatMul (input A) to the second input (input B).

https://arxiv.org/pdf/2211.10438.pdf

Parameters:

a_ch_axis (int) – channel axis for input A of the MatMul
b_ch_axis (int) – channel axis for input B of the MatMul
a_dynamic (bool) – If set to True, the maximum value of input A will be calculated dynamically, default is False.
b_dynamic (bool) – If set to True, the maximum value of input B will be calculated dynamically, default is False.
migration_strength (float) – controls how much quantization difficulty we want to migrate from input A to input B, should be between 0 and 1, default is 0.5.
scale_format (str or dmx.Format) – the numerical format to store and compute the scaler, default is “SAME”.
scale_min (float) – minimum epsilon value used to prevent division by zero calculating the scaling factors, default is 1e-5.

`a_ch_axis`

channel axis for input A of the MatMul

Type:: int

`b_ch_axis`

channel axis for input B of the MatMul

Type:: int

`a_dynamic`

If set to True, the maximum value of input A will be calculated dynamically, default is False.

Type:: bool

`b_dynamic`

If set to True, the maximum value of input B will be calculated dynamically, default is False.

Type:: bool

`migration_strength`

controls how much quantization difficulty we want to migrate from input A to input B, should be between 0 and 1, default is 0.5.

Type:: float

`scale_format`

the numerical format to store and compute the scaler, default is “SAME”.

Type:: str or dmx.Format

`scale_min`

minimum epsilon value used to prevent division by zero calculating the scaling factors, default is 1e-5.

Type:: float

`enabled`

If set to True, smoothQuant will be enabled for both input A and input B

Type:: bool

`scale`

scaling factors used to scale input A and input B to (input A / scale) and (input B * scale), respectively.

Type:: Tensor

`a_maxabs`

the maximum value of absolute of input A

Type:: Tensor

`b_maxabs`

the maximum value of absolute of input B

Type:: Tensor

property a_maxabs_exists: bool

Checks if a_maxabs is already calculated.

Returns:: True if a_maxabs is calculated, False otherwise.

property b_maxabs_exists: bool

Checks if b_maxabs is already calculated.

Returns:: True if b_maxabs is calculated, False otherwise.

calibrating: bool = False

compute_scale(a_maxabs: Tensor, b_maxabs: Tensor) → None

Computes the scaling tensor.

Parameters:

a_maxabs (Tensor) – the maximum value of absolute of input A
b_maxabs (Tensor) – the maximum value of absolute of input B

disable() → None: Disables smoothQuant.

enable(enabled: bool = True) → None

Sets/resets the enabled flag.

Parameters:: enabled (bool) – if set to True, smoothQuant is enabled, default is True.

extra_repr() → str: Returns the extra representation of smoothQuant

forward(a: Tensor, b: Tensor) → None

Computes the smoothQuant scaling tensor and scales inputs A and B

Parameters:

a (tensor) – input tensor A
b (tensor) – input tensor B

reset_a_maxabs() → None: Resets a_maxabs to an empty tensor.

reset_b_maxabs() → None: Resets b_maxabs to an empty tensor.

reset_scale() → None: Resets the scaling tensor to an empty tensor.

scale_a(a: Tensor) → Tensor

If smoothQuant is enabled, scales input A.

Parameters:: a (Tensor) – input tensor that scaling will be applied on
Returns:: scaled input tensor

scale_b(b: Tensor) → Tensor

If smoothQuant is enabled, scales input B.

Parameters:: b (Tensor) – input tensor that scaling will be applied on
Returns:: scaled input tensor

set_dynamic(a_dynamic: bool = True, b_dynamic: bool = True) → None

Sets/resets the dynamic flag for inputs A and B.

Parameters:

a_dynamic (bool) – if set to True, the maximum value of input A will be calculated dynamically, default is True.
b_dynamic (bool) – if set to True, the maximum value of input B will be calculated dynamically, default is True.

set_migration_strength(migration_strength: float) → None

Sets the migration_strength factor.

Parameters:: migration_strength (float) – quantization difficulty migration factor, should be between 0 and 1, default is 0.5.
Raises:: ValueError – If migration_strength is less than 0.0 or greater than 1.0.

set_scale_format(format: str | Format = 'SAME') → None

Sets/resets the scale_format.

Parameters:: format (str or dmx.Format) – the numerical format to

store and compute the scaler, default is “SAME”.