dmx.compressor.numerical.smoothquant

Classes

ActivationWeightSmoothQuant(ch_axis, win_ch_axis)

This is the derived class for Activation x Weight smoothQuant.

SmoothQuant(a_ch_axis, b_ch_axis[, ...])

SmoothQuant is a quantization technique that reduces MatMul quantization error by migrating the quantization difficulty from the first input of the MatMul (input A) to the second input (input B).

class dmx.compressor.numerical.smoothquant.ActivationWeightSmoothQuant(ch_axis: int, win_ch_axis: int, migration_strength: float = 0.5, scale_format: str | Format = 'SAME', dynamic: bool = False, scale_min: float = 1e-05, **kwargs)

Bases: SmoothQuant

This is the derived class for Activation x Weight smoothQuant.

Parameters:
  • ch_axis (int) – channel axis for the input activation tensor

  • win_ch_axis (int) – channel axis for the weight tensor

  • migration_strength (float) – controls how much quantization difficulty we want to migrate from activations to weights, should be between 0 and 1, default is 0.5.

  • scale_format (str or dmx.Format) – the numerical format to store and compute the scaler, default is “SAME”.

  • dynamic (bool) – If set to True, the maximum value of activations will be calculated dynamically, default is False.

  • scale_min (float) – minimum epsilon value used to prevent division by zero calculating the scaling factors, default is 1e-5.

`ch_axis`

channel axis for the input activation tensor

Type:

int

`win_ch_axis`

channel axis for the weight tensor

Type:

int

`fused_to_weight`

If set to True, the scaling factors will be fused to the weights, cannot be enabled when dynamic is set.

Type:

bool

compute_scale(inp_maxabs: Tensor) None

Computes the scaling tensor.

Parameters:

inp_maxabs (Tensor) – the maximum value of absolute of input activation

property dynamic: Tensor

Checks if the dynamic flag is set for the input activation.

Returns:

A boolean tensor set to True if the dynamic flag is one,

and set to False otherwise.

extra_repr() str

Returns the extra representation of Activation x Weight smoothQuant

forward(inp: Tensor, wgt: Tensor) None

Computes the smoothQuant scaling tensor and scales input activation and weight

Parameters:
  • inp (tensor) – the input activation tensor

  • wgt (tensor) – the weight tensor

fuse_to_weight(wgt: Tensor) None

Fuses the scaling factor to the weight tensor.

Parameters:

wgt (Tensor) – the weight tensor

property input_maxabs_exists: bool

Checks if input_maxabs is already calculated.

Returns:

True if input_maxabs is calculated, False otherwise.

reset_weight_maxabs() None

Resets weight maxabs.

scale_input(inp)

Scales the input activation.

Parameters:

inp (Tensor) – the input tensor that scaling will be applied on

Returns:

scaled input activation tensor

scale_weight(wgt)

Scales weight.

Parameters:

wgt (Tensor) – the weight tensor that scaling will be applied on

Returns:

scaled weight tensor

set_dynamic(dynamic: bool = True) None

Sets/resets the dynamic flag for the input activation

Parameters:

a_dynamic (bool) – if set to True, the maximum value of the input activation will be calculated dynamically, default is True.

Raises:

RuntimeError – If the dynamic and the fused_to_weight flags are both enabled.

property weight_maxabs_computed: bool

Checks if weight_maxabs is already calculated.

Returns:

True if weight_maxabs is calculated, False otherwise.

class dmx.compressor.numerical.smoothquant.SmoothQuant(a_ch_axis: int, b_ch_axis: int, a_dynamic: bool = False, b_dynamic: bool = False, migration_strength: float = 0.5, scale_format: str | Format = 'SAME', scale_min: float = 1e-05, **kwargs)

Bases: Module

SmoothQuant is a quantization technique that reduces MatMul quantization error by migrating the quantization difficulty from the first input of the MatMul (input A) to the second input (input B).

https://arxiv.org/pdf/2211.10438.pdf

Parameters:
  • a_ch_axis (int) – channel axis for input A of the MatMul

  • b_ch_axis (int) – channel axis for input B of the MatMul

  • a_dynamic (bool) – If set to True, the maximum value of input A will be calculated dynamically, default is False.

  • b_dynamic (bool) – If set to True, the maximum value of input B will be calculated dynamically, default is False.

  • migration_strength (float) – controls how much quantization difficulty we want to migrate from input A to input B, should be between 0 and 1, default is 0.5.

  • scale_format (str or dmx.Format) – the numerical format to store and compute the scaler, default is “SAME”.

  • scale_min (float) – minimum epsilon value used to prevent division by zero calculating the scaling factors, default is 1e-5.

`a_ch_axis`

channel axis for input A of the MatMul

Type:

int

`b_ch_axis`

channel axis for input B of the MatMul

Type:

int

`a_dynamic`

If set to True, the maximum value of input A will be calculated dynamically, default is False.

Type:

bool

`b_dynamic`

If set to True, the maximum value of input B will be calculated dynamically, default is False.

Type:

bool

`migration_strength`

controls how much quantization difficulty we want to migrate from input A to input B, should be between 0 and 1, default is 0.5.

Type:

float

`scale_format`

the numerical format to store and compute the scaler, default is “SAME”.

Type:

str or dmx.Format

`scale_min`

minimum epsilon value used to prevent division by zero calculating the scaling factors, default is 1e-5.

Type:

float

`enabled`

If set to True, smoothQuant will be enabled for both input A and input B

Type:

bool

`scale`

scaling factors used to scale input A and input B to (input A / scale) and (input B * scale), respectively.

Type:

Tensor

`a_maxabs`

the maximum value of absolute of input A

Type:

Tensor

`b_maxabs`

the maximum value of absolute of input B

Type:

Tensor

property a_maxabs_exists: bool

Checks if a_maxabs is already calculated.

Returns:

True if a_maxabs is calculated, False otherwise.

property b_maxabs_exists: bool

Checks if b_maxabs is already calculated.

Returns:

True if b_maxabs is calculated, False otherwise.

calibrating: bool = False
compute_scale(a_maxabs: Tensor, b_maxabs: Tensor) None

Computes the scaling tensor.

Parameters:
  • a_maxabs (Tensor) – the maximum value of absolute of input A

  • b_maxabs (Tensor) – the maximum value of absolute of input B

disable() None

Disables smoothQuant.

enable(enabled: bool = True) None

Sets/resets the enabled flag.

Parameters:

enabled (bool) – if set to True, smoothQuant is enabled, default is True.

extra_repr() str

Returns the extra representation of smoothQuant

forward(a: Tensor, b: Tensor) None

Computes the smoothQuant scaling tensor and scales inputs A and B

Parameters:
  • a (tensor) – input tensor A

  • b (tensor) – input tensor B

reset_a_maxabs() None

Resets a_maxabs to an empty tensor.

reset_b_maxabs() None

Resets b_maxabs to an empty tensor.

reset_scale() None

Resets the scaling tensor to an empty tensor.

scale_a(a: Tensor) Tensor

If smoothQuant is enabled, scales input A.

Parameters:

a (Tensor) – input tensor that scaling will be applied on

Returns:

scaled input tensor

scale_b(b: Tensor) Tensor

If smoothQuant is enabled, scales input B.

Parameters:

b (Tensor) – input tensor that scaling will be applied on

Returns:

scaled input tensor

set_dynamic(a_dynamic: bool = True, b_dynamic: bool = True) None

Sets/resets the dynamic flag for inputs A and B.

Parameters:
  • a_dynamic (bool) – if set to True, the maximum value of input A will be calculated dynamically, default is True.

  • b_dynamic (bool) – if set to True, the maximum value of input B will be calculated dynamically, default is True.

set_migration_strength(migration_strength: float) None

Sets the migration_strength factor.

Parameters:

migration_strength (float) – quantization difficulty migration factor, should be between 0 and 1, default is 0.5.

Raises:

ValueError – If migration_strength is less than 0.0 or greater than 1.0.

set_scale_format(format: str | Format = 'SAME') None

Sets/resets the scale_format.

Parameters:

format (str or dmx.Format) – the numerical format to

store and compute the scaler, default is “SAME”.