dmx.compressor.numerical.smoothquant
Classes
|
This is the derived class for Activation x Weight smoothQuant. |
|
SmoothQuant is a quantization technique that reduces MatMul quantization error by migrating the quantization difficulty from the first input of the MatMul (input A) to the second input (input B). |
- class dmx.compressor.numerical.smoothquant.ActivationWeightSmoothQuant(ch_axis: int, win_ch_axis: int, migration_strength: float = 0.5, scale_format: str | Format = 'SAME', dynamic: bool = False, scale_min: float = 1e-05, **kwargs)
Bases:
SmoothQuantThis is the derived class for Activation x Weight smoothQuant.
- Parameters:
ch_axis (int) – channel axis for the input activation tensor
win_ch_axis (int) – channel axis for the weight tensor
migration_strength (float) – controls how much quantization difficulty we want to migrate from activations to weights, should be between 0 and 1, default is 0.5.
scale_format (str or dmx.Format) – the numerical format to store and compute the scaler, default is “SAME”.
dynamic (bool) – If set to True, the maximum value of activations will be calculated dynamically, default is False.
scale_min (float) – minimum epsilon value used to prevent division by zero calculating the scaling factors, default is 1e-5.
- `ch_axis`
channel axis for the input activation tensor
- Type:
int
- `win_ch_axis`
channel axis for the weight tensor
- Type:
int
- `fused_to_weight`
If set to True, the scaling factors will be fused to the weights, cannot be enabled when dynamic is set.
- Type:
bool
- compute_scale(inp_maxabs: Tensor) None
Computes the scaling tensor.
- Parameters:
inp_maxabs (Tensor) – the maximum value of absolute of input activation
- property dynamic: Tensor
Checks if the dynamic flag is set for the input activation.
- Returns:
- A boolean tensor set to True if the dynamic flag is one,
and set to False otherwise.
- extra_repr() str
Returns the extra representation of Activation x Weight smoothQuant
- forward(inp: Tensor, wgt: Tensor) None
Computes the smoothQuant scaling tensor and scales input activation and weight
- Parameters:
inp (tensor) – the input activation tensor
wgt (tensor) – the weight tensor
- fuse_to_weight(wgt: Tensor) None
Fuses the scaling factor to the weight tensor.
- Parameters:
wgt (Tensor) – the weight tensor
- property input_maxabs_exists: bool
Checks if input_maxabs is already calculated.
- Returns:
True if input_maxabs is calculated, False otherwise.
- reset_weight_maxabs() None
Resets weight maxabs.
- scale_input(inp)
Scales the input activation.
- Parameters:
inp (Tensor) – the input tensor that scaling will be applied on
- Returns:
scaled input activation tensor
- scale_weight(wgt)
Scales weight.
- Parameters:
wgt (Tensor) – the weight tensor that scaling will be applied on
- Returns:
scaled weight tensor
- set_dynamic(dynamic: bool = True) None
Sets/resets the dynamic flag for the input activation
- Parameters:
a_dynamic (bool) – if set to True, the maximum value of the input activation will be calculated dynamically, default is True.
- Raises:
RuntimeError – If the
dynamicand thefused_to_weightflags are both enabled.
- property weight_maxabs_computed: bool
Checks if weight_maxabs is already calculated.
- Returns:
True if weight_maxabs is calculated, False otherwise.
- class dmx.compressor.numerical.smoothquant.SmoothQuant(a_ch_axis: int, b_ch_axis: int, a_dynamic: bool = False, b_dynamic: bool = False, migration_strength: float = 0.5, scale_format: str | Format = 'SAME', scale_min: float = 1e-05, **kwargs)
Bases:
ModuleSmoothQuant is a quantization technique that reduces MatMul quantization error by migrating the quantization difficulty from the first input of the MatMul (input A) to the second input (input B).
https://arxiv.org/pdf/2211.10438.pdf
- Parameters:
a_ch_axis (int) – channel axis for input A of the MatMul
b_ch_axis (int) – channel axis for input B of the MatMul
a_dynamic (bool) – If set to True, the maximum value of input A will be calculated dynamically, default is False.
b_dynamic (bool) – If set to True, the maximum value of input B will be calculated dynamically, default is False.
migration_strength (float) – controls how much quantization difficulty we want to migrate from input A to input B, should be between 0 and 1, default is 0.5.
scale_format (str or dmx.Format) – the numerical format to store and compute the scaler, default is “SAME”.
scale_min (float) – minimum epsilon value used to prevent division by zero calculating the scaling factors, default is 1e-5.
- `a_ch_axis`
channel axis for input A of the MatMul
- Type:
int
- `b_ch_axis`
channel axis for input B of the MatMul
- Type:
int
- `a_dynamic`
If set to True, the maximum value of input A will be calculated dynamically, default is False.
- Type:
bool
- `b_dynamic`
If set to True, the maximum value of input B will be calculated dynamically, default is False.
- Type:
bool
- `migration_strength`
controls how much quantization difficulty we want to migrate from input A to input B, should be between 0 and 1, default is 0.5.
- Type:
float
- `scale_format`
the numerical format to store and compute the scaler, default is “SAME”.
- Type:
str or dmx.Format
- `scale_min`
minimum epsilon value used to prevent division by zero calculating the scaling factors, default is 1e-5.
- Type:
float
- `enabled`
If set to True, smoothQuant will be enabled for both input A and input B
- Type:
bool
- `scale`
scaling factors used to scale input A and input B to (input A / scale) and (input B * scale), respectively.
- Type:
Tensor
- `a_maxabs`
the maximum value of absolute of input A
- Type:
Tensor
- `b_maxabs`
the maximum value of absolute of input B
- Type:
Tensor
- property a_maxabs_exists: bool
Checks if a_maxabs is already calculated.
- Returns:
True if a_maxabs is calculated, False otherwise.
- property b_maxabs_exists: bool
Checks if b_maxabs is already calculated.
- Returns:
True if b_maxabs is calculated, False otherwise.
- calibrating: bool = False
- compute_scale(a_maxabs: Tensor, b_maxabs: Tensor) None
Computes the scaling tensor.
- Parameters:
a_maxabs (Tensor) – the maximum value of absolute of input A
b_maxabs (Tensor) – the maximum value of absolute of input B
- disable() None
Disables smoothQuant.
- enable(enabled: bool = True) None
Sets/resets the enabled flag.
- Parameters:
enabled (bool) – if set to True, smoothQuant is enabled, default is True.
- extra_repr() str
Returns the extra representation of smoothQuant
- forward(a: Tensor, b: Tensor) None
Computes the smoothQuant scaling tensor and scales inputs A and B
- Parameters:
a (tensor) – input tensor A
b (tensor) – input tensor B
- reset_a_maxabs() None
Resets a_maxabs to an empty tensor.
- reset_b_maxabs() None
Resets b_maxabs to an empty tensor.
- reset_scale() None
Resets the scaling tensor to an empty tensor.
- scale_a(a: Tensor) Tensor
If smoothQuant is enabled, scales input A.
- Parameters:
a (Tensor) – input tensor that scaling will be applied on
- Returns:
scaled input tensor
- scale_b(b: Tensor) Tensor
If smoothQuant is enabled, scales input B.
- Parameters:
b (Tensor) – input tensor that scaling will be applied on
- Returns:
scaled input tensor
- set_dynamic(a_dynamic: bool = True, b_dynamic: bool = True) None
Sets/resets the dynamic flag for inputs A and B.
- Parameters:
a_dynamic (bool) – if set to True, the maximum value of input A will be calculated dynamically, default is True.
b_dynamic (bool) – if set to True, the maximum value of input B will be calculated dynamically, default is True.
- set_migration_strength(migration_strength: float) None
Sets the migration_strength factor.
- Parameters:
migration_strength (float) – quantization difficulty migration factor, should be between 0 and 1, default is 0.5.
- Raises:
ValueError – If migration_strength is less than 0.0 or greater than 1.0.