dmx.compressor.sparse.BlockTopK
- class dmx.compressor.sparse.BlockTopK(K=4, block_size=8, block_dim=-1, mask_gradient=False)
Fine-grain structured sparsity with K non-zeros out of block_size elements long block_dim.
- __init__(K=4, block_size=8, block_dim=-1, mask_gradient=False)
Methods
__init__([K, block_size, block_dim, ...])apply(*args, **kwargs)backward(ctx, grad_output)Define a formula for differentiating the operation with backward mode automatic differentiation.
forward(ctx, score, params)Define the forward of the custom autograd Function.
from_shorthand(sh)get_mask(score)jvp(ctx, *grad_inputs)Define a formula for differentiating the operation with forward mode automatic differentiation.
mark_dirty(*args)Mark given tensors as modified in an in-place operation.
mark_non_differentiable(*args)Mark outputs as non-differentiable.
mark_shared_storage(*pairs)maybe_clear_saved_tensorsnameregister_hookregister_prehooksave_for_backward(*tensors)Save given tensors for a future call to
backward().save_for_forward(*tensors)Save given tensors for a future call to
jvp().set_materialize_grads(value)Set whether to materialize grad tensors.
setup_context(ctx, inputs, output)There are two ways to define the forward pass of an autograd.Function.
vjp(ctx, *grad_outputs)Define a formula for differentiating the operation with backward mode automatic differentiation.
vmap(info, in_dims, *args)Define the behavior for this autograd.Function underneath
torch.vmap().Attributes
dirty_tensorsgenerate_vmap_rulematerialize_gradsmetadataneeds_input_gradnext_functionsnon_differentiablerequires_gradsaved_for_forwardsaved_tensorssaved_variablesto_save