dmx.compressor.modeling.hf.DmxPreTrainedModel

class dmx.compressor.modeling.hf.DmxPreTrainedModel(config: PretrainedConfig, *inputs, **kwargs)

__init__(config: PretrainedConfig, *inputs, **kwargs): Initialize internal Module state, shared by both nn.Module and ScriptModule.

Methods

`__init__`(config, inputs, *kwargs)	Initialize internal Module state, shared by both nn.Module and ScriptModule.
`active_adapter`()
`active_adapters`()	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`add_adapter`(adapter_config[, adapter_name])	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`add_memory_hooks`()	Add a memory hook before and after each sub-module forward pass to record increase in memory consumption.
`add_model_tags`(tags)	Add custom tags into the model that gets pushed to the Hugging Face Hub.
`add_module`(name, module)	Add a child module to the current module.
`apply`(fn)	Apply `fn` recursively to every submodule (as returned by `.children()`) as well as self.
`bfloat16`()	Casts all floating point parameters and buffers to `bfloat16` datatype.
`buffers`([recurse])	Return an iterator over module buffers.
`can_generate`()	Returns whether this model can generate sequences with .generate().
`check_dim_consistency`()	A function that checks format dimension consistency and sparseness dimension consistency for all applicable dmx modules in the model
`children`()	Return an iterator over immediate children modules.
`compile`(args, *kwargs)	Compile this Module's forward using `torch.compile()`.
`compute_transition_scores`(sequences, scores)	Computes the transition scores of sequences given the generation scores (and beam indices, if beam search was used).
`configure`(config, *rules)	Configure Dmx-specific numerics/sparsity/logics
`counting_flops`([zero])
`cpu`()	Move all model parameters and buffers to the CPU.
`create_extended_attention_mask_for_decoder`(...)
`create_submod_transform_forward`(model, ...)	Only supported for fx path, submodule forward can be directly called in export path
`cuda`([device])	Move all model parameters and buffers to the GPU.
`deepcopy_args`(args)
`delete_adapter`(adapter_names)	Delete an adapter's LoRA layers from the underlying model.
`dequantize`()	Potentially dequantize the model in case it has been quantized by a quantization method that support dequantization.
`disable_adapters`()	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`disable_input_require_grads`()	Removes the _require_grads_hook.
`double`()	Casts all floating point parameters and buffers to `double` datatype.
`enable_adapters`()	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`enable_input_require_grads`()	Enables the gradients for the input embeddings.
`estimate_tokens`(input_dict)	Helper function to estimate the total number of tokens from the model inputs.
`eval`()	Set the module in evaluation mode.
`extra_repr`()	Return the extra representation of the module.
`float`(*args)	Casts all floating point parameters and buffers to `float` datatype.
`floating_point_ops`(input_dict[, ...])	Get number of (optionally, non-embeddings) floating-point operations for the forward and backward passes of a batch with this transformer model.
`fold_weights_and_biases`()	A function that applies the ops the weights and biases using the corresponding formats.
`forward`(*input)	Define the computation performed at every call.
`forward_weight_hypernets`()
`freeze`([config_file])	A function that stores the state and ops format of the model to a config file
`from_pretrained`(args, *kwargs)	Instantiate a pretrained pytorch model from a pre-trained model configuration.
`generate`([inputs, generation_config, ...])	Generates sequences of token ids for models with a language modeling head.
`get_adapter_state_dict`([adapter_name])	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`get_buffer`(target)	Return the buffer given by `target` if it exists, otherwise throw an error.
`get_compiled_call`(compile_config)	Return a torch.compile'd version of self.__call__.
`get_extended_attention_mask`(attention_mask, ...)	Makes broadcastable attention and causal masks so that future and masked tokens are ignored.
`get_extra_state`()	Return any extra state to include in the module's state_dict.
`get_head_mask`(head_mask, num_hidden_layers)	Prepare the head mask if needed.
`get_input_embeddings`()	Returns the model's input embeddings.
`get_memory_footprint`([return_buffers])	Get the memory footprint of a model.
`get_monitoring_records`([submodules_to_monitor])
`get_output_embeddings`()	Returns the model's output embeddings.
`get_parameter`(target)	Return the parameter given by `target` if it exists, otherwise throw an error.
`get_position_embeddings`()
`get_runtime_records`()
`get_submodule`(target)	Return the submodule given by `target` if it exists, otherwise throw an error.
`gradient_checkpointing_disable`()	Deactivates gradient checkpointing for the current model.
`gradient_checkpointing_enable`([...])	Activates gradient checkpointing for the current model.
`half`(*args)	Casts all floating point parameters and buffers to `half` datatype.
`heal_tokens`(input_ids[, tokenizer])	Generates sequences of token ids for models with a language modeling head.
`init_weights`()	If needed prunes and maybe initializes weights.
`invert_attention_mask`(encoder_attention_mask)	Invert an attention mask (e.g., switches 0.
`ipu`([device])	Move all model parameters and buffers to the IPU.
`is_backend_compatible`()
`is_same_signature`(_model, args, kwargs)
`keep_dmx_config`()
`load_adapter`([peft_model_id, adapter_name, ...])	Load adapter weights from file or remote Hub folder.
`load_state_dict`(state_dict, *[, strict, assign])
`measure_runtimes`(device[, submodules_to_measure])
`modules`()	Return an iterator over all modules in the network.
`monitoring`([submodules_to_monitor, ...])
`mtia`([device])	Move all model parameters and buffers to the MTIA.
`named_buffers`([prefix, recurse, ...])	Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
`named_children`()	Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
`named_dmx_modules`()	"Returns a list of named modules that are dmx configurable
`named_modules`([memo, prefix, remove_duplicate])	Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
`named_parameters`([prefix, recurse, ...])	Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
`num_parameters`([only_trainable, ...])	Get number of (optionally, trainable or non-embeddings) parameters in the module.
`parameters`([recurse])	Return an iterator over module parameters.
`post_init`()	A method executed at the end of each Transformer model initialization, to execute code that needs the model's modules properly initialized (such as weight initialization).
`post_process_gm`(_model, args, kwargs)
`prepare_inputs_for_generation`(input_ids[, ...])	Prepare the model inputs for generation.
`print_model_tree`([include_type])	A function that prints out the tree structure of a model
`process_inputs_for_export`(model, args, kwargs)
`prune_heads`(heads_to_prune)	Prunes heads of the base model.
`push_to_hub`(repo_id[, use_temp_dir, ...])	Upload the model file to the 🤗 Model Hub.
`register_backward_hook`(hook)	Register a backward hook on the module.
`register_buffer`(name, tensor[, persistent])	Add a buffer to the module.
`register_for_auto_class`([auto_class])	Register this class with a given auto class.
`register_forward_hook`(hook, *[, prepend, ...])	Register a forward hook on the module.
`register_forward_pre_hook`(hook, *[, ...])	Register a forward pre-hook on the module.
`register_full_backward_hook`(hook[, prepend])	Register a backward hook on the module.
`register_full_backward_pre_hook`(hook[, prepend])	Register a backward pre-hook on the module.
`register_load_state_dict_post_hook`(hook)	Register a post-hook to be run after module's `load_state_dict()` is called.
`register_load_state_dict_pre_hook`(hook)	Register a pre-hook to be run before module's `load_state_dict()` is called.
`register_module`(name, module)	Alias for `add_module()`.
`register_parameter`(name, param)	Add a parameter to the module.
`register_state_dict_post_hook`(hook)	Register a post-hook for the `state_dict()` method.
`register_state_dict_pre_hook`(hook)	Register a pre-hook for the `state_dict()` method.
`requires_grad_`([requires_grad])	Change if autograd should record operations on parameters in this module.
`reset_memory_hooks_state`()	Reset the mem_rss_diff attribute of each module (see [~modeling_utils.ModuleUtilsMixin.add_memory_hooks]).
`resize_position_embeddings`(...)
`resize_token_embeddings`([new_num_tokens, ...])	Resizes input token embeddings matrix of the model if new_num_tokens != config.vocab_size.
`retrieve_modules_from_names`(names[, ...])
`reverse_bettertransformer`()	Reverts the transformation from [~PreTrainedModel.to_bettertransformer] so that the original modeling is used, for example in order to save the model.
`save_pretrained`(save_directory[, ...])	Save a model and its configuration file to a directory, so that it can be re-loaded using the [~PreTrainedModel.from_pretrained] class method.
`set_adapter`(adapter_name)	If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
`set_extra_state`(state)	Set extra state contained in the loaded state_dict.
`set_input_embeddings`(value)	Set model's input embeddings.
`set_submodule`(target, module[, strict])	Set the submodule given by `target` if it exists, otherwise throw an error.
`share_memory`()	See `torch.Tensor.share_memory_()`.
`state_dict`(*args[, destination, prefix, ...])	Return a dictionary containing references to the whole state of the module.
`tensor_parallel`(device_mesh)	Tensor parallelize the model across the given device mesh.
`thaw`([config_file])	A function that transforms the model in place from a config file.
`tie_weights`()	Tie the weights between the input embeddings and the output embeddings.
`to`(args, *kwargs)	Move and/or cast the parameters and buffers.
`to_baseline_mode`()
`to_basic_mode`([sbfp_weight_storage])	Configures a transformed DmxModel to the BASIC mode on dmx hardware.
`to_bettertransformer`()	Converts the model to use [PyTorch's native attention implementation](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html), integrated to Transformers through [Optimum library](https://huggingface.co/docs/optimum/bettertransformer/overview).
`to_empty`(*, device[, recurse])	Move the parameters and buffers to the specified device without copying storage.
`to_fp8_mode`()	Configures a transformed DmxModel to the FP8 mode on dmx hardware.
`to_old_forward`(_m)
`to_signature_key`(_m, _args, _kwargs)
`to_transformed_forward`(_m)
`train`([mode])	Set the module in training mode.
`transform`(config, *rules)	Configure Dmx-specific numerics/sparsity/logics
`type`(dst_type)	Casts all parameters and buffers to `dst_type`.
`warn_if_padding_and_no_attention_mask`(...)	Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.
`xpu`([device])	Move all model parameters and buffers to the XPU.
`zero_grad`([set_to_none])	Reset gradients of all model parameters.

Attributes

`T_destination`
`base_model`	The main body of the model.
`base_model_prefix`
`call_super_init`
`config_class`
`device`	The device on which the module is (assuming that all the module parameters are on the same device).
`dmx_config`	"Returns the DmxConfig object for the model
`dmx_module_names`	"Returns a list of module names listed in a dmx_config
`dtype`	The dtype of the module (assuming that all the module parameters have the same dtype).
`dummy_inputs`	Dummy inputs to do a forward pass in the network.
`dump_patches`
`framework`	Identifies that this is a PyTorch model.
`is_gradient_checkpointing`	Whether gradient checkpointing is activated for this model or not.
`is_parallelizable`
`loss_function`
`main_input_name`
`model_tags`
`op_set`	Returns a set of unique ops present in the model
`supports_gradient_checkpointing`
`supports_pp_plan`
`supports_tp_plan`	Returns whether the model has a tensor parallelism plan.
`training`
`transformed`
`additional_dmx_aware_mappings`