pytorch_extra_mhirano.nn.SelfAttention¶
- class pytorch_extra_mhirano.nn.SelfAttention(qdim: int, dropout: float = 0.0, transform: bool = True, bias: bool = True, same_embd: bool = True, add_bias_kv: Optional[bool] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, batch_first: bool = True, scaled: bool = False)[source]¶
Self Attention module using DotProductAttention
- Parameters
qdim – dimension of the model, i.e., dimension of Q
dropout – a Dropout layer on attn_output_weights. Default: 0.0.
transform – q = Q, k = K, v = V if it is False. Default: True
bias – add bias as module parameter. Default: True.
same_embd – W1 = W2 = W3, b1 = b2 = b3 if it is True. Default: True
add_bias_kv – add bias to the key and value sequences at dim=0.
kdim – total number of features in key. Default: None.
vdim – total number of features in key. Default: None.
batch_first – If
True, then the input and output tensors are provided as (batch, seq, feature). Default:False(seq, batch, feature)scaled – If
True, this performs as scaled dot product attention
- __init__(qdim: int, dropout: float = 0.0, transform: bool = True, bias: bool = True, same_embd: bool = True, add_bias_kv: Optional[bool] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, batch_first: bool = True, scaled: bool = False) None[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Methods
__init__(qdim[, dropout, transform, bias, ...])Initializes internal Module state, shared by both nn.Module and ScriptModule.
add_module(name, module)Adds a child module to the current module.
apply(fn)Applies
fnrecursively to every submodule (as returned by.children()) as well as self.bfloat16()Casts all floating point parameters and buffers to
bfloat16datatype.buffers([recurse])Returns an iterator over module buffers.
children()Returns an iterator over immediate children modules.
cpu()Moves all model parameters and buffers to the CPU.
cuda([device])Moves all model parameters and buffers to the GPU.
double()Casts all floating point parameters and buffers to
doubledatatype.eval()Sets the module in evaluation mode.
extra_repr()Set the extra representation of the module
float()Casts all floating point parameters and buffers to
floatdatatype.forward(inputs[, key_padding_mask, ...])Defines the computation performed at every call.
get_buffer(target)Returns the buffer given by
targetif it exists, otherwise throws an error.get_extra_state()Returns any extra state to include in the module's state_dict.
get_parameter(target)Returns the parameter given by
targetif it exists, otherwise throws an error.get_submodule(target)Returns the submodule given by
targetif it exists, otherwise throws an error.half()Casts all floating point parameters and buffers to
halfdatatype.ipu([device])Moves all model parameters and buffers to the IPU.
load_state_dict(state_dict[, strict])Copies parameters and buffers from
state_dictinto this module and its descendants.modules()Returns an iterator over all modules in the network.
named_buffers([prefix, recurse, ...])Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children()Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules([memo, prefix, remove_duplicate])Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters([prefix, recurse, ...])Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters([recurse])Returns an iterator over module parameters.
register_backward_hook(hook)Registers a backward hook on the module.
register_buffer(name, tensor[, persistent])Adds a buffer to the module.
register_forward_hook(hook, *[, prepend, ...])Registers a forward hook on the module.
register_forward_pre_hook(hook, *[, ...])Registers a forward pre-hook on the module.
register_full_backward_hook(hook[, prepend])Registers a backward hook on the module.
register_full_backward_pre_hook(hook[, prepend])Registers a backward pre-hook on the module.
register_load_state_dict_post_hook(hook)Registers a post hook to be run after module's
load_state_dictis called.register_module(name, module)Alias for
add_module().register_parameter(name, param)Adds a parameter to the module.
register_state_dict_pre_hook(hook)These hooks will be called with arguments:
self,prefix, andkeep_varsbefore callingstate_dictonself.requires_grad_([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state(state)This function is called from
load_state_dict()to handle any extra state found within the state_dict.share_memory()See
torch.Tensor.share_memory_()state_dict(*args[, destination, prefix, ...])Returns a dictionary containing references to the whole state of the module.
to(*args, **kwargs)Moves and/or casts the parameters and buffers.
to_empty(*, device)Moves the parameters and buffers to the specified device without copying storage.
train([mode])Sets the module in training mode.
type(dst_type)Casts all parameters and buffers to
dst_type.xpu([device])Moves all model parameters and buffers to the XPU.
zero_grad([set_to_none])Sets gradients of all model parameters to zero.
Attributes
T_destinationalias of TypeVar('T_destination', bound=
Dict[str,Any])call_super_initdump_patches