pytorch_extra_mhirano.nn.DotProductAttention¶

class pytorch_extra_mhirano.nn.DotProductAttention(qdim: int, output_dim: Optional[int] = None, dropout: float = 0.0, transform: bool = True, bias: bool = True, same_embd: bool = True, add_bias_kv: Optional[bool] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, batch_first: bool = True, scaled: bool = False)[source]¶

DotProductAttention.

\[ \begin{align}\begin{aligned}\mathrm{DotProductAttention}(Q, K, V) &=& \mathrm{softmax}(qk^T) v\\q &=& QW_1 + b_1\\k &=& KW_2 + b_2\\v &=& VW_3 + b_3\end{aligned}\end{align} \]

Parameters

qdim – dimension of the model, i.e., dimension of Q
output_dim – dimension of output layer, i.e., dimension of output. Default: None
dropout – a Dropout layer on attn_output_weights. Default: 0.0.
transform – q = Q, k = K, v = V if it is False. Default: True
bias – add bias as module parameter. Default: True.
same_embd – W1 = W2 = W3, b1 = b2 = b3 if it is True. Default: True
add_bias_kv – add bias to the key and value sequences at dim=0.
kdim – total number of features in key. Default: None.
vdim –
total number of features in key. Default: None. Note: if kdim and vdim are None, they will be set to embed_dim such that

query, key, and value have the same number of features.
batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False (seq, batch, feature)
scaled – If True, this performs as scaled dot product attention

Examples::

>>> attn = DotProductAttention(query_dim)
>>> attn_output, attn_output_weights = attn(query, key, value)

__init__(qdim: int, output_dim: Optional[int] = None, dropout: float = 0.0, transform: bool = True, bias: bool = True, same_embd: bool = True, add_bias_kv: Optional[bool] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, batch_first: bool = True, scaled: bool = False) → None[source]¶: Initializes internal Module state, shared by both nn.Module and ScriptModule.

Methods

`__init__`(qdim[, output_dim, dropout, ...])	Initializes internal Module state, shared by both nn.Module and ScriptModule.
`add_module`(name, module)	Adds a child module to the current module.
`apply`(fn)	Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self.
`bfloat16`()	Casts all floating point parameters and buffers to `bfloat16` datatype.
`buffers`([recurse])	Returns an iterator over module buffers.
`children`()	Returns an iterator over immediate children modules.
`cpu`()	Moves all model parameters and buffers to the CPU.
`cuda`([device])	Moves all model parameters and buffers to the GPU.
`double`()	Casts all floating point parameters and buffers to `double` datatype.
`eval`()	Sets the module in evaluation mode.
`extra_repr`()	Set the extra representation of the module
`float`()	Casts all floating point parameters and buffers to `float` datatype.
`forward`(query, key, value[, ...])	param query Query embeddings of shape \((L, E_q)\) for unbatched input, \((L, N, E_q)\) when `batch_first=False`
`generate_square_subsequent_mask`(sz)	Generate a square mask for the sequence.
`get_buffer`(target)	Returns the buffer given by `target` if it exists, otherwise throws an error.
`get_extra_state`()	Returns any extra state to include in the module's state_dict.
`get_parameter`(target)	Returns the parameter given by `target` if it exists, otherwise throws an error.
`get_submodule`(target)	Returns the submodule given by `target` if it exists, otherwise throws an error.
`half`()	Casts all floating point parameters and buffers to `half` datatype.
`ipu`([device])	Moves all model parameters and buffers to the IPU.
`load_state_dict`(state_dict[, strict])	Copies parameters and buffers from `state_dict` into this module and its descendants.
`modules`()	Returns an iterator over all modules in the network.
`named_buffers`([prefix, recurse, ...])	Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
`named_children`()	Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
`named_modules`([memo, prefix, remove_duplicate])	Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
`named_parameters`([prefix, recurse, ...])	Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
`parameters`([recurse])	Returns an iterator over module parameters.
`register_backward_hook`(hook)	Registers a backward hook on the module.
`register_buffer`(name, tensor[, persistent])	Adds a buffer to the module.
`register_forward_hook`(hook, *[, prepend, ...])	Registers a forward hook on the module.
`register_forward_pre_hook`(hook, *[, ...])	Registers a forward pre-hook on the module.
`register_full_backward_hook`(hook[, prepend])	Registers a backward hook on the module.
`register_full_backward_pre_hook`(hook[, prepend])	Registers a backward pre-hook on the module.
`register_load_state_dict_post_hook`(hook)	Registers a post hook to be run after module's `load_state_dict` is called.
`register_module`(name, module)	Alias for `add_module()`.
`register_parameter`(name, param)	Adds a parameter to the module.
`register_state_dict_pre_hook`(hook)	These hooks will be called with arguments: `self`, `prefix`, and `keep_vars` before calling `state_dict` on `self`.
`requires_grad_`([requires_grad])	Change if autograd should record operations on parameters in this module.
`set_extra_state`(state)	This function is called from `load_state_dict()` to handle any extra state found within the state_dict.
`share_memory`()	See `torch.Tensor.share_memory_()`
`state_dict`(*args[, destination, prefix, ...])	Returns a dictionary containing references to the whole state of the module.
`to`(args, *kwargs)	Moves and/or casts the parameters and buffers.
`to_empty`(*, device)	Moves the parameters and buffers to the specified device without copying storage.
`train`([mode])	Sets the module in training mode.
`type`(dst_type)	Casts all parameters and buffers to `dst_type`.
`xpu`([device])	Moves all model parameters and buffers to the XPU.
`zero_grad`([set_to_none])	Sets gradients of all model parameters to zero.

Attributes

`T_destination`	alias of TypeVar('T_destination', bound=`Dict`[`str`, `Any`])
`call_super_init`
`dump_patches`

API reference

pytorch_extra_mhirano.nn.SelfAttention