pytorch_extra_mhirano.nn.DotProductAttention¶
- class pytorch_extra_mhirano.nn.DotProductAttention(qdim: int, output_dim: Optional[int] = None, dropout: float = 0.0, transform: bool = True, bias: bool = True, same_embd: bool = True, add_bias_kv: Optional[bool] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, batch_first: bool = True, scaled: bool = False)[source]¶
DotProductAttention.
\[ \begin{align}\begin{aligned}\mathrm{DotProductAttention}(Q, K, V) &=& \mathrm{softmax}(qk^T) v\\q &=& QW_1 + b_1\\k &=& KW_2 + b_2\\v &=& VW_3 + b_3\end{aligned}\end{align} \]- Parameters
qdim – dimension of the model, i.e., dimension of Q
output_dim – dimension of output layer, i.e., dimension of output. Default: None
dropout – a Dropout layer on attn_output_weights. Default: 0.0.
transform – q = Q, k = K, v = V if it is False. Default: True
bias – add bias as module parameter. Default: True.
same_embd – W1 = W2 = W3, b1 = b2 = b3 if it is True. Default: True
add_bias_kv – add bias to the key and value sequences at dim=0.
kdim – total number of features in key. Default: None.
vdim –
total number of features in key. Default: None. Note: if kdim and vdim are None, they will be set to embed_dim such that
query, key, and value have the same number of features.
batch_first – If
True, then the input and output tensors are provided as (batch, seq, feature). Default:False(seq, batch, feature)scaled – If
True, this performs as scaled dot product attention
- Examples::
>>> attn = DotProductAttention(query_dim) >>> attn_output, attn_output_weights = attn(query, key, value)
- __init__(qdim: int, output_dim: Optional[int] = None, dropout: float = 0.0, transform: bool = True, bias: bool = True, same_embd: bool = True, add_bias_kv: Optional[bool] = None, kdim: Optional[int] = None, vdim: Optional[int] = None, batch_first: bool = True, scaled: bool = False) None[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Methods
__init__(qdim[, output_dim, dropout, ...])Initializes internal Module state, shared by both nn.Module and ScriptModule.
add_module(name, module)Adds a child module to the current module.
apply(fn)Applies
fnrecursively to every submodule (as returned by.children()) as well as self.bfloat16()Casts all floating point parameters and buffers to
bfloat16datatype.buffers([recurse])Returns an iterator over module buffers.
children()Returns an iterator over immediate children modules.
cpu()Moves all model parameters and buffers to the CPU.
cuda([device])Moves all model parameters and buffers to the GPU.
double()Casts all floating point parameters and buffers to
doubledatatype.eval()Sets the module in evaluation mode.
extra_repr()Set the extra representation of the module
float()Casts all floating point parameters and buffers to
floatdatatype.forward(query, key, value[, ...])- param query
Query embeddings of shape \((L, E_q)\) for unbatched input, \((L, N, E_q)\) when
batch_first=False
generate_square_subsequent_mask(sz)Generate a square mask for the sequence.
get_buffer(target)Returns the buffer given by
targetif it exists, otherwise throws an error.get_extra_state()Returns any extra state to include in the module's state_dict.
get_parameter(target)Returns the parameter given by
targetif it exists, otherwise throws an error.get_submodule(target)Returns the submodule given by
targetif it exists, otherwise throws an error.half()Casts all floating point parameters and buffers to
halfdatatype.ipu([device])Moves all model parameters and buffers to the IPU.
load_state_dict(state_dict[, strict])Copies parameters and buffers from
state_dictinto this module and its descendants.modules()Returns an iterator over all modules in the network.
named_buffers([prefix, recurse, ...])Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children()Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules([memo, prefix, remove_duplicate])Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters([prefix, recurse, ...])Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters([recurse])Returns an iterator over module parameters.
register_backward_hook(hook)Registers a backward hook on the module.
register_buffer(name, tensor[, persistent])Adds a buffer to the module.
register_forward_hook(hook, *[, prepend, ...])Registers a forward hook on the module.
register_forward_pre_hook(hook, *[, ...])Registers a forward pre-hook on the module.
register_full_backward_hook(hook[, prepend])Registers a backward hook on the module.
register_full_backward_pre_hook(hook[, prepend])Registers a backward pre-hook on the module.
register_load_state_dict_post_hook(hook)Registers a post hook to be run after module's
load_state_dictis called.register_module(name, module)Alias for
add_module().register_parameter(name, param)Adds a parameter to the module.
register_state_dict_pre_hook(hook)These hooks will be called with arguments:
self,prefix, andkeep_varsbefore callingstate_dictonself.requires_grad_([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state(state)This function is called from
load_state_dict()to handle any extra state found within the state_dict.share_memory()See
torch.Tensor.share_memory_()state_dict(*args[, destination, prefix, ...])Returns a dictionary containing references to the whole state of the module.
to(*args, **kwargs)Moves and/or casts the parameters and buffers.
to_empty(*, device)Moves the parameters and buffers to the specified device without copying storage.
train([mode])Sets the module in training mode.
type(dst_type)Casts all parameters and buffers to
dst_type.xpu([device])Moves all model parameters and buffers to the XPU.
zero_grad([set_to_none])Sets gradients of all model parameters to zero.
Attributes
T_destinationalias of TypeVar('T_destination', bound=
Dict[str,Any])call_super_initdump_patches