Attention layer
- Original Link : https://keras.io/api/layers/attention_layers/attention/
- Last Checked at : 2024-11-25
Attention class
keras.layers.Attention(
use_scale=False, score_mode="dot", dropout=0.0, seed=None, **kwargs
)Dot-product attention layer, a.k.a. Luong-style attention.
Inputs are a list with 2 or 3 elements: 1. A query tensor of shape (batch_size, Tq, dim). 2. A value tensor of shape (batch_size, Tv, dim). 3. A optional key tensor of shape (batch_size, Tv, dim). If none supplied, value will be used as a key.
The calculation follows the steps: 1. Calculate attention scores using query and key with shape (batch_size, Tq, Tv). 2. Use scores to calculate a softmax distribution with shape (batch_size, Tq, Tv). 3. Use the softmax distribution to create a linear combination of value with shape (batch_size, Tq, dim).
Arguments
- use_scale: If
True, will create a scalar variable to scale the attention scores. - dropout: Float between 0 and 1. Fraction of the units to drop for the attention scores. Defaults to
0.0. - seed: A Python integer to use as random seed incase of
dropout. - score_mode: Function to use to compute attention scores, one of
{"dot", "concat"}."dot"refers to the dot product between the query and key vectors."concat"refers to the hyperbolic tangent of the concatenation of thequeryandkeyvectors.
Call arguments
- inputs: List of the following tensors:
query: Query tensor of shape(batch_size, Tq, dim).value: Value tensor of shape(batch_size, Tv, dim).key: Optional key tensor of shape(batch_size, Tv, dim). If not given, will usevaluefor bothkeyandvalue, which is the most common case.
- mask: List of the following tensors:
query_mask: A boolean mask tensor of shape(batch_size, Tq). If given, the output will be zero at the positions wheremask==False.value_mask: A boolean mask tensor of shape(batch_size, Tv). If given, will apply the mask such that values at positions wheremask==Falsedo not contribute to the result.
- return_attention_scores: bool, it
True, returns the attention scores (after masking and softmax) as an additional output argument. - training: Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (no dropout).
- use_causal_mask: Boolean. Set to
Truefor decoder self-attention. Adds a mask such that positionicannot attend to positionsj > i. This prevents the flow of information from the future towards the past. Defaults toFalse.
Output: Attention outputs of shape (batch_size, Tq, dim). (Optional) Attention scores after masking and softmax with shape (batch_size, Tq, Tv).