|
@@ -797,9 +797,9 @@
|
|
|
"- Implementing the self-attention mechanism step by step, we will start by introducing the three training weight matrices $W_q$, $W_k$, and $W_v$\n",
|
|
"- Implementing the self-attention mechanism step by step, we will start by introducing the three training weight matrices $W_q$, $W_k$, and $W_v$\n",
|
|
|
"- These three matrices are used to project the embedded input tokens, $x^{(i)}$, into query, key, and value vectors via matrix multiplication:\n",
|
|
"- These three matrices are used to project the embedded input tokens, $x^{(i)}$, into query, key, and value vectors via matrix multiplication:\n",
|
|
|
"\n",
|
|
"\n",
|
|
|
- " - Query vector: $q^{(i)} = W_q \\,x^{(i)}$\n",
|
|
|
|
|
- " - Key vector: $k^{(i)} = W_k \\,x^{(i)}$\n",
|
|
|
|
|
- " - Value vector: $v^{(i)} = W_v \\,x^{(i)}$\n"
|
|
|
|
|
|
|
+ " - Query vector: $q^{(i)} = x^{(i)}\\,W_q $\n",
|
|
|
|
|
+ " - Key vector: $k^{(i)} = x^{(i)}\\,W_k $\n",
|
|
|
|
|
+ " - Value vector: $v^{(i)} = x^{(i)}\\,W_v $\n"
|
|
|
]
|
|
]
|
|
|
},
|
|
},
|
|
|
{
|
|
{
|