YYMuse

注意力机制

Attention Mechanism

大语言模型

让模型在处理每个 Token 时动态关注上下文中最相关的部分，是 Transformer 捕获长距离依赖的关键。

# Transformer # 架构

直觉理解： 翻译"The animal didn't cross the street because it was too tired"时，模型需要知道 it 指代 animal 而非 street。

计算步骤： 1. 每个 Token 生成 Query、Key、Value 三个向量 2. 用 Query 和所有 Key 计算相似度得分 3. Softmax 归一化为注意力权重 4. 加权求和 Value 得到输出

公式： Attention(Q,K,V) = softmax(QK^T / √d_k) × V