CS224n-lecture10-Question Answering

SQuAD

Evaluation

作者对每个问题提供了3个正确回答(gold answers)
模型得分(score)根据以下两个指标
- Exact Match(EM): 模型输出的答案为三个gold answers之一则为1，否则为0
- F1: 将模型输出的答案和gold answers当做词袋，假设模型给出的答案x，x有a个词出现在某个gold answer，x一共有b个词，gold answer有c个词，则在该gold answer上计算答案的Precision = a / b，Recall = a / c，F1 = 2PR / (P + R)，对每个问题，输出的答案score为其在几个 gold answers 上得分最大的那个
计算上述两个指标时，都忽略标点符号与冠词(a, an, the)
整个数据集最终的 F1 score 为每个问题的 F1 score 的平均

Stanford Attentive Reader

给定问题q和文章t，将问题q输入双向 lstm，将设正向和反向的 lstm 隐状态维度都为d，分别取最后一个隐状态 concat 为一个维度2d的向量q，作为question vector，对文章t同理，输入双向lstm，对每个单词都能得到一个隐状态p(passage representation)，将其与q求attention score，分别预测答案的 start 位置和 end 位置

BiDAF: Bi-Directional Attention Flow for Machine Comprehension

不仅使用了GloVe embedding，还使用了Char-CNN得到Character Embedding

Attention Flow layer

注意力应该既从context流向question，也从question流向context

对于C2Q attention，Sij如下，ci为context的第i个隐状态，qj为question的第j个隐状态，再将它们与二者之间的element-wise乘积 (常用于神经网络中学习两个vector是否相似) concat到一起，即得到Sij。对于ci，计算它对所有q的attention score，然后用该score对所有q进行weighted sum，得到ai，其为attention weighted view of the question mapped onto each position in the passage，再对所有的i重复上述操作。

对于Q2C attention，稍有不同，对每个context word ci，找到与其最相似(similar score最高，即Sij最大)的qj，取其对应相似分数Sij，求softmax后，对ci求weighted sum，得到c’ the most important words in the context with respect to the query。

预测start和end token的步骤也更complex

几种计算Attention的方式