@listenviolet 2019-01-25T16:09:33.000000Z 字数 618 阅读 712

CS224n 1.Introduction,SVD and Word2Vec Note

cs224n

Negative Sampling 与 Hierarchical Softmax比较

在cs224n 2019的Lecture 1 的note中，写道：

“In practice, hierarchical softmax tends to be better for infrequent word, while negtive sampling works better for frequent words and lower dimensional vectors.”

读到这里有所疑问，为什么hierarchical的较之negtive sampling对于低频词更好呢？
自己思考了一下，有以下猜想：
举个例子
以cs224n lecture 1 note 4.4 中所给出数据为例：
"is":

$P(is) = 0.9^\frac{3}{4}=0.92$
"bombastic":

$P(bombastic)=0.01^\frac{3}{4}=0.032$
假若单词"bombastic"在树中仍以0.032的概率被选择，则其所在树的深度为

$h(bombastic) = log_2 \frac{1}{0.032} = 31.25$
显然，对于真实世界词汇表来说，构建的树的深度 << 31.
(2^31 = 2,147,483,648 我们并没有这么大的词库)
因而，可以推断，在hier tree中，低频词将以较大于negtive sampling中的概率被选择。

CS224n 1.Introduction,SVD and Word2Vec Note

Negative Sampling 与 Hierarchical Softmax比较

内容目录

选择主题