The embeddings of the four words were projected to four vectors using a shared weight matrix (and possibly a shared bias term), and the four vectors were averaged and the softmax were computed for the predicted word distribution.
The embedding of the input word were projected to one vector using a weight matrix (and possibly a bias term), the projected vector would be computed for softmax for the predicted word distribution.
Since computing $kP_n(w)$ usually only takes $O(1)$ constant time, and as the authors admitted it no longer approximates maximum likelihood estimation, probably this Negative Sampling should not exist at all in my opinion.
But since Negative Sampling no longer does maximum likelihood estimation, how could it still successfully train the word embeddings in the first place in the original paper?
The earliest estimate we could find for the number of assault weapons in the United States comes from Mark Overstreet, a research coordinator at the National Rifle Association.
The 2015 National Firearms Survey, which drew on data collected from thousands of participants, placed the total American gun stock at 265 million weapons, and found that 33% of them were rifles, leading to an overall estimate of 87.4 million rifles.
Most voluntary buybacks run by city and state governments have opted for the fixed-price model: for example, in 2016 the Boston Police Department handed out $200 Target gift cards in exchange for returned firearms, and a Los Angeles buyback program implemented after the Sandy Hook Elementary School shooting offered $200 gift cards for returned assault weapons.
Combining the above estimates of the number of guns that would be banned and the possible cost per weapon, we were able to generate a range of estimates for the cost of a national buyback.