(4)
Compared with original SNE, t-SNE has two major adjustments: 1) the use
of joint probabilities (Eq. 5) instead of conditional probabilities to
represent similarities, and 2) the use of a Student t-distribution
instead of Gaussian distribution to compute similarities between two
datapoints in low-dimensional space (Eq. 6).