(3)
where xi and xj denote
datapoints in the original feature space, kxi −xj k2 is the Euclidean
distance, pi |j
is the conditional probability between xi andxj , and σi is the variance
of the Gaussian distribution centered at xi . The
value of σi primarily depend on the data density
at xi , which varies among datapoints. Details on
how to determine the value of σi can be found in
van der Maaten and Hinton (2008). The conditional probabilities between
the low-dimensional counterpart yi andyj is indicated asqi |j . Through
minimizing the differences betweenpi |j andqi |j , SNE
maximally copies the local and global structure of datapoints from
orginal to new feature space. Kullback-Leibler Divergence (KLD), a
measure of difference between probabilities, is employed by SNE as the
loss function (see Eq. 4).