(3)
where xi and xj denote datapoints in the original feature space, kxixj k2 is the Euclidean distance, pi |j
is the conditional probability between xi andxj , and σi is the variance of the Gaussian distribution centered at xi . The value of σi primarily depend on the data density at xi , which varies among datapoints. Details on how to determine the value of σi can be found in van der Maaten and Hinton (2008). The conditional probabilities between the low-dimensional counterpart yi andyj is indicated asqi |j . Through minimizing the differences betweenpi |j andqi |j , SNE maximally copies the local and global structure of datapoints from orginal to new feature space. Kullback-Leibler Divergence (KLD), a measure of difference between probabilities, is employed by SNE as the loss function (see Eq. 4).