# Definition of the similarity in a set of integers

Discussion in 'Mathematica' started by Ryan Markley, Feb 12, 2009.

1. ### Ryan MarkleyGuest

Hello I have two sets of integers eg

S1 = (25,14,32,45) and S2 = (26,12,31,48)

I want to define an operation similar to the variance that give me how
similar both sets are, for example in the above example for both sets
the results I have to get need to be similar because both sets are
similar.

The problem with the variance is this

S1 = (25,1,1,1) and S2 = (1,1,25,1) these two sets have the same
variance but they are completly different. What mathematical operation
can I use to do what I am looking for.

Ryan Markley, Feb 12, 2009

2. ### dhGuest

Hi Ryan,

what about the difference of the ordered sets?

Daniel

dh, Feb 13, 2009

3. ### Jean-Marc GullietGuest

Note that what you call "sets" are not sets as usually defined in
mathematics: a collection of *distinct* objects. That is S1 = (25,1,1,1)
as a set is {1, 25} and S2 = (1,1,25,1) as a set is {1, 25}, which
clearly shows that both sets S1 and S2 are equal. OTOH, the sets S1 =
{25,14,32,45} and S2 = {26,12,31,48} may be deemed as very dissimilar
since they have no element in common. I think the objects you are
dealing with can be described as vectors or ordered lists of integers.

Now, assuming you are comparing only vectors of equal length, you could
use the correlation or the cosine distance, among many others available
in Mathematica. See "Distance and Similarity Measures" at

http://reference.wolfram.com/mathematica/guide/DistanceAndSimilarityMeasu
res.html

For instance,

In:= S1 = {25, 14, 32, 45};
S2 = {26, 12, 31, 48};

CorrelationDistance[S1, S2] // N
CosineDistance[S1, S2] // N

Out= 0.00361843

Out= 0.00152087

In:= S1 = {25, 1, 1, 1};
S2 = {1, 1, 25, 1};

CorrelationDistance[S1, S2] // N
CosineDistance[S1, S2] // N

Out= 1.33333

Out= 0.917197

In:= S1 = {24, 1};
S2 = {25, 2};

CorrelationDistance[S1, S2] // N
CosineDistance[S1, S2] // N

Out= 0.

Out= 0.00072905

In:= S1 = {25, 1};
S2 = {1, 25};

CorrelationDistance[S1, S2] // N
CosineDistance[S1, S2] // N

Out= 2.

Out= 0.920128

Regards,
--Jean-Marc

Jean-Marc Gulliet, Feb 13, 2009
4. ### Sjoerd C. de VriesGuest

Assuming S1 and S2 contain the same amount of integers:

EuclideanDistance[S1,S2]^2/Length[S1]

or Correlation[S1,S2]

Cheers -- Sjoerd

Sjoerd C. de Vries, Feb 13, 2009