Perspective in Informatics 3 - Assignment 1 - Answer Sheet

Subject: Perspective in Informatics 3 – Fall Semester 2014
Professor: ＤａｖｏｏｄＲａｆｉｅｉ
Assignment No.1 HOANG Nguyen Phong
Submitted on November 3rd ID number: 6930-26-1264
Question 1 [30 marks]
• 3.5.1: on the space of nonnegative integers, which of the following functions are
distance measures? If so, prove it; if not, prove that it fails to satisfy one or more of the
axioms.
a) max(x, y) = the larger of x and y.
This function is distance measure function because of the following reasons:
• In the space of nonnegative integers as given from the beginning, the function would
never return a negative value.
• If x and y are at the same position in the space, then no larger value is defined, which
would return a null value (which is 0). That satisfies the reflexive property of distance
measure function.
• Measuring both distances from x to y and x < y, and from y to x would only return one
larger value. It satisfies the symmetric property of distance measure function.
• Let x and y are 2 separate nodes, and a is a random node (different from x and y).
Then, the triangle-inequality can be proved as shown in the below table:
3 Possible cases of a max(x,a) + max(y,a) > max(x,y) Check
a ∈ [x,y] a + y ≥ y true
a < (x,y) x + y ≥ y true
a > (x,y) a + a ≥ y true(since a≥y => 2a≥y)
• Actually, this function is the L∞-norm Euclidean distance measuring function, which is
used when x and y have many dimensions (where the dimension ~> ∞). Then, the
distance between x and y is approximately equal to the max(x,y).
b) diff(x, y) = |x − y| (the absolute magnitude of the difference between x and y).
• By proving in the same manner of the above case, this function is also a distance
measure function, because of the following reasons:
• Since the absolute-value function, it would always return a nonnegative value.
• If x and y is a same point, the function will return 0. That satisfies the reflexive
property.
• Let x and y are 2 separate nodes, and a is a random node (different from x and y).
Then, the triangle-inequality can be proved as shown in the below table:
3 Possible cases of a diff(x,a) + diff(y,a) > diff(x,y) Check
a ∈ [x,y]
(a – x) + (y – a) ≥ y – x
 y – x ≥ y – x
true
a < (x,y)
(x – a) + (y – a) > y – x
 x + y – 2a > y – x
 x > a
true (since a<x as given in
the initial condition of a )
a > (x,y) a + a > y
true(since a>y as given in
the initial condition of a
=> 2a>y)
• Actually, we can imagine that this function is a L1-norm Euclidean Distance function
for measuring x and y in 1 dimension.
c) sum(x, y) = x + y.
It is easily proved that this function is not a distance measure function, since it does not
satisfies the reflexive property. For instance, if x and y are a same point (≠0), the function
would return a positive value in lieu of 0 because they are both in nonnegative space.
1

• 3.7.2: Let us compute sketches using the following four “random” vectors:
V1= [+1,+1,+1,-1] V2=[+1,+1,-1,+1]
V3=[+1,-1,+1,+1] V4=[-1,+1,+1,+1]
Compute the sketches of the following vectors.
• [2,3,4,5]
Random vector Dot product Sketch value
V1= [+1,+1,+1,-1] 4 +1
V2=[+1,+1,-1,+1] 6 +1
V3=[+1,-1,+1,+1] 8 +1
V4=[-1,+1,+1,+1] 10 +1
(b)[-2,3,-4,5]
V1= [+1,+1,+1,-1] -8 -1
V2=[+1,+1,-1,+1] 10 +1
V3=[+1,-1,+1,+1] -4 -1
V4=[-1,+1,+1,+1] 6 +1
(c)[2,-3,4,-5]
V1= [+1,+1,+1,-1] 8 +1
V2=[+1,+1,-1,+1] -10 -1
V3=[+1,-1,+1,+1] 4 +1
V4=[-1,+1,+1,+1] -6 -1
For each pair, what is the estimated angle between them, according to the sketches? What are
the true angles?
The following 2 formulas are employed to calculate the Estimated angle and true angles:
• Estimated angle = 180O(1 – sim( Sketches of 2 vectors))
• True Angle =
𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷(𝑡𝑡ℎ𝑒𝑒 2 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣)
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 2 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
Pair Estimated angle True angles
∠ (a)(b) 90o 90o-15o=75o
∠ (b)(c) 180o 180o
∠ (a)(c) 90o 90o+15o=105o
• 3.7.3: suppose we form sketches by using all sixteen of the vectors of length 4, whose
components are each +1 or -1. Compute the sketches of the three vectors in Exercise
3.7.2.
*at dot product = 0, sketch value is randomly chosen to be 1 or +1 as highlighted in gray.
2

Vector a 2 3 4 5
Random vector dot Product Sketch value
v1 -1 -1 -1 -1 -14 -1
v2 -1 -1 -1 1 -4 -1
v3 -1 -1 1 -1 -6 -1
v4 -1 -1 1 1 4 1
v5 -1 1 -1 -1 -8 -1
v6 -1 1 -1 1 2 1
v7 -1 1 1 -1 0 1
v8 -1 1 1 1 10 1
v9 1 -1 -1 -1 -10 -1
v10 1 -1 -1 1 0 -1
v11 1 -1 1 -1 -2 -1
v12 1 -1 1 1 8 1
v13 1 1 -1 -1 -4 -1
v14 1 1 -1 1 6 1
v15 1 1 1 -1 4 1
v16 1 1 1 1 14 1
Vector b -2 3 -4 5
v1 -1 -1 -1 -1 -2 -1
v2 -1 -1 -1 1 8 1
v3 -1 -1 1 -1 -10 -1
v4 -1 -1 1 1 0 -1
v5 -1 1 -1 -1 4 1
v6 -1 1 -1 1 14 1
v7 -1 1 1 -1 -4 -1
v8 -1 1 1 1 6 1
v9 1 -1 -1 -1 -6 -1
v10 1 -1 -1 1 4 1
v11 1 -1 1 -1 -14 -1
v12 1 -1 1 1 -4 -1
v13 1 1 -1 -1 0 1
v14 1 1 -1 1 10 1
v15 1 1 1 -1 -8 -1
v16 1 1 1 1 2 1
3

Vector c 2 -3 4 -5
v1 -1 -1 -1 -1 2 1
v2 -1 -1 -1 1 -8 -1
v3 -1 -1 1 -1 10 1
v4 -1 -1 1 1 0 1
v5 -1 1 -1 -1 -4 -1
v6 -1 1 -1 1 -14 -1
v7 -1 1 1 -1 4 1
v8 -1 1 1 1 -6 -1
v9 1 -1 -1 -1 6 1
v10 1 -1 -1 1 -4 -1
v11 1 -1 1 -1 14 1
v12 1 -1 1 1 4 1
v13 1 1 -1 -1 0 -1
v14 1 1 -1 1 -10 -1
v15 1 1 1 -1 8 1
v16 1 1 1 1 -2 -1
How do the estimates of the angles between each pair compare with the true angles?
Pair Estimated angle True angles
∠ (a)(b) ½ => 90o 90o-15o=75o
∠ (b)(c) 11/12 => approximate 180o 180o
∠ (a)(c) ½ => 90o 90o+15o=105o
Then it can be deduced that even all of 16 random vectors are chosen, the estimates of the
angles between each pair compare with the true angles do not change compared with the result
in problem 3.7.2.
4

Question 2 [10 marks] 3.7.4(A): Suppose we form sketches using the four vectors from
Exercise 3.7.2. What are the constrains on a, b, c, and d that will cause the sketch of the vector
[a, b, c, d] to be [+1,+1,+1,+1]? (write your constrains in as simple form as possible)
The dot products of four random vectors and [a, b, c, d] can be represented in form of matrix as
following equation:
�
1 1 1 −1
1 1 −1 1
1 −1 1 1
−1 1 1 1
�。 �
𝑎𝑎
𝑏𝑏
𝑐𝑐
𝑑𝑑
� = �
𝑥𝑥1
𝑥𝑥2
𝑥𝑥3
𝑥𝑥4
�
the sketch of [a, b, c, d] is [+1 , +1, +1, +1] where all of x1,x2,x3,x4 ≥ 0
�
1 1 1 −1
1 1 −1 1
1 −1 1 1
−1 1 1 1
�
−1
�
1 1 1 −1
1 1 −1 1
1 −1 1 1
−1 1 1 1
�。 �
𝑎𝑎
𝑏𝑏
𝑐𝑐
𝑑𝑑
� = �
1 1 1 −1
1 1 −1 1
1 −1 1 1
−1 1 1 1
�
−1
�
𝑥𝑥1
𝑥𝑥2
𝑥𝑥3
𝑥𝑥4
�
�
𝑎𝑎
𝑏𝑏
𝑐𝑐
𝑑𝑑
� = �
1 1 1 −1
1 1 −1 1
1 −1 1 1
−1 1 1 1
�
−1
�
𝑥𝑥1
𝑥𝑥2
𝑥𝑥3
𝑥𝑥4
�
We have�
1 1 1 −1
1 1 −1 1
1 −1 1 1
−1 1 1 1
�
−1
=
1
4
�
1 1 1 −1
1 1 −1 1
1 −1 1 1
−1 1 1 1
�
So a, b, a and d can be constrained by the following equation:
�
𝑎𝑎
𝑏𝑏
𝑐𝑐
𝑑𝑑
� =
1
4
�
1 1 1 −1
1 1 −1 1
1 −1 1 1
−1 1 1 1
� �
𝑥𝑥1
𝑥𝑥2
𝑥𝑥3
𝑥𝑥4
� where x1,x2,x3,x4 ≥ 0�
𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 − 𝑑𝑑 ≥ 0
𝑎𝑎 + 𝑏𝑏 − 𝑐𝑐 + 𝑑𝑑 ≥ 0
𝑎𝑎 − 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑 ≥ 0
−𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑 ≥ 0
5

Question 3 [10 marks]
a) Consider a universe U with n elements, and let R and S be subsets of U both of size m,
chosen uniformly at random.
What is the expected value of the Jaccard similarity of R and S?
The Expectation of an event x is calculated as Ε(x) = ∑x. P(x)
In this case, Jaccard Similarity of R and S is calculated as:
Sim(R,S)=
|𝑅𝑅⋂𝑆𝑆|
|𝑅𝑅⋃𝑆𝑆|
=
𝑘𝑘
2𝑚𝑚−𝑘𝑘
(𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 0 ≤ 𝑘𝑘 ≤ 𝑚𝑚 𝑖𝑖𝑖𝑖 𝑡𝑡ℎ𝑒𝑒 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑒𝑒𝑒𝑒𝑒𝑒 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑅𝑅 𝑎𝑎𝑎𝑎𝑎𝑎 𝑆𝑆)
Next, the probability of Sim(R,S) is calculated as following:
P(sim(R,S)=(
𝑘𝑘
))=
𝐶𝐶 𝑚𝑚
𝑘𝑘 𝐶𝐶𝑛𝑛−𝑚𝑚
𝑚𝑚−𝑘𝑘
𝐶𝐶𝑛𝑛
𝑚𝑚
Since:
• To create set R, we combine m element(s) from n elements of the universal set U. It is
calculated as: 𝐶𝐶𝑛𝑛
𝑚𝑚
• Next, to create set S, we need to take k common element(s) from set R first, which is
calculated as 𝐶𝐶𝑚𝑚
𝑘𝑘
. Then the left (m-k) element(s) are chosen from (n-m) elements, since
m element(s) have been chosen to create set R at the beginning. The formula is:
𝐶𝐶𝑛𝑛−𝑚𝑚
𝑚𝑚−𝑘𝑘
As a result, Expectation of Jaccard Similarity sim(S,T) is estimated as:
E(sim(S,T))=∑
𝑘𝑘
𝐶𝐶 𝑚𝑚
𝑘𝑘
𝑚𝑚−𝑘𝑘
𝐶𝐶𝑛𝑛
𝑚𝑚 =𝑚𝑚
𝑘𝑘=0 ∑
𝑘𝑘
�
𝑚𝑚
𝑘𝑘��
𝑛𝑛−𝑚𝑚
𝑚𝑚−𝑘𝑘�
� 𝑛𝑛
𝑚𝑚�
𝑚𝑚
𝑘𝑘=0
b) How does your answer to part (a) change if R and S must include a certain element (say z)
of U?
It means k ~> z, then the answer is changed to be:
E(sim(S,T))=∑
𝑧𝑧
2𝑚𝑚−𝑧𝑧
𝐶𝐶 𝑚𝑚
𝑧𝑧
𝑚𝑚−𝑧𝑧
𝐶𝐶𝑛𝑛
𝑚𝑚 =𝑧𝑧
𝑘𝑘=0 ∑
𝑧𝑧
2𝑚𝑚−𝑧𝑧
� 𝑚𝑚
𝑧𝑧 �� 𝑛𝑛−𝑚𝑚
𝑚𝑚−𝑧𝑧�
� 𝑛𝑛
𝑚𝑚�
𝑧𝑧
𝑘𝑘=0
c) How does your answer to part (a) change if R and S must be disjoint?
It means k=0, then the answer is changed to be:
E(sim(S,T))=∑
𝑘𝑘
𝐶𝐶 𝑚𝑚
𝑘𝑘
𝑚𝑚−𝑘𝑘
𝐶𝐶𝑛𝑛
𝑚𝑚 =0
𝑘𝑘=0 0
6

Perspective in Informatics 3 - Assignment 1 - Answer Sheet

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Perspective in Informatics 3 - Assignment 1 - Answer Sheet

Ähnlich wie Perspective in Informatics 3 - Assignment 1 - Answer Sheet (20)

Mehr von Hoang Nguyen Phong

Mehr von Hoang Nguyen Phong (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Perspective in Informatics 3 - Assignment 1 - Answer Sheet