SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Data
Streaming
Algorithms
TABLE OF
CONTENTS
01
Approximate counting
Algorithm
Algorithm
What do you mean by an
algorithm
Hashing
What do you mean by hashing,
along with algorithms for
hashing
Allows counting of large
numbers of events using
low memory
Counting Distinct
Elements
We input stream of data and
output are distinct elements in
the data stream.
Frequency estimation
Frequency estimation is to
estimate the frequency of any
item x, i.e. the number of
occurrences of any item x
References
All the research papers
referred
02
03
04
05
06
01
What is an Algorithm?
An algorithm is a set of instructions that produces an output or a result.
It tells the system what to do in order to achieve the desired result. It may not know what the
result is beforehand, but it knows that it wants one.
4
02
Hashing is a sort of algorithm that takes information
of any size and changes over it into information of
settled size. The principle contrast between hashing
and encryption is that a hash is irreversible.
Hashing is most commonly used to implement hash
tables. A hash table stores key/value pairs in the form
of a list where any element can be accessed using its
index.
Hashing is also used in data encryption. Passwords
can be stored in the form of their hashes so that even if
a database is breached, plaintext passwords are not
accessible. MD5, SHA-1 and SHA-2 are popular
cryptographic hashes.
Hashing algorithms are functions that generate a fixed-length result (the hash, or hash value)
from a given input. The hash value is a summary of the original data.
Definition:
A hash function is a function h: D -> R,
where the domain D = {0,1}* and R = {0,1}n for some n >= 1
DESCRIPTION OF A HASH FUNCTION
In general, hash functions work as follows:
● The input message is divided into blocks.
● Then the hash for the first block, a value with a fixed size, is
calculated for the first block.
● Then, the hash for the second block is obtained and added to the
previous output.
● This process is repeated until all blocks are calculated.
8
● Unique Hash value
● Hashing Speed
● Secure hash
● Hash functions are widely used in IT.
● We can use them for digital signatures, message authentication codes (MACs), and other
forms of authentication.
● We can also use them for indexing data in hash tables, for fingerprinting, identifying files,
detecting duplicates or as checksums (we can detect if a sent file didn’t suffer accidental
or intentional data corruption).
● We can also use them for password storage.
Some Hashing
Algorithm:
1. MD5
2. SHA-1
3. SHA-2
4. SHA-3
9
Hash Algorithms Comparisons
10
Step1: //Define r as the following
var int[64] r, k
r[ 0..15] := {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22}
r[16..31] := {5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20}
r[32..47] := {4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23}
r[48..63] := {6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21}
Step 2: //Use binary integer part of the sines of integers as constants:
for i from 0 to 63
k[i] := floor(abs(sin(i + 1)) × 2^32)
//Initialize variables:
var int h0 := 0x67452301
var int h1 := 0xEFCDAB89
var int h2 := 0x98BADCFE
var int h3 := 0x10325476
Step3: //Pre-processing:
append "1" bit to message
append "0" bits until message length in bits ≡ 448 (mod 512)
append bit length of message as 64-bit little-endian integer to message
//Process the message in successive 512-bit chunks:
for each 512-bit chunk of message
break chunk into sixteen 32-bit little-endian words w(i), 0 ≤ i ≤ 15
Step 4: //Initialize hash value for this chunk:
var int a := h0
var int b := h1
var int c := h2
var int d := h3
1. MD-5 PSEUDOCODE
Step 5: //Main loop:
for i from 0 to 63
if 0 ≤ i ≤ 15 then
f := (b and c) or ((not b) and d)
g := i
else if 16 ≤ i ≤ 31
f := (d and b) or ((not d) and c)
g := (5×i + 1) mod 16
else if 32 ≤ i ≤ 47
f := b xor c xor d
g := (3×i + 5) mod 16
else if 48 ≤ i ≤ 63
f := c xor (b or (not d))
g := (7×i) mod 16
temp := d
d := c
c := b
b := ((a + f + k(i) + w(g)) leftrotate r(i)) + b
a := temp
Step 6: //Add this chunk's hash to result so far:
h0 := h0 + a
h1 := h1 + b
h2 := h2 + c
h3 := h3 + d
var int digest := h0 append h1 append h2 append h3 //(expressed as
little-endian)
11
2. SHA-1 PSEUDOCODE
Step1: initialize all the variables
ml = message length in bits (always a multiple of the number of bits
in a character).
Step2: Pre-processing:
append the bit '1' to the message i.e. by adding 0x80 if characters are
8 bits.
Step 3: Process the message in successive 512-bit chunks: break
message into 512-bit chunks
for each chunk
break chunk into sixteen 32-bit big-endian words w[i], 0 ≤ i ≤ 15
Step 4 : Extend the sixteen 32-bit words into eighty 32-bit words: for i
from 16 to 79
w[i] = (w[i-3] xor w[i-8] xor w[i-14] xor w[i-16]) leftrotate 1
Step 5: Initialize hash value for this chunk: Main loop:
for i from 0 to 79
if 0 ≤ i ≤ 19 then
f = (b and c) or ((not b) and d)
k = 0x5A827999
else if 20 ≤ i ≤ 39
f = b xor c xor d
k = 0x6ED9EBA1
else if 40 ≤ i ≤ 59
f = (b and c) or (b and d) or (c and d)
k = 0x8F1BBCDC
else if 60 ≤ i ≤ 79
f = b xor c xor d
k = 0xCA62C1D6
temp = (a leftrotate 5) + f + e + k + w[i]
e = d
d = c
c = b
leftrotate 30
b = a
a = temp
Step 6: Add this chunk's hash to result so far:
h0 = h0 + a
h1 = h1 + b
h2 = h2 + c
h3 = h3 + d
h4 = h4 + e
Step 7: Produce the final hash value (big-endian) as a 160 bit number
12
Step 1: Initialize hash values:
first 32 bits of the fractional parts of the square roots of the first 8 primes
2..19
Step 2: Initialize array of round constants:
first 32 bits of the fractional parts of the cube roots of the first 64 primes
2..311
Step 3: Pre-processing:
append the bit '1' to the message append k bits '0', where k is the minimum
number >= 0 such that the resulting message
length (modulo 512 in bits) is 448.
append length of message (without the '1' bit or padding), in bits, as 64-bit
big-endian integer (this will make the entire post-processed length a multiple
of 512 bits)
Step 4: Process the message in successive 512-bit chunks:
break message into 512-bit chunks for each chunk
create a 64-entry message schedule array w[0..63] of 32- bit words
Step 5: Extend the first 16 words into the remaining 48 words w[16..63] of the
message schedule array:
for i from 16 to 63
s0 := (w[i-15] rightrotate 7) xor (w[i-15] rightrotate 18) xor (w[i-15] rightshift 3)
s1 := (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] rightshift 10)
w[i] := w[i-16] + s0 + w[i-7] + s1
Step 6: Initialize working variables to current hash value Compression
function main loop:
for i from 0 to 63
S1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
ch := (e and f) xor ((not e) and g)
temp1 := h + S1 + ch + k[i] + w[i]
S0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
maj := (a and b) xor (a and c) xor (b and c)
temp2 := S0 + maj
h := g
g := f
f := e
e := d + temp1
d := c
c := b
b := a
a := temp1 + temp2
Step 7: Add the compressed chunk to the current hash value
Step 8: Produce the final hash value (big-endian):
digest := hash := h0 append h1 append h2 append h3 append h4 append h5
append h6 append h7.
SHA-224 is identical to SHA-256[11], except that: the initial hash values h0
through h7 are different, and the output is constructed by omitting h7.
3. SHA-2 PSEUDOCODE
Algorithms and Limitations
13
Sr
No
Hashing Algorithms Limitations
1 SHA-1 This requires a lot of computing power and resources
2 SHA-2 Increased resistance to collision means SHA256 and SHA512 produce longer outputs
(256b and 512b respectively) than SHA1 (160b). Those defending use of SHA2 cite
this increased output size as reason behind attack resistance.
3 SHA-3 SHA-3 is designed to be a good hash-function, not a good password-hashing-scheme
(PHS), whereas bcrypt is designed to be a PHS and was analyzed in this direction as
well.
4 MD5 Using salted md5 for passwords is a bad idea. Not because of MD5's cryptographic
weaknesses, but because it's fast. This means that an attacker can try billions of
candidate passwords per second on a single GPU.
03
Overview
● The Approximate counting algorithm also known as the morris algorithm allows counting of large
numbers of events using low memory
● Invented by robert morris it uses probabilistic counting to increment the counter
● This algorithm is considered one of the precursors of the current data streaming algorithms.
● The basic idea is to track log n instead of n and use log log n bits instead of log n bits
15
Origin
● The Approximate counting algorithm also known as the morris algorithm
allows counting of large numbers of events using low memory
● Invented by robert morris it uses probabilistic counting to increment the
counter
● This algorithm is considered one of the precursors of the current data
streaming algorithms.
● The basic idea is to track log n instead of n and use log log n bits instead of
log n bits
● The space complexity of this technique is O(log log n)
16
Working of the Algorithm
We need log2 n bits to store an integer between 1 and n else
two integers would map to the same bitstring and be
indistinguishable. But what if we only care about recovering
the integer up to a constant factor then it suffices to only
recover log n, and storing log n only requires O(log log n) bits.
Consider the streaming problem there is a stream of n
increments. We would like to compute n, though
approximately, and with some potential small probability of
failure. We could keep an explicit counter in memory and
increment it after each stream update, but that would require
log2 n bits. Morris’ clever algorithm works as follows: initialize
a counter c to 1, and after each update increment c with
probability 1/2 c and do nothing otherwise. Philippe Flajolet
showed that the expected value of 2c is n + 2 after n updates ,
and thus 2c −2 is an unbiased estimator of n.
17
Applications
● The algorithm is useful in examining large data streams for patterns.
● It is particularly useful in applications of data compression
● Sight and sound recognition
● Artificial intelligence applications.
18
19
Morris ‘s Counter :
1. Init():
a. C <-0
2. Update(item)
a. Increment c with
probability 2 ^ -c
b. And do nothing with
probability 1 - 2 ^-c
3. Query():
a. Return 2 ^c -1
04
We input stream of data and output are distinct elements in the data
stream.
For example, count the number of distinct of IP address you encounter.
What do you mean by Counting distinct Elements?
Our first problem is to approximate the Fp-norm of items in a stream.
Fp-norm: Let S be a multi-set, where every item i of S is in [N]. Let mi be the
number of occurrences of item i in set S. Then the Fp-norm of set S is defined
by,
Where 0^P is set to be 0. By definition, the F0-norm of set S is the number of
distinct items in S, and the F1-norm of S is the number of items in S.
Problem Statement : Let S be a data stream representing a multi set S. Items of
S arrive consecutively and every items i ∈ [n]. Design a streaming algorithm to
(ε,δ) approximate the F0-norm of set S.
Where ε is confidence parameter and δ is
approximation parameter
To solve this problem statement, we can implement 3
different algorithms,
1.The AMS Algorithm(Primitive)
2.The BJKST Algorithm(basic)
3.Indyk Algorithm(advanced)
1.The AMS Algorithm
This algorithm for approximating F0 is by Noga Alon, Yossi Matias, and Mario
Szegedy.
Assume that we have seen sufficiently many numbers,and these numbers
are uniformly distributed. We look at the binary expression Binary(x) of
every item x, and we expect that for one out of d distinct items Binary(x)
ends with d consecutive zeros. More generally, let
be the number of zeros that Binary(x) ends with, and we have the following observation:
1. Ifρ(x) = 1 for any x, then it is likely that the number of distinct integers is 2^1= 2.
2. Ifρ(x) = 2 for any x, then it is likely that the number of distinct integers is 2^2= 4.
3. Ifρ(x) = 3 for any x, then it is likely that the number of distinct integers is 2^3= 8.
4. Ifρ(x) =r for any x, then it is likely that the number of distinct integers is2r.
To implement this idea, we use a hash function h so that, after applying h, all items
in S are uniformly distributed, and on average one out of F0 distinct numbers hit
ρ(h(x)) ≥ logF0. Hence the maximum value of ρ(h(x)) over all items x in the stream could
give us a good approximation of the number of distinct items.
An Algorithm For Approximating F0:
1. Choose a random function h: [n]→[n] from a family of pairwise independent hash
functions;
2. Z←0;
3. While an item x arrives do
a. ifρ(h(x))> z then
i. z←ρ(h(x));
4. Return 2z+½
2.The BJKST Algorithm
Our second algorithm for approximating F0 is a simplified version of the algorithm by Bar-
Yossef et al. In contrast to the AMS algorithm, the BJKST algorithm uses a set to
keep the sampled items.
The basic idea behind the sampling scheme of the BJKST algorithm is as follows:
1. Let B be a set that is used to retain sampled items, and B=∅ initially. The size of
B is O(1/ε2) and only depends on approximation parameter ε.
2. The initial sampling probability is 1, i.e. the algorithm keeps all items seen so far
in B.
3. When the set B becomes full, shrink B by removing about half items and from
then on the sample probability becomes smaller
4. In the end the number of items in B and the current sampling probability are
used to approximate the F0-norm
The BJKST Algorithm (Simplified Version)
1. Choose a random function h: [n]→[n] from a family of pairwise independent hash
functions.
2. Z←0 //Z is the index of the current level
3. B←∅ //Set B keeps sampled items
4. While an item x arrives do
a. ifρ(h(x))≥z then
i. B←B∪{(x,ρ(h(x)))}7
ii. while|B|≥c/ε2 do //Set B becomes full
1. z←z+ 1 //Increase the level
2. Shrink B by removing all (x,ρ(h(x))) with ρ(h(x))< z
5. Return |B|·2z
3.INDYK Algorithm
We next show that F0-norm of a set Scan be estimated in dynamic streams.This
algorithm, due to Piotr Indyk, presents beautiful applications of the so-called
stable distributions in designing streaming algorithms.
Let S be a stream consisting of pairs of the form (si,Ui), where si∈[n] and
Ui= +/− represents dynamic changes of si. Design a data streaming
algorithm that, for any ε and δ, (ε,δ)-approximates the F0-norm of S.
Assume that every item in the stream is in [n], and we want to achieve an (ε,δ)-
approximation of the Fp-norm. Let us further assume that we have matrix M of
k=Θ(ε−2log(1/δ)) rows and n columns, where every item in M is a random variable drawn
from a p-stable distribution, generated by (BJKST). Given matrix M, (Indyk) keeps a vector z
∈ Rk which can be expressed by a linear combination of columns of matrix M
The F0 norm of multi-set S can be approximated by Indyk Algorithm for choosing sufficiently
small p, assuming that we have an upper bound K of the number of occurrences of every
item in the stream.
Approximating Fp-norm in a Turnstile Stream (An Idealized Algorithm):
1. While 1≤i≤k do
2. Zi←0
3. While an operation arrives do
a. If item j is added then
b. For i←1,k do
i. zi←zi+M[i,j]
c. If item j is deleted then
i. For i←1,k do
1. zi←zi−M[i,j]
d. If Fp-norm is asked then
i. Return medium1≤i≤k{|zi|p}·scalefactor(p)
This idealized algorithm relies on matrix M of size k×n, and for every occurrence of item i, the
algorithm needs the i th column of matrix M
Complexity of Various Algorithms:
1.The AMS Algorithm
Running k = Θ(log(1/δ)) independent copies of Algorithm above and returning the median value, we can
make the two probabilities above at most δ. This gives an(O(1),δ) -approximation of the number of
distinct items over the stream.
2.The BJKST Algorithm
By running Θ(log(1/δ)) independent copies in parallel and returning the medium of these outputs, the
BJKST algorithm(ε,δ)-approximates the F0-norm of the multiset S.
3.INDYK Algorithm
For any parametersε,δ, there is an algorithm(ε,δ)-approximates the number of distinct elements in a
turnstile stream. The algorithm needs O (ε−2 log n log(1/δ)) bits of space. The update time for every
coming item isO(ε−2log(1/δ)).
05
Frequency estimation
32
- Frequency estimation is to estimate the frequency of any item x, i.e.
the number of occurrences of any item x
- The basic setting is as follows :
- Let S be a multi-set, and is empty initially
- The data stream consists of a sequence of update operations to set S,
and each operation is one of the following three forms:
Three forms
performing the operation
S ← S ∪ {x};
INSERT DELETE
, performing the
operation S ← S  {x};
QUERY
querying the number of
occurrences of x in the
multiset S
Algorithm
- Count-min sketch : Count-Min Sketch for this frequency estimation problem.
- It consists of a fixed array C of counters of width w and depth d
- These counters are all initialized to be zero. Each row is associated to a pairwise hash function hi
, where each hi maps an element from U to {1, . . . , w}.
34
Algorithm
1: d = [log(1/δ)]
2: w = [e/ε]
3: while an operation arrives do
4: if Insert(S, x) then
5: for j ← 1, d do
6: C[j, hj (x)] ← C[j, hj (x)] + 1
7: if Delete(S, x) then
8: for j ← 1, d do
9: C[ j, hj (x)] ← C[ j, hj (x)] − 1
10: if the number of occurrence of x is asked then
11: Return mx = min1≤j≤d C[j, hj (x)]
35
Where ε is confidence
parameter and δ is
approximation parameter
Choosing W and d
- For given parameters ε and δ, the width and height of Count-Min sketch is set to be w =[e/ε] and
d =[ln(1/δ)].
- Hence for constant ε and δ, the sketch only consists of constant number of counters.
- Note that the size of the Count-Min sketch only depends on the accuracy of the approximation,
and independent of the size of the universe.
36
06
Research papers Referenced:
1. https://inst.eecs.berkeley.edu/~cs170/fa18/assets/streaming-170.pdf
2. https://www.cs.dartmouth.edu/~ac/Teach/CS35-Spring20/Notes/lecnotes.pdf
3. https://resources.mpi-inf.mpg.de/departments/d1/teaching/ss14/gitcs/notes3.pdf
4. https://people.seas.harvard.edu/~minilek/publications/papers/xrds.pdf
5. https://www.quantamagazine.org/best-ever-algorithm-found-for-huge-streams-
of-data-20171024/
CREDITS: This presentation template was created
by Slidesgo, including icons by Flaticon, and
infographics & images by Freepik.
THANKS!
Our team:
1.Aryan Singh(18070124017)
2.Hridyesh Singh Bisht(18070124030)
3.Kavya Suthar(18070124037)
4.Sejal Shrestha(18070124064)

Weitere ähnliche Inhalte

Was ist angesagt?

Information and data security cryptographic hash functions
Information and data security cryptographic hash functionsInformation and data security cryptographic hash functions
Information and data security cryptographic hash functionsMazin Alwaaly
 
Hash Techniques in Cryptography
Hash Techniques in CryptographyHash Techniques in Cryptography
Hash Techniques in CryptographyBasudev Saha
 
Message authentication with md5
Message authentication with md5Message authentication with md5
Message authentication with md5志璿 楊
 
The SHA Hashing Algorithm
The SHA Hashing AlgorithmThe SHA Hashing Algorithm
The SHA Hashing AlgorithmBob Landstrom
 
Hash& mac algorithms
Hash& mac algorithmsHash& mac algorithms
Hash& mac algorithmsHarry Potter
 
Count based Secured Hash Algorithm.
Count based Secured Hash Algorithm.Count based Secured Hash Algorithm.
Count based Secured Hash Algorithm.IOSR Journals
 
Hashing Algorithm: MD5
Hashing Algorithm: MD5Hashing Algorithm: MD5
Hashing Algorithm: MD5ijsrd.com
 
Hash Function & Analysis
Hash Function & AnalysisHash Function & Analysis
Hash Function & AnalysisPawandeep Kaur
 
Hash functions MD5 and RIPEMD 160
Hash functions MD5 and RIPEMD 160Hash functions MD5 and RIPEMD 160
Hash functions MD5 and RIPEMD 160chutinhha
 
Public key cryptography and message authentication
Public key cryptography and message authenticationPublic key cryptography and message authentication
Public key cryptography and message authenticationCAS
 

Was ist angesagt? (17)

Information and data security cryptographic hash functions
Information and data security cryptographic hash functionsInformation and data security cryptographic hash functions
Information and data security cryptographic hash functions
 
Secure hashing algorithm
Secure hashing algorithmSecure hashing algorithm
Secure hashing algorithm
 
Hash Techniques in Cryptography
Hash Techniques in CryptographyHash Techniques in Cryptography
Hash Techniques in Cryptography
 
Message authentication with md5
Message authentication with md5Message authentication with md5
Message authentication with md5
 
The SHA Hashing Algorithm
The SHA Hashing AlgorithmThe SHA Hashing Algorithm
The SHA Hashing Algorithm
 
Message digest 5
Message digest 5Message digest 5
Message digest 5
 
Hash& mac algorithms
Hash& mac algorithmsHash& mac algorithms
Hash& mac algorithms
 
Count based Secured Hash Algorithm.
Count based Secured Hash Algorithm.Count based Secured Hash Algorithm.
Count based Secured Hash Algorithm.
 
Cryptographic hash function md5
Cryptographic hash function md5Cryptographic hash function md5
Cryptographic hash function md5
 
Hashing Algorithm: MD5
Hashing Algorithm: MD5Hashing Algorithm: MD5
Hashing Algorithm: MD5
 
Hash function
Hash functionHash function
Hash function
 
Hash Function & Analysis
Hash Function & AnalysisHash Function & Analysis
Hash Function & Analysis
 
A technical writing on cryptographic hash function md5
A technical writing on cryptographic hash function md5A technical writing on cryptographic hash function md5
A technical writing on cryptographic hash function md5
 
Hash functions MD5 and RIPEMD 160
Hash functions MD5 and RIPEMD 160Hash functions MD5 and RIPEMD 160
Hash functions MD5 and RIPEMD 160
 
Renas Rajab Asaad
Renas Rajab AsaadRenas Rajab Asaad
Renas Rajab Asaad
 
Md5
Md5Md5
Md5
 
Public key cryptography and message authentication
Public key cryptography and message authenticationPublic key cryptography and message authentication
Public key cryptography and message authentication
 

Ähnlich wie Data streaming algorithms

Implementation of Bitcoin Miner on SW and HW
Implementation of Bitcoin Miner on SW and HWImplementation of Bitcoin Miner on SW and HW
Implementation of Bitcoin Miner on SW and HWJoe Jiang
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithmsYoung Alista
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithmsJames Wong
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithmsTony Nguyen
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithmsFraboni Ec
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithmsDavid Hoen
 
IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2...
IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2...IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2...
IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2...IRJET Journal
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
MD5 ALGORITHM.pptx
MD5 ALGORITHM.pptxMD5 ALGORITHM.pptx
MD5 ALGORITHM.pptxRajapriya82
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicIJORCS
 
Survey of Hybrid Encryption Algorithm for Mobile Communication
Survey of Hybrid Encryption Algorithm for Mobile CommunicationSurvey of Hybrid Encryption Algorithm for Mobile Communication
Survey of Hybrid Encryption Algorithm for Mobile Communicationijsrd.com
 
Implementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codesImplementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codesKhadidja BOUKREDIMI
 
Lesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmeticLesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmeticPVS-Studio
 

Ähnlich wie Data streaming algorithms (20)

Sha
ShaSha
Sha
 
Implementation of Bitcoin Miner on SW and HW
Implementation of Bitcoin Miner on SW and HWImplementation of Bitcoin Miner on SW and HW
Implementation of Bitcoin Miner on SW and HW
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithms
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithms
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithms
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithms
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithms
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithms
 
IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2...
IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2...IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2...
IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2...
 
Unit-3.pdf
Unit-3.pdfUnit-3.pdf
Unit-3.pdf
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
SHA
SHASHA
SHA
 
MD5 ALGORITHM.pptx
MD5 ALGORITHM.pptxMD5 ALGORITHM.pptx
MD5 ALGORITHM.pptx
 
ieee paper
ieee paper ieee paper
ieee paper
 
Enhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State LogicEnhancement of DES Algorithm with Multi State Logic
Enhancement of DES Algorithm with Multi State Logic
 
Survey of Hybrid Encryption Algorithm for Mobile Communication
Survey of Hybrid Encryption Algorithm for Mobile CommunicationSurvey of Hybrid Encryption Algorithm for Mobile Communication
Survey of Hybrid Encryption Algorithm for Mobile Communication
 
27-SHA1.ppt
27-SHA1.ppt27-SHA1.ppt
27-SHA1.ppt
 
Implementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codesImplementation of rainbow tables to crack md5 codes
Implementation of rainbow tables to crack md5 codes
 
Lesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmeticLesson 13. Pattern 5. Address arithmetic
Lesson 13. Pattern 5. Address arithmetic
 
Cns
CnsCns
Cns
 

Mehr von Hridyesh Bisht

Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoHridyesh Bisht
 
Distributed Systems for Blockchain using Cloud
Distributed  Systems for Blockchain  using CloudDistributed  Systems for Blockchain  using Cloud
Distributed Systems for Blockchain using CloudHridyesh Bisht
 
Bit by bit into data structures
Bit by bit into data structuresBit by bit into data structures
Bit by bit into data structuresHridyesh Bisht
 

Mehr von Hridyesh Bisht (6)

Tech-Writing-101
Tech-Writing-101Tech-Writing-101
Tech-Writing-101
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Distributed Systems for Blockchain using Cloud
Distributed  Systems for Blockchain  using CloudDistributed  Systems for Blockchain  using Cloud
Distributed Systems for Blockchain using Cloud
 
Intro to web dev
Intro to web devIntro to web dev
Intro to web dev
 
Bit by bit into data structures
Bit by bit into data structuresBit by bit into data structures
Bit by bit into data structures
 
Do you git it
Do you git it Do you git it
Do you git it
 

Kürzlich hochgeladen

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 

Kürzlich hochgeladen (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 

Data streaming algorithms

  • 2. TABLE OF CONTENTS 01 Approximate counting Algorithm Algorithm What do you mean by an algorithm Hashing What do you mean by hashing, along with algorithms for hashing Allows counting of large numbers of events using low memory Counting Distinct Elements We input stream of data and output are distinct elements in the data stream. Frequency estimation Frequency estimation is to estimate the frequency of any item x, i.e. the number of occurrences of any item x References All the research papers referred 02 03 04 05 06
  • 3. 01
  • 4. What is an Algorithm? An algorithm is a set of instructions that produces an output or a result. It tells the system what to do in order to achieve the desired result. It may not know what the result is beforehand, but it knows that it wants one. 4
  • 5. 02
  • 6. Hashing is a sort of algorithm that takes information of any size and changes over it into information of settled size. The principle contrast between hashing and encryption is that a hash is irreversible. Hashing is most commonly used to implement hash tables. A hash table stores key/value pairs in the form of a list where any element can be accessed using its index. Hashing is also used in data encryption. Passwords can be stored in the form of their hashes so that even if a database is breached, plaintext passwords are not accessible. MD5, SHA-1 and SHA-2 are popular cryptographic hashes.
  • 7. Hashing algorithms are functions that generate a fixed-length result (the hash, or hash value) from a given input. The hash value is a summary of the original data. Definition: A hash function is a function h: D -> R, where the domain D = {0,1}* and R = {0,1}n for some n >= 1 DESCRIPTION OF A HASH FUNCTION In general, hash functions work as follows: ● The input message is divided into blocks. ● Then the hash for the first block, a value with a fixed size, is calculated for the first block. ● Then, the hash for the second block is obtained and added to the previous output. ● This process is repeated until all blocks are calculated.
  • 8. 8 ● Unique Hash value ● Hashing Speed ● Secure hash ● Hash functions are widely used in IT. ● We can use them for digital signatures, message authentication codes (MACs), and other forms of authentication. ● We can also use them for indexing data in hash tables, for fingerprinting, identifying files, detecting duplicates or as checksums (we can detect if a sent file didn’t suffer accidental or intentional data corruption). ● We can also use them for password storage.
  • 9. Some Hashing Algorithm: 1. MD5 2. SHA-1 3. SHA-2 4. SHA-3 9 Hash Algorithms Comparisons
  • 10. 10 Step1: //Define r as the following var int[64] r, k r[ 0..15] := {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22} r[16..31] := {5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20} r[32..47] := {4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23} r[48..63] := {6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21} Step 2: //Use binary integer part of the sines of integers as constants: for i from 0 to 63 k[i] := floor(abs(sin(i + 1)) × 2^32) //Initialize variables: var int h0 := 0x67452301 var int h1 := 0xEFCDAB89 var int h2 := 0x98BADCFE var int h3 := 0x10325476 Step3: //Pre-processing: append "1" bit to message append "0" bits until message length in bits ≡ 448 (mod 512) append bit length of message as 64-bit little-endian integer to message //Process the message in successive 512-bit chunks: for each 512-bit chunk of message break chunk into sixteen 32-bit little-endian words w(i), 0 ≤ i ≤ 15 Step 4: //Initialize hash value for this chunk: var int a := h0 var int b := h1 var int c := h2 var int d := h3 1. MD-5 PSEUDOCODE Step 5: //Main loop: for i from 0 to 63 if 0 ≤ i ≤ 15 then f := (b and c) or ((not b) and d) g := i else if 16 ≤ i ≤ 31 f := (d and b) or ((not d) and c) g := (5×i + 1) mod 16 else if 32 ≤ i ≤ 47 f := b xor c xor d g := (3×i + 5) mod 16 else if 48 ≤ i ≤ 63 f := c xor (b or (not d)) g := (7×i) mod 16 temp := d d := c c := b b := ((a + f + k(i) + w(g)) leftrotate r(i)) + b a := temp Step 6: //Add this chunk's hash to result so far: h0 := h0 + a h1 := h1 + b h2 := h2 + c h3 := h3 + d var int digest := h0 append h1 append h2 append h3 //(expressed as little-endian)
  • 11. 11 2. SHA-1 PSEUDOCODE Step1: initialize all the variables ml = message length in bits (always a multiple of the number of bits in a character). Step2: Pre-processing: append the bit '1' to the message i.e. by adding 0x80 if characters are 8 bits. Step 3: Process the message in successive 512-bit chunks: break message into 512-bit chunks for each chunk break chunk into sixteen 32-bit big-endian words w[i], 0 ≤ i ≤ 15 Step 4 : Extend the sixteen 32-bit words into eighty 32-bit words: for i from 16 to 79 w[i] = (w[i-3] xor w[i-8] xor w[i-14] xor w[i-16]) leftrotate 1 Step 5: Initialize hash value for this chunk: Main loop: for i from 0 to 79 if 0 ≤ i ≤ 19 then f = (b and c) or ((not b) and d) k = 0x5A827999 else if 20 ≤ i ≤ 39 f = b xor c xor d k = 0x6ED9EBA1 else if 40 ≤ i ≤ 59 f = (b and c) or (b and d) or (c and d) k = 0x8F1BBCDC else if 60 ≤ i ≤ 79 f = b xor c xor d k = 0xCA62C1D6 temp = (a leftrotate 5) + f + e + k + w[i] e = d d = c c = b leftrotate 30 b = a a = temp Step 6: Add this chunk's hash to result so far: h0 = h0 + a h1 = h1 + b h2 = h2 + c h3 = h3 + d h4 = h4 + e Step 7: Produce the final hash value (big-endian) as a 160 bit number
  • 12. 12 Step 1: Initialize hash values: first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19 Step 2: Initialize array of round constants: first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311 Step 3: Pre-processing: append the bit '1' to the message append k bits '0', where k is the minimum number >= 0 such that the resulting message length (modulo 512 in bits) is 448. append length of message (without the '1' bit or padding), in bits, as 64-bit big-endian integer (this will make the entire post-processed length a multiple of 512 bits) Step 4: Process the message in successive 512-bit chunks: break message into 512-bit chunks for each chunk create a 64-entry message schedule array w[0..63] of 32- bit words Step 5: Extend the first 16 words into the remaining 48 words w[16..63] of the message schedule array: for i from 16 to 63 s0 := (w[i-15] rightrotate 7) xor (w[i-15] rightrotate 18) xor (w[i-15] rightshift 3) s1 := (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] rightshift 10) w[i] := w[i-16] + s0 + w[i-7] + s1 Step 6: Initialize working variables to current hash value Compression function main loop: for i from 0 to 63 S1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25) ch := (e and f) xor ((not e) and g) temp1 := h + S1 + ch + k[i] + w[i] S0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22) maj := (a and b) xor (a and c) xor (b and c) temp2 := S0 + maj h := g g := f f := e e := d + temp1 d := c c := b b := a a := temp1 + temp2 Step 7: Add the compressed chunk to the current hash value Step 8: Produce the final hash value (big-endian): digest := hash := h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7. SHA-224 is identical to SHA-256[11], except that: the initial hash values h0 through h7 are different, and the output is constructed by omitting h7. 3. SHA-2 PSEUDOCODE
  • 13. Algorithms and Limitations 13 Sr No Hashing Algorithms Limitations 1 SHA-1 This requires a lot of computing power and resources 2 SHA-2 Increased resistance to collision means SHA256 and SHA512 produce longer outputs (256b and 512b respectively) than SHA1 (160b). Those defending use of SHA2 cite this increased output size as reason behind attack resistance. 3 SHA-3 SHA-3 is designed to be a good hash-function, not a good password-hashing-scheme (PHS), whereas bcrypt is designed to be a PHS and was analyzed in this direction as well. 4 MD5 Using salted md5 for passwords is a bad idea. Not because of MD5's cryptographic weaknesses, but because it's fast. This means that an attacker can try billions of candidate passwords per second on a single GPU.
  • 14. 03
  • 15. Overview ● The Approximate counting algorithm also known as the morris algorithm allows counting of large numbers of events using low memory ● Invented by robert morris it uses probabilistic counting to increment the counter ● This algorithm is considered one of the precursors of the current data streaming algorithms. ● The basic idea is to track log n instead of n and use log log n bits instead of log n bits 15
  • 16. Origin ● The Approximate counting algorithm also known as the morris algorithm allows counting of large numbers of events using low memory ● Invented by robert morris it uses probabilistic counting to increment the counter ● This algorithm is considered one of the precursors of the current data streaming algorithms. ● The basic idea is to track log n instead of n and use log log n bits instead of log n bits ● The space complexity of this technique is O(log log n) 16
  • 17. Working of the Algorithm We need log2 n bits to store an integer between 1 and n else two integers would map to the same bitstring and be indistinguishable. But what if we only care about recovering the integer up to a constant factor then it suffices to only recover log n, and storing log n only requires O(log log n) bits. Consider the streaming problem there is a stream of n increments. We would like to compute n, though approximately, and with some potential small probability of failure. We could keep an explicit counter in memory and increment it after each stream update, but that would require log2 n bits. Morris’ clever algorithm works as follows: initialize a counter c to 1, and after each update increment c with probability 1/2 c and do nothing otherwise. Philippe Flajolet showed that the expected value of 2c is n + 2 after n updates , and thus 2c −2 is an unbiased estimator of n. 17
  • 18. Applications ● The algorithm is useful in examining large data streams for patterns. ● It is particularly useful in applications of data compression ● Sight and sound recognition ● Artificial intelligence applications. 18
  • 19. 19 Morris ‘s Counter : 1. Init(): a. C <-0 2. Update(item) a. Increment c with probability 2 ^ -c b. And do nothing with probability 1 - 2 ^-c 3. Query(): a. Return 2 ^c -1
  • 20. 04 We input stream of data and output are distinct elements in the data stream. For example, count the number of distinct of IP address you encounter.
  • 21. What do you mean by Counting distinct Elements? Our first problem is to approximate the Fp-norm of items in a stream. Fp-norm: Let S be a multi-set, where every item i of S is in [N]. Let mi be the number of occurrences of item i in set S. Then the Fp-norm of set S is defined by, Where 0^P is set to be 0. By definition, the F0-norm of set S is the number of distinct items in S, and the F1-norm of S is the number of items in S.
  • 22. Problem Statement : Let S be a data stream representing a multi set S. Items of S arrive consecutively and every items i ∈ [n]. Design a streaming algorithm to (ε,δ) approximate the F0-norm of set S. Where ε is confidence parameter and δ is approximation parameter To solve this problem statement, we can implement 3 different algorithms, 1.The AMS Algorithm(Primitive) 2.The BJKST Algorithm(basic) 3.Indyk Algorithm(advanced)
  • 23. 1.The AMS Algorithm This algorithm for approximating F0 is by Noga Alon, Yossi Matias, and Mario Szegedy. Assume that we have seen sufficiently many numbers,and these numbers are uniformly distributed. We look at the binary expression Binary(x) of every item x, and we expect that for one out of d distinct items Binary(x) ends with d consecutive zeros. More generally, let
  • 24. be the number of zeros that Binary(x) ends with, and we have the following observation: 1. Ifρ(x) = 1 for any x, then it is likely that the number of distinct integers is 2^1= 2. 2. Ifρ(x) = 2 for any x, then it is likely that the number of distinct integers is 2^2= 4. 3. Ifρ(x) = 3 for any x, then it is likely that the number of distinct integers is 2^3= 8. 4. Ifρ(x) =r for any x, then it is likely that the number of distinct integers is2r. To implement this idea, we use a hash function h so that, after applying h, all items in S are uniformly distributed, and on average one out of F0 distinct numbers hit ρ(h(x)) ≥ logF0. Hence the maximum value of ρ(h(x)) over all items x in the stream could give us a good approximation of the number of distinct items.
  • 25. An Algorithm For Approximating F0: 1. Choose a random function h: [n]→[n] from a family of pairwise independent hash functions; 2. Z←0; 3. While an item x arrives do a. ifρ(h(x))> z then i. z←ρ(h(x)); 4. Return 2z+½
  • 26. 2.The BJKST Algorithm Our second algorithm for approximating F0 is a simplified version of the algorithm by Bar- Yossef et al. In contrast to the AMS algorithm, the BJKST algorithm uses a set to keep the sampled items. The basic idea behind the sampling scheme of the BJKST algorithm is as follows: 1. Let B be a set that is used to retain sampled items, and B=∅ initially. The size of B is O(1/ε2) and only depends on approximation parameter ε. 2. The initial sampling probability is 1, i.e. the algorithm keeps all items seen so far in B. 3. When the set B becomes full, shrink B by removing about half items and from then on the sample probability becomes smaller 4. In the end the number of items in B and the current sampling probability are used to approximate the F0-norm
  • 27. The BJKST Algorithm (Simplified Version) 1. Choose a random function h: [n]→[n] from a family of pairwise independent hash functions. 2. Z←0 //Z is the index of the current level 3. B←∅ //Set B keeps sampled items 4. While an item x arrives do a. ifρ(h(x))≥z then i. B←B∪{(x,ρ(h(x)))}7 ii. while|B|≥c/ε2 do //Set B becomes full 1. z←z+ 1 //Increase the level 2. Shrink B by removing all (x,ρ(h(x))) with ρ(h(x))< z 5. Return |B|·2z
  • 28. 3.INDYK Algorithm We next show that F0-norm of a set Scan be estimated in dynamic streams.This algorithm, due to Piotr Indyk, presents beautiful applications of the so-called stable distributions in designing streaming algorithms. Let S be a stream consisting of pairs of the form (si,Ui), where si∈[n] and Ui= +/− represents dynamic changes of si. Design a data streaming algorithm that, for any ε and δ, (ε,δ)-approximates the F0-norm of S. Assume that every item in the stream is in [n], and we want to achieve an (ε,δ)- approximation of the Fp-norm. Let us further assume that we have matrix M of k=Θ(ε−2log(1/δ)) rows and n columns, where every item in M is a random variable drawn from a p-stable distribution, generated by (BJKST). Given matrix M, (Indyk) keeps a vector z ∈ Rk which can be expressed by a linear combination of columns of matrix M
  • 29. The F0 norm of multi-set S can be approximated by Indyk Algorithm for choosing sufficiently small p, assuming that we have an upper bound K of the number of occurrences of every item in the stream. Approximating Fp-norm in a Turnstile Stream (An Idealized Algorithm): 1. While 1≤i≤k do 2. Zi←0 3. While an operation arrives do a. If item j is added then b. For i←1,k do i. zi←zi+M[i,j] c. If item j is deleted then i. For i←1,k do 1. zi←zi−M[i,j] d. If Fp-norm is asked then i. Return medium1≤i≤k{|zi|p}·scalefactor(p) This idealized algorithm relies on matrix M of size k×n, and for every occurrence of item i, the algorithm needs the i th column of matrix M
  • 30. Complexity of Various Algorithms: 1.The AMS Algorithm Running k = Θ(log(1/δ)) independent copies of Algorithm above and returning the median value, we can make the two probabilities above at most δ. This gives an(O(1),δ) -approximation of the number of distinct items over the stream. 2.The BJKST Algorithm By running Θ(log(1/δ)) independent copies in parallel and returning the medium of these outputs, the BJKST algorithm(ε,δ)-approximates the F0-norm of the multiset S. 3.INDYK Algorithm For any parametersε,δ, there is an algorithm(ε,δ)-approximates the number of distinct elements in a turnstile stream. The algorithm needs O (ε−2 log n log(1/δ)) bits of space. The update time for every coming item isO(ε−2log(1/δ)).
  • 31. 05
  • 32. Frequency estimation 32 - Frequency estimation is to estimate the frequency of any item x, i.e. the number of occurrences of any item x - The basic setting is as follows : - Let S be a multi-set, and is empty initially - The data stream consists of a sequence of update operations to set S, and each operation is one of the following three forms:
  • 33. Three forms performing the operation S ← S ∪ {x}; INSERT DELETE , performing the operation S ← S {x}; QUERY querying the number of occurrences of x in the multiset S
  • 34. Algorithm - Count-min sketch : Count-Min Sketch for this frequency estimation problem. - It consists of a fixed array C of counters of width w and depth d - These counters are all initialized to be zero. Each row is associated to a pairwise hash function hi , where each hi maps an element from U to {1, . . . , w}. 34
  • 35. Algorithm 1: d = [log(1/δ)] 2: w = [e/ε] 3: while an operation arrives do 4: if Insert(S, x) then 5: for j ← 1, d do 6: C[j, hj (x)] ← C[j, hj (x)] + 1 7: if Delete(S, x) then 8: for j ← 1, d do 9: C[ j, hj (x)] ← C[ j, hj (x)] − 1 10: if the number of occurrence of x is asked then 11: Return mx = min1≤j≤d C[j, hj (x)] 35 Where ε is confidence parameter and δ is approximation parameter
  • 36. Choosing W and d - For given parameters ε and δ, the width and height of Count-Min sketch is set to be w =[e/ε] and d =[ln(1/δ)]. - Hence for constant ε and δ, the sketch only consists of constant number of counters. - Note that the size of the Count-Min sketch only depends on the accuracy of the approximation, and independent of the size of the universe. 36
  • 37. 06
  • 38. Research papers Referenced: 1. https://inst.eecs.berkeley.edu/~cs170/fa18/assets/streaming-170.pdf 2. https://www.cs.dartmouth.edu/~ac/Teach/CS35-Spring20/Notes/lecnotes.pdf 3. https://resources.mpi-inf.mpg.de/departments/d1/teaching/ss14/gitcs/notes3.pdf 4. https://people.seas.harvard.edu/~minilek/publications/papers/xrds.pdf 5. https://www.quantamagazine.org/best-ever-algorithm-found-for-huge-streams- of-data-20171024/
  • 39. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik. THANKS! Our team: 1.Aryan Singh(18070124017) 2.Hridyesh Singh Bisht(18070124030) 3.Kavya Suthar(18070124037) 4.Sejal Shrestha(18070124064)