This document discusses 10 algorithms that have changed the future. It begins by defining what an algorithm is and how it is essential to how computers process data. It then provides examples of specific algorithms including search engine algorithms used by companies like Google to match and rank webpages, public key encryption algorithms for secure data transmission, error correcting codes to ensure accurate data transfer, and pattern recognition algorithms enabling applications like facial recognition. The document emphasizes that while algorithms may seem complex, they are based on precise sets of rules and have revolutionized fields like internet search and security.
2. ALGORITHM-1
Algorithm‘ stems from the Latin word “Algoritmi” meaning “Calculation”.
An informal definition now commonly used in mathematics and
computer science is considered as “A set of rules that precisely defines
a sequence of operations." which include both arithmetic calculation
and all the computer programing.
No longer it is just a calculation, but rather, algorithm can also
understood as a “self-contained step-by-step set of operations to be
performed”. Another words, algorithms are essential the way
computers process data. Many computer programs contain algorithms
that detail the specific instructions as to what computer should perform
(in a specific order) in order to carry out a specified task
Johnson Chen, April, 2015
2
An example of Algorithm
• **Now a days, “Algorithm” is almost synonym
with computer programming, thus it covers all
the computer performed calculation, data
processing, indexing and automated
reasoning……
3. ALGORITHM as a precise mechanical recipe
Johnson Chen, April, 2015
3
5. Search Engine and the internet
Johnson Chen, April, 2015 5
Many people wonder just what exactly are search engines and how have
they made such a huge impact on the internet? It is true that without
search engines, using the internet would be so much difficult, troublesome,
and people would never have been able to achieve all that they can from
the internet today, if they did not have search engines. Search engines
truly have made the internet a lot more convenient and more fun to use,
which otherwise would never had gotten so popular among everyone.
Without realization, actually every search result is like looking for a needle
in the vast knowledge hay stacks. It relies on algorithms (computer
programing and instruction) behind the scene to find what we want in just
a click!
And leading search engine companies (i.e. Googles, Yahoo, Bing….)
spend millions of dollars on server hardware, and invest in best brains to
refine their algorithm over their competitors.
This competitive edge builds on accurate key word search, and also
techniques in matching and ranking data and webpages in the internet
6. Search Engines before Google
6
• Search Engine giant Google was not the first in the internet business. It was developed by two Stanford
graduates students, Larry Page and Sergey Brin in 1998. Before that, there were other search engine, provide
some basic search result, but based on less sophisticated algorithm, and did not have a cash generating
business model.
Lycos, 1994 Excites!, 1994
Infoseek, 1994
Alta Vista, 1995 Yahoo!, 1996
Google, 1998
Johnson Chen, April, 2015
7. 1. Key Word Search, Matching & Ranking
The internet search engine all started out with the simple
concept of “Key Word Search”, where an user input in words
most related to the subject or information he/she is looking for.
Obviously, the search engine will draw from the vast internet
million, million of search results which “matched” the same or
similar words the user typed in, and hopefully also “rank” and
display them in the orders which most closely resemble the
intention or the information user is looing for.
Johnson Chen, April, 2015 7
• Good search engine, with sophisticated algorithm and computer operation will not only best
match the search result, but also efficiently list them in the ranking order best suitable for
the users. Some search engine will also use this technology, best match advertisers
messages or hyperlinks to the search results, and draw income from the advertisers.
• The quality and efficiency of the ranking system determines the life and death of a search
engine. Currently, Google is the most dominate one, with over 70%+ of the market share,
follow by Yahoo (formerly Alta Vista) and Bing (MSN Live Search).
8. How A Search Engine Works?-1
Johnson Chen, April, 2015
8
1.
2.
3.
INQUIRY
MATCHING
& PARING
QUALIFIED /MATCHED
SEARCH RESULTS
RANKING
WEB PAGE
DISPLAYED
9. How A Search Engine Works?-2
Johnson Chen, April, 2015
9
Spiders are search
and verification
programs- Algorithm,
Patented by Google
10. 2. Indexing in Search Engine-1
The concept of indexing derived from book reading, where almost all books will list key words in
index format for reader to conveniently and quickly refer back or find the references in the
book..… A typical index looks like, for example,
-Internet Algorithm, 2, 15, 33, 45, 200
Indicating the word “Internet Algorithm” or reference for the word can be found in page 2, 15,
33, 45 and 200 in the book.
Johnson Chen, April, 2015 10
• Search engine uses the same principle like the book index, assigning a
number to each word on each web page matched the key word. The
powerful calculation capability by the computer enable these words in
web pages can quickly be assigned to a value and be indexed in a
meaningful way that best matched the word searched.
11. 2. Indexing in Search Engine-2
Johnson Chen, April, 2015 11
• Supposed that the current WWW (world wide web) now only has three web pages, 1, 2 and 3,
each with the content as follows:
The Cat Sat On The
Ground
The Dog is On The
Ground, But the Cat
is Inside
The Dog Sat On The
Ground
1. 2. 3.
• The search engine will first make simple index assigning each word appeared in the page to
the respective page number and their “Location” as follows:
• The 1-1, 1-5, 2-1, 2-5, 3-1, 3-5, 3-8
• Cat 1-2, 3-9
• Dog 2-1, 3-2
• Sat 1-3, 2-3
• Ground 1-6, 2-6
• On 1-4, 2-4,3-4
• Inside 3-10
• But 3-7
• Is 3-3
• Search engine then can quickly match
the key word searched, and rank them
quickly in the order of most appeared,
assuming this should be the most
relevant search result
12. 3. Metaword Trick
In HTML language format web page programming, there is command clearly distinguish page title, and
the body of content. The title is also known as “Metaword”. Early search engines, like Alta Vista type,
realized this separation web page title and body of content in the HTML computer language, and
developed “Metaword Trick” technique, which provided most accurate search result at that time.
“Metaword” means search for key words in webpage “TITLE”, instead of individual words in the entire
web page contents, as we know that in English language, the most appeared or commonly found word
would probably be something like “THE”, “IN” or “ON”, instead of the key word which user is looking for.
Fortunately, the title of web page is marked by computer command <title> and </title>, so it
automatically cut out a lot of work for computer to perform with this trick and at the same time raise the
quality and accuracy of the search.
And “Metaword Trick” is what made Alta Vista successful in the 1990’s before Google came into play.
Johnson Chen, April, 2015 12
My Cat
The Cat Sat On
The Ground
My Pets
The Dog is On The
Ground, But the Cat is
Inside
My Dog
The Dog Sat On
The Ground
1. 2. 3.
Metaword
13. 4. Page Rank by Google
One of the most revolutionary idea on refining search was proposed
by the founder of Google, Larry Page and Sergey Brin in 1998, in a
paper published titled “The Anatomy of a Large Scale Hyper textual
Web Search Engine”.
Larry Page and Sergey Brin proposed using “Hyperlinks” (in stead of
basic word search or Metaword Trick) in deciding ranking the search
results. And their algorithm, which is the foundation of Google’s
technology, is called “Page Rank”.
Google not only index web pages, but also examining and counting
the hyperlinks that is related to that particular webpage and give it a
score, based on the number of hyperlinks and the authority of the
linked source (Authority Score).
And the greater the score for a particular webpage, will rank higher
than other pages with similar contents. And this is certainly a much
more refined search results than other search engine had provided.
Johnson Chen, April, 2015 13
14. 4. Page Rank and Random Surfer Trick
To refine the web search result and ranking further,
Googles dispatched out on regular basis Web Crawlers to
systematically examine the newly posted web pages
everyday! A Web crawler does not have a physical bug
shape, but is actually an Internet algorithm
program, which systematically browses the World Wide
Web, typically for the purpose of Web indexing. And how
Google’s Crawlers select, work, check or give a score for
in a web page?, That remain the top trade secret for
Google still…..
Johnson Chen, April, 2015 14
• These algorithm crawlers are dispatched randomly based on a statistically significant
calculation and percentage which build from experience, thus is also called “Random Surfer
Trick”. Web search engines and some other sites use Web crawling or spidering software
(Algorithm) to update their web content or indexes of others sites' web content. Web crawlers
can copy all the pages they visit for later processing by a search engine which indexes the
downloaded pages so the users can search much more efficiently.
15. 5. Public Key Encryptography
Everyday, thousands of credit card and personal information
exchanged across servers in the internet, across the borders and
continents. To protect the personal and financial data being sent
and flow in the internet not to be intercepted by hackers and can
safely, accurately delivered to the recipients (i.e. banks, sellers on
eBays, Amazon.com …), the messages must be encrypted to
protect its security.
Computer Encryptography involved in transmitting data is based
on a simple concept of “Commonly Shared Secret” mechanism.
Sender and Recipient shared a common secret. And the data to
be send is an algorithm of that shared secret.
Johnson Chen, April, 2015 15
• The algorithm involved in the encryptography is usually not that complicated, even today! Most
likely be some kind of simple arithmetic operation. However, since the computer number is now in
128bit string format (38 decimal places) , so that to crack and tried out all the possibilities now
become more and more difficult. Further more, there is another “Block Cipher” techniques, which
breaks down the 128 bit string format into blocks, and each blocks are given different sets of
encryptions to make the data even safer……
16. 5. Public Key Encryptography
Johnson Chen, April, 2015
16
Diffie-Hellman Key Exchange Protocol Algorithm
Potential 3rd Party Hackers
Bob
7
Common Numbers
Public Known
Bob’s Common Number
computation result
Laurel’s Common Number
computation result
28
(4x7)
42
(6x7)
6
4
168
(4x6x7)
168
(4x6x7)
The “secret key” in this diagram is just a simple
multiplication, but we can changed to more
complicated algorithm, like exponentials……
Common Secret
Number
The 3rd party observers,
sees 7, 28, 42, 4 and 6
and can not figure out
their relationships
Mary
17. 6. Error Correcting Code-1
Every computer has three main basic functions to be performed:
1. Calculating/computation the data
2. Storing the data
3. Transferring the data
• Without the storing and transferring function in the hard disk or RAM, the
computer is basically just a simple desk top calculator that gives only one result at
a time, and can not be used for producing complicated financial report or analysis.
• However, there is a big challenge in making sure the data we keyed in are correct
and complete every time you retrieve or transfer.
• This is particularly important , when dealing with bank or financial accounts,
passwords, e-mail boxes and other important sensitive, personal information. And
even 99.9999% accuracy is still not enough. It has to be 100% correct all the time!
Johnson Chen, April, 2015 17
18. 6. Repetition Trick
Johnson Chen, April, 2015
18
• To make sure the data were transmitted correctly, computer
can perform repeated transmission steps called Repetition
Trick, to confirm the correct numbers. And this is done through
the work of both probability and algorithm operation.
Transmission 1 5 2 9 3 7 5
Transmission 2 5 2 1 3 7 5
Transmission 3 5 2 1 3 1 1
Transmission 4 5 4 4 3 7 5
Transmission 5 7 2 1 8 7 5
Most Common
Number
5 2 1 3 7 5
• Example: A bank is trying to credit $5,213.75 tax rebate to Laurel’s account, but it
also knew the computer has 20% of the chance to miss a digit each time it transmit
data, due to poor line quality, How can the bank make sure it credits properly
Laurel’s tax rebate with Repetition Trick?
19. 7. Redundancy Trick
Johnson Chen, April, 2015
19
• To ensure the accuracy of transmitted data,
computer scientists also used “Redundancy
Trick“ , sending more than the original data to
enhance the accuracy and reliability.
• The original data then is being extended to a
longer string of data, containing helpful but
perhaps redundant digits, but the conversion
from the extended data to original string
should be still an easy algorithm operation
• Using previous bank transfer of $5,213.75 tax rebate to Laurel example, the bank would
add extra string of words after the $5,213.75 data string as…
• $5,213.75 five two one three seven five ,
• with each digit with 20% probability of error or being hacked
• Even if the redundancy part of string looks like this: Fiqe kwo one thrxp sivpn fivq, We can still
easily judge that it means Five Two One Three Seven Five
20. 8. Check Sum Trick
Check Sum Trick is another technique which computer
programmer use to detect error in the data. The logic and
operation is simple.
Assuming we have a set of data 4, 6, 7, 5, 6
We then add up the numbers together
4+6+7+5+6= 28
We take the last digit of “8” from the sum 28, to
be the check sum
So the data now is being transmitted as 4, 6, 7, 5,
6 and (8)
Instead of previous repetition and redundancy, which may take
a long time, check sum seemed to be a simple and fast way to
know whether we have transferred the data correctly? Or
whether our data has been hacked or not?
Johnson Chen, April, 2015 20
21. 9. Pattern Recognition
“Pattern Recognition” perhaps is the hottest topic now in the computer algorithm and
latest convenient application to which can demonstrate huge capabilities of the modern
computers
Pattern Recognition is also part of artificial intelligence. Pattern Recognition is being
widely used in mobile device hand writing entry, voice recognition, retina recognition,
criminal finger prints, DNA tests, finding friends in the Facebook…….
Johnson Chen, April, 2015 21
• With an computer algorithm (Sets of rules) we can utilize
computer’s data process capabilities to provide the an
accurate guess of who might be in the Facebook pictures?
• Actually, no one taught computer how to recognize faces,
like how our brains functions, instead it is all about digits and
numbers. Each pictures, hand writing entry is divided into
squares of pixels, and being assigned values, the computer
then compare the values in its data bank to find the closest
match, or give ranks to many close matches for user to
decide
22. 10. Data Compression-1
Now a days, photo pictures, CD music, voice recording
and lots of other things are all digitalized, thanks to the
advancement of the computer technology. Once it is
digitalized, it means that pictures, CD music can be then
transfer via internet and share among friends and
families no matter where they are, or can be transferred
convenient into portable devices, computer hard disk
and our mobile phone!
Johnson Chen, April, 2015 22
We are probably all very familiar with MP3, JPEG format of files, or WinZip programs
which is a set of compressed data, or ways to compress data so we can send data across
the internet or store in our computer devices efficiently, without losing much of its quality
and authenticity.
There are two types of data compression (1) Lossless Compression and (2) Lossy
Compression. Lossless compression when unzipped, or decompressed, can return to the
original file, and is considered being the most ideal way to store or exchange the data.
23. 10. Data Compression-2
Johnson Chen, April, 2015 23
• Supposedly we have a string of 50 digit data like this
• AAAAAAAAAAAAAAAAAA BCBCBCBCBCBCBCBCAAAAAAADEFDEFDEF
• We can clearly find a pattern to it, and rewrite (Compress) the data string with this
• 18A 8BC 7A 3DEF , now the new description of the same data is only 12 digit long,
and it is almost a 75% compression. And this is the most basic trick called
“Run Length Encoding”.
• Further, the 7A portion of the data can further be re-written in a simple algorithm
as “Same as Before”, in order to compress further. This is commonly used in
WinZip program, and it is know as “Same as Earlier Trick” and the “Shorter
Symbol Trick“.
24. 10. Data Compression-3
• Johnson Chen, April, 2015 24
• Lossy Compression: “Leave It Out Trick“
• Converting from large megapixel picture to smaller
fewer megapixel, or smaller sizes for storage and
transportability, is a common application for lossy
compression. JPEG format of file is the prime example
of lossy compression.
• JPEG first cut out a photo into many 8x8 pixel squares,
and the number depended on the actual size of the
photo
• Black and white colors are assigned to 8 x 8 =64 bits of
numbers , and color photos are 8 x 8 x 8=512 bits.
• If a certain area of the pictures share the same color shade, we can then “Leave it out” by
deleting a row or column of digits. By deleting just a few rows of same color data in 64
(or 512 in colors) pixel squares in reality does not lose too much of the picture quality
and is still recognizable, but we get the benefit of the data compression.
25. 11. Database-1
Database is another significant application of the computer. Today, banks and
companies, and even government social security and welfare, relies on huge
database and the algorithm related to these data base to perform millions and
millions of function across the board, like mailing the tax rebate, on-line ordering,
register for schools and classes……
Johnson Chen, April, 2015 25
• The common Excel program is a prime example of database,
value and numbers are being assigned in a cell with the
respect to certain column and row tags. Each data value
only needs to be entered once.
• All the indexing, ranking, summation, arithmetic or
statistical computation can be performed with any column
or rows, and even data from other worksheets.
• With database, we can perform and process simultaneously
thousands of data at the same time, and knowing that the
data are all well protected in a cell, and the computation
result are always reliable without human error.
26. 11. Database-2 : Write Ahead Table
Johnson Chen, April, 2015 26
• To make a database work, we need not only ranking and indexing like previously
mentioned, but also a clear step by step algorithm setting out operation to be
performed, in a time sequence log format, which we called it “Write Ahead
Logging”.
• Using the bank example, where Lisa wants to write a $150 check but she kept
most of her funds in the interesting bearing saving account. And this is exactly
what happed at the bank database system.
Write Ahead Log
1 Begin Transaction
2 Change Lisa’s saving
balance to $650
3 Change Lisa‘s checking
balance to 250
4 End Transaction
Account Type Balance
Lisa Checking $100
Lisa Saving $800
Account Type Balance
Lisa Checking $250
Lisa Saving $650
Permanent
Database Change
If computer was down
for some reason along
the time line, the
complete transaction is
well and securely kept
27. 11. Database-3 : Prepare and Commit
Johnson Chen, April, 2015 27
1. The Database simultaneously prepare 2 sets of independent virtual table complete with
Write Ahead Log. And locked in all the data in the main data base
2. The database simulate and execute commends on the virtual database table and waiting
the results.
3. Once the feedback result is confirmed, the main database table un-lock and allows the
change
4. The main data base confirms the change by operation, and update to the rest of 2 virtual
database table. Operation ends.
Table A
Data
Locked
Table B
Table C
Table A
Data
Locked
Operation
operation
Table A
New
Data
New
Data
Prepare Step 2, 3 Commit Step 4
New
Data
Table B
Table C
28. 12. Digital Signature
A digital signature is a mathematical
scheme for demonstrating the authenticity of
a digital message or document. A valid
digital signature gives a recipient reason to
believe that the message was created by a
known sender, such that the sender cannot
deny having sent the message
(authentication and non-repudiation) and
that the message was not altered in transit.
Digital signatures are commonly used for
software distribution, financial transactions,
and in other cases where it is important to
detect forgery or tampering.
Johnson Chen, April, 2015 28
• And the “Public Key” is usually a sequence of numbers as result of an pre-determined
algorithm, very much liked the previous encryption mechanism
29. Appendix: Binary Number System
All the data (including letters, numbers) are stored and transferred between different
computers in a binary code format. It means we have to assumed to live in a world
where there is only two number, 0 and 1. And we shall convert all the numbers concepts
(i.e. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12,13, 14, 15……) in 0 and 1 form.
Is it difficult to switch to different number system, besides the decimals? Actually it is
not, we uses other numbering system in our daily life, besides the decimal (10) system.
For example, time and clock is based on 12, dozen (12) and Gross (144), Inches (16),
Feet (3)………. And we probably can all switch back and for and convert to other number
system without any problem.
29
• First of all, 0 and 1 do not change. Number 2 needs to 10,
three is 2+1, so it is 11, number four is 100, and number 5
is 4+1 so it is 101; number 6 is 4+2, so it is 101+1=110,
and number 7 is 6+1= 110+1=111; number 8 is two time
of 4, so it is 200 (or re-written as 1000) and number 9 is
1001, and number 10 is 1010…..all the number in fact can
be expressed with this binary number system, and this is
what all the computer and data is all based on today.