SlideShare ist ein Scribd-Unternehmen logo
1 von 21
구글을 지탱하는 기술
구글을 지탱하는 기술 – chapter1.ppt
1. First Appearance of Google
2. Main Concepts
3. Search Engine Structure
    - ‘s Roll
    - Back-end Structure
    - Index Structure
4. Total Structure
First Appearance of Google


• Why?
           Get useful results


• Who?
           Sergey Brin & Larry Page
Main Concepts



Hardware expands


Ranking Function
         – Page Rank
         – Anchor Text
         – Word
Search Engine Structure




                      Internet
    Search Engine
Search Engine
Structure



Search Server’s Roll



• 통신 관리                                 Back-
                       Search
                                Index
                       Server            end
• 요청 해석하여 처리할 내용 판단

• 인덱스에서 필요한 정보 찾아냄

• 결과를 편집해 이용자에게 보냄
Search Engine
Structure



Back-end’s Roll

• Crawling

     •Web page 수집해 오는 기술
                                                  Back-
                                 Search
                                          Index
                                 Server            end
     •많은 시간 -> 복수의 crawler 사용

     •수집한 것을 Repository에 보관


• Creating Index

     •Repository에 저장된 web page
     로 Index를 만들어 냄

     •구조분석, 단어처리, 링크 처리
      랭킹 등
Search Engine
Structure



Index’s Roll



• 주어진 Data를 안전하게 저장                             Back-
                               Search
                                        Index
                               Server            end
• 요청 받은 Data를 찾아냄

• Search Engine의 Data Base 역
할
Search Engine
Structure
Back-end Structure



Crawling

Web page 수집해 오는 기술



초기 Google 2400만개 Web Page 등록

초당 avg40page를 유지하기 위해선
동시에 수백 개의 download유지

-> 현재는??

구글 검색했을 때 3,070,000,000개 결과
Search Engine
Structure
Back-end Structure
                               URL
                              server
                                                     crawler
Crawler

                                          crawler
URL server 가 전체 crawler 지휘

각 crawler는 지시에 따라             crawler
                                                           Internet
Web Page download

Repository에 임시 저장

• docID – 고유 숫자 값
                                        Repository
• url  – URL
• text – 압축물
• etc. – date, page length…
Search Engine
Structure
Back-end Structure
                       URL
                      server
                                             crawler
Crawler

                                  crawler
주소해석이 시간 많이 소요
-> 내부에 DNS cache 관리
                      crawler
                                                   Internet
Repository에 저장후
URL server가 다음주소 할당



                                Repository
Search Engine
Structure
Back-end Structure
                                                         docID   Sejong.ac.k
                                                          url         r
                                        <html>
                                                           1
                                        <head>
Creating Index                  <title>세종대학교</title>
                                        </body>
                                   <h1>학사정보<h1>
                                                                 세종대학교
                                                         Title
                                           ….
                                                         기타        …
Analyzing Web Page structures


DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용

                                       DocIndex              URLlist
URLlist
– url을 key로 사용                    docID url title etc.     url docID
– docID를 가져오기 위함
Search Engine                           Lexicon
Structure
                                     word    wordID
Back-end Structure
                                     세종       101
                                                                      Barrels
                                     대학교      102
                                     학사       201
Creating Index                       정보       202


                                                         Barrels
                                     docID    wordID#1   Position#1   Size#1    Etc.#1
Word Index
                                                         Position#2   Size#2    Etc.#2

Lexicon                                       wordID#2   Position#1   Size#1    Etc.#1
 – word -> wordID
                                                         Position#2   Size#2    Etc.#2

                                                            …
Barrels
 – docID wordID position size etc.

Inverted Index
 – wordID를 Key로 사용
Search Engine
Structure
Back-end Structure


                                 docID    Sejong.ac.k
                                                               docID       3
Creating Index                    url          r
                                                                url    Cyworld.com
                                   1

                                                        Link

Link Index


URLlist
                                          URLlist
Links                                                                Links
                                 Sejong.ac.kr       1              1     3
                                 Cyworld.com        3
Anchortext
- A information of linked page
Search Engine
Structure
Back-end Structure



Creating Index



Ranking Index


Page Rank - Link
                       Web Page 사이의 link를 일종의 투표처럼 분석
                       -> 더 많은 link를 받은 문서 = 더 좋은 문서
Anchortext
Word       - Barrels
Search Engine
Structure
                      DocIndex
Index Structure


                       Lexicon

DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용


Lexicon
– word -> wordID


                        Barrels
Barrels
– storages
Total Structure

User

         Index                   Back-end           Internet


                                  crawler
         DocIndex
Search
Server                            crawler

          Lexicon
                                  crawler

                     Structure
                                                         URL
                                                        server
                       word
         Barrels
          Barrels
           Barrels               Repository

                       Link
                                              URLlist

                     Ranking
                                    Links
Thanks for your attention
구글을 지탱하는 기술

Weitere ähnliche Inhalte

Ähnlich wie 구글을 지탱하는 기술

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007ITDogadjaji.com
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleIE Group
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersDan Usher
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...Dan Usher
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overviewguru122
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overviewguestd9aa5
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersDan Usher
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Cengage Learning
 
Working With Rails
Working With RailsWorking With Rails
Working With RailsDali Wang
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013Stoney deGeyter
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your websitehernanibf
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices FinalMarianne Sweeny
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)Paul James
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Developmentkmloomis
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB SyncShaun Haber
 

Ähnlich wie 구글을 지탱하는 기술 (20)

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
 
Websites On Speed
Websites On SpeedWebsites On Speed
Websites On Speed
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
 
Working With Rails
Working With RailsWorking With Rails
Working With Rails
 
Google
GoogleGoogle
Google
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your website
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
 
Session6
Session6Session6
Session6
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Development
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB Sync
 

Mehr von sid choi

벤치마킹
벤치마킹벤치마킹
벤치마킹sid choi
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4sid choi
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5sid choi
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3sid choi
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹sid choi
 
미코노미
미코노미미코노미
미코노미sid choi
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는sid choi
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 

Mehr von sid choi (15)

벤치마킹
벤치마킹벤치마킹
벤치마킹
 
Meconomy
MeconomyMeconomy
Meconomy
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹
 
미코노미
미코노미미코노미
미코노미
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 

Kürzlich hochgeladen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Kürzlich hochgeladen (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

구글을 지탱하는 기술

  • 1.
  • 3. 구글을 지탱하는 기술 – chapter1.ppt
  • 4. 1. First Appearance of Google 2. Main Concepts 3. Search Engine Structure - ‘s Roll - Back-end Structure - Index Structure 4. Total Structure
  • 5. First Appearance of Google • Why? Get useful results • Who? Sergey Brin & Larry Page
  • 6. Main Concepts Hardware expands Ranking Function – Page Rank – Anchor Text – Word
  • 7. Search Engine Structure Internet Search Engine
  • 8. Search Engine Structure Search Server’s Roll • 통신 관리 Back- Search Index Server end • 요청 해석하여 처리할 내용 판단 • 인덱스에서 필요한 정보 찾아냄 • 결과를 편집해 이용자에게 보냄
  • 9. Search Engine Structure Back-end’s Roll • Crawling •Web page 수집해 오는 기술 Back- Search Index Server end •많은 시간 -> 복수의 crawler 사용 •수집한 것을 Repository에 보관 • Creating Index •Repository에 저장된 web page 로 Index를 만들어 냄 •구조분석, 단어처리, 링크 처리 랭킹 등
  • 10. Search Engine Structure Index’s Roll • 주어진 Data를 안전하게 저장 Back- Search Index Server end • 요청 받은 Data를 찾아냄 • Search Engine의 Data Base 역 할
  • 11. Search Engine Structure Back-end Structure Crawling Web page 수집해 오는 기술 초기 Google 2400만개 Web Page 등록 초당 avg40page를 유지하기 위해선 동시에 수백 개의 download유지 -> 현재는?? 구글 검색했을 때 3,070,000,000개 결과
  • 12. Search Engine Structure Back-end Structure URL server crawler Crawler crawler URL server 가 전체 crawler 지휘 각 crawler는 지시에 따라 crawler Internet Web Page download Repository에 임시 저장 • docID – 고유 숫자 값 Repository • url – URL • text – 압축물 • etc. – date, page length…
  • 13. Search Engine Structure Back-end Structure URL server crawler Crawler crawler 주소해석이 시간 많이 소요 -> 내부에 DNS cache 관리 crawler Internet Repository에 저장후 URL server가 다음주소 할당 Repository
  • 14. Search Engine Structure Back-end Structure docID Sejong.ac.k url r <html> 1 <head> Creating Index <title>세종대학교</title> </body> <h1>학사정보<h1> 세종대학교 Title …. 기타 … Analyzing Web Page structures DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 DocIndex URLlist URLlist – url을 key로 사용 docID url title etc. url docID – docID를 가져오기 위함
  • 15. Search Engine Lexicon Structure word wordID Back-end Structure 세종 101 Barrels 대학교 102 학사 201 Creating Index 정보 202 Barrels docID wordID#1 Position#1 Size#1 Etc.#1 Word Index Position#2 Size#2 Etc.#2 Lexicon wordID#2 Position#1 Size#1 Etc.#1 – word -> wordID Position#2 Size#2 Etc.#2 … Barrels – docID wordID position size etc. Inverted Index – wordID를 Key로 사용
  • 16. Search Engine Structure Back-end Structure docID Sejong.ac.k docID 3 Creating Index url r url Cyworld.com 1 Link Link Index URLlist URLlist Links Links Sejong.ac.kr 1 1 3 Cyworld.com 3 Anchortext - A information of linked page
  • 17. Search Engine Structure Back-end Structure Creating Index Ranking Index Page Rank - Link Web Page 사이의 link를 일종의 투표처럼 분석 -> 더 많은 link를 받은 문서 = 더 좋은 문서 Anchortext Word - Barrels
  • 18. Search Engine Structure DocIndex Index Structure Lexicon DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 Lexicon – word -> wordID Barrels Barrels – storages
  • 19. Total Structure User Index Back-end Internet crawler DocIndex Search Server crawler Lexicon crawler Structure URL server word Barrels Barrels Barrels Repository Link URLlist Ranking Links
  • 20. Thanks for your attention