SlideShare a Scribd company logo
1 of 28
Spider
  Spider
“   ”




Spike
“   ”
vs.




                                                               http://www.flickr.com/photos/blueblankut/497571704/sizes/z/in/photostream/




http://www.flickr.com/photos/coreyburger/2481836757/sizes/z/in/photostream/
Dust
ip

     protocol   sitemap   robots.txt
Link
Trie
Query




Cache
……




http://www.flickr.com/photos/regolare/791385521/
Spider……


                    Map-
           Reduce
Crawler Architecture

                                                                      Repository
               Downloader


                 Download                              Extractor
                  Worker                                Worker
                                                                    save page
                                                                    to repository



                   if 302 founded
get a link         update link
                   http status
                                     put downloaded
                                      page to queue


          links queue
                                                      pages queue   extract links
                                                                    and save


       main loop will put
       peek site's links to
       queue




           Crawler                                                     Linkbase
          main loop


Site will refill itself
when it's empty
                                    TaskLoader


                                Priority Heap
         Scope.txt                    Sites

                                Ordered Site
                                and their links
Dust

            Simhash

       PageRank
robust   html      css selector   lxml   tidy




   url

           proxy            UA
1-NoSQL
          NOSQL
-NoSQL
               NOSQL

   Heap




FIFO
-NoSQL


HBase

Cassandra
-NoSQL
 Cassandra

              bug

                                       Random
patitioning




        Crawler              Crawler
-NoSQL
CAP

   HBase -> CP    Cassandra -> AP

      Cassandra   C
Crawler   link
2-Google
incremental processing system -
  Percolator. a.k.a. Caffeine
2-Google
   BigTable

                   timestamp oracle   lightweight
lock

              Notification                trigger

       Observer      Notification

                  Notification   Percolator Worker
2-Google
Map-Reduce




             Locality
2-Google
Trade-off

                    trillion:million
       Map-Reduce

Page   RPC     MR
                    10   MR   RPC
2-Google
Percolator           DBMS       DBMS




             scale
      Percolator




  Percolator           shared-nothing parallel
  databases
Thanks

More Related Content

What's hot

A quick introduction to Storm Crawler
A quick introduction to Storm CrawlerA quick introduction to Storm Crawler
A quick introduction to Storm CrawlerJulien Nioche
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at BristechJulien Nioche
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Multi-threaded web crawler in Ruby
Multi-threaded web crawler in RubyMulti-threaded web crawler in Ruby
Multi-threaded web crawler in RubyPolcode
 
Confitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
Confitura 2018 — Apache Beam — Promyk Nadziei Data EngineeraConfitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
Confitura 2018 — Apache Beam — Promyk Nadziei Data EngineeraPiotr Wikiel
 
Ruby on rails online training
Ruby on rails online trainingRuby on rails online training
Ruby on rails online trainingTRAINING ICON
 
Git major commands
Git major commandsGit major commands
Git major commandsmyepicslides
 
Git major commands
Git major commandsGit major commands
Git major commandsmyepicslides
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure DataMuga Nishizawa
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Yahoo Developer Network
 
Configuration management
Configuration managementConfiguration management
Configuration managementLuca De Vitis
 

What's hot (11)

A quick introduction to Storm Crawler
A quick introduction to Storm CrawlerA quick introduction to Storm Crawler
A quick introduction to Storm Crawler
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at Bristech
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Multi-threaded web crawler in Ruby
Multi-threaded web crawler in RubyMulti-threaded web crawler in Ruby
Multi-threaded web crawler in Ruby
 
Confitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
Confitura 2018 — Apache Beam — Promyk Nadziei Data EngineeraConfitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
Confitura 2018 — Apache Beam — Promyk Nadziei Data Engineera
 
Ruby on rails online training
Ruby on rails online trainingRuby on rails online training
Ruby on rails online training
 
Git major commands
Git major commandsGit major commands
Git major commands
 
Git major commands
Git major commandsGit major commands
Git major commands
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 
Configuration management
Configuration managementConfiguration management
Configuration management
 

Similar to 爬虫点滴

Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015ontopic
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Bertrand Delacretaz
 
Harnessing the power of Nutch with Scala
Harnessing the power of Nutch with ScalaHarnessing the power of Nutch with Scala
Harnessing the power of Nutch with ScalaKnoldus Inc.
 
Low latency scalable web crawling on Apache Storm
Low latency scalable web crawling on Apache StormLow latency scalable web crawling on Apache Storm
Low latency scalable web crawling on Apache StormJulien Nioche
 
Design and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerDesign and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerGeorge Ang
 
Rails services in the walled garden
Rails services in the walled gardenRails services in the walled garden
Rails services in the walled gardenSidu Ponnappa
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationCommand Prompt., Inc
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDoKC
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure bloomreacheng
 
Web Crawling and Data Gathering with Apache Nutch
Web Crawling and Data Gathering with Apache NutchWeb Crawling and Data Gathering with Apache Nutch
Web Crawling and Data Gathering with Apache NutchSteve Watt
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking systemJesse Vincent
 
Docker 對傳統 DevOps 工具鏈的衝擊 (Docker's Impact on traditional DevOps toolchain)
Docker 對傳統 DevOps 工具鏈的衝擊 (Docker's Impact on traditional DevOps toolchain)Docker 對傳統 DevOps 工具鏈的衝擊 (Docker's Impact on traditional DevOps toolchain)
Docker 對傳統 DevOps 工具鏈的衝擊 (Docker's Impact on traditional DevOps toolchain)William Yeh
 
Redis深入浅出
Redis深入浅出Redis深入浅出
Redis深入浅出ruoyi ruan
 
An introduction to Storm Crawler
An introduction to Storm CrawlerAn introduction to Storm Crawler
An introduction to Storm CrawlerJulien Nioche
 
Common Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudCommon Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudNick Gerner
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS Aggregator
[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS Aggregator[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS Aggregator
[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS AggregatorShengyou Fan
 

Similar to 爬虫点滴 (20)

Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?
 
Harnessing the power of Nutch with Scala
Harnessing the power of Nutch with ScalaHarnessing the power of Nutch with Scala
Harnessing the power of Nutch with Scala
 
Low latency scalable web crawling on Apache Storm
Low latency scalable web crawling on Apache StormLow latency scalable web crawling on Apache Storm
Low latency scalable web crawling on Apache Storm
 
Design and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerDesign and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web Crawler
 
Rails services in the walled garden
Rails services in the walled gardenRails services in the walled garden
Rails services in the walled garden
 
Deployment de Rails
Deployment de RailsDeployment de Rails
Deployment de Rails
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure
 
Web Crawling and Data Gathering with Apache Nutch
Web Crawling and Data Gathering with Apache NutchWeb Crawling and Data Gathering with Apache Nutch
Web Crawling and Data Gathering with Apache Nutch
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking system
 
Docker 對傳統 DevOps 工具鏈的衝擊 (Docker's Impact on traditional DevOps toolchain)
Docker 對傳統 DevOps 工具鏈的衝擊 (Docker's Impact on traditional DevOps toolchain)Docker 對傳統 DevOps 工具鏈的衝擊 (Docker's Impact on traditional DevOps toolchain)
Docker 對傳統 DevOps 工具鏈的衝擊 (Docker's Impact on traditional DevOps toolchain)
 
Redis深入浅出
Redis深入浅出Redis深入浅出
Redis深入浅出
 
An introduction to Storm Crawler
An introduction to Storm CrawlerAn introduction to Storm Crawler
An introduction to Storm Crawler
 
Common Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudCommon Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the Cloud
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS Aggregator
[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS Aggregator[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS Aggregator
[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS Aggregator
 

More from Open Party

Sunshine library introduction
Sunshine library introductionSunshine library introduction
Sunshine library introductionOpen Party
 
食品安全与生态农业──小毛驴市民农园项目介绍
食品安全与生态农业──小毛驴市民农园项目介绍食品安全与生态农业──小毛驴市民农园项目介绍
食品安全与生态农业──小毛驴市民农园项目介绍Open Party
 
网站优化实践
网站优化实践网站优化实践
网站优化实践Open Party
 
Introduction to scientific visualization
Introduction to scientific visualizationIntroduction to scientific visualization
Introduction to scientific visualizationOpen Party
 
西藏10日游
西藏10日游西藏10日游
西藏10日游Open Party
 
Applying BDD in refactoring
Applying BDD in refactoringApplying BDD in refactoring
Applying BDD in refactoringOpen Party
 
移动广告不是网盟
移动广告不是网盟移动广告不是网盟
移动广告不是网盟Open Party
 
Android 开源社区,10年后的再思考
Android 开源社区,10年后的再思考Android 开源社区,10年后的再思考
Android 开源社区,10年后的再思考Open Party
 
企业创业融资之路
企业创业融资之路企业创业融资之路
企业创业融资之路Open Party
 
夸父通讯中间件
夸父通讯中间件夸父通讯中间件
夸父通讯中间件Open Party
 
Java mobile 移动应用开发
Java mobile 移动应用开发Java mobile 移动应用开发
Java mobile 移动应用开发Open Party
 
如何做演讲
如何做演讲如何做演讲
如何做演讲Open Party
 
Positive psychology
Positive psychologyPositive psychology
Positive psychologyOpen Party
 
价值驱动的组织转型-王晓明
价值驱动的组织转型-王晓明价值驱动的组织转型-王晓明
价值驱动的组织转型-王晓明Open Party
 
淘宝广告技术部开发流程和Scrum实践
淘宝广告技术部开发流程和Scrum实践淘宝广告技术部开发流程和Scrum实践
淘宝广告技术部开发流程和Scrum实践Open Party
 
对云计算的理解
对云计算的理解对云计算的理解
对云计算的理解Open Party
 
Web前端标准在各浏览器中的实现差异
Web前端标准在各浏览器中的实现差异Web前端标准在各浏览器中的实现差异
Web前端标准在各浏览器中的实现差异Open Party
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_partyOpen Party
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development Open Party
 

More from Open Party (20)

Sunshine library introduction
Sunshine library introductionSunshine library introduction
Sunshine library introduction
 
食品安全与生态农业──小毛驴市民农园项目介绍
食品安全与生态农业──小毛驴市民农园项目介绍食品安全与生态农业──小毛驴市民农园项目介绍
食品安全与生态农业──小毛驴市民农园项目介绍
 
Cs open-party
Cs open-partyCs open-party
Cs open-party
 
网站优化实践
网站优化实践网站优化实践
网站优化实践
 
Introduction to scientific visualization
Introduction to scientific visualizationIntroduction to scientific visualization
Introduction to scientific visualization
 
西藏10日游
西藏10日游西藏10日游
西藏10日游
 
Applying BDD in refactoring
Applying BDD in refactoringApplying BDD in refactoring
Applying BDD in refactoring
 
移动广告不是网盟
移动广告不是网盟移动广告不是网盟
移动广告不是网盟
 
Android 开源社区,10年后的再思考
Android 开源社区,10年后的再思考Android 开源社区,10年后的再思考
Android 开源社区,10年后的再思考
 
企业创业融资之路
企业创业融资之路企业创业融资之路
企业创业融资之路
 
夸父通讯中间件
夸父通讯中间件夸父通讯中间件
夸父通讯中间件
 
Java mobile 移动应用开发
Java mobile 移动应用开发Java mobile 移动应用开发
Java mobile 移动应用开发
 
如何做演讲
如何做演讲如何做演讲
如何做演讲
 
Positive psychology
Positive psychologyPositive psychology
Positive psychology
 
价值驱动的组织转型-王晓明
价值驱动的组织转型-王晓明价值驱动的组织转型-王晓明
价值驱动的组织转型-王晓明
 
淘宝广告技术部开发流程和Scrum实践
淘宝广告技术部开发流程和Scrum实践淘宝广告技术部开发流程和Scrum实践
淘宝广告技术部开发流程和Scrum实践
 
对云计算的理解
对云计算的理解对云计算的理解
对云计算的理解
 
Web前端标准在各浏览器中的实现差异
Web前端标准在各浏览器中的实现差异Web前端标准在各浏览器中的实现差异
Web前端标准在各浏览器中的实现差异
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
 

Recently uploaded

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

爬虫点滴

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n