SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
1 © 2019 Deep SEARCH 9 GmbH1
Deep SEARCH 9
Distributing AI to the Amazon cloud
IC-SDV 2019 08 - 09 April Nice, France
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
https://deepsearchnine.com
2 © 2019 Deep SEARCH 9 GmbH2
Sources
Surface Web
Deep Web
Databases
Repositories
Scheduled
execution
Unattendedretrieval/crawling
Prepare semantic search
Automatic publication
Deep SEARCH 9
Information Consumers
Ontology management
SEARCHCORPORA
• Biotech
• CROs
• Digital Therapeutics
• Technology Transfer Offices
• Clinical trials
• Other scopes of information
• Known (trusted) sources
• More complete
• Faster
Search applications for specific
scopes of information
3 © 2019 Deep SEARCH 9 GmbH3
Why moving to the cloud?
DS9 needs more and more resources…
2015 2016 2017 2018 2019
• 30.000 company websites
• Link depth 3
• Once every 3 months
• ca. 50 GB of data
• 60.000 company websites
• Link depth 5
• Every month
• ca. 1 TB of data
…because our search engines keep gobbling information
like the cookie monster gobbles cookies!
• 250.000 company websites
• Link depth 5
• Twice a month
• ?
4 © 2019 Deep SEARCH 9 GmbH4
Therefore we need:
The only place to get all of this, is the cloud!
More CPU power
Content classification
Semantic tagging
Machine learning
Faster networks High bandwidth requirements
Network latency problems
Scalability
Availability and responsiveness for users
CPU during analysis
Bandwidth during crawling
5 © 2019 Deep SEARCH 9 GmbH5
More CPU power
6 © 2019 Deep SEARCH 9 GmbH6
More CPU power
EC2 Dynamic Scaling price per hour hours yearly budget
EC2 r5.4xlarge + 2 TB SSD 1,22 € 8.250 10.065 €
Bare metal hardware price per month hours yearly budget
Bare metal hardware 839,00 € 8.760 10.068 €
7 © 2019 Deep SEARCH 9 GmbH7
More CPU power
But we need to be able to do the job in about 2 days
EC2 Runtime Compared to bare metal server Budget (year) Concurrent DS9 nodes Hours / day Hours / month Hours / year
EC2 10 instances 10.065 € 10 2 69 825
EC2 20 instances 10.065 € 20 1 34 413
EC2 50 instances 10.065 € 50 - 14 165
EC2 100 instances 10.065 € 100 - 7 83
EC2 Dynamic Scaling price per hour hours yearly budget
EC2 r5.4xlarge + 2 TB SSD 1,22 € 8.250 10.065 €
Bare metal hardware price per month hours yearly budget
Bare metal hardware 839,00 € 8.760 10.068 €
20x as much CPU for the same price!
8 © 2019 Deep SEARCH 9 GmbH8
Next bullet point: Faster networks
Viewers show the global
distribution of companies
in our SEARCHCORPORA
Obviously there are many
activities in Japan (JPN),
India (IND), China (CHN),
Korea (KOR), Hong Kong
(HKG), Iran (IRN), Pakistan
(PAK), Taiwan (TWN),
Malaysia (MYS),
Bangladesh (BGD),
Singapore (SGP), …
9 © 2019 Deep SEARCH 9 GmbH9
Faster networks
Note, how Tokyo and Seattle have the same distance to our servers
(9.300 km) as have Boston and New Delhi (6.000 km) but network
latency is much higher going east or south
Ping time from DS9 server
Circles are simply squeezed to compensate for Mercator distortion
10 © 2019 Deep SEARCH 9 GmbH10
Faster networks
But can we make the network connection faster?
Simple calculation
Typical page: 30 kB
Typical webserver: 500 pages
Transferring 1 page from Tokyo: 1.200ms
500 pages: 500 x 1.200ms = 10 minutes
1.000 servers: 6 days 23 hours
From Tokyo
Transferring 1 page from London: 82ms
500 pages: 500 x 82ms = 41 seconds
1.000 servers: 11,5 hours
From London
11 © 2019 Deep SEARCH 9 GmbH11
No. But we could distribute DS9!
We can distribute DS9 instances across the world using the Amazon cloud
This map shows the Amazon EC2 computer center locations
12 © 2019 Deep SEARCH 9 GmbH12
Distributing DS9
We can distribute DS9 instances across the world using the Amazon cloud
This map shows the Amazon EC2 computer center locations
13 © 2019 Deep SEARCH 9 GmbH13
Challenges
Use standards or develop proprietary?
Hadoop is what one thinks of when hearing distributed analytics…
MapReduce algorithms are good at distributing cut down analytics tasks across
multiple CPUs. This is what we would use on the filter step level. But it is not suited to
distribute whole filter chains with arbitrary analytics tasks like text annotation with
ontologies or Deep Web crawling with real-time constructed URLs
How can we minimize I/O operations?
I/O operations – especially indexing of data – and data transfer are the
bottlenecks and could potentially eat up all benefits coming from distribution
1. Data must be read only once from the DS9 backend (no copying)
2. Data must be transferred in compressed chunks (to overcome latency issues)
3. Data must be indexed only once at the final destination on the DS9 backend
14 © 2019 Deep SEARCH 9 GmbH14
DS9 standard node
Distributing DS9 instances
ds9App
Frontdoor
• Webserver
Firewall
Browserfarm
• DS9
• MySQL
• Elasticsearch
• Blazegraph
• DS9 Farming
• MySQL
• DS9 App
• MySQL
Frontdoor
• Webserver
Firewall
• DS9
• MySQL
• Blazegraph
DS9 distributed node
Smaller footprint!
15 © 2019 Deep SEARCH 9 GmbH15
Two new types of DS9 jobs were implemented:
That‘s what we always did
Execute a job from main DS9 host remotely
on some other DS9 host for load distribution
Execute a job from main DS9 host on a
dynamically allocated cluster of EC2 instances
that have DS9 Solutions installed
Controlled by DS9 Farming
URLs read from DS9 main host
Results written back to DS9 main host
Start 20 nodes in DS9 cluster mode
Use t3.xlarge node type (4 VCPUs, 96GB)
Run all instances at Amazon in Tokyo
DS9 EC2 cloud clusters
16 © 2019 Deep SEARCH 9 GmbH16
ds9App
Frontdoor
• Webserver
Firewall
Browserfarm
DS9 / IDE
DS9 standard installation
Instances are dynamically
allocated, deployed and
started, jobs are executed
and at the end all
instances are terminated
Accounting
Dynamic cloud allocation
powered by• DS9
• MySQL
• Elasticsearch
• Blazegraph
• DS9 Farming
• MySQL
• DS9 App
• MySQL
• DS9
• MySQL
• Blazegraph
Each node is a full installation of
DS9 Solutions (without Elasticsearch)
Finally fully scalable (this satisfies our 3rd need)
AWS Region Tokyo
20x – deployment takes < 5 minutes
17 © 2019 Deep SEARCH 9 GmbH17
1. export
DS9 Farming
2. unpack
Claim
containers
input
powered by
DS9 Solutions
• DS9
• MySQL
• Blazegraph
DS9 Solutions
• DS9
• MySQL
• Blazegraph
DS9 Solutions
• DS9
• MySQL
• Blazegraph
DS9 Solutions
• DS9
• MySQL
• Blazegraph
3. start nodes
4. import job
5. execute job
remote read
equally distribute URLs
among EC2 nodes
write
cache
remote write
Only move necessary resources to EC2
Execute Distributed Job
6. stop nodes
…
18 © 2019 Deep SEARCH 9 GmbH18
Sources
Information Scientists
SEARCHCORPORA
• Start-ups
• Competitors
• Regulatory
• New technology
• …
Scheduled
execution
Unattendedupdates
Automatic publication
• Known (trusted) sources
• More complete
• Faster
Managed Intelligence 2018
• Information source selection
• Content structuring
• Linking of disparate sources
• Ontology management
• SEARCHCORPUS® management
Search Competence Center
Information Consumers
Internal Customers
Expertise of information scientist needed
Unattended automatic execution of jobs
Sources
Surface Web
Deep Web
Databases
Repositories Prepare semantic search
Ontology management
19 © 2019 Deep SEARCH 9 GmbH19
Company repositories
e.g. Crunchbase
Master SEARCHCORPUS®
• Hundreds of thousands of websites
• Many Million web pages
• PDF-based publications
• Structured data
• Other sources
Extraction using
Lucene query +
classification
SEARCHCORPORA
• Biotech
• CROs
• Digital Therapeutics
• Technology Transfer Offices
• Clinical trials
• Other scopes of information
Customize for
research target
Automatic publication:
• target specific focus Information Consumers
Internal CustomersQuality assurance
Qualification
SEARCHCORPUS®
Crawling and automatic
classification for
classes of interest
Classified targets
Managed Intelligence 2019
Fully distributed
Expertise of information scientist needed
Crawling
Unattended automatic execution of jobs
Distributed automatic execution of jobs
Information Scientists
Search Competence Center
Surface / Deep
Web
20 © 2019 Deep SEARCH 9 GmbH20
Deep SEARCH 9
Distributing AI to the Amazon cloud
IC-SDV 2019 08 - 09 April Nice, France
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
https://deepsearchnine.com

Weitere ähnliche Inhalte

Was ist angesagt?

H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Sri Ambati
 

Was ist angesagt? (20)

IBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesIBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use Cases
 
Building a Consistent Hybrid Cloud Semantic Model In Denodo
Building a Consistent Hybrid Cloud Semantic Model In DenodoBuilding a Consistent Hybrid Cloud Semantic Model In Denodo
Building a Consistent Hybrid Cloud Semantic Model In Denodo
 
Scaling Face Recognition with Big Data
Scaling Face Recognition with Big DataScaling Face Recognition with Big Data
Scaling Face Recognition with Big Data
 
A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial Times
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms
 
10 Good Reasons: NetApp for Automotive
10 Good Reasons: NetApp for Automotive10 Good Reasons: NetApp for Automotive
10 Good Reasons: NetApp for Automotive
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)
 
20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output
 
Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)
 
The traditional data center is dead: How to win with hybrid DR
The traditional data center is dead: How to win with hybrid DRThe traditional data center is dead: How to win with hybrid DR
The traditional data center is dead: How to win with hybrid DR
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Using Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaS
Using Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaSUsing Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaS
Using Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaS
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
 
Working with data using AI based tools
Working with data using AI based toolsWorking with data using AI based tools
Working with data using AI based tools
 
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data Infrastructure
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?
AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?
AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?
 

Ähnlich wie IC-SDV 2019: Distributing AI to the Amazon Cloud - Klaus Kater (Deep SEARCH 9, Germany )

Ähnlich wie IC-SDV 2019: Distributing AI to the Amazon Cloud - Klaus Kater (Deep SEARCH 9, Germany ) (20)

A Brave new object store world
A Brave new object store worldA Brave new object store world
A Brave new object store world
 
Three ways object storage can save you time in 2017
Three ways object storage can save you time in 2017Three ways object storage can save you time in 2017
Three ways object storage can save you time in 2017
 
Cloud Opportunities with Virtualization
Cloud Opportunities with VirtualizationCloud Opportunities with Virtualization
Cloud Opportunities with Virtualization
 
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackAdobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
 
Libera la potenza del Machine Learning
Libera la potenza del Machine LearningLibera la potenza del Machine Learning
Libera la potenza del Machine Learning
 
Learn the new rules of cloud storage
Learn the new rules of cloud storageLearn the new rules of cloud storage
Learn the new rules of cloud storage
 
Building Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit CanberraBuilding Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit Canberra
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
Simplicity Without Compromise Building a Cognitive Cloud
Simplicity Without Compromise Building a Cognitive CloudSimplicity Without Compromise Building a Cognitive Cloud
Simplicity Without Compromise Building a Cognitive Cloud
 
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
 
Manage your database in the cloud like a pro with Cloud Volumes Service for A...
Manage your database in the cloud like a pro with Cloud Volumes Service for A...Manage your database in the cloud like a pro with Cloud Volumes Service for A...
Manage your database in the cloud like a pro with Cloud Volumes Service for A...
 
Cloud Data Management: Protecting your Cloud strategy
Cloud Data Management: Protecting your Cloud strategyCloud Data Management: Protecting your Cloud strategy
Cloud Data Management: Protecting your Cloud strategy
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
 
Virtualization and Containers
Virtualization and ContainersVirtualization and Containers
Virtualization and Containers
 
No Time to Waste: Migrate from Oracle to Postgres in Minutes
No Time to Waste: Migrate from Oracle to Postgres in MinutesNo Time to Waste: Migrate from Oracle to Postgres in Minutes
No Time to Waste: Migrate from Oracle to Postgres in Minutes
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
 
Cloud computing ft
Cloud computing ftCloud computing ft
Cloud computing ft
 
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
 
AWS Cloud9
AWS Cloud9AWS Cloud9
AWS Cloud9
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Monica Sydney
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
F
 

Kürzlich hochgeladen (20)

原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 

IC-SDV 2019: Distributing AI to the Amazon Cloud - Klaus Kater (Deep SEARCH 9, Germany )

  • 1. 1 © 2019 Deep SEARCH 9 GmbH1 Deep SEARCH 9 Distributing AI to the Amazon cloud IC-SDV 2019 08 - 09 April Nice, France Klaus Kater Deep SEARCH 9 GmbH Managing Partner https://deepsearchnine.com
  • 2. 2 © 2019 Deep SEARCH 9 GmbH2 Sources Surface Web Deep Web Databases Repositories Scheduled execution Unattendedretrieval/crawling Prepare semantic search Automatic publication Deep SEARCH 9 Information Consumers Ontology management SEARCHCORPORA • Biotech • CROs • Digital Therapeutics • Technology Transfer Offices • Clinical trials • Other scopes of information • Known (trusted) sources • More complete • Faster Search applications for specific scopes of information
  • 3. 3 © 2019 Deep SEARCH 9 GmbH3 Why moving to the cloud? DS9 needs more and more resources… 2015 2016 2017 2018 2019 • 30.000 company websites • Link depth 3 • Once every 3 months • ca. 50 GB of data • 60.000 company websites • Link depth 5 • Every month • ca. 1 TB of data …because our search engines keep gobbling information like the cookie monster gobbles cookies! • 250.000 company websites • Link depth 5 • Twice a month • ?
  • 4. 4 © 2019 Deep SEARCH 9 GmbH4 Therefore we need: The only place to get all of this, is the cloud! More CPU power Content classification Semantic tagging Machine learning Faster networks High bandwidth requirements Network latency problems Scalability Availability and responsiveness for users CPU during analysis Bandwidth during crawling
  • 5. 5 © 2019 Deep SEARCH 9 GmbH5 More CPU power
  • 6. 6 © 2019 Deep SEARCH 9 GmbH6 More CPU power EC2 Dynamic Scaling price per hour hours yearly budget EC2 r5.4xlarge + 2 TB SSD 1,22 € 8.250 10.065 € Bare metal hardware price per month hours yearly budget Bare metal hardware 839,00 € 8.760 10.068 €
  • 7. 7 © 2019 Deep SEARCH 9 GmbH7 More CPU power But we need to be able to do the job in about 2 days EC2 Runtime Compared to bare metal server Budget (year) Concurrent DS9 nodes Hours / day Hours / month Hours / year EC2 10 instances 10.065 € 10 2 69 825 EC2 20 instances 10.065 € 20 1 34 413 EC2 50 instances 10.065 € 50 - 14 165 EC2 100 instances 10.065 € 100 - 7 83 EC2 Dynamic Scaling price per hour hours yearly budget EC2 r5.4xlarge + 2 TB SSD 1,22 € 8.250 10.065 € Bare metal hardware price per month hours yearly budget Bare metal hardware 839,00 € 8.760 10.068 € 20x as much CPU for the same price!
  • 8. 8 © 2019 Deep SEARCH 9 GmbH8 Next bullet point: Faster networks Viewers show the global distribution of companies in our SEARCHCORPORA Obviously there are many activities in Japan (JPN), India (IND), China (CHN), Korea (KOR), Hong Kong (HKG), Iran (IRN), Pakistan (PAK), Taiwan (TWN), Malaysia (MYS), Bangladesh (BGD), Singapore (SGP), …
  • 9. 9 © 2019 Deep SEARCH 9 GmbH9 Faster networks Note, how Tokyo and Seattle have the same distance to our servers (9.300 km) as have Boston and New Delhi (6.000 km) but network latency is much higher going east or south Ping time from DS9 server Circles are simply squeezed to compensate for Mercator distortion
  • 10. 10 © 2019 Deep SEARCH 9 GmbH10 Faster networks But can we make the network connection faster? Simple calculation Typical page: 30 kB Typical webserver: 500 pages Transferring 1 page from Tokyo: 1.200ms 500 pages: 500 x 1.200ms = 10 minutes 1.000 servers: 6 days 23 hours From Tokyo Transferring 1 page from London: 82ms 500 pages: 500 x 82ms = 41 seconds 1.000 servers: 11,5 hours From London
  • 11. 11 © 2019 Deep SEARCH 9 GmbH11 No. But we could distribute DS9! We can distribute DS9 instances across the world using the Amazon cloud This map shows the Amazon EC2 computer center locations
  • 12. 12 © 2019 Deep SEARCH 9 GmbH12 Distributing DS9 We can distribute DS9 instances across the world using the Amazon cloud This map shows the Amazon EC2 computer center locations
  • 13. 13 © 2019 Deep SEARCH 9 GmbH13 Challenges Use standards or develop proprietary? Hadoop is what one thinks of when hearing distributed analytics… MapReduce algorithms are good at distributing cut down analytics tasks across multiple CPUs. This is what we would use on the filter step level. But it is not suited to distribute whole filter chains with arbitrary analytics tasks like text annotation with ontologies or Deep Web crawling with real-time constructed URLs How can we minimize I/O operations? I/O operations – especially indexing of data – and data transfer are the bottlenecks and could potentially eat up all benefits coming from distribution 1. Data must be read only once from the DS9 backend (no copying) 2. Data must be transferred in compressed chunks (to overcome latency issues) 3. Data must be indexed only once at the final destination on the DS9 backend
  • 14. 14 © 2019 Deep SEARCH 9 GmbH14 DS9 standard node Distributing DS9 instances ds9App Frontdoor • Webserver Firewall Browserfarm • DS9 • MySQL • Elasticsearch • Blazegraph • DS9 Farming • MySQL • DS9 App • MySQL Frontdoor • Webserver Firewall • DS9 • MySQL • Blazegraph DS9 distributed node Smaller footprint!
  • 15. 15 © 2019 Deep SEARCH 9 GmbH15 Two new types of DS9 jobs were implemented: That‘s what we always did Execute a job from main DS9 host remotely on some other DS9 host for load distribution Execute a job from main DS9 host on a dynamically allocated cluster of EC2 instances that have DS9 Solutions installed Controlled by DS9 Farming URLs read from DS9 main host Results written back to DS9 main host Start 20 nodes in DS9 cluster mode Use t3.xlarge node type (4 VCPUs, 96GB) Run all instances at Amazon in Tokyo DS9 EC2 cloud clusters
  • 16. 16 © 2019 Deep SEARCH 9 GmbH16 ds9App Frontdoor • Webserver Firewall Browserfarm DS9 / IDE DS9 standard installation Instances are dynamically allocated, deployed and started, jobs are executed and at the end all instances are terminated Accounting Dynamic cloud allocation powered by• DS9 • MySQL • Elasticsearch • Blazegraph • DS9 Farming • MySQL • DS9 App • MySQL • DS9 • MySQL • Blazegraph Each node is a full installation of DS9 Solutions (without Elasticsearch) Finally fully scalable (this satisfies our 3rd need) AWS Region Tokyo 20x – deployment takes < 5 minutes
  • 17. 17 © 2019 Deep SEARCH 9 GmbH17 1. export DS9 Farming 2. unpack Claim containers input powered by DS9 Solutions • DS9 • MySQL • Blazegraph DS9 Solutions • DS9 • MySQL • Blazegraph DS9 Solutions • DS9 • MySQL • Blazegraph DS9 Solutions • DS9 • MySQL • Blazegraph 3. start nodes 4. import job 5. execute job remote read equally distribute URLs among EC2 nodes write cache remote write Only move necessary resources to EC2 Execute Distributed Job 6. stop nodes …
  • 18. 18 © 2019 Deep SEARCH 9 GmbH18 Sources Information Scientists SEARCHCORPORA • Start-ups • Competitors • Regulatory • New technology • … Scheduled execution Unattendedupdates Automatic publication • Known (trusted) sources • More complete • Faster Managed Intelligence 2018 • Information source selection • Content structuring • Linking of disparate sources • Ontology management • SEARCHCORPUS® management Search Competence Center Information Consumers Internal Customers Expertise of information scientist needed Unattended automatic execution of jobs Sources Surface Web Deep Web Databases Repositories Prepare semantic search Ontology management
  • 19. 19 © 2019 Deep SEARCH 9 GmbH19 Company repositories e.g. Crunchbase Master SEARCHCORPUS® • Hundreds of thousands of websites • Many Million web pages • PDF-based publications • Structured data • Other sources Extraction using Lucene query + classification SEARCHCORPORA • Biotech • CROs • Digital Therapeutics • Technology Transfer Offices • Clinical trials • Other scopes of information Customize for research target Automatic publication: • target specific focus Information Consumers Internal CustomersQuality assurance Qualification SEARCHCORPUS® Crawling and automatic classification for classes of interest Classified targets Managed Intelligence 2019 Fully distributed Expertise of information scientist needed Crawling Unattended automatic execution of jobs Distributed automatic execution of jobs Information Scientists Search Competence Center Surface / Deep Web
  • 20. 20 © 2019 Deep SEARCH 9 GmbH20 Deep SEARCH 9 Distributing AI to the Amazon cloud IC-SDV 2019 08 - 09 April Nice, France Klaus Kater Deep SEARCH 9 GmbH Managing Partner https://deepsearchnine.com