SlideShare a Scribd company logo
1 of 21
Download to read offline
RELATIONSHIP EXTRACTION
FROM UNSTRUCTURED TEXT-
BASED ON STANFORD NLP
WITH SPARK
Yana Ponomarova
Head of Data Science France - Capgemini
Nicolas Claudon
Head of Big Data Architects France - Capgemini
Taming the Text
• 80% of world’s information is in a form of text
• Text documents are highly unstructured
• Search engines do not satisfy all information
extraction needs
What if you could « structure » text….
Use Cases and Benefits
• Querying Supply Chain graph
Select all receivers of crude oil from site A
• Alerts for undesirable contract commitements
« The Defence Ministry has decided to impose unlimited penalty on foreign vendors»
• Alerts for desirable opportunities from your
financial news feeds
« Macquarie upgraded AUY from Neutral to Outperform rating »
Relation Extraction : Approaches
• ACE : 17 rel-s (role, contain, location, family)
• UMLS : 134 entity types, 54 relations
• Freebase : nationality, contains, profession, place of birth, etc.
• Approaches (D. Jurafsky : Speech and Language Processing)
1. Search for patterns Our approach
2. Supervised machine learning Need training set
3. Semi-supervised and unsupervised Need training set (smaller)
4. Deep Learning Need huge training sets
REFINERY
PIPELINE
PRODUCT
TERMINAL
LOCATION
OIL FIELD
• Hearst Patterns : Ontology construction
• Relations between specific entities
– located-in (ORGANIZATION, LOCATION)
– founded (PERSON, ORGANIZATION) employs (ORGANIZATION, PERSON)
– cures (DRUG, DISEASE) causes (DRUG, DISEASE)
• Supply Chain :
– direction matters!
X and other Y ...temples, treasuries, and other important civic buildings.
X or other Y Bruises, wounds, broken bones or other injuries...
Y such as X The bow lute, such as the Bambara ndang...
Relation Extraction : Search for Pattern
?
?
Stanford CoreNLP
Arzew refinery processes Saharan Crude Oil piped to it by the Haoudh El Hamra-Arzew Oil Pipeline.
Supply Chain Relation Extraction Pipeline
Sentence
Simplification
Relation
Extraction
Context learning
Supply Chain
NER annotation
Streaming
Batch
Supply Chain NER
Machine Learning
Documents
Ingestion
Why Use Spark ?
1. Code reuse between batch layer &
streaming processing layer
2. Easy to distribute Stanford NLP procesing
3. Spark brings the fault tolerance
4. Near Real Time is made easy for D.S and
Developers compare to Apache Storm
Supply Chain Relation Extraction Pipeline
Supply Chain NER
annotation
Arzew refinery (REF)
Saharan Crude Oil (PROD)
Haoudh El Hamra-Arzew
Oil Pipeline (PIPE)
“Arzew refinery processes
Saharan Crude Oil piped
to it along the Haoudh El
Hamra-Arzew Oil
Pipeline.”
NER for Oil & Gas Supply Chain
• Oil & Gas Sites & Products
– PROD - product
– TERM - terminal
– OFI - Oil field
– REF - refinery
– PIPE - pipeline
– O – other
• Feature Engineering :
– useChunks = true
– useNPGovernor = true
– useLemmas
– maxRight = 6
– useClassFeature = true
– useWordTag
– etc.Crude PROD
oil PROD
can O
be O
supplied O
from O
the O
Mediterranean LOC
Sea LOC
by O
the PIPE
Janaf PIPE
Crude PIPE
Oil PIPE
Pipeline PIPE
PROD: Crude oil
LOC : Mediterenean Sea
PIPE : the Janaf Crude Oil
Pipeline
NER
Machine
Learning
NER for Oil & Gas Supply Chain
• Linear chain Conditional Random Field (CRF) sequence models
– Lafferty, McCallum, and Pereira (2001)
• CRF combines discriminative
modeling and sequence modeling
(robust to violation of iid assumption)
• A state in CRF can depend on
observations from any (even future)
state.
• Training process is lengthy Image : Getoor L. and Taskar B, 2007 : Introduction to Statistical Relational Learning
NER
Machine
Learning
1) Arzew refinery
processes
Saharan Crude
Oil.
2) Saharan Crude
Oil is piped to it
along the
Haoudh El
Hamra-Arzew
Oil Pipeline.
Supply Chain Relation Extraction Pipeline
Sentence
Simplification
Relation
Extraction
“Arzew refinery
processes
Saharan Crude
Oil piped to it
along the
Haoudh El
Hamra-Arzew
Oil Pipeline.”
1) supplier => none,
receiver =>Arzew refinery (REF)
theme =>Saharan Crude Oil (PROD)
2) supplier => Haoudh El Hamra-Arzew
Oil Pipeline (PIPE),
receiver => arzew refinery (REF)
theme => Saharan Crude Oil(PROD)
Relation Extraction : active voice
• VerbNet : fulfilling-13.4.1: supply, provide, serve, resupply
• VerbNet: obtain-13.5.2: receive, obtain, gain, retrieve, collect
• Copula verbs : VerbNet seem-109-1-1: be, seem available
Example Frames Our Approach
Refinery A sends oil to Terminal B NP V NP {To} PP.recepient Supplier (subj) V Theme (obj) {To} Recepient (nmod=« to »)
Refinery A presents Terminal B with oil NP V NP {With} PP.theme Supplier (subj) V Recepient {With} Theme (obj)
Refinery A delivers oil NP V NP Supplier (subj) V Theme (obj)
Example Frames Our Approach
Terminal B obtained oil NP V NP Recepient (subj) V Theme (obj)
Terminal B obtained oil from Refinery A NP V NP {From} PP.source Recepient (subj) V Theme (obj) {From} Supplier
(nmod=« from »)
Example Frames Our Approach
Gas is available from Refinery A NP V PP {From} NP Theme (subj) V Attribute (available) {From} Supplier (nmod = « from »)
Relation
Extraction
Sentence Simplification
• Del Corro L.and Rainer Gemulla (WWW-2013) : ClausIE: Clause-Based Open Information Extraction
• Construct a clause for every subject dependency :
– Clause: part of a sentence, that expresses some coherent piece of information
– Consists of: Subject (S), Verb (V), Optionally: Indirect object (O), Direct object (O), Complement (C), one ore more
adverbials (A)
• Replace relative pronoun (e.g., who or which) of a relative clause by its antecedent (relcl depend.).
• Generate clause from participial modifiers(ccomp, amod), which indicate reduced relative clauses.
• Result :
– “Arzew refinery processes Saharan Crude Oil.”
– “Saharan Crude Oil is piped to it along the Haoudh El Hamra-Arzew Oil Pipeline.”
Sentence
Simplification
Batch
Solution architecture
ODBC
JDBC
HTTP
API
Loading
Structuring
HDFS
Refined data, aggregate
Raw data, Serialized data
Publication
Datalab
Historic data
HBase
YARN
Stream
Engineer reports
Stream
Stock exchange
data
Sentence
Simplification
SPARK
Streaming
Internal
Applications
Kibana
API REST
DWH
Usage
Tableau
HUE
KafkaStream
Partners
sources
Loading
Kafka
Technical
description
Partners
REST
Elastic
search
Spark
Relation
Extraction
NER ML
SPARK
Batch Layer
Speed layer
Contracts
NER
Prediction
SPARK
Streaming
SPARK
Streaming
Streaming object reused
Reuse
Annotators are time costly (3-4 seconds to initialize) so we initiate them
only once per executor
Once processing & Fault tolerance
Receiver : At least once processing Directstream : Exactly once processing
TAKE AWAY
Data Science
NER
• External libraries can be organically integrated in Spark
(Spark Streaming) pipeline
• Distributed Spark CRF implementation would be very
welcome
Relation Extraction
• Pattern-based implementation provides a good initial
solution
• Next step : use it as training set for further bootstrap / ML
Sentence Simplification
• Very sensitive to changes in the Stanford Parser
ALL 3 blocks
• Share common Stanford NLP annotators.Dependency
Parser is the most expensive.
• Nevertheless,implementation thatreuses those in the
tree blocks was found inefficient.
• Implementation with three consecutive maps in preferred.
Architecture & Tuning
Architecure
• Choose your kafka connector according to your
existing monitoring tools
• Yarn scheduler resources calculator should take
account of RAM & CPU (yarn.scheduler.capacity.resource-calculator)
• Do checkpoints
Memory
• Memory was a chokepoint
• Turn kryo serialization verify that you have registerd
all your class spark.kryo.registrationRequired
• Use fastutils
• Filter as much as possible
Tuning
• Tune your partitions according to data volume
• Use coalesce instead of repartion if you are
downgrading your partitions number
• Prefer Dataframe to RDD for Batch processing
And, of course, Capgemini is hiring
THANK YOU.
Yana Ponomarova : yana.ponomarova@capgemini.com
@yponomarova
Nicolas Claudon : nicolas.claudon@capgemini.com
@nicolasclaudon

More Related Content

What's hot

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks
 
MLeap: Deploy Spark ML Pipelines to Production API Servers
MLeap: Deploy Spark ML Pipelines to Production API ServersMLeap: Deploy Spark ML Pipelines to Production API Servers
MLeap: Deploy Spark ML Pipelines to Production API Servers
DataWorks Summit
 

What's hot (20)

Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
 
Lambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie Strickland
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
 
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
 
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
MLeap: Deploy Spark ML Pipelines to Production API Servers
MLeap: Deploy Spark ML Pipelines to Production API ServersMLeap: Deploy Spark ML Pipelines to Production API Servers
MLeap: Deploy Spark ML Pipelines to Production API Servers
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Spark Summit EU talk by Stavros kontopoulos and Justin Pihony
Spark Summit EU talk by Stavros kontopoulos and Justin PihonySpark Summit EU talk by Stavros kontopoulos and Justin Pihony
Spark Summit EU talk by Stavros kontopoulos and Justin Pihony
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba Search
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 

Viewers also liked

Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Spark Summit
 

Viewers also liked (11)

OSINT - Open Source Intelligence
OSINT - Open Source IntelligenceOSINT - Open Source Intelligence
OSINT - Open Source Intelligence
 
OSINT for Attack and Defense
OSINT for Attack and DefenseOSINT for Attack and Defense
OSINT for Attack and Defense
 
OSINT 2.0 - Past, present and future
OSINT 2.0  - Past, present and futureOSINT 2.0  - Past, present and future
OSINT 2.0 - Past, present and future
 
24 June 2015: Working with CDE
24 June 2015: Working with CDE24 June 2015: Working with CDE
24 June 2015: Working with CDE
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
 
Open Source Intelligence (OSINT)
Open Source Intelligence (OSINT)Open Source Intelligence (OSINT)
Open Source Intelligence (OSINT)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
2017 Digital Yearbook
2017 Digital Yearbook2017 Digital Yearbook
2017 Digital Yearbook
 
Digital in 2017 Global Overview
Digital in 2017 Global OverviewDigital in 2017 Global Overview
Digital in 2017 Global Overview
 
Global Digital Statshot Q3 2017
Global Digital Statshot Q3 2017Global Digital Statshot Q3 2017
Global Digital Statshot Q3 2017
 

Similar to Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spark by Nicolas Claudon and Yana Ponomarova

Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streaming
DataWorks Summit
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
DataWorks Summit
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
inside-BigData.com
 
AP4R on Developers Summit 2008
AP4R on Developers Summit 2008AP4R on Developers Summit 2008
AP4R on Developers Summit 2008
Kato Kiwamu
 

Similar to Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spark by Nicolas Claudon and Yana Ponomarova (20)

Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streaming
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorial
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
All Day DevOps - FLiP Stack for Cloud Data Lakes
All Day DevOps - FLiP Stack for Cloud Data LakesAll Day DevOps - FLiP Stack for Cloud Data Lakes
All Day DevOps - FLiP Stack for Cloud Data Lakes
 
AP4R on RubyKaigi2007 (English only)
AP4R on RubyKaigi2007 (English only)AP4R on RubyKaigi2007 (English only)
AP4R on RubyKaigi2007 (English only)
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
 
AP4R on RubyConf2007
AP4R on RubyConf2007AP4R on RubyConf2007
AP4R on RubyConf2007
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Copr HD OpenStack Day India
Copr HD OpenStack Day IndiaCopr HD OpenStack Day India
Copr HD OpenStack Day India
 
JAX-RS.next
JAX-RS.nextJAX-RS.next
JAX-RS.next
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
 
AP4R on Developers Summit 2008
AP4R on Developers Summit 2008AP4R on Developers Summit 2008
AP4R on Developers Summit 2008
 
Cdn cs6740
Cdn cs6740Cdn cs6740
Cdn cs6740
 
Ratpack Web Framework
Ratpack Web FrameworkRatpack Web Framework
Ratpack Web Framework
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
 
Railsconf
RailsconfRailsconf
Railsconf
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
 

More from Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 

Recently uploaded

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spark by Nicolas Claudon and Yana Ponomarova

  • 1. RELATIONSHIP EXTRACTION FROM UNSTRUCTURED TEXT- BASED ON STANFORD NLP WITH SPARK Yana Ponomarova Head of Data Science France - Capgemini Nicolas Claudon Head of Big Data Architects France - Capgemini
  • 2. Taming the Text • 80% of world’s information is in a form of text • Text documents are highly unstructured • Search engines do not satisfy all information extraction needs
  • 3. What if you could « structure » text….
  • 4. Use Cases and Benefits • Querying Supply Chain graph Select all receivers of crude oil from site A • Alerts for undesirable contract commitements « The Defence Ministry has decided to impose unlimited penalty on foreign vendors» • Alerts for desirable opportunities from your financial news feeds « Macquarie upgraded AUY from Neutral to Outperform rating »
  • 5. Relation Extraction : Approaches • ACE : 17 rel-s (role, contain, location, family) • UMLS : 134 entity types, 54 relations • Freebase : nationality, contains, profession, place of birth, etc. • Approaches (D. Jurafsky : Speech and Language Processing) 1. Search for patterns Our approach 2. Supervised machine learning Need training set 3. Semi-supervised and unsupervised Need training set (smaller) 4. Deep Learning Need huge training sets
  • 6. REFINERY PIPELINE PRODUCT TERMINAL LOCATION OIL FIELD • Hearst Patterns : Ontology construction • Relations between specific entities – located-in (ORGANIZATION, LOCATION) – founded (PERSON, ORGANIZATION) employs (ORGANIZATION, PERSON) – cures (DRUG, DISEASE) causes (DRUG, DISEASE) • Supply Chain : – direction matters! X and other Y ...temples, treasuries, and other important civic buildings. X or other Y Bruises, wounds, broken bones or other injuries... Y such as X The bow lute, such as the Bambara ndang... Relation Extraction : Search for Pattern ? ?
  • 7. Stanford CoreNLP Arzew refinery processes Saharan Crude Oil piped to it by the Haoudh El Hamra-Arzew Oil Pipeline.
  • 8. Supply Chain Relation Extraction Pipeline Sentence Simplification Relation Extraction Context learning Supply Chain NER annotation Streaming Batch Supply Chain NER Machine Learning Documents Ingestion
  • 9. Why Use Spark ? 1. Code reuse between batch layer & streaming processing layer 2. Easy to distribute Stanford NLP procesing 3. Spark brings the fault tolerance 4. Near Real Time is made easy for D.S and Developers compare to Apache Storm
  • 10. Supply Chain Relation Extraction Pipeline Supply Chain NER annotation Arzew refinery (REF) Saharan Crude Oil (PROD) Haoudh El Hamra-Arzew Oil Pipeline (PIPE) “Arzew refinery processes Saharan Crude Oil piped to it along the Haoudh El Hamra-Arzew Oil Pipeline.”
  • 11. NER for Oil & Gas Supply Chain • Oil & Gas Sites & Products – PROD - product – TERM - terminal – OFI - Oil field – REF - refinery – PIPE - pipeline – O – other • Feature Engineering : – useChunks = true – useNPGovernor = true – useLemmas – maxRight = 6 – useClassFeature = true – useWordTag – etc.Crude PROD oil PROD can O be O supplied O from O the O Mediterranean LOC Sea LOC by O the PIPE Janaf PIPE Crude PIPE Oil PIPE Pipeline PIPE PROD: Crude oil LOC : Mediterenean Sea PIPE : the Janaf Crude Oil Pipeline NER Machine Learning
  • 12. NER for Oil & Gas Supply Chain • Linear chain Conditional Random Field (CRF) sequence models – Lafferty, McCallum, and Pereira (2001) • CRF combines discriminative modeling and sequence modeling (robust to violation of iid assumption) • A state in CRF can depend on observations from any (even future) state. • Training process is lengthy Image : Getoor L. and Taskar B, 2007 : Introduction to Statistical Relational Learning NER Machine Learning
  • 13. 1) Arzew refinery processes Saharan Crude Oil. 2) Saharan Crude Oil is piped to it along the Haoudh El Hamra-Arzew Oil Pipeline. Supply Chain Relation Extraction Pipeline Sentence Simplification Relation Extraction “Arzew refinery processes Saharan Crude Oil piped to it along the Haoudh El Hamra-Arzew Oil Pipeline.” 1) supplier => none, receiver =>Arzew refinery (REF) theme =>Saharan Crude Oil (PROD) 2) supplier => Haoudh El Hamra-Arzew Oil Pipeline (PIPE), receiver => arzew refinery (REF) theme => Saharan Crude Oil(PROD)
  • 14. Relation Extraction : active voice • VerbNet : fulfilling-13.4.1: supply, provide, serve, resupply • VerbNet: obtain-13.5.2: receive, obtain, gain, retrieve, collect • Copula verbs : VerbNet seem-109-1-1: be, seem available Example Frames Our Approach Refinery A sends oil to Terminal B NP V NP {To} PP.recepient Supplier (subj) V Theme (obj) {To} Recepient (nmod=« to ») Refinery A presents Terminal B with oil NP V NP {With} PP.theme Supplier (subj) V Recepient {With} Theme (obj) Refinery A delivers oil NP V NP Supplier (subj) V Theme (obj) Example Frames Our Approach Terminal B obtained oil NP V NP Recepient (subj) V Theme (obj) Terminal B obtained oil from Refinery A NP V NP {From} PP.source Recepient (subj) V Theme (obj) {From} Supplier (nmod=« from ») Example Frames Our Approach Gas is available from Refinery A NP V PP {From} NP Theme (subj) V Attribute (available) {From} Supplier (nmod = « from ») Relation Extraction
  • 15. Sentence Simplification • Del Corro L.and Rainer Gemulla (WWW-2013) : ClausIE: Clause-Based Open Information Extraction • Construct a clause for every subject dependency : – Clause: part of a sentence, that expresses some coherent piece of information – Consists of: Subject (S), Verb (V), Optionally: Indirect object (O), Direct object (O), Complement (C), one ore more adverbials (A) • Replace relative pronoun (e.g., who or which) of a relative clause by its antecedent (relcl depend.). • Generate clause from participial modifiers(ccomp, amod), which indicate reduced relative clauses. • Result : – “Arzew refinery processes Saharan Crude Oil.” – “Saharan Crude Oil is piped to it along the Haoudh El Hamra-Arzew Oil Pipeline.” Sentence Simplification
  • 16. Batch Solution architecture ODBC JDBC HTTP API Loading Structuring HDFS Refined data, aggregate Raw data, Serialized data Publication Datalab Historic data HBase YARN Stream Engineer reports Stream Stock exchange data Sentence Simplification SPARK Streaming Internal Applications Kibana API REST DWH Usage Tableau HUE KafkaStream Partners sources Loading Kafka Technical description Partners REST Elastic search Spark Relation Extraction NER ML SPARK Batch Layer Speed layer Contracts NER Prediction SPARK Streaming SPARK Streaming
  • 17. Streaming object reused Reuse Annotators are time costly (3-4 seconds to initialize) so we initiate them only once per executor
  • 18. Once processing & Fault tolerance Receiver : At least once processing Directstream : Exactly once processing
  • 19. TAKE AWAY Data Science NER • External libraries can be organically integrated in Spark (Spark Streaming) pipeline • Distributed Spark CRF implementation would be very welcome Relation Extraction • Pattern-based implementation provides a good initial solution • Next step : use it as training set for further bootstrap / ML Sentence Simplification • Very sensitive to changes in the Stanford Parser ALL 3 blocks • Share common Stanford NLP annotators.Dependency Parser is the most expensive. • Nevertheless,implementation thatreuses those in the tree blocks was found inefficient. • Implementation with three consecutive maps in preferred. Architecture & Tuning Architecure • Choose your kafka connector according to your existing monitoring tools • Yarn scheduler resources calculator should take account of RAM & CPU (yarn.scheduler.capacity.resource-calculator) • Do checkpoints Memory • Memory was a chokepoint • Turn kryo serialization verify that you have registerd all your class spark.kryo.registrationRequired • Use fastutils • Filter as much as possible Tuning • Tune your partitions according to data volume • Use coalesce instead of repartion if you are downgrading your partitions number • Prefer Dataframe to RDD for Batch processing
  • 20. And, of course, Capgemini is hiring
  • 21. THANK YOU. Yana Ponomarova : yana.ponomarova@capgemini.com @yponomarova Nicolas Claudon : nicolas.claudon@capgemini.com @nicolasclaudon