SlideShare a Scribd company logo
1 of 24
LEBONCOIN DE LA DATA
Stéphanie Baltus – Responsable Data Engineering- @steph_baltus
Meetup Duchess France @ TheFamily – 01/19/2016
■ About leboncoin
■ Data, data everywhere !
■ To infinity and beyond …
2
PLAN
ABOUT LEBONCOIN
4
LEBONCOIN...AND FRIENDS
5
■ A Schibsted Media Group company
■ Since 2006
■ 320+ people
■ Located in Paris, Montceau-Les-Mines, Reims
■ 2014 Revenue: 150+M€
6
IN A FEW WORDS
7
NOT JUST A WEBSITE
■ Classified ads :
■ Professional
■ Personal
■ Premium offer :
■ Highlight products
■ Ad import tools
■ Ad display
8
NOT JUST A CLASSIFIEDADS COMPANY
DATA, DATA EVERYWHERE
■ Building a team
■ Provide daily batch DWH
■ Website traffic (sort of)
■ Ad activity & validation
■ Sales & Coin usage
■ User information
■ Support
■ Try near-real time processing
10
A BIT OF STORY
11
SO, WE DID SOME BI STUFF (2012-2015)
12
IT LOOKS LIKE THIS
■ A lot of uncovered scope
■ Incremental load only
■ Unablity to load historical data, stuck from 2013 to today
■ A business team unable to query the database
■ A lot of « no! » when asking for evolution
■ Vertical scalability only
■ No potential sharing policy with the product (website, app, CRM, …)
13
IT WORKS ! BUT …
TO INFINITYAND BEYOND!
■ Share data services with the website, apps
■ Build a unique source of truth
■ Provide raw data to our analysts
■ Provide real time data
■ Cover all the data scope of leboncoin
15
THE FUTURE
16
FUNCTIONALARCHITECTURE
17
DATAARCHITECTURE : DUMBO STYLE
18
ONE STACK TO RULE THEM ALL
■ Centralized data cleaning / streamlining
■ Extended analytics apps
■ Ads and customers indexes
■ Import ad web service
■ Datalake indexing through bloomfilter
■ Anomaly detection
19
SOME IMPLEMENTATIONS
■ Goal : help the SysAdmin Team to catch bots crawling our website and apps to steal
our ads or people’s phone numbers => Anomaly detection
■ How :
■ Use http logs (150Go per day)
■ Build KPIs and vectors
■ Apply a logistic regression to identify suspicious session
■ Next steps :
■ Test K-Means algorithm
20
CATCH’EMALL !
■ Data unified view
■ Home built data extractor + Spark MDM jobs
■ Build a next generation BI app
■ Spark ETL+ Redshift
■ Share built information with other apps
■ Spark ETL+ ES + Kafka
21
DIVE INTO DATA SHARING
22
NOW IT LOOKS LIKE THIS
■ Being production ready
■ New app, new services
■ More machine learning oriented apps
■ Feeding the website
■ Recruiting 
23
WHAT’S NEXT ?
QUESTIONS ?

More Related Content

Similar to Meetup duchess 20160119 - Leboncoin de la data

Turning Digital Performance into Competitive Advantage
Turning Digital Performance into Competitive AdvantageTurning Digital Performance into Competitive Advantage
Turning Digital Performance into Competitive Advantage
Jennifer Finney
 

Similar to Meetup duchess 20160119 - Leboncoin de la data (20)

Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
 
Turning Digital Performance into Competitive Advantage
Turning Digital Performance into Competitive AdvantageTurning Digital Performance into Competitive Advantage
Turning Digital Performance into Competitive Advantage
 
How Financial Services Firms are Using Digital to Improve the Customer Experi...
How Financial Services Firms are Using Digital to Improve the Customer Experi...How Financial Services Firms are Using Digital to Improve the Customer Experi...
How Financial Services Firms are Using Digital to Improve the Customer Experi...
 
eCommerce. How digital is transforming retail
eCommerce. How digital is transforming retaileCommerce. How digital is transforming retail
eCommerce. How digital is transforming retail
 
Portalfk SIPA Munich
Portalfk SIPA MunichPortalfk SIPA Munich
Portalfk SIPA Munich
 
Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...
 
Data Bootcamp by Fabernovel and Squid Solutions
Data Bootcamp by Fabernovel and Squid SolutionsData Bootcamp by Fabernovel and Squid Solutions
Data Bootcamp by Fabernovel and Squid Solutions
 
Office Depot: Equipping the Business to Drive Growth
Office Depot: Equipping the Business to Drive GrowthOffice Depot: Equipping the Business to Drive Growth
Office Depot: Equipping the Business to Drive Growth
 
Acando Seminar Best of ignite 2016
Acando Seminar Best of ignite 2016Acando Seminar Best of ignite 2016
Acando Seminar Best of ignite 2016
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Artem Makarov, Business Development Russia, Trademob
Artem Makarov, Business Development Russia, TrademobArtem Makarov, Business Development Russia, Trademob
Artem Makarov, Business Development Russia, Trademob
 
Artificial Intelligence in an ABM World
Artificial Intelligence in an ABM WorldArtificial Intelligence in an ABM World
Artificial Intelligence in an ABM World
 
DPM Overview Soasta Partners.pptx
DPM Overview Soasta Partners.pptxDPM Overview Soasta Partners.pptx
DPM Overview Soasta Partners.pptx
 
Data Dunk with Insight - Twin Cities Eloqua User Group September 30, 2014
Data Dunk with Insight - Twin Cities Eloqua User Group September 30, 2014Data Dunk with Insight - Twin Cities Eloqua User Group September 30, 2014
Data Dunk with Insight - Twin Cities Eloqua User Group September 30, 2014
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
 
BoostIT - StartUp
BoostIT - StartUpBoostIT - StartUp
BoostIT - StartUp
 
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
 
E-commerce platforms - Benchmark by EBG Berlin 2019
E-commerce platforms - Benchmark by EBG Berlin 2019 E-commerce platforms - Benchmark by EBG Berlin 2019
E-commerce platforms - Benchmark by EBG Berlin 2019
 
DesignersX Corporate Deck for Strategic Partnership - Web, Mobile, eCommerce ...
DesignersX Corporate Deck for Strategic Partnership - Web, Mobile, eCommerce ...DesignersX Corporate Deck for Strategic Partnership - Web, Mobile, eCommerce ...
DesignersX Corporate Deck for Strategic Partnership - Web, Mobile, eCommerce ...
 
Top 15 Business Intelligence (BI) Software
Top 15 Business Intelligence (BI) SoftwareTop 15 Business Intelligence (BI) Software
Top 15 Business Intelligence (BI) Software
 

Recently uploaded

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 

Recently uploaded (20)

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

Meetup duchess 20160119 - Leboncoin de la data

  • 1. LEBONCOIN DE LA DATA Stéphanie Baltus – Responsable Data Engineering- @steph_baltus Meetup Duchess France @ TheFamily – 01/19/2016
  • 2. ■ About leboncoin ■ Data, data everywhere ! ■ To infinity and beyond … 2 PLAN
  • 5. 5
  • 6. ■ A Schibsted Media Group company ■ Since 2006 ■ 320+ people ■ Located in Paris, Montceau-Les-Mines, Reims ■ 2014 Revenue: 150+M€ 6 IN A FEW WORDS
  • 7. 7 NOT JUST A WEBSITE
  • 8. ■ Classified ads : ■ Professional ■ Personal ■ Premium offer : ■ Highlight products ■ Ad import tools ■ Ad display 8 NOT JUST A CLASSIFIEDADS COMPANY
  • 10. ■ Building a team ■ Provide daily batch DWH ■ Website traffic (sort of) ■ Ad activity & validation ■ Sales & Coin usage ■ User information ■ Support ■ Try near-real time processing 10 A BIT OF STORY
  • 11. 11 SO, WE DID SOME BI STUFF (2012-2015)
  • 13. ■ A lot of uncovered scope ■ Incremental load only ■ Unablity to load historical data, stuck from 2013 to today ■ A business team unable to query the database ■ A lot of « no! » when asking for evolution ■ Vertical scalability only ■ No potential sharing policy with the product (website, app, CRM, …) 13 IT WORKS ! BUT …
  • 15. ■ Share data services with the website, apps ■ Build a unique source of truth ■ Provide raw data to our analysts ■ Provide real time data ■ Cover all the data scope of leboncoin 15 THE FUTURE
  • 18. 18 ONE STACK TO RULE THEM ALL
  • 19. ■ Centralized data cleaning / streamlining ■ Extended analytics apps ■ Ads and customers indexes ■ Import ad web service ■ Datalake indexing through bloomfilter ■ Anomaly detection 19 SOME IMPLEMENTATIONS
  • 20. ■ Goal : help the SysAdmin Team to catch bots crawling our website and apps to steal our ads or people’s phone numbers => Anomaly detection ■ How : ■ Use http logs (150Go per day) ■ Build KPIs and vectors ■ Apply a logistic regression to identify suspicious session ■ Next steps : ■ Test K-Means algorithm 20 CATCH’EMALL !
  • 21. ■ Data unified view ■ Home built data extractor + Spark MDM jobs ■ Build a next generation BI app ■ Spark ETL+ Redshift ■ Share built information with other apps ■ Spark ETL+ ES + Kafka 21 DIVE INTO DATA SHARING
  • 22. 22 NOW IT LOOKS LIKE THIS
  • 23. ■ Being production ready ■ New app, new services ■ More machine learning oriented apps ■ Feeding the website ■ Recruiting  23 WHAT’S NEXT ?

Editor's Notes

  1. From Mexico to Malaysia, from Brazil to Norway – millions of people interact with Schibsted companies every day. We’re meeting our customers’ needs with our ever expanding range of smart products and services. Schibsted is increasingly international, and we’re moving forward. Fast. Through all this diversity, we provide similar solutions to make everyday life for millions of people a little bit easier, a little bit better. In doing this we are committed to always try to innovate and deliver new, even smarter services that will meet the needs of people today and tomorrow around the world.
  2. Un petit peu d’histoire. Au delà du fait que leboncoin fait partie d’un groupe. Leboncoin est né en 2006 d’une joint-venture entre Schibsted et Spir (le groupe qui détient 20 minutes)
  3. Leboncoin, est principalement connu pour le site web, mais co
  4. A l’origine, il y a eu le site, puis les équipes produit ont eu besoin de stats. Alors, des stats caclulées en batch par l’équipe Core, envoyées en fichiers texte par mail Des batchs qui pouvaient durer des jours, des nuits, et qui leur consommaient pas mal de temps… Et nous sommes d’accord ce n’est pas leur boulot.
  5. 1) Parc appli LBC (WWW, Mobile, CP, CRM, OTRS) 2) Extractions BATCH des données brutes requises + Stockage BDD travail => Pas solliciter les systèmes sources 3) Données brutes à nettoyer/rafiner/croiser => ETL 4) Données hydratées stockées dans un datawarehouse. BDD avec une modélisation dite dimensionelle & un stockage colonnes qui permettent de grosse perf en aggrégation. => analyses Niveau techno : PSQL, PDI, MonetDB Exploitation de pas mal de fonctinnalités de Postgres qui nous ont permis de repondre à des besoin difficilement réalisable uniquement avec les fonction relationnelle : hll et les ranges (je pourrais vous détailler pourquoi en dehors de la présentation) Pour une idée de volumétrie, on stocke a peu près 6 To de données dans les bases de travail, En terme d’infra, on n’était plutot bien lotis, personnellement, je n’avais pas eu de telles machines dans mes missions précédentes. Entre les serveurs de BDD et d’ETL : environ 300 Go de RAM, 10 To Tout cela pour dire que nous avons mis toutes les chances de notre côtés pour répondre aux besoins des analystes et chef de produit.
  6. Malgré toutes cette bonne volonté, 1) Rétention de données transformées. Analyses mais pas de mise à disposition directe coté produit (sens large). 2) Bon outils mais scalabilité verticale uniquement => compléxité persistence de certaines infos + compléxité perf Du coup on a commencé à voir plus grand et à réflichir à une architecture dite "bigdata".
  7. Fort de nos constats, on a redefinit la mission de l’équipe, les technologies big data pouvant servir de levier à l’accroissement de notre périmètre.
  8.  Fonctionnellement çà consiste en quoi ? 1) Toujours extract Batch mais plutot qu'une BDD : stockage fichier extensible cloud + ensemble des data au format brute => Datalake 2) De même toujours netoyer/rafiner/croiser nos données => ETL mais en capable de distribuer ses traitements sur un cluster scalable 3) Idem toujours DWH avec modélisation dimensionelle & stockage colonne mais sur une base SQL distribuée => A ce stade on a "juste" adresser le problème de scalabilité de notre archi BI, reste celui du feed back 4) Contrainte inhérente à l'échange d'info inter-applicative => système de communication "temps réel". Bon gout de fonctionner dans les deux sens => ingestion temps réel + feedback de données hydratées en streaming. Doit aussi être scalable & robuste. 5) Si le streaming convient bien aux besoins de syncrhonisation et d'alerting il est peu adapté à la recherche de données spécifiques. => On expose donc des services de recherche pour adresser ces besoins Doivent etre scalable et robustes 6) Enfin on ne veut pas se contenter de mettre à disposition de la donnée recyclée (fut elle rafinée), on veut aussi créer de nouveaux services et produits depuis celles-ci. Machine Learning Nombreux de champs d'application : détection d'anomalies, lutte fraude, suggestion de contenus, détection des intentionistes d'achat, ... Une fois cette archi posée reste à faire le choix de l'implémentation concrète.
  9. Lorsqu'on a commencé à réfléchir à cette question l'état de l'art ressemblait à çà. Stack Hadoop pour le stockage & le batch + Storm & kafka pour les traitements temps réel. Répondait aux besoins fonctionnels mais entrait conflit avec certains de nos choix de conception. 1) 6 mois de veille        => Dans le monde du "BigData" les choses évoluent très rapidement               => paru critique d'assurer une certaine agilité, etre capable de switcher d'une techno à une autre à moindre coup Or une archi qui repose sur hadoop introduit un fort couplage entre ses composants. 2) 2nd problème :  Ecosystème Hadoop = 20aine de projets Apache => Hétérogénéité des outils (Pig, hiveql, java, scala, ...) => Difficile à rationaliser/déployer/maintenir  => Redondance fonctionelle entre projets  + Périnité difficile à anticiper => Rend complexe et critique les choix d'archi
  10. => On a donc éssayé de garder l'éléphant jaune le + loin possible. On a abouti au résultat suivant : S3 => stockage élastique et distribué Redshift => Base DWH (=> consistence groupe) Kafka => StreamingES=> Services de recherches Cassandra => Pour la persistence coté ML Spark => ETL bacth & Temps réel + Machine learning (uniformité du code) Au final on abouti a une archi modulaire, basée sur des briques a priori pérennes mais dont on peu sortir à moindre coup. Ex : Redshit --> Vertica ou S3 --> HDFS (1-2 semaines taf) Implémentation commencée en Mai. Peinture loiiiiiin d'etre sèche mais premiers retours d'exp =>Transition Nico. Quelques applications concrete à cette architecture
  11. data = pas que de la données blocket logs audience, http aussi Objectif => aider les sysadmins à épingler les vils concurrents qui nous volent nos annonces Récemment lancé un projet d'apprentissage des comportements utilisateurs Sysadmins identifient le gros via elastic search, mais difficile pour eux d'identifier les sessions et leur activité dans la durée