SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
DMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMD
 MMM8OOOOOOOOOOO8MMMM8OOOOOOOOOOOOOOODMMMMOOOOOOOOOOOOOOMMMN
DMMIIIIIIIIIIIII$MMMM$IIIIIIIIIIIIIIIOMMMM7III?IIIIIIIIII7MM
MMOIIIIIIIIIIIII7MMMMOIIIIIIIIIIIIIIIIMMMM8IIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMDIIIIIIIIIIIIIIIIDMMMM7IIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMM7IIIIIIIIIIIIIII?MMMMMIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMM8IIIIIIIIIIIIIIIIMMMMMOIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIOMMMMMIIIIIIIIIIIIIIIIIMMMMMMIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIIIIZMMMMMMIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIZMMMM8IIIIIIIIIIIIIIIII7MMMMMMOIIIIIIIIIMM
MM8$$IIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIII?III8MMMMMMMZIIIIIIMM
MMMMMMMMMMN87IIIIII8MMMMDIIIIIIIIIIIIIIIIIIIZMMMMMMMMMNZIIMM
MMMMMMMMMMMMMMMMMOII$MMMMMIIIIIIIIIIIIIIIIIIIIIMMMMMMMMMMMMM
MMMMMMMMMMMMMMMMMMMM8MMMMMZIIIIIIIIIIIIIIIIIIIIII8MMMMMMMMMM
MMOIIIIIIII7NMMMMMMMMMMMMMMMIIIIIIIIIIIIIIIIIIIIII?IIII$ODMM
MMOIIIIIIIIII?I8MMMMMMMMMMMMDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIZMMMMMMMMMM7II?IIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIII7NMMMMMMMMIIIIIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIDMMMMMMMMIIIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIIMMMMMMMMMIIIIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIII7MMMMMMMM8IIIIIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIIIII7MMMMMMMMM$IIIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIIIIIIOMMMMMMMMMM8I?IIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIIIIIIIIIIIIIII7MMMMMMMMMMMMMN7IIIIIIIIIIIMM
MMMMMMD$IIIIIIIIIIIIIIIIIIIIIIIIIMMMMMMMMMMMMMMMMDZIIIIIIIMM
MMMMMMMMMMMNIIIIIIIIIIIIIIIIIIIIIINMMMMDZNMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMMMM?IIIIIIIIIIIIIIIIIIIIMMMMMIII7NMMMMMMMMMMMMMMM
MMOIII7DMMMMMMMM$IIIIIIIIIIIIIIIIIIZMMMMNIIIIIIII7$DNMMMMMMM
MMOIIIIII7MMMMMMM7IIIIIIIIIIIIIIIIIIOMMMM8IIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIMMMMMMNIIIIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIMMMMMNIIIIIIIIIIIIIIIIIDMMMM7IIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIMMMMMMIIIIIIIIIIIIIIII7MMMMDIIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIMMMMMZIIIIIIIIIIIIIIIIOMMMM$IIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIIIIMMMM8IIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIMMMMMNIIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIMM
MMOIIIIIIIIIIIIIIOMMMMMIIIIIIIIIIIIIIIIMMMMMI??IIIIIIIIIIIMM
$MMIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIIIIIIMMMMMIIIIIIIIIIIII8MM
 MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
    777777777777777777777777777777777777777777777777I77

 MD$N?   MMM8MN   OMM8MZ     OMMMDM   MMDM+ MD~NO    M= MM     ZZMI
   +ZI   M7   M   O7    MO   OM   M   OMMMMM 8M7     M= MM    MN7
 MM?+I   M7   M   O7    MO   OM   M   O+      8M7    M~ MM      ZMI
 MDMMN   MMNMM7   OMMMM      OM   M   MMM8M  MD:N8   MMMMMM   MO8M
         M7       O7
         M7       O7
PYTHON IN
AN EVOLVING
ENTERPRISE SYSTEM
EVALUATING INTEGRATION
SOLUTIONS WITH HADOOP
DAVE HIMROD
STEVE KANNAN
ANGELICA PANDO
Building today’s most powerful,
open, and customizable advertising
technology platform.
Ad is served in
<100 milliseconds

                                                                       WINNING
                      AUCTION                                            BID
                      REQUEST
            300x250



                         AD      ADVERTISER 1   ADVERTISER 2   ADVERTISER 3
                      RESPONSE    BID: $2.50    BID: $3.25     BID: $4.10


                                       APPNEXUS OPTIMIZATION
Evolution of AppNexus

                20    350      430 PEOPLE
      FROM    100M    39B     45B AD REQUESTS


     5000+    MYSQL, HADOOP/HBASE, AEROSPIKE,
   SERVERS    NETEZZA, VERTICA


    38+ TB
   OF DATA EVERY DAY


    99.99%
 UPTIME
Evolution of AppNexus

    ENG OFFICES         ENGINEERING
    IN PORTLAND         HQ IN NYC
    & SF
Data-Driven Decisioning (D3)


                  Bidder
                   Bidder
                    Bidder
                    BIDDERS




         DATA                     D3
       PIPELINE               PROCESSING
Python at AppNexus
Python enables us to scale our team and rapidly
iterate and prototype technologies.
Hadoop at AppNexus

Hadoop enables us to   1PB
    CLUSTER

do aggregations for
reporting and other    862
    NODES ACROSS
                               SEVERAL CLUSTERS
data pipeline jobs
                       40B
    BILLION LOG
                               RECORDS DAILY

                               BILLION
                       5.6B
   LOG RECORDS/HOUR
                               AT PEAK
Data modeling today
 BIG DATA: TBS/HOUR                    MEDIUM DATA: GBS/HOUR


   Task
    Task
     Task
      Task
        logs
         logs                                              CACHE
          logs               VERTICA
           logs

      HADOOP
                      Σ
                    DATA                     DATA DRIVEN
                  SERVICES                   DECISIONING
To enable the next
generation of data modeling,
we need to leverage our
Hadoop cluster
What are we trying to do
Access the data on Hadoop
Continue to use Python to model
à No consensus on the best solution




So we conducted our own research
to evaluate integration options
The budget problem
We have thousands of bidders buying billions
of ads per hour in real-time auctions.
We need to create a model that can manipulate
how our bidders spend their budgets and
purchase ads.
Data modeling today
 BIG DATA: TBS/HOUR                MEDIUM DATA: GBS/HOUR



   Task
    Task
     Task
      Task
        logs
         logs                                           CACHE
          logs    DATA DRIVEN
                         VERTICA
           logs   DECISIONING

      HADOOP
                      Σ
                    DATA                  DATA DRIVEN
                  SERVICES                DECISIONING
Test problem:
Budget aggregation
SCENARIO:
Each auction creates a row in a log.

 timestamp, auction_id, object_type, object_id, method, value


We need to aggregate and model to update
bidders.
Method:
Budget aggregation
STEP 1: De-duplicate records where
KEY: object_type, object_id, method, auction_id

STEP 2: Aggregate value where
KEY: object_type, object_id, method
HARDWARE
•  300 GB of log data
•  5 nodes running Scientific Linux 6.3 (Carbon)
   •  Intel Xeon CPU @ 2.13 GHz, 4 cores
 •   2 TB Disk
•  CDH4
•  45 map, 35 reduce tasks at a time
Research: Potential solutions
1.   Native Java
2.   Streaming ‒ no framework
3.   mrjob
4.   Happy / Jython / PyCascading
5.  Pig + Jython UDF
6.   Pydoop   prohibitive installation
7.   Disco    evaluating Hadoop
8.  Hadoopy / dumbo similar to mrjob
9.   Hipy Effectively ORM for Hive
Research: Criteria
1. Usability
2. Performance
3. Versatility / Flexibility
Research: Native Java

Benchmark for comparison, using new Hadoop Java API

BudgetAgg.java Mapper class




BudgetAgg.java Reducer class
Research: Native Java
USABILITY:
 ›  Not   straightforward for analysts to implement, launch, or tweak



PERFORMANCE:
 ›  Fastest implementation.
 ›  Can further enhance by overriding   comparators for grouping and
   sorting
Research: Native Java


VERSATILITY / FLEXIBILITY:
 ›  Abilityto customize pretty
   much everything
 ›  CustomPartitioner,
   Comparator, Grouping
   Comparator in our
   implementation
 ›  Canuse complex objects as
   keys or values
Research: Streaming
Supplies an executable to Hadoop that reads from stdin
and writes to stdout
mapper.py   
                 reducer.py
Research: Streaming
USABILITY:
 ›  Key/value detection has to      be done by the user
 ›  Still, straightforward for      relatively simple jobs


  hadoop jar /usr/lib/hadoop-0.23.0-mr1-cdh4b1/contrib/streaming/hadoop-*streaming*.jar 
  -D stream.num.map.output.key.fields=4 
  -D num.key.fields.for.partition=3 
  -D mapred.reduce.tasks=35 
  -file mapper.py 
  -mapper mapper.py 
  -file reducer.py 
  -reducer reducer_nongroup.py 
  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner 
   -input /logs/log_budget/v002/2013/03/06/19/
   -output bidder_logs/streaming_output
Research: Streaming
PERFORMANCE:
 ›  ~50%   slower than Java



VERSATILITY / FLEXIBILITY:
 ›  Inputs in reducer are iterated line-by-line
 ›  Straightforward to get de-duplication and agg   to work in a single
  step
Research: mrjob
Open-source Python framework that wraps Hadoop Streaming


USABILITY:
 ›  “Simplified   Java”
 ›  Great docs,   actively developed

python budget_agg.py -r hadoop --hadoop-bin /usr/bin/hadoop 
--jobconf stream.num.map.output.key.fields=4 
--jobconf num.key.fields.for.partition=3 
--jobconf mapred.reduce.tasks=35 
--partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner 
-o   hdfs:///user/apando/budget_logs/mrjob_output 
hdfs:///logs/log_budget/v002/2013/03/06/19/
Research: mrjob
PERFORMANCE:
 ›  Not   much slower than Streaming if only using RawValueProtocol
Research: mrjob
PERFORMANCE:
 ›  Involvingobjects or
   multiple steps slow it
   down a lot



VERSATILITY /
FLEXIBILITY:
 ›  Candefine Input /
   Internal / Output
   protocols
Research: Happy / Jython
HAPPY:
 ›  Full access to Java MapReduce   API
 ›  Happy project is deprecated
    ›  Depends on Hadoop 0.17


JYTHON:
 ›  Doesn’t work easily out of the box
    ›  Relies on deprecated Jython compiler in Jython   2.2
 ›  Limited to Jython implementation of Python
    ›  Numpy/SciPy and Pandas unavailable
Research: PyCascading
Python wrapper around Cascading framework for data
processing workflow.
Uses Jython as high level language for defining
workflows.
Research: PyCascading
USABILITY:
 ›  Relatively new project
 ›  Cascading API is simple and intuitive
 ›  Job Planner abstracts details of MapReduce


PERFORMANCE:
 ›  Abstraction makes performance tuning   challenging
 ›  Does not support Combiner operation
 ›  Dev time was fast, runtime was slow
Research: PyCascading
VERSATILITY / FLEXIBILITY:
 ›  Allows Jython UDFs
 ›  Rich set of built-in   functions: GroupBy, Join, Merge
Research: Pig
Provides a high-level language for data analysis
which is compiled into a sequence of MapReduce
operations.

USABILITY:
Research: Pig
USABILITY:
 ›  Powerful   debugging and optimization tools (e.g. explain, illustrate)




 ›  Automatically optimizes MapReduce operations:
    ›  Applies Combiner operations where applicable
    ›  Reorders and conflates data flow for efficiency
Research: Pig
PERFORMANCE:
 ›  Pig compiler produces performant code
 ›  Complex operations might require manual optimization
 ›  Budget Aggregation require the implementation of a User
                                                          Defined
  Function in Jython to eliminate unnecessary MapReduce step
Research: Pig
VERSATILITY / FLEXIBILITY:
USING PIG + JYTHON UDF
 ›  PigLatin
           is expressive and can
  capture most use cases
 ›  Define
         custom data operations
  in Jython called UDFs
 ›  UDFs
       can implement custom
  loaders, partitioners, and
  other advanced features
Research: Summary
             Running Time / Lines of Code for Implementations

           Pig




   PyCascading




        MRJob
                                                                                    Lines of Code
                                                                                    Running Time

     Streaming




          Java



                  0
   50
       100
         150
        200
        250
   300
                             Running Time (minutes), Lines of Code
Research: Recommendations

•  Pig and PyCascading enable complex
   pipelines to be expressed simply
•  Pig is more mature and the most viable
   option for ad-hoc analysis
??????? ??:::::::?? ??:::::::::::? ?:::::????:::::? ?::::? ?::::? ?::::? ?::::? ?????? ?::::? ?::::? ?::::? ?::::? ?::::? ?::::? ?::::? ??::?? ???? ??? ??:?? ???




                                                                                                               ???????
                                                                                                             ??:::::::??
                                                                                                           ??:::::::::::?
                                                                                                          ?:::::????:::::?
                                                                                                          ?::::?     ?::::?
                                                                                                          ?::::?       ?::::?
                                                                                                          ??????       ?::::?
      QUESTIONS                                                                                                      ?::::?
                                                                                                                    ?::::?
                                                                                                                   ?::::?
                                                                                                                  ?::::?
                                                                                                                 ?::::?
                                                                                                                 ?::::?
     pydata@appnexus.com                                                                                         ??::??
                                                                                                                  ????
                                                                                                                             ???
                                                                                                                            ??:??
                                                                                                                             ???

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
ijcsit
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
November 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorNovember 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity Calculator
Yahoo Developer Network
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
Supriya .
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 

Was ist angesagt? (19)

Hadoop
HadoopHadoop
Hadoop
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 
November 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorNovember 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity Calculator
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 

Ähnlich wie Python in an Evolving Enterprise System (PyData SV 2013)

Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
Nathan Bijnens
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
yhadoop
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 

Ähnlich wie Python in an Evolving Enterprise System (PyData SV 2013) (20)

Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 
Big data-denis-rothman
Big data-denis-rothmanBig data-denis-rothman
Big data-denis-rothman
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Apache Big Data Europa- How to make money with your own data
Apache Big Data Europa- How to make money with your own dataApache Big Data Europa- How to make money with your own data
Apache Big Data Europa- How to make money with your own data
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 

Mehr von PyData

Mehr von PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Python in an Evolving Enterprise System (PyData SV 2013)