SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Introduction to Augustus OVERVIEW Open Data Group September 17, 2009
Website and Community Augustus is an open source scoring engine for statistical and data mining models based on the Predictive Model Markup Language (PMML). It is written in Python and is freely available. http://augustus.googlecode.com
 
Getting Augustus ,[object Object],[object Object],[object Object],[object Object]
 
Source ,[object Object],[object Object],[object Object]
 
 
 
Documentation and Community ,[object Object],[object Object],[object Object],[object Object]
 
 
Using Augustus ,[object Object],[object Object],[object Object]
Development and Use Cycle ,[object Object],[object Object],[object Object],[object Object],[object Object]
Development and Use Cycle 2. Model schema 1. Data Inputs
Running Augustus 3. Obtain new model with Producer 4. Score with Consumer
Work Flows ,[object Object],[object Object]
Components ,[object Object],[object Object],[object Object],[object Object]
Producers and Consumers ,[object Object],[object Object],[object Object]
Post Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Segments ,[object Object],[object Object],[object Object],[object Object]
Result of Scoring
Case Study: Auto ,[object Object],[object Object],[object Object],[object Object],[object Object]
Work Flow Overview
Auto: Weighted Batch Using the Baseline for Training: $ cd WeightedBatch `-- scripts |-- consume.py |-- postprocess.py `-- produce.py http://code.google.com/p/augustus/source/browse/#svn/trunk/examples/auto/WeightedBatch
Input for the Producer The Producer takes the training data set.  In the code, we have declared how we want to test the data import augustus.modellib.baseline.producer.Producer as Producer def makeConfigs(inFile, outFile, inPMML, outPMML): #open data file inf = uni.UniTable().fromfile(inFile) #start the configuration file   test = ET.SubElement(root, "test") test.set("field", "Automaker") test.set("weightField", "Count") test.set("testStatistic", "dDist") test.set("testType", "threshold") test.set("threshold", "0.475")
Input for the Producer Continued # use a discrete distribution model for test baseline = ET.SubElement(test, "baseline") baseline.set("dist", "discrete") baseline.set("file", str(inFile)) baseline.set("type", "UniTable") # create the segmentation declarations for the two fields at this level ''' Taken out for the example, other Use Cases will focus on Segments segmentation = ET.SubElement(test, "segmentation") makeSegment(inf, segmentation, "Color") ''' #output the configuration file tree = ET.ElementTree(root) tree.write(outFile)
Running the Producer( Training) $ cd scripts $ python2.5 produce.py -f wtraining.nab -t20 (0.000 secs)  Beginning timing (0.000 secs)  Creating configuration file (0.001 secs)  Creating input PMML file (0.001 secs)  Starting producer (0.000 secs)  Inputting configurations (0.001 secs)  Inputting model (0.008 secs)  Collecting stats for baseline distribution (0.011 secs)  Events 20.067% processed (0.009 secs)  Events 40.134% processed (0.009 secs)  Events 60.201% processed (0.009 secs)  Events 80.268% processed (0.009 secs)  Events 100.000% processed (0.000 secs)  Making test distributions from statistics (0.002 secs)  Outputting PMML (0.062 secs)  Lifetime of timer
Model generated by the Producer <PMML version=&quot;3.1&quot;> <Header copyright=&quot; &quot; /> < DataDictionary > < DataField  dataType=&quot;string&quot; name=&quot;Automaker&quot; optype=&quot;categorical&quot; /> < DataField  dataType=&quot;string&quot; name=&quot;Color&quot; optype=&quot;categorical&quot; /> < DataField  dataType=&quot;float&quot; name=&quot;Count&quot; optype=&quot;continuous&quot; /> </ DataDictionary > < BaselineModel  functionName=&quot;baseline&quot;> < MiningSchema > < MiningField  name=&quot;Automaker&quot; /> < MiningField  name=&quot;Color&quot; /> < MiningField  name=&quot;Count&quot; /> </ MiningSchema > </ BaselineModel > </PMML>
Model generated by the Producer (Cont) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Producer Output The training step used the code in producer.py to generate a model and get expected results.  Training generated the following files: . |-- consumer |  `-- wtraining.nab.pmml  MODEL WITH EXPECTED VALUES BASED ON THE TRAINING DATA `-- producer |-- wtraining.nab.pmml  BASELINE DATA, DATA DICTIONARY, MINING SCHEMA `-- wtraining.nab.xml  MODEL FILE USED FOR TRAINING
Training XML ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Unitable ,[object Object],[object Object],[object Object],[object Object]
Running the Consumer cd script $ python2.5 consume.py -b wtraining.nab -f wscoring.nab Ready to score . |-- consumer |  |-- wscoring.nab.wtraining.nab.xml |  `-- wtraining.nab.pmml |-- postprocess |  `-- wscoring.nab.wtraining.nab.xml `-- producer |-- wtraining.nab.pmml `-- wtraining.nab.xml This examples generates a report in the post process directory.
Consumer (Scoring) output $ cat consumer/wscoring.nab.wtraining.nab.xml <pmmlDeployment> <inputData> <readOnce /> <batchScoring /> <fromFile name=&quot;../data/wscoring.nab&quot; type=&quot;UniTable&quot; /> </inputData> <inputModel> <fromFile name=&quot;../consumer/wtraining.nab.pmml&quot; /> </inputModel> <output> <report name=&quot;report&quot;> <toFile name=&quot;../postprocess/wscoring.nab.wtraining.nab.xml&quot; /> <outputRow name=&quot;event&quot;> <score name=&quot;score&quot; /> <alert name=&quot;alert&quot; /> <segments name=&quot;segments&quot; /> </outputRow> </report> </output> </pmmlDeployment>
Scoring Report $ cat postprocess/ wscoring.nab.wtraining.nab.xml <report> < event > < score >0.471458430077</ score > < alert >True</ alert > < Segments ></ Segments > </ event > </report>
Unitable ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Key Features of Unitable ,[object Object],[object Object],[object Object],[object Object],[object Object]
Key Features of Unitable (cont) ,[object Object],[object Object],[object Object]
For more information ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Ähnlich wie Augustus Overview Open Source Analytics

Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)AWS User Group Pune
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningParis Data Engineers !
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
ACDKOCHI19 - Demystifying amazon sagemaker
ACDKOCHI19 - Demystifying amazon sagemakerACDKOCHI19 - Demystifying amazon sagemaker
ACDKOCHI19 - Demystifying amazon sagemakerAWS User Group Kochi
 
Build, Train & Deploy Your ML Application on Amazon SageMaker
Build, Train & Deploy Your ML Application on Amazon SageMakerBuild, Train & Deploy Your ML Application on Amazon SageMaker
Build, Train & Deploy Your ML Application on Amazon SageMakerAmazon Web Services
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019Mark Tabladillo
 
VSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic WorkflowsVSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic WorkflowsBigML, Inc
 
Spring batch for large enterprises operations
Spring batch for large enterprises operations Spring batch for large enterprises operations
Spring batch for large enterprises operations Ignasi González
 
RPG Program for Unit Testing RPG
RPG Program for Unit Testing RPG RPG Program for Unit Testing RPG
RPG Program for Unit Testing RPG Greg.Helton
 
Developing Drizzle Replication Plugins
Developing Drizzle Replication PluginsDeveloping Drizzle Replication Plugins
Developing Drizzle Replication PluginsPadraig O'Sullivan
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
Measurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNetMeasurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNetVasyl Senko
 

Ähnlich wie Augustus Overview Open Source Analytics (20)

Test automation process
Test automation processTest automation process
Test automation process
 
Test automation process _ QTP
Test automation process _ QTPTest automation process _ QTP
Test automation process _ QTP
 
Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)Demystifying Amazon Sagemaker (ACD Kochi)
Demystifying Amazon Sagemaker (ACD Kochi)
 
QSpiders - Installation and Brief Dose of Load Runner
QSpiders - Installation and Brief Dose of Load RunnerQSpiders - Installation and Brief Dose of Load Runner
QSpiders - Installation and Brief Dose of Load Runner
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learning
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
ACDKOCHI19 - Demystifying amazon sagemaker
ACDKOCHI19 - Demystifying amazon sagemakerACDKOCHI19 - Demystifying amazon sagemaker
ACDKOCHI19 - Demystifying amazon sagemaker
 
Build, Train & Deploy Your ML Application on Amazon SageMaker
Build, Train & Deploy Your ML Application on Amazon SageMakerBuild, Train & Deploy Your ML Application on Amazon SageMaker
Build, Train & Deploy Your ML Application on Amazon SageMaker
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
VSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic WorkflowsVSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic Workflows
 
Spring batch for large enterprises operations
Spring batch for large enterprises operations Spring batch for large enterprises operations
Spring batch for large enterprises operations
 
About QTP 9.2
About QTP 9.2About QTP 9.2
About QTP 9.2
 
About Qtp_1 92
About Qtp_1 92About Qtp_1 92
About Qtp_1 92
 
About Qtp 92
About Qtp 92About Qtp 92
About Qtp 92
 
RPG Program for Unit Testing RPG
RPG Program for Unit Testing RPG RPG Program for Unit Testing RPG
RPG Program for Unit Testing RPG
 
Developing Drizzle Replication Plugins
Developing Drizzle Replication PluginsDeveloping Drizzle Replication Plugins
Developing Drizzle Replication Plugins
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
Measurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNetMeasurement .Net Performance with BenchmarkDotNet
Measurement .Net Performance with BenchmarkDotNet
 

Kürzlich hochgeladen

Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Tina Ji
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 
Best Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in IndiaBest Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in IndiaShree Krishna Exports
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 

Kürzlich hochgeladen (20)

Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 
Best Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in IndiaBest Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in India
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 

Augustus Overview Open Source Analytics

  • 1. Introduction to Augustus OVERVIEW Open Data Group September 17, 2009
  • 2. Website and Community Augustus is an open source scoring engine for statistical and data mining models based on the Predictive Model Markup Language (PMML). It is written in Python and is freely available. http://augustus.googlecode.com
  • 3.  
  • 4.
  • 5.  
  • 6.
  • 7.  
  • 8.  
  • 9.  
  • 10.
  • 11.  
  • 12.  
  • 13.
  • 14.
  • 15. Development and Use Cycle 2. Model schema 1. Data Inputs
  • 16. Running Augustus 3. Obtain new model with Producer 4. Score with Consumer
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 23.
  • 25. Auto: Weighted Batch Using the Baseline for Training: $ cd WeightedBatch `-- scripts |-- consume.py |-- postprocess.py `-- produce.py http://code.google.com/p/augustus/source/browse/#svn/trunk/examples/auto/WeightedBatch
  • 26. Input for the Producer The Producer takes the training data set. In the code, we have declared how we want to test the data import augustus.modellib.baseline.producer.Producer as Producer def makeConfigs(inFile, outFile, inPMML, outPMML): #open data file inf = uni.UniTable().fromfile(inFile) #start the configuration file test = ET.SubElement(root, &quot;test&quot;) test.set(&quot;field&quot;, &quot;Automaker&quot;) test.set(&quot;weightField&quot;, &quot;Count&quot;) test.set(&quot;testStatistic&quot;, &quot;dDist&quot;) test.set(&quot;testType&quot;, &quot;threshold&quot;) test.set(&quot;threshold&quot;, &quot;0.475&quot;)
  • 27. Input for the Producer Continued # use a discrete distribution model for test baseline = ET.SubElement(test, &quot;baseline&quot;) baseline.set(&quot;dist&quot;, &quot;discrete&quot;) baseline.set(&quot;file&quot;, str(inFile)) baseline.set(&quot;type&quot;, &quot;UniTable&quot;) # create the segmentation declarations for the two fields at this level ''' Taken out for the example, other Use Cases will focus on Segments segmentation = ET.SubElement(test, &quot;segmentation&quot;) makeSegment(inf, segmentation, &quot;Color&quot;) ''' #output the configuration file tree = ET.ElementTree(root) tree.write(outFile)
  • 28. Running the Producer( Training) $ cd scripts $ python2.5 produce.py -f wtraining.nab -t20 (0.000 secs) Beginning timing (0.000 secs) Creating configuration file (0.001 secs) Creating input PMML file (0.001 secs) Starting producer (0.000 secs) Inputting configurations (0.001 secs) Inputting model (0.008 secs) Collecting stats for baseline distribution (0.011 secs) Events 20.067% processed (0.009 secs) Events 40.134% processed (0.009 secs) Events 60.201% processed (0.009 secs) Events 80.268% processed (0.009 secs) Events 100.000% processed (0.000 secs) Making test distributions from statistics (0.002 secs) Outputting PMML (0.062 secs) Lifetime of timer
  • 29. Model generated by the Producer <PMML version=&quot;3.1&quot;> <Header copyright=&quot; &quot; /> < DataDictionary > < DataField dataType=&quot;string&quot; name=&quot;Automaker&quot; optype=&quot;categorical&quot; /> < DataField dataType=&quot;string&quot; name=&quot;Color&quot; optype=&quot;categorical&quot; /> < DataField dataType=&quot;float&quot; name=&quot;Count&quot; optype=&quot;continuous&quot; /> </ DataDictionary > < BaselineModel functionName=&quot;baseline&quot;> < MiningSchema > < MiningField name=&quot;Automaker&quot; /> < MiningField name=&quot;Color&quot; /> < MiningField name=&quot;Count&quot; /> </ MiningSchema > </ BaselineModel > </PMML>
  • 30.
  • 31. Producer Output The training step used the code in producer.py to generate a model and get expected results. Training generated the following files: . |-- consumer | `-- wtraining.nab.pmml MODEL WITH EXPECTED VALUES BASED ON THE TRAINING DATA `-- producer |-- wtraining.nab.pmml BASELINE DATA, DATA DICTIONARY, MINING SCHEMA `-- wtraining.nab.xml MODEL FILE USED FOR TRAINING
  • 32.
  • 33.
  • 34. Running the Consumer cd script $ python2.5 consume.py -b wtraining.nab -f wscoring.nab Ready to score . |-- consumer | |-- wscoring.nab.wtraining.nab.xml | `-- wtraining.nab.pmml |-- postprocess | `-- wscoring.nab.wtraining.nab.xml `-- producer |-- wtraining.nab.pmml `-- wtraining.nab.xml This examples generates a report in the post process directory.
  • 35. Consumer (Scoring) output $ cat consumer/wscoring.nab.wtraining.nab.xml <pmmlDeployment> <inputData> <readOnce /> <batchScoring /> <fromFile name=&quot;../data/wscoring.nab&quot; type=&quot;UniTable&quot; /> </inputData> <inputModel> <fromFile name=&quot;../consumer/wtraining.nab.pmml&quot; /> </inputModel> <output> <report name=&quot;report&quot;> <toFile name=&quot;../postprocess/wscoring.nab.wtraining.nab.xml&quot; /> <outputRow name=&quot;event&quot;> <score name=&quot;score&quot; /> <alert name=&quot;alert&quot; /> <segments name=&quot;segments&quot; /> </outputRow> </report> </output> </pmmlDeployment>
  • 36. Scoring Report $ cat postprocess/ wscoring.nab.wtraining.nab.xml <report> < event > < score >0.471458430077</ score > < alert >True</ alert > < Segments ></ Segments > </ event > </report>
  • 37.
  • 38.
  • 39.
  • 40.