How to build a SQL-based data warehouse for a trillion rows in Python by Ville Tuulos PyData SV 2014

•Download as PPTX, PDF•

0 likes•933 views

In this talk, we show how and why AdRoll built a custom, high-performance data warehouse in Python which can handle hundreds of billions of data points with sub-minute latency on a small cluster of servers. This feat is made possible by a non-trivial combination of compressed data structures, meta-programming, and just-in-time compilation using Numba, a compiler for numerical Python. To enable smooth interoperability with existing tools, the system provides a standard SQL-interface using Multicorn and Foreign Data Wrappers in PostgreSQL.

How to build a SQL-based data warehouse for a trillion rows in Python
By Ville Tuulos
http://tuulos.github.io/pydata-2014/#/

More Related Content

More from PyData

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

To productionize data science work (and have it taken seriously by software engineers, CTOs, clients, or the open source community), you need to write tests! Except… how can you test code that performs nondeterministic tasks like natural language parsing and modeling? This talk presents an approach to testing probabilistic functions in code, illustrated with concrete examples written for Pytest.

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Words in Space - Rebecca Bilbro

Words in Space - Rebecca Bilbro

Words in Space - Rebecca Bilbro

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica Puerto

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will Ayd

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen Hoover

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper Seabold

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

More from PyData (20)

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Deploying Data Science for Distribution of The New York Times - Anne Bauer

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott

Words in Space - Rebecca Bilbro

Words in Space - Rebecca Bilbro

Words in Space - Rebecca Bilbro

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

End-to-End Machine learning pipelines for Python driven organizations - Nick ...

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica Puerto

Pydata beautiful soup - Monica Puerto

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will Ayd

Extending Pandas with Custom Types - Will Ayd

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen Hoover

Measuring Model Fairness - Stephen Hoover

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper Seabold

What's the Science in Data Science? - Skipper Seabold

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Applying Statistical Modeling and Machine Learning to Perform Time-Series For...

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...

Deprecating the state machine: building conversational AI with the Rasa stack...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Towards automating machine learning: benchmarking tools for hyperparameter tu...

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...

Using GANs to improve generalization in a semi-supervised setting - trying it...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

LightFields.jl: Fast 3D image reconstruction for VR applications - Hector And...

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

Apidays New York 2024 - The value of a flexible API Management solution for O...

Apidays New York 2024 - The value of a flexible API Management solution for O...

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

A Domino Admins Adventures (Engage 2024)

A Domino Admins Adventures (Engage 2024)

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Histor y of HAM Radio presentation slide

Histor y of HAM Radio presentation slide

Histor y of HAM Radio presentation slide

presentation ICT roal in 21st century education

presentation ICT roal in 21st century education

presentation ICT roal in 21st century education

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Tech Trends Report 2024 Future Today Institute.pdf

Tech Trends Report 2024 Future Today Institute.pdf

Tech Trends Report 2024 Future Today Institute.pdf

Advantages of Hiring UIUX Design Service Providers for Your Business

Advantages of Hiring UIUX Design Service Providers for Your Business

Advantages of Hiring UIUX Design Service Providers for Your Business

Pixlogix Infotech

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

Principled Technologies

Developing An App To Navigate The Roads of Brazil

Developing An App To Navigate The Roads of Brazil

Developing An App To Navigate The Roads of Brazil

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

This presentation explores the impact of HTML injection attacks on web applications, detailing how attackers exploit vulnerabilities to inject malicious code into web pages. Learn about the potential consequences of such attacks and discover effective mitigation strategies to protect your web applications from HTML injection vulnerabilities. for more information visit https://bostoninstituteofanalytics.org/category/cyber-security-ethical-hacking/

HTML Injection Attacks: Impact and Mitigation Strategies

HTML Injection Attacks: Impact and Mitigation Strategies

HTML Injection Attacks: Impact and Mitigation Strategies

Boston Institute of Analytics

[2024]Digital Global Overview Report 2024 Meltwater.pdf

[2024]Digital Global Overview Report 2024 Meltwater.pdf

[2024]Digital Global Overview Report 2024 Meltwater.pdf

🐬 The future of MySQL is Postgres 🐘

🐬 The future of MySQL is Postgres 🐘

🐬 The future of MySQL is Postgres 🐘

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

Apidays New York 2024 - The value of a flexible API Management solution for O...

Apidays New York 2024 - The value of a flexible API Management solution for O...

Apidays New York 2024 - The value of a flexible API Management solution for O...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

A Domino Admins Adventures (Engage 2024)

A Domino Admins Adventures (Engage 2024)

A Domino Admins Adventures (Engage 2024)

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Histor y of HAM Radio presentation slide

Histor y of HAM Radio presentation slide

Histor y of HAM Radio presentation slide

presentation ICT roal in 21st century education

presentation ICT roal in 21st century education

presentation ICT roal in 21st century education

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Tech Trends Report 2024 Future Today Institute.pdf

Tech Trends Report 2024 Future Today Institute.pdf

Tech Trends Report 2024 Future Today Institute.pdf

Advantages of Hiring UIUX Design Service Providers for Your Business

Advantages of Hiring UIUX Design Service Providers for Your Business

Advantages of Hiring UIUX Design Service Providers for Your Business

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

Developing An App To Navigate The Roads of Brazil

Developing An App To Navigate The Roads of Brazil

Developing An App To Navigate The Roads of Brazil

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

HTML Injection Attacks: Impact and Mitigation Strategies

HTML Injection Attacks: Impact and Mitigation Strategies

HTML Injection Attacks: Impact and Mitigation Strategies

[2024]Digital Global Overview Report 2024 Meltwater.pdf

[2024]Digital Global Overview Report 2024 Meltwater.pdf

[2024]Digital Global Overview Report 2024 Meltwater.pdf

🐬 The future of MySQL is Postgres 🐘

🐬 The future of MySQL is Postgres 🐘

🐬 The future of MySQL is Postgres 🐘

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

How to build a SQL-based data warehouse for a trillion rows in Python by Ville Tuulos PyData SV 2014

1. How to build a SQL-based data warehouse for a trillion rows in Python By Ville Tuulos http://tuulos.github.io/pydata-2014/#/