Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Data Science for Business

Wird geladen in …3

Hier ansehen

1 von 42 Anzeige

Data Science for Business

Herunterladen, um offline zu lesen

What is Data Science, Artificial Intelligence / Machine Learning, how did we come about it and what is it used for in our day to day lives... not a technical presentation, but for anyone looking to understand what "AI" "Machine Learning" and "Data Science" is all about...

What is Data Science, Artificial Intelligence / Machine Learning, how did we come about it and what is it used for in our day to day lives... not a technical presentation, but for anyone looking to understand what "AI" "Machine Learning" and "Data Science" is all about...


Weitere Verwandte Inhalte

Ähnlich wie Data Science for Business (20)

Aktuellste (20)


Data Science for Business

  2. 2. WHAT IS THIS? Bob: i can i i everything else . . . . . . . . . . . . . . Alice: balls have zero to me to me to me to me to me to me to me to me to Bob: you i everything else . . . . . . . . . . . . . . Alice: balls have a ball to me to me to me to me to me to me to me Bob: i i can i i i everything else . . . . . . . . . . . . . . Alice: balls have a ball to me to me to me to me to me to me to me Bob: i . . . . . . . . . . . . . . . . . . . Alice: balls have zero to me to me to me to me to me to me to me to me to Bob: you i i i i i everything else . . . . . . . . . . . . . . Alice: balls have 0 to me to me to me to me to me to me to me to me to
  3. 3. WHAT IS DATA SCIENC E Data science is the extraction of relevant insights from data
  4. 4. WHAT DO YOU DO IN DATA SCIENCE? • Classification (e.g., spam or not spam) • Pattern detection and grouping (classification without known classes) • Anomaly detection (e.g., fraud detection) • Recognition (image, text, audio, video, facial, …) • Actionable insights (via dashboards, reports, visualizations, …) • Automated processes and decision-making (e.g., credit card approval) • Scoring and ranking (e.g., FICO score) • Segmentation (e.g., demographic-based marketing) • Optimization (e.g., risk management)
  5. 5. WHAT CAN YOU DO WITH DATA SCIENCE Recommendation s Fraud detection Customer sentiment analysis Churn, Next Best Action, Propensity Predictive Shipping Supply Chain Optimization Price Optimization Clickstream analytics
  8. 8. GROWTH OF CONNECTED THINGS 127 new devices connecting to the Internet every second.
  10. 10. A BRIEF HISTORY OF … DATA 2.5 Quintillion (Exabytes) data / day (2018) 10 TB = printed Library of Congress
  11. 11. 90% of the world’s data was created in last 2 years. 1.7 megabytes of new information will be created / second / person, by 2020
  12. 12. A Boeing 787 aircraft could generate 40 TBs per hour of flight.
  13. 13. An ADV car will churn out 4,000 GB of data per hour of driving
  14. 14. In just 10 minutes, 16 players with 6 balls can produce almost 13 million data points! (soccer) “You’re capturing real-time data at every point, on every single food product.” - Walmart, Food Trust blockchain
  16. 16. NASDAQ, NYSE, CBOE ~ 1 TB / DAY EACH
  18. 18. Self Driving Car
  21. 21. DEEP FAKES
  22. 22. • Deep dreaming •
  23. 23. AlphaGo first learned from studying 30 million moves of expert human play. AlphaGo Zero just learned the rules and played.
  24. 24. DATA SCIENCE IN FINANCIAL SERVICES • Dataminr analyzes billions of tweets to monitor the entire world – predicting stock movements • 56% of hedge funds said they used AI/ML for investing • Blackrock (largest Asset Manager, 6.5 TN AUM) using AI for investing. • JPM Chase (largest bank, 2.6 TN assets) using AI to “deepen customer engagements.” • Risk management • Algorithmic trading
  25. 25. “Machine managed portfolio will out perform a human managed one, in 7 years” – AW 
  27. 27. Financial Conditions Policy Liquidity Quantity Liquidity Domestic Liquidity Equity Exposures Bond Exposures Money Flows Monetized Savings Momentum TedSpread OIS spread 10yr, 2yr CMT Convexity at 5yr 5 yr inflation + 5 years Banks' swap spreads CB Credit Risk Index Sentiment Index Dollar Sentiment Trade weighted $ 2-10yr Yield Curve BAA-AAA credit spread Mkt PE & EPS VIX, S&P 500, FTSE EuroStoxx, MSCI EM USD/ GBP, EUR MARKET VOLATILITY PREDICTION - DATA
  28. 28. MODEL BUILDING PROCESS Rule modelingInput enhancements Classify output Lag/lead the factors Remove correlations Induction for optimization Transform inputs, outputs Analyze explanatory power Operational Monitoring Compute risk &probability What if scenario modeling Monitor incoming data Expert Analysis Analyze rules for purity Analyze rules for causality Analyze new requirements Feedback
  29. 29. PREDICTIVE RISK MAP VS. MARKETS… What is happenin g in global markets?The RBA surprised … cutting its … rate by 0.25 …a level last seen in late 2009 Heightened global risk in May – Jul 2013 Japan starts QE The US Fed “Taper” talk starts
  31. 31. Factors for a high out performance… Insight: Brazilian Real…for real? FINANCIAL PORTFOLIO CONSTRUCTION
  32. 32. PERSON A OF A DATA SCIENTIS T Credit- Stephan Kolassa – Data Science Expert – SAP Switzerland AG Business Domain Knowledge & soft skills Math, Stats, Data Engineering, Programming
  33. 33. WHY BE A DATA SCIENTIST • If you like data  • Data scientists today are akin to the Wall Street “quants” of the 1980s 1990s.. And 2000s • Salary $120-160K + Sexiest Job of the 21st Century – Harvard Business Rev

Hinweis der Redaktion

  • Summer 2017 - Facebook’s AI research lab. Researchers set out to make chatbots that could negotiate with people. Their thinking: Negotiation and cooperation will be necessary for bots to work more closely with humans. First, they fed the computers dialog from thousands of games between humans to give the system a sense of the language of negotiation. Then they allowed bots to use trial and error—in the form of a technique called reinforcement learning, which helped Google’s Go bot AlphaGo defeat champion players When two bots using reinforcement learning played each other, they stopped using recognizable sentences. 
  • Data Science is an interdisciplinary field to extract insights from data .
    AI is the science of making machines do intelligent tasks like humans.
    To do this, machines have to learn from data – and that process is called machine learning

    Deep learning is a type of ML generally modeled after the human brain – neural networks. DL is more scalable than other ML , for improved learning and larger data.

    data science allows for AIs to find appropriate and meaningful information from those huge pools faster and more efficiently. machine learning is the process of learning from data over time. 

    Artificial intelligence refers to the simulation of a human brain function by machines. This is achieved by creating an artificial neural network that can mimick human intelligence. The primary human functions that an AI machine performs include logical reasoning, learning and self-correction. Machines inherently are not smart and to make them so, we need a lot of computing power and data to empower them to simulate human thinking.
    Artificial intelligence is classified into two parts, general AI and Narrow AI. General AI refers to making machines intelligent in a wide array of activities that involve thinking and reasoning. Narrow AI, on the other hand, involves the use of artificial intelligence for a very specific task. For instance, general AI would mean an algorithm that is capable of playing all kinds of board game while narrow AI will limit the range of machine capabilities to a specific game like chess or scrabble. 

    Machine learning is the ability of a computer system to learn from the environment and improve itself from experience without the need for any explicit programming. Machine learning focuses on enabling algorithms to learn from the data provided, gather insights and make predictions on previously unanalyzed data using the information gathered. Machine learning can be performed using multiple approaches. The three basic models of machine learning are supervised, unsupervised and reinforcement learning.
    In case of supervised learning, labeled data is used to help machines recognize characteristics and use them for future data. For instance, if you want to classify pictures of cats and dogs then you can feed the data of a few labeled pictures and then the machine will classify all the remaining pictures for you.  On the other hand, in unsupervised learning, we simply put unlabeled data and let machine understand the characteristics and classify it. Reinforcement machine learning algorithms interact with the environment by producing actions and then analyze errors or rewards. For example, to understand a game of chess an ML algorithm will not analyze individual moves but will study the game as a whole.

  • Solve real world problems or improve things, using data & AI/ML
  • The Manhattan Population Explorer provides a visual representation of the dynamic population shifts within the borough. In this example it synthesizes a heartbeat of New York
  • 24 BN connected devices in 2018
  • Atoms in universe 10^80
  • 2.5 quintillion bytes of data created each day at our current pace, but tha
    t pace is only accelerating with the growth of the Internet of Things (IoT). Over the last two years alone 90 percent of the data in the world was generated. 

    Data is growing at a rapid pace. By 2020 the new information generated per second for every human being will approximate amount to 1.7 megabytes.
    By 2020, the accumulated volume of big data will increase from 4.4 zettabytes to roughly 44 zettabytes or 44 trillion GB.
    Originally, data scientists maintained that the volume of data would double every two years thus reaching the 44 ZB point by 2020 with iot
    The rate at which data is created is increased exponentially. For instance, 40,000 search queries are performed per second (on Google alone), which makes it 3.46 million searches per day and 1.2 trillion every year.
    Every minute Facebook users send roughly 31.25 million messages and watch 2.77 million videos.
    The data gathered is no more text-only. An exponential growth in videos and photos is equally prominent. On YouTube alone, 300 hours of video are uploaded every minute.
    IDC estimates that by 2020, business transactions (including both B2B and B2C) via the internet will reach up to 450 billion per day.
    Globally, the number of smartphone users will grow to 6.1 billion by 2020 .
    In just 5 years the number of smart connected devices in the world will be more than 50 billion – all of which will create data that can be shared, collected and analyzed.
  • A typical human genome contains more than 20,000 genes, with each made up of millions of base pairs. Simply mapping a genome requires a hundred gigabytes of data, and sequencing multiple genomes and tracking gene interactions multiplies that number many times — hundreds of petabytes in some cases. 
  • Physicists use the 17 -mile) LHC tunnel to accelerate particles almost to light speed, and smash them together
    At about 30 million collisions per second for 120 billion protons.
    one billion collisions per second generates one petabyte per second.
    to keep all 30 million events per second we would need about 2,000 petabytes to store a typical 12-hour run.
    For a typical running year of 150 days uptime, this would mean almost 400 ExaByte per year 
    throws away 99.99% of 400 EB

    The Large Hadron Collider is the world's largest and most powerful particle collider and the largest machine in the world. 
    CERN has dumped about 300 TB of Large Hadron Collider (LHC) data online. It’s completely free,

    the world’s first electronic stock market, NASDAQ OMX owns and operates three clearing houses, five central securities depositories, and 26 markets (including the NASDAQ Stock Market) with a combined value that exceeds US$8 trillion. Its trading engine is used by 80 global marketplaces.
    When markets open, the company processes more than 1 million messages per second.
    Director of Database Structures at NASDAQ OMX, says, “Just our US Options and Equity data archive handles billions of transactions per day, stores multiple petabytes of online data, and has tables that contain quintillions of records about business transactions.”
    the Options and Equity archive measures 2 petabytes (PB)
  • US Department of Energy’s Oak Ridge National Laboratory announced the top speeds of its Summit supercomputing machine, which nearly laps the previous record-holder, China’s Sunway TaihuLight. The Summit’s theoretical peak speed is 200 petaflops, or 200,000 teraflops. To put that in human terms, approximately 6.3 billion people would all have to make a calculation at the same time, every second, for an entire year, to match what Summit can do in just one second.

    In 2015, Google and NASA reported that their new 1097-qubit D-Wave quantum computer had solved an optimization problem in a few seconds. That’s 100 million times faster than a regular computer chip. They claimed that a problem their D-Wave 2X machine processed inside one second would take a classical computer 10,000 years to solve.

    Your brain is 10 million times slower than a computer.
    Brain ~ 1000 operations /s
  • Google offers an option to download all of the data it stores about you. I’ve requested to download it and the file is 5.5GB big,
    Facebook offers a similar option to download all your information. Mine was roughly 600MB
  • 8x8 pixel photos were inputted into a Deep Learning network which tried to guess what the original face looked like. As you can see it was fairly close (the correct answer is under "ground truth” - which was the real face originally in the photos)).
  • https://youtu.be/aKed5FHzDTw?t=43
  • Natural language processing (NLP) deals with building computational algorithms to automatically analyze and represent human language. NLP-based systems have enabled a wide range of applications such as Google’s powerful search engine, and more recently, Amazon’s voice assistant named Alexa. NLP is also useful to teach machines the ability to perform complex natural language related tasks such as machine translation and dialogue generation.
  • Gebru et al took 50 million Google Street View images and exploredwhat a Deep Learning network can do - "if the number of sedans encountered during a 15-minute drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next Presidential election (88% chance); otherwise, it is likely to vote Republican (82%).”

    Harvard scientists used Deep Learning to teach a computer to perform viscoelastic computations, these are the computations used in predictions of earthquakes.
    Deep Learning improved calculation time by 50,000%
  • total number of possible games of Go has been estimated at 10761, compared to 10120 for chess. Both are very large numbers: the entire universe is estimated to contain "only" about 1080 atoms. 

    The original AlphaGo first learned from studying 30 million moves of expert human play.

  • Fifty-six percent of the survey’s respondents said they used AI or machine learning in their investment processes. Just 20 percent had said the same in a BarclayHedge poll last August.
    Among current users, slightly more than two-thirds said they relied on these quantitative techniques for idea generation, while 58 percent said they used them for portfolio construction. Other applications of AI and machine learning included risk management
  • Why is liquidity?
  • Trivia question – what factor has been very highly correlated to S&P in the late 80s and 90s? Bangla butter prod
    Don’t pick on 1 country, - enhance the model by adding another factor to the mix, - US cheese prod
    Bangla sheep

    We do some intelligent things eliminate correls, to reduce noise… but not dwell on this too much. Offline
  • Use for macro risk, asset allocation, portfolio construction and management,
    Whether you are a CRO, CIO, CXO, strategist, asset allocator, etc