SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Back to the Classroom:
The Road to Business Analytics
Professor Hector Guerrero
Disk Storage
1 Bit = Binary Digit
8 Bits = 1 Byte
1000 Bytes = 1 Kilobyte
1000 Kilobytes = 1 Megabyte
1000 Megabytes = 1 Gigabyte
1000 Gigabytes = 1 Terabyte
1000 Terabytes = 1 Petabyte
1000 Petabytes = 1 Exabyte
1000 Exabytes = 1 Zettabyte
1000 Zettabytes = 1 Yottabyte
1000 Yottabytes = 1 Brontobyte
1000 Brontobytes = 1 Geopbyte
Data Science / Big Data / Business or Data Analytics
The tasty recipe for Business Analytics--
Business Analytics, 2nd Edition James R. Evans, ©2016 | Pearson
Origins of the 3 Ingredients
• Probability/Statistics …..1700’s
• Early efforts to understand uncertainty (modern Stats 1900)
• Operations Research …..1940’s
• Attempt to bring greater efficiency to use of scare resources
• Computer Technology Science …..1940’s
• 1st Ph.D. in Computer Science awarded at Purdue in 1966
What is Data Science and a Data Scientist?
• “Data science, also known as data-driven science, is an interdisciplinary
field about scientific methods, processes, and systems to extract knowledge or
insights from data in various forms, either structured or unstructured, similar to
Knowledge Discovery in Databases (KDD).”
• “Data science is a "concept to unify statistics, data analysis and their related
methods" in order to "understand and analyze actual phenomena" with data. It
employs techniques and theories drawn from many fields within the broad
areas of mathematics, statistics, information science, and computer science,
in particular from the subdomains of machine learning, classification, cluster
analysis, data mining, databases, and visualization.”
Modified from… https://en.wikipedia.org/wiki/Data_science
According to Wikipedia--
“When Harvard Business Review called it "The Sexiest Job of the 21st
Century" the term became a buzzword, and is now often applied to
business analytics, or even arbitrary use of data, or used as a sexed-up
term for statistics. While many university programs now offer a data
science degree, there exists no consensus on a definition or curriculum
contents. Because of the current popularity of this term, there are many
"advocacy efforts" surrounding it.”
Modified from … https://en.wikipedia.org/wiki/Data_science
A Process Map of Data Science
https://en.wikipedia.org/wiki/Data_science
A simple timeline of lessons learned and observations
• 1966– off to Univ. Texas as an EE
• 1970– off to what would become Silicon Valley
• 1978/80– off to Univ. Texas-MBA/Univ. Washington-Ph.D.
• 1982– off to Tuck School at Dartmouth
• 1986– off to Notre Dame
• 1990– off to W&M
• 2017– off to retirement (?)
1966– off to Univ. Texas as an EE
• No computers at my high school, or likely many high schools in that time.
During orientation all Engineering majors required to learn Fortran
programming an write a complex program in 2.5 days. Went from 4500 to 500
majors!
• Lesson-- It was hard to become an engineer at UT, and one way to cull the herd
is to terrorize students and see who survives
• Take first Operations Research classes—I’m in heaven!
• First Ph.D. in Computer Science offered at Purdue Univ.
1970– off to what would become Silicon Valley
• Lockheed Missiles and Space company– 30k employees
• Play Pong by Atari at Andy Capp’s Tavern in Sunnyvale
• Realize that computers are going to be the most important tool in my professional life, and
that my training in math was equally important
• Attend Engineering Economics program at Stanford and introduced to Decision Analysis—
read about early AI concepts, Neural Networks, Rule-Based Systems, Bayesian analysis,
Logic (fuzzy), Expert Systems, etc.
• All seemed important, but a little distant due to lack of computer power– for the most part
is was conceptual. No way, or difficult, to actually use these concepts
1978/80–off to Univ. Texas MBA/ Univ. Washington Ph.D.
• MBA was King/Queen of all Degrees
• I learned there were firms that would pay for abilities in operations research, but very focused (for
example--my ability to do time series forecast models)
• Ex. Later… can you build a model for efficient distribution of natural gas/purchase futures contracts?
• Learned to do modeling of many types—simulation, optimization, etc.
• Still, the capabilities of these techniques were limited by the processing capabilities of computers!
• My dissertation was typed manually—next year a student colleague used an IBM personal computer.
Ms. Lupe Lopez lost job—she had typed dissertations for 40 years (sad).
1982– off to Tuck School at Dartmouth
Data General One
• one or two 3.5-inch floppy drives - the first
portable computer to incorporate the new
Sony 3.5-inch disks.
• a huge 11-inch display - the largest of any
portable computer - capable of displaying a
full 25 lines of text with 80 characters per
line.
• weighing only 10 pounds, it is significantly
lighter than competing CRT-based portable
system, like the IBM Portable
• up to eight hours of run time using the
internal rechargeable batteries.
• The MBA is still King/Queen as long as you are
Finance or Marketing Major– especially
Investment Banking. Jim Bradley was a student in
my classes and a real Geek!
• I was NOT a Dartmouth Man!
• I did begin to see a break to more high-tech jobs
and Entrepreneurship that required technology
• I was still using “baby problems”, “Little–Data” in
the classroom
1986– off to Notre Dame
• I began research on Rule-Based Robotics—simple AI
• Excel comes to forefront as “the working man’s/woman’s analytic platform”
• I had Bill Jelen, Mr. Excel on the internet, in class– He convinced me!!
• Apple produces a video predicting the use of computers and smart
assistants
1990– off to W&M
• Deep Blue (IBM) partially defeats Kasparov in Chess
• Watson was not far behind and more sophisticated use of AI
• Technology became omnipresent
• Could do real demos of analyses in classroom
• Students could follow and try themselves
• Statisticians debate whether they should call themselves Data Scientists
• Big Data and Analytics emerges as the way to compete
“Companies questing for killer apps generally focus all their firepower on the one area that promises to create the greatest
competitive advantage. But a new breed of company is upping the stakes. Organizations such as Amazon, Harrah’s,
Capital One, and the Boston Red Sox have dominated their fields by deploying industrial-strength analytics across a wide
variety of activities. In essence, they are transforming their organization.”
Competing on Analytics, Thomas H. Davenport, January 2006
2017– off to retirement (?)
• I develop and teach an online Business Analytics class in our
OMBA– I was skeptical, but it’s a big success
• I teach Intermediate Probability and Statistics to our inaugural
MSBA class– soon to also be an online program
• I teach an online Business Analytics class to our MAcc program
• I develop an online Business Analytics for UGs
• I wonder if it was the right time to retire– then I remember IT WAS!
Where are we in this brave new world?
• What’s working and Hot? …..AI!!
• The future of the “customer experience”
• Replacement of humans in work
• Autonomous agents, including vehicles
• What’s the future?......AI!!
• Questions about displacement
• Questions about ethics
• Questions about the effect on human existence
August May
What does a Business Analytics Degree look like?
A brief glossary of terms
http://data-informed.com/glossary-of-big-data-terms/ (modified through some omission)
Some important terms--
Algorithm
• A process or set of rules to be followed in calculations or other problem-solving
operations, especially by a computer.
Analytics
• The discovery, interpretation, and communication of meaningful patterns in data.
Artificial Intelligence
• The theory and development of computer systems able to perform tasks that
normally require human intelligence, such as visual perception, speech
recognition, decision-making, and translation between languages.
Contd.
Data management
According to the Data Management Association, data management incorporates the following practices needed to manage the full data lifecycle in
an enterprise:
data governance
data architecture, analysis, and design
database management
data security management
data quality management
reference and master data management
data warehousing and business intelligence management
document, record, and content management
metadata management
contact data management
Data mining
The process of deriving patterns or knowledge from large data sets.
Data science
A recent term that has multiple definitions, but generally accepted as a discipline that incorporates statistics, data visualization, computer
programming, data mining, machine learning, and database engineering to solve complex problems.
Data scientist
A practitioner of data science.
Data visualization
A visual abstraction of data designed for the purpose of deriving meaning or communicating
information more effectively.
Data warehouse
A place to store data for the purpose of reporting and analysis.
Database
A digital collection of data and the structure around which the data is organized. The data is typically
entered into and accessed via a database management system (DBMS).
Enterprise resource planning (ERP)
A software system that allows an organization to coordinate and manage all its resources, information,
and business functions.
Exploratory data analysis
An approach to data analysis focused on identifying general patterns in data, including outliers and
features of the data that are not anticipated by the experimenter’s current knowledge or
preconceptions. EDA aims to uncover underlying structure, test assumptions, detect mistakes, and
understand relationships between variables.
Contd.
Contd.
Internet of Things (IoT)
The network of physical objects or “things” embedded with electronics, software, sensors and connectivity to enable it to
achieve greater value and service by exchanging data with the manufacturer, operator and/or other connected devices. Each
thing is uniquely identifiable through its embedded computing system but is able to interoperate within the existing Internet
infrastructure.
Machine learning
A type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed.
Machine learning focuses on the development of computer programs that can change when exposed to new data.
Metadata
Any data used to describe other data–for example, a data file’s size or date of creation.
Natural language processing
The ability of a computer program or system to understand human language. Applications of natural language processing
include enabling humans to interact with computers using speech, automated language translation, and deriving meaning
from unstructured data such as text or speech data.
NoSQL
A class of database management system that does not use the relational model. NoSQL is designed to handle large data
volumes that do not follow a fixed schema. It is ideally suited for use with very large data volumes that do not require the
relational model.
A more complete glossary
http://data-informed.com/glossary-of-big-data-terms/ (modified through some omissions)
Analytics and Big Data Glossary
Last updated: 3/16/17
Algorithm
A process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.
Analytics
The discovery, interpretation, and communication of meaningful patterns in data.
Analytics platform Application
Software that is designed to perform a specific task or suite of tasks.
Artificial Intelligence
The theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
Behavioral analytics
Using data about people’s behavior to understand intent and predict future actions.
Big data
This term has been defined in many ways, but along similar lines. Doug Laney, then an analyst at the META Group, first defined big data in a 2001 report called “3-D Data Management: Controlling Data Volume, Velocity and Variety.” Volume refers to the sheer size of the datasets. The McKinsey report, “Big Data: The Next Frontier
for Innovation, Competition, and Productivity,” expands on the volume aspect by saying that, “’Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”
Velocity refers to the speed at which the data is acquired and used. Not only are companies and organizations collecting more and more data at a faster rate, they want to derive meaning from that data as soon as possible, often in real time.
Variety refers to the different types of data that are available to collect and analyze in addition to the structured data found in a typical database. Barry Devlin of 9sight Consulting identifies four categories of information that constitute big data:
1. Machine-generated data. This includes RFID data, geolocation data from mobile devices, and data from monitoring devices such as utility meters.
2. Computer log data, such as clickstreams from websites.
3. Textual social media information from sources such as Twitter and Facebook.
4. Multimedia social and other information from Flickr, YouTube, and other similar sites.
Business intelligence (BI)
The general term used for the identification, extraction, and analysis of data.
Classification analysis
Data analysis for the purpose of assigning the data to a particular group or class.
Cloud
A broad term that refers to any Internet-based application or service that is hosted remotely.
Clustering analysis
Data analysis for the purpose of identifying similarities and differences among data sets so that similar data sets can be clustered together.
Computer-generated data
Any data generated by a computer rather than a human–a log file for example.
Contd.
Correlation analysis
• A means to determine a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables.
• Correlation refers to any of a broad class of statistical relationships involving dependence. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation
between the demand for a product and its price.
Customer relationship management (CRM)
• Software that helps businesses manage sales and customer service processes.
Dashboard
• A graphical reporting of static or real-time data on a desktop or mobile device. The data represented is typically high-level to give managers a quick report on status or performance.
Data
• A quantitative or qualitative value. Common types of data include sales figures, marketing research results, readings from monitoring equipment, user actions on a website, market growth projections, demographic information, and customer lists.
Data analytics
• The application of software to derive information or meaning from data. The end result might be a report, an indication of status, or an action taken automatically based on the information received.
Data analyst
• A person responsible for the tasks of modeling, preparing, and cleaning data for the purpose of deriving actionable information from it.
Data architecture and design
• How enterprise data is structured. The actual structure or design varies depending on the eventual end result required. Data architecture has three stages or processes: conceptual representation of business entities. the logical representation of
the relationships among those entities, and the physical construction of the system to support the functionality.
Data center
• A physical facility that houses a large number of servers and data storage devices. Data centers might belong to a single organization or sell their services to many organizations.
Data cleansing
• The act of reviewing and revising data to remove duplicate entries, correct misspellings, add missing data, and provide more consistency.
Data collection
• Any process that captures any type of data.
• The process of combining data from different sources and presenting it in a single view.
Data integrity
• The measure of trust an organization has in the accuracy, completeness, timeliness, and validity of the data.
Contd.Data management
• According to the Data Management Association, data management incorporates the following practices needed to manage the full data lifecycle in an enterprise:
• data governance
• data architecture, analysis, and design
• database management
• data security management
• data quality management
• reference and master data management
• data warehousing and business intelligence management
• document, record, and content management
• metadata management
• contact data management
Data marketplace
• A place where people can buy and sell data online.
Data mart
• The access layer of a data warehouse used to provide data to users.
Data migration
• The process of moving data between different storage types or formats, or between different computer systems.
Data mining
• The process of deriving patterns or knowledge from large data sets.
Data model, data modeling
• A data model defines the structure of the data for the purpose of communicating between functional and technical people to show data needed for business processes, or for communicating a plan to develop how data is stored and accessed among application
development team members.
Data science
• A recent term that has multiple definitions, but generally accepted as a discipline that incorporates statistics, data visualization, computer programming, data mining, machine learning, and database engineering to solve complex problems.
Data scientist
• A practitioner of data science.
Data security
• The practice of protecting data from destruction or unauthorized access.
Data structure
• A specific way of storing and organizing data.
Data visualization
• A visual abstraction of data designed for the purpose of deriving meaning or communicating information more effectively.
Data warehouse
• A place to store data for the purpose of reporting and analysis.
Database
• A digital collection of data and the structure around which the data is organized. The data is typically entered into and accessed via a database management system (DBMS).
Database administrator (DBA)
• A person, often certified, who is responsible for supporting and maintaining the integrity of the structure and content of a database.
Database management system (DBMS)
• Software that collects and provides access to data in a structured format.
Demographic data
• Data relating to the characteristics of a human population.
Distributed processing
• The execution of a process across multiple computers connected by a computer network.
Document management
• The practice of tracking and storing electronic documents and scanned images of paper documents.
Electronic health records (EHR)
• A digitized health record meant to be usable across different health care settings.
Enterprise resource planning (ERP)
• A software system that allows an organization to coordinate and manage all its resources, information, and business functions.
Exploratory data analysis
• An approach to data analysis focused on identifying general patterns in data, including outliers and features of the data that are not anticipated by the experimenter’s current knowledge or preconceptions. EDA aims to uncover underlying
structure, test assumptions, detect mistakes, and understand relationships between variables.
External data
• Data that exists outside of a system.
Contd.
Extract, transform, and load (ETL)
• A process used in data warehousing to prepare data for use in reporting or analytics.
Information management
• The practice of collecting, managing, and distributing information of all types–digital, paper-based, structured, unstructured.
• in-memory database
• Any database system that relies on memory for data storage.
• in-memory data grid (IMDG)
• The storage of data in memory across multiple servers for the purpose of greater scalability and faster access or analytics.
Internet of Things (IoT)
• The network of physical objects or “things” embedded with electronics, software, sensors and connectivity to enable it to achieve greater value and service by exchanging data with the manufacturer, operator and/or other connected devices. Each
thing is uniquely identifiable through its embedded computing system but is able to interoperate within the existing Internet infrastructure.
Location analytics
• Location analytics brings mapping and map-driven analytics to enterprise business systems and data warehouses. It allows you to associate geospatial information with datasets.
Location data
• Data that describes a geographic location.
Machine learning
• A type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can change when exposed to new data.
Metadata
• Any data used to describe other data–for example, a data file’s size or date of creation.
Multidimensional database
• A type of database that stores data as multidimensional arrays, or “cubes,” as opposed to the rows and column sotrage structure of relational databases. This enables data to be analyzed from different angles for complex queries and analytical
processing (OLAP) applications.
Natural language processing
• The ability of a computer program or system to understand human language. Applications of natural language processing include enabling humans to interact with computers using speech, automated language translation, and deriving meaning
from unstructured data such as text or speech data.
NoSQL
• A class of database management system that does not use the relational model. NoSQL is designed to handle large data volumes that do not follow a fixed schema. It is ideally suited for use with very large data volumes that do not require the
relational model.
Online analytical processing (OLAP)
• The process of analyzing multidimensional data using three operations: consolidation (the aggregation of available), drill-down (the ability for users to see the underlying details), and slice and dice (the ability for users to select subsets and view
them from different perspectives).
Contd.
Open source software
• Software with source code that is made available by the copyright holder free of charge to the general public. This code may be redistributed, and anyone can inspect and change it.
Pattern recognition
• The classification or labeling of an identified pattern in the machine learning process.
Petabyte
• One million gigabytes or 1,024 terabytes.
Predictive analytics
• Using statistical functions on one or more datasets to predict trends or future events.
Predictive modeling
• The process of developing a model that will most likely predict a trend or outcome.
Query analysis
• The process of analyzing a search query for the purpose of optimizing it for the best possible result.
R
• An open source software environment used for statistical computing.
Records management
• The process of managing an organization’s records throughout their entire lifecycle, from creation to disposal.
Risk analysis
• The application of statistical methods on one or more datasets to determine the likely risk of a project, action, or decision.
Root-cause analysis
• The process of determining the main cause of an event or problem.
Scalability
• The ability of a system or process to maintain acceptable performance levels as workload or scope increases.
Schema
• The structure that defines the organization of data in a database system.
Search
• The process of locating specific data or content using a search tool.
Contd.
Search data
• Aggregated data about search terms used over time.
Storage
• Any means of storing data persistently.
Structured data
• Data that is organized by a predetermined structure.
Structured Query Language (SQL)
• A programming language designed specifically to manage and retrieve data from a relational database system.
Terabyte
• 1,000 gigabytes.
Text analytics
• The application of statistical, linguistic, and machine learning techniques on text-based sources to derive meaning or insight.
Transactional data
• Data that changes unpredictably. Examples include accounts payable and receivable data, or data about product shipments.
Transparency
• As more data becomes openly available, the idea of proprietary data as a competitive advantage is diminished.
Unstructured data
• Data that has no identifiable structure – for example, the text of email messages.
Weather data
• Real-time weather data is now widely available for organizations to use in a variety of ways. For example, a logistics company can monitor local weather conditions to optimize the transport of goods. A utility company can adjust energy distribution
in real time.
Whole Earth Model
• An integrated data management system that allows geophysicists, engineers, and financial managers in the oil and gas industry evaluate the potential of oil and gas fields.
Contd.

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist? HackerEarth
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebJames Hendler
 
Machine Learning in the age of Big Data
Machine Learning in the age of Big DataMachine Learning in the age of Big Data
Machine Learning in the age of Big DataDaniel Sârbe
 
Data Models And Details About Open Data
Data Models And Details About Open DataData Models And Details About Open Data
Data Models And Details About Open DataMichael Bostwick
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA DATASCIENCE
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceFerdin Joe John Joseph PhD
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for RealJames Hendler
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
 
Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data Vaibhav Kurkute
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSMSunView Software, Inc.
 

Was ist angesagt? (20)

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist?
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
 
Machine Learning in the age of Big Data
Machine Learning in the age of Big DataMachine Learning in the age of Big Data
Machine Learning in the age of Big Data
 
Data Models And Details About Open Data
Data Models And Details About Open DataData Models And Details About Open Data
Data Models And Details About Open Data
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for Real
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
 

Ähnlich wie Hector Guerrero- Road to Business Analytics

Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptxshalini s
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).pptSanjayAcharaya
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxwahiba ben abdessalem
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science trainingDIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabadVamsiNihal
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in HyderabadKumarNaik21
 

Ähnlich wie Hector Guerrero- Road to Business Analytics (20)

Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).ppt
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 

Kürzlich hochgeladen

How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannaBusinessPlans
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSpanmisemningshen123
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentationuneakwhite
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 MonthsIndeedSEO
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Adnet Communications
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon investment
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Falcon Invoice Discounting
 
Falcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial WingsFalcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial WingsFalcon Invoice Discounting
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
TVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdf
TVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdfTVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdf
TVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdfbelieveminhh
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptxRoofing Contractor
 
Over the Top (OTT) Market Size & Growth Outlook 2024-2030
Over the Top (OTT) Market Size & Growth Outlook 2024-2030Over the Top (OTT) Market Size & Growth Outlook 2024-2030
Over the Top (OTT) Market Size & Growth Outlook 2024-2030tarushabhavsar
 
New 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck TemplateNew 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck TemplateCannaBusinessPlans
 
Rice Manufacturers in India | Shree Krishna Exports
Rice Manufacturers in India | Shree Krishna ExportsRice Manufacturers in India | Shree Krishna Exports
Rice Manufacturers in India | Shree Krishna ExportsShree Krishna Exports
 

Kürzlich hochgeladen (20)

How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pillsMifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
 
Falcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial WingsFalcon Invoice Discounting: Tailored Financial Wings
Falcon Invoice Discounting: Tailored Financial Wings
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
!~+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUD...
!~+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUD...!~+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUD...
!~+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUD...
 
TVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdf
TVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdfTVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdf
TVB_The Vietnam Believer Newsletter_May 6th, 2024_ENVol. 006.pdf
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
Over the Top (OTT) Market Size & Growth Outlook 2024-2030
Over the Top (OTT) Market Size & Growth Outlook 2024-2030Over the Top (OTT) Market Size & Growth Outlook 2024-2030
Over the Top (OTT) Market Size & Growth Outlook 2024-2030
 
New 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck TemplateNew 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck Template
 
Buy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail AccountsBuy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail Accounts
 
Rice Manufacturers in India | Shree Krishna Exports
Rice Manufacturers in India | Shree Krishna ExportsRice Manufacturers in India | Shree Krishna Exports
Rice Manufacturers in India | Shree Krishna Exports
 

Hector Guerrero- Road to Business Analytics

  • 1. Back to the Classroom: The Road to Business Analytics Professor Hector Guerrero
  • 2. Disk Storage 1 Bit = Binary Digit 8 Bits = 1 Byte 1000 Bytes = 1 Kilobyte 1000 Kilobytes = 1 Megabyte 1000 Megabytes = 1 Gigabyte 1000 Gigabytes = 1 Terabyte 1000 Terabytes = 1 Petabyte 1000 Petabytes = 1 Exabyte 1000 Exabytes = 1 Zettabyte 1000 Zettabytes = 1 Yottabyte 1000 Yottabytes = 1 Brontobyte 1000 Brontobytes = 1 Geopbyte Data Science / Big Data / Business or Data Analytics
  • 3. The tasty recipe for Business Analytics-- Business Analytics, 2nd Edition James R. Evans, ©2016 | Pearson
  • 4. Origins of the 3 Ingredients • Probability/Statistics …..1700’s • Early efforts to understand uncertainty (modern Stats 1900) • Operations Research …..1940’s • Attempt to bring greater efficiency to use of scare resources • Computer Technology Science …..1940’s • 1st Ph.D. in Computer Science awarded at Purdue in 1966
  • 5. What is Data Science and a Data Scientist? • “Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to Knowledge Discovery in Databases (KDD).” • “Data science is a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze actual phenomena" with data. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization.” Modified from… https://en.wikipedia.org/wiki/Data_science
  • 6. According to Wikipedia-- “When Harvard Business Review called it "The Sexiest Job of the 21st Century" the term became a buzzword, and is now often applied to business analytics, or even arbitrary use of data, or used as a sexed-up term for statistics. While many university programs now offer a data science degree, there exists no consensus on a definition or curriculum contents. Because of the current popularity of this term, there are many "advocacy efforts" surrounding it.” Modified from … https://en.wikipedia.org/wiki/Data_science
  • 7. A Process Map of Data Science https://en.wikipedia.org/wiki/Data_science
  • 8. A simple timeline of lessons learned and observations • 1966– off to Univ. Texas as an EE • 1970– off to what would become Silicon Valley • 1978/80– off to Univ. Texas-MBA/Univ. Washington-Ph.D. • 1982– off to Tuck School at Dartmouth • 1986– off to Notre Dame • 1990– off to W&M • 2017– off to retirement (?)
  • 9. 1966– off to Univ. Texas as an EE • No computers at my high school, or likely many high schools in that time. During orientation all Engineering majors required to learn Fortran programming an write a complex program in 2.5 days. Went from 4500 to 500 majors! • Lesson-- It was hard to become an engineer at UT, and one way to cull the herd is to terrorize students and see who survives • Take first Operations Research classes—I’m in heaven! • First Ph.D. in Computer Science offered at Purdue Univ.
  • 10. 1970– off to what would become Silicon Valley • Lockheed Missiles and Space company– 30k employees • Play Pong by Atari at Andy Capp’s Tavern in Sunnyvale • Realize that computers are going to be the most important tool in my professional life, and that my training in math was equally important • Attend Engineering Economics program at Stanford and introduced to Decision Analysis— read about early AI concepts, Neural Networks, Rule-Based Systems, Bayesian analysis, Logic (fuzzy), Expert Systems, etc. • All seemed important, but a little distant due to lack of computer power– for the most part is was conceptual. No way, or difficult, to actually use these concepts
  • 11. 1978/80–off to Univ. Texas MBA/ Univ. Washington Ph.D. • MBA was King/Queen of all Degrees • I learned there were firms that would pay for abilities in operations research, but very focused (for example--my ability to do time series forecast models) • Ex. Later… can you build a model for efficient distribution of natural gas/purchase futures contracts? • Learned to do modeling of many types—simulation, optimization, etc. • Still, the capabilities of these techniques were limited by the processing capabilities of computers! • My dissertation was typed manually—next year a student colleague used an IBM personal computer. Ms. Lupe Lopez lost job—she had typed dissertations for 40 years (sad).
  • 12. 1982– off to Tuck School at Dartmouth Data General One • one or two 3.5-inch floppy drives - the first portable computer to incorporate the new Sony 3.5-inch disks. • a huge 11-inch display - the largest of any portable computer - capable of displaying a full 25 lines of text with 80 characters per line. • weighing only 10 pounds, it is significantly lighter than competing CRT-based portable system, like the IBM Portable • up to eight hours of run time using the internal rechargeable batteries. • The MBA is still King/Queen as long as you are Finance or Marketing Major– especially Investment Banking. Jim Bradley was a student in my classes and a real Geek! • I was NOT a Dartmouth Man! • I did begin to see a break to more high-tech jobs and Entrepreneurship that required technology • I was still using “baby problems”, “Little–Data” in the classroom
  • 13. 1986– off to Notre Dame • I began research on Rule-Based Robotics—simple AI • Excel comes to forefront as “the working man’s/woman’s analytic platform” • I had Bill Jelen, Mr. Excel on the internet, in class– He convinced me!! • Apple produces a video predicting the use of computers and smart assistants
  • 14. 1990– off to W&M • Deep Blue (IBM) partially defeats Kasparov in Chess • Watson was not far behind and more sophisticated use of AI • Technology became omnipresent • Could do real demos of analyses in classroom • Students could follow and try themselves • Statisticians debate whether they should call themselves Data Scientists • Big Data and Analytics emerges as the way to compete “Companies questing for killer apps generally focus all their firepower on the one area that promises to create the greatest competitive advantage. But a new breed of company is upping the stakes. Organizations such as Amazon, Harrah’s, Capital One, and the Boston Red Sox have dominated their fields by deploying industrial-strength analytics across a wide variety of activities. In essence, they are transforming their organization.” Competing on Analytics, Thomas H. Davenport, January 2006
  • 15. 2017– off to retirement (?) • I develop and teach an online Business Analytics class in our OMBA– I was skeptical, but it’s a big success • I teach Intermediate Probability and Statistics to our inaugural MSBA class– soon to also be an online program • I teach an online Business Analytics class to our MAcc program • I develop an online Business Analytics for UGs • I wonder if it was the right time to retire– then I remember IT WAS!
  • 16. Where are we in this brave new world? • What’s working and Hot? …..AI!! • The future of the “customer experience” • Replacement of humans in work • Autonomous agents, including vehicles • What’s the future?......AI!! • Questions about displacement • Questions about ethics • Questions about the effect on human existence
  • 17. August May What does a Business Analytics Degree look like?
  • 18. A brief glossary of terms http://data-informed.com/glossary-of-big-data-terms/ (modified through some omission)
  • 19. Some important terms-- Algorithm • A process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer. Analytics • The discovery, interpretation, and communication of meaningful patterns in data. Artificial Intelligence • The theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.
  • 20. Contd. Data management According to the Data Management Association, data management incorporates the following practices needed to manage the full data lifecycle in an enterprise: data governance data architecture, analysis, and design database management data security management data quality management reference and master data management data warehousing and business intelligence management document, record, and content management metadata management contact data management Data mining The process of deriving patterns or knowledge from large data sets. Data science A recent term that has multiple definitions, but generally accepted as a discipline that incorporates statistics, data visualization, computer programming, data mining, machine learning, and database engineering to solve complex problems. Data scientist A practitioner of data science.
  • 21. Data visualization A visual abstraction of data designed for the purpose of deriving meaning or communicating information more effectively. Data warehouse A place to store data for the purpose of reporting and analysis. Database A digital collection of data and the structure around which the data is organized. The data is typically entered into and accessed via a database management system (DBMS). Enterprise resource planning (ERP) A software system that allows an organization to coordinate and manage all its resources, information, and business functions. Exploratory data analysis An approach to data analysis focused on identifying general patterns in data, including outliers and features of the data that are not anticipated by the experimenter’s current knowledge or preconceptions. EDA aims to uncover underlying structure, test assumptions, detect mistakes, and understand relationships between variables. Contd.
  • 22. Contd. Internet of Things (IoT) The network of physical objects or “things” embedded with electronics, software, sensors and connectivity to enable it to achieve greater value and service by exchanging data with the manufacturer, operator and/or other connected devices. Each thing is uniquely identifiable through its embedded computing system but is able to interoperate within the existing Internet infrastructure. Machine learning A type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can change when exposed to new data. Metadata Any data used to describe other data–for example, a data file’s size or date of creation. Natural language processing The ability of a computer program or system to understand human language. Applications of natural language processing include enabling humans to interact with computers using speech, automated language translation, and deriving meaning from unstructured data such as text or speech data. NoSQL A class of database management system that does not use the relational model. NoSQL is designed to handle large data volumes that do not follow a fixed schema. It is ideally suited for use with very large data volumes that do not require the relational model.
  • 23. A more complete glossary http://data-informed.com/glossary-of-big-data-terms/ (modified through some omissions)
  • 24. Analytics and Big Data Glossary Last updated: 3/16/17 Algorithm A process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer. Analytics The discovery, interpretation, and communication of meaningful patterns in data. Analytics platform Application Software that is designed to perform a specific task or suite of tasks. Artificial Intelligence The theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. Behavioral analytics Using data about people’s behavior to understand intent and predict future actions. Big data This term has been defined in many ways, but along similar lines. Doug Laney, then an analyst at the META Group, first defined big data in a 2001 report called “3-D Data Management: Controlling Data Volume, Velocity and Variety.” Volume refers to the sheer size of the datasets. The McKinsey report, “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” expands on the volume aspect by saying that, “’Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” Velocity refers to the speed at which the data is acquired and used. Not only are companies and organizations collecting more and more data at a faster rate, they want to derive meaning from that data as soon as possible, often in real time. Variety refers to the different types of data that are available to collect and analyze in addition to the structured data found in a typical database. Barry Devlin of 9sight Consulting identifies four categories of information that constitute big data: 1. Machine-generated data. This includes RFID data, geolocation data from mobile devices, and data from monitoring devices such as utility meters. 2. Computer log data, such as clickstreams from websites. 3. Textual social media information from sources such as Twitter and Facebook. 4. Multimedia social and other information from Flickr, YouTube, and other similar sites. Business intelligence (BI) The general term used for the identification, extraction, and analysis of data. Classification analysis Data analysis for the purpose of assigning the data to a particular group or class. Cloud A broad term that refers to any Internet-based application or service that is hosted remotely. Clustering analysis Data analysis for the purpose of identifying similarities and differences among data sets so that similar data sets can be clustered together. Computer-generated data Any data generated by a computer rather than a human–a log file for example.
  • 25. Contd. Correlation analysis • A means to determine a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables. • Correlation refers to any of a broad class of statistical relationships involving dependence. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price. Customer relationship management (CRM) • Software that helps businesses manage sales and customer service processes. Dashboard • A graphical reporting of static or real-time data on a desktop or mobile device. The data represented is typically high-level to give managers a quick report on status or performance. Data • A quantitative or qualitative value. Common types of data include sales figures, marketing research results, readings from monitoring equipment, user actions on a website, market growth projections, demographic information, and customer lists. Data analytics • The application of software to derive information or meaning from data. The end result might be a report, an indication of status, or an action taken automatically based on the information received. Data analyst • A person responsible for the tasks of modeling, preparing, and cleaning data for the purpose of deriving actionable information from it. Data architecture and design • How enterprise data is structured. The actual structure or design varies depending on the eventual end result required. Data architecture has three stages or processes: conceptual representation of business entities. the logical representation of the relationships among those entities, and the physical construction of the system to support the functionality. Data center • A physical facility that houses a large number of servers and data storage devices. Data centers might belong to a single organization or sell their services to many organizations. Data cleansing • The act of reviewing and revising data to remove duplicate entries, correct misspellings, add missing data, and provide more consistency. Data collection • Any process that captures any type of data. • The process of combining data from different sources and presenting it in a single view. Data integrity • The measure of trust an organization has in the accuracy, completeness, timeliness, and validity of the data.
  • 26. Contd.Data management • According to the Data Management Association, data management incorporates the following practices needed to manage the full data lifecycle in an enterprise: • data governance • data architecture, analysis, and design • database management • data security management • data quality management • reference and master data management • data warehousing and business intelligence management • document, record, and content management • metadata management • contact data management Data marketplace • A place where people can buy and sell data online. Data mart • The access layer of a data warehouse used to provide data to users. Data migration • The process of moving data between different storage types or formats, or between different computer systems. Data mining • The process of deriving patterns or knowledge from large data sets. Data model, data modeling • A data model defines the structure of the data for the purpose of communicating between functional and technical people to show data needed for business processes, or for communicating a plan to develop how data is stored and accessed among application development team members. Data science • A recent term that has multiple definitions, but generally accepted as a discipline that incorporates statistics, data visualization, computer programming, data mining, machine learning, and database engineering to solve complex problems. Data scientist • A practitioner of data science.
  • 27. Data security • The practice of protecting data from destruction or unauthorized access. Data structure • A specific way of storing and organizing data. Data visualization • A visual abstraction of data designed for the purpose of deriving meaning or communicating information more effectively. Data warehouse • A place to store data for the purpose of reporting and analysis. Database • A digital collection of data and the structure around which the data is organized. The data is typically entered into and accessed via a database management system (DBMS). Database administrator (DBA) • A person, often certified, who is responsible for supporting and maintaining the integrity of the structure and content of a database. Database management system (DBMS) • Software that collects and provides access to data in a structured format. Demographic data • Data relating to the characteristics of a human population. Distributed processing • The execution of a process across multiple computers connected by a computer network. Document management • The practice of tracking and storing electronic documents and scanned images of paper documents. Electronic health records (EHR) • A digitized health record meant to be usable across different health care settings. Enterprise resource planning (ERP) • A software system that allows an organization to coordinate and manage all its resources, information, and business functions. Exploratory data analysis • An approach to data analysis focused on identifying general patterns in data, including outliers and features of the data that are not anticipated by the experimenter’s current knowledge or preconceptions. EDA aims to uncover underlying structure, test assumptions, detect mistakes, and understand relationships between variables. External data • Data that exists outside of a system. Contd.
  • 28. Extract, transform, and load (ETL) • A process used in data warehousing to prepare data for use in reporting or analytics. Information management • The practice of collecting, managing, and distributing information of all types–digital, paper-based, structured, unstructured. • in-memory database • Any database system that relies on memory for data storage. • in-memory data grid (IMDG) • The storage of data in memory across multiple servers for the purpose of greater scalability and faster access or analytics. Internet of Things (IoT) • The network of physical objects or “things” embedded with electronics, software, sensors and connectivity to enable it to achieve greater value and service by exchanging data with the manufacturer, operator and/or other connected devices. Each thing is uniquely identifiable through its embedded computing system but is able to interoperate within the existing Internet infrastructure. Location analytics • Location analytics brings mapping and map-driven analytics to enterprise business systems and data warehouses. It allows you to associate geospatial information with datasets. Location data • Data that describes a geographic location. Machine learning • A type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can change when exposed to new data. Metadata • Any data used to describe other data–for example, a data file’s size or date of creation. Multidimensional database • A type of database that stores data as multidimensional arrays, or “cubes,” as opposed to the rows and column sotrage structure of relational databases. This enables data to be analyzed from different angles for complex queries and analytical processing (OLAP) applications. Natural language processing • The ability of a computer program or system to understand human language. Applications of natural language processing include enabling humans to interact with computers using speech, automated language translation, and deriving meaning from unstructured data such as text or speech data. NoSQL • A class of database management system that does not use the relational model. NoSQL is designed to handle large data volumes that do not follow a fixed schema. It is ideally suited for use with very large data volumes that do not require the relational model. Online analytical processing (OLAP) • The process of analyzing multidimensional data using three operations: consolidation (the aggregation of available), drill-down (the ability for users to see the underlying details), and slice and dice (the ability for users to select subsets and view them from different perspectives). Contd.
  • 29. Open source software • Software with source code that is made available by the copyright holder free of charge to the general public. This code may be redistributed, and anyone can inspect and change it. Pattern recognition • The classification or labeling of an identified pattern in the machine learning process. Petabyte • One million gigabytes or 1,024 terabytes. Predictive analytics • Using statistical functions on one or more datasets to predict trends or future events. Predictive modeling • The process of developing a model that will most likely predict a trend or outcome. Query analysis • The process of analyzing a search query for the purpose of optimizing it for the best possible result. R • An open source software environment used for statistical computing. Records management • The process of managing an organization’s records throughout their entire lifecycle, from creation to disposal. Risk analysis • The application of statistical methods on one or more datasets to determine the likely risk of a project, action, or decision. Root-cause analysis • The process of determining the main cause of an event or problem. Scalability • The ability of a system or process to maintain acceptable performance levels as workload or scope increases. Schema • The structure that defines the organization of data in a database system. Search • The process of locating specific data or content using a search tool. Contd.
  • 30. Search data • Aggregated data about search terms used over time. Storage • Any means of storing data persistently. Structured data • Data that is organized by a predetermined structure. Structured Query Language (SQL) • A programming language designed specifically to manage and retrieve data from a relational database system. Terabyte • 1,000 gigabytes. Text analytics • The application of statistical, linguistic, and machine learning techniques on text-based sources to derive meaning or insight. Transactional data • Data that changes unpredictably. Examples include accounts payable and receivable data, or data about product shipments. Transparency • As more data becomes openly available, the idea of proprietary data as a competitive advantage is diminished. Unstructured data • Data that has no identifiable structure – for example, the text of email messages. Weather data • Real-time weather data is now widely available for organizations to use in a variety of ways. For example, a logistics company can monitor local weather conditions to optimize the transport of goods. A utility company can adjust energy distribution in real time. Whole Earth Model • An integrated data management system that allows geophysicists, engineers, and financial managers in the oil and gas industry evaluate the potential of oil and gas fields. Contd.