This document provides an overview of big data and how it can be applied in the oil and gas industry. It discusses key aspects of big data including definitions, how big data analytics works and differs from conventional analytics, preparing for a big data project, potential business cases and benefits in oil and gas domains. Examples are given around using big data to better manage electrical submersible pumps, streamline drilling operations, and conduct well integrity risk analysis. The document emphasizes that big data projects require significant preparation, including defining success criteria and evaluating technology options before full implementation, in order to maximize chances of success.
Essential Prerequisites for Maximizing Success from Big Data
1. Society of Petroleum Engineers
Distinguished Lecturer Program
www.spe.org/dl
Muhammad Shehryar Khakwani
Essential Pre-Requisites for Maximizing
Success from Big Data
1
2. Overview
• What is Big Data?
• How does it apply to Oil & Gas?
• Why is it important?
• How does Big Data Analytics work, and how is it different?
• How to prepare for a Big Data project?
• Define Petroleum Engineering business cases for evaluation
• Big Data project must have collaboration between PE and IT
2
3. Key Points to Remember
1. Big Data can solve big Problems
2. Big Data technologies present big Opportunities
3. New technology which needs a big Evaluation
4. Holds big Rewards especially for big Organizations
5. It contains some big Risks
6. Mitigate risks by recognizing them and being prepared
3
4. Big Data – Academic Definition
Properties of Big Data
Data with high Velocity, Variety, and Volume
Oil & Gas Enterprises Data Gathering
4
Velocity Faster data gathering with Sensors. Intelligent Fields, LWD, MWD.
Variety Files, Project & Master Databases, Documents, Maps, Spreadsheets.
Volume More data collected today. Computer Devices, Systems, Real-time, Workflows
5. Big Data – Understandable Definition
Through the ages humans gained
knowledge and recorded information.
Big Data accomplishes the reverse
• Digital data left by devices
• Study the patterns
• Create the information
• Lights and Crime
• Flu Outbreak in real-time
5
6. Capitalizing on Analysis with Big Data
More Data Gathering Not Equivalent to More Information
The percentage of data an enterprise can understand is on the decline. A further complication is that
the data the enterprise is trying to understand is saturated with useful signals and lots of noise (IBM)
http://www.ibm.com/developerworks/bigdata/karentest/veracity.html
Big Data Analytics
A suite of applications providing solutions and analysis. A data-centric method adept at
uncovering otherwise invisible patterns and connections by linking disparate data types
http://www.spe.org/jpt/article/9969-management-using-big-data-analysis-tools-to-understand-bad-hole-sections-on-the-uk-continent by
Joe Johnston, and Aurelien Guichard
6
7. How is Big Data Analytics Different
Conventional Data Analytics Big Data Analytics
7
10. Big Data Technology Providers
• Old names, New names, Catchy titles
• How do we understand all these different
products?
• How should we apply them?
• Who should we get with the right skill set?
10
11. Data Science
Insights and Correlations are
uncovered in this part of the
project.
Must involve Subject Matter
Experts, as they will validate
correlations.
Machine Learning, Data
Mining, Advanced Analytics
all play a part. Relatively
new skill set in industry.
Technology
Leading area of interest
for IT companies.
Too many choices.
List Important features:
• Handle Multiple Formats
• Rich Analytics Library
• Data Manipulators
• Performance & Scale
• Visualization
• Maintainability & Support
Business
Approach with Oil & Gas
business objectives in mind.
Technology is applied in Oil
& Gas to solve a business
problem.
Hint: Look at past projects
which were not “doable”, or
analytics did not yield
desirable results.
Big Data Project Plan
Get Organized and Plan with Focus on 3 Major Areas
11
12. Planning Business Case
• Identify Multiple Business Cases which are good candidates for a Big Data Project
• Emphasize “Analytical Component”
• Identify and Involve Subject Matter Experts. Explain and get Buy-in.
• SMEs critical in Evaluation, Adoption, Promotion, and Success in Big Data.
12
13. Potential Benefits for Oil & Gas
Business Domains Where Big Data Can Positively Impact
• Streamline Operations
• Reduce Equipment Failures
• Well Planning
• Improved Safety
• Optimize Production from Assets
• Environmental & Social Responsibilities
13
14. I-Field Data Collected by Sensors
14
CHOKE
POSITION
VALVE POSITION
VALVE OPENING
UPSTREAM FLOW
DOWNSTREAM PRESSURE
DOWNSTREAM FLOW
DOWNSTREAM
TEMPERATURE
UPSTREAM PRESSURE
WELLHEAD FLOW/RATE
CATHODIC PROTECTION
INPUT CURRENT
INPUT VOLTAGE
OUTPUT CURRENT
OUTPUT VOLTAGE
ESP
FLOW
MOTOR CURRENT
MOTOR CURRENT LEAKAGE
MOTOR SPEED
MOTOR TEMPERATURE
MOTOR VIBRATION
MOTOR FREQUENCY
ESP DISCHARGE PRESSURE
ENERGY PERFORMANCE
INDICATOR
SUCTION PRESSURE
SUCTION TEMPERATURE
EQUIPMENT OPERATING HRS
FLOWLINE
UPSTREAM PRESSURE
DOWNSTREAM PRESSURE
GAS
PRESSURE
DENSITY
FLOW
GAS LIQUID RATIO
GAS OIL RATIO
RATE
TEMPERATURE
GAS VOLUME FRACTION
GAS VOLUME
OIL
DENSITY
FLOW
RATE
SHRINKAGE FACTOR
TEMPERATURE
OIL VOLUME FRACTION
OIL VOLUME
PLANT
GAS PRESSURE
GAS TEMPERATURE
WATER CUT
OIL FLOW RATE
GAS FLOW RATE
WATER INJECTION
DISCHARGE PRESSURE
PUMP IN GOSP
PUMP IN WIP
PUMP STATUS
WATER
RATE
DENSITY
FLOW
LIQUID RATIO
TEMPERATURE
15. Example: Electrical Submersible Pumps
Better Management of Electrical Submersible Pumps
Minimize Trip Events
• More Oil Production
• Longer Life & Lower Maintenance
Analytics
• Average Time between Trips for different Pumps
Advanced Analytics
• Analyze ESP real-time data leading to failures
– Motor Speed, Vibration, Suction vs Discharge.
• Combine with Fluid or Reservoir properties
• Check Different Manufacturer Models
Reveal cause or leading indicators reduce downtime
Typical ESP Installation.
Any sensor data patterns
correlating with failures
15
16. Example 2: Streamline Drilling Operations
Build Models & Apply in Real-Time
Analyze Real-Time Drilling Data to Reduce Downtime
• Construct Model from Historical Dataset – offset wells
– Drilling Trouble
• Drag Prediction
• Kick Detection
• Stand Pipe Pressure
• Real-Time Correlation & Advisory
• Adjust for Optimum ROP
• Drill faster safely
Salem Gharbi, Saudi Aramco
Drilling & Workover Systems Specialist
18. Example 4: Well Integrity Risk Analysis
18
Wells
10K +
Risks
461K +
HIGH
CRITICAL
19. Technology: Architecture
Major Technology Components
• Data Ingestion
• Compute cluster
• Analytics Library
• Query Engine
• Data Access, Presentation
Key Considerations
• Security
• Maintenance
19
20. Technology: Data Preparation
• Identify Input Data for your Business Cases. After all Big Data is about Data
• Preparing Input Data Sets Difficult and Time consuming:
– Relevant Dataset. Get a large enough sample set
• A small sample size may not contain enough for patterns & correlation
– Data Ownership and Security
• Big Data projects involve large, cross-departmental data sets. Permission to use data.
• Confidentiality Restrictions on data security, data masking etc.
• Data masking is especially troublesome when combining Structured & Unstructured data.
– Data Migration Tools
• Many datasets used as input. Need tools with special connectors to move data
20
21. Technology: Data Preparation
– Data Quality, or Veracity Plays a Big part
• Garbage in = Garbage out.
• Performing analytics on Enterprise or Cross-departmental data.
• Varying levels of Data Quality from one department to another.
• Gathered from Sensors, Master & Project database, Internet, Documents
– Develop Policies for Managing Data in Project
• How will you assess the data quality for input data
• How will you clean bad data
• How will you deal with missing data
** Have SME sign off on procedures related to data cleanup
21
22. Data Science & Data Mining Lifecycle
Use a Reference Model
• Long Project with Iterative Activities. A Framework keeps track of where you are
• Data Mining & Data Science lifecycle similar
22
CRISP-DM: CRoss Industry Standard Process for Data Mining
Business
Understanding
Data
Understanding
Data
Preparation
Model
Construction
Evaluation &
Conclusion
Deploy &
Apply
23. Data Science – Evaluation By SME
Modeling, Validation & Conclusion
• Iterative process to validate model, re-try using different parameters
• SMEs must validate findings: Correlation <> Causation
• Note: Using Previous Example, suppose for ESPs you find a correlation between water cut in fluid and pump failure; Could that
be a valid finding? Need Pump expert to validate this correlation, not just rely on data points to establish a finding.
• SME time is usually not easy to get
23
Business
Understanding
Data
Understanding
Data
Preparation
Model
Construction
Evaluation &
Conclusion
Deploy &
Apply
24. Establish Success Criteria
Pilot Project and Define Success
• Business: How do you define success for correlations and patterns
which you are unaware of? Did you find anything unexpected?
• Technology: Get an idea of the capability, scale, and performance
needed for production implementation
24
25. Conclusion
• Big Data projects need significant preparation
• Preparation dramatically increases chances of success
• Decide upfront the business value you want from Big Data
• Resist temptation to jump into project before you are ready
• Don’t approach like typical IT projects.
• Evaluate Options Before Deciding on a Production Implementation
Don’t Chase Technology, Let Technology Serve Your Business Needs
25
26. Society of Petroleum Engineers
Distinguished Lecturer Program
www.spe.org/dl
26
Your Feedback is Important
Enter your section in the DL Evaluation Contest by
completing the evaluation form for this presentation
Visit SPE.org/dl
26
Hinweis der Redaktion
At the very onset, I will say that this presentation is a little bit different in that it addresses 2 types of audiences: PE and IT.
We will cover what Big Data is (so everyone is on the same page) whether it is even relevant to Upstream Oil & Gas.
We have been collecting data, so let us see if we can unlock some of that potential and that is why it is important.
How Big Data (or Data Science) projects are different, and therefore how to prepare for them.
How you can define some business cases where it can help you – I will give some examples.
The last point I must emphasize is the collaboration between IT and PE.
Big Data is an emerging technology. It should be placed on the list of strategic technologies especially for large enterprises. It holds true for large oil & gas enterprises, including service provider and producers.
As a technology it can solve some problems which were not doable in the past. Therefore, if applied correctly it offers some pretty big opportunities.
You have technological options and choices when you start. You need a methodical and careful approach. It is emerging and maturing, so an organization must invest in a plan to evaluate before proceeding.
If it pays off, it will hold big rewards especially for large enterprises. Big undertaking but it comes with a big pay off.
There are always risks that go along with big projects and new technology. However, this need not be a gamble, if you understand the risk, and you are aware, you can mitigate the risk and have the right enablers in place to boost your chances of success.
The name “Big Data” is misleading. It implies a large mass of volume of data, like a long list of phone numbers in a telephone directory for a city. And that is not what it is. What it is instead it is a collection or a grouping of data which contains certain properties.
A standard or typical definition of Big Data talks of data with Velocity, Variety & Volume. (other definitions include Value and Veracity). But the 3 V definition is accepted everywhere. The idea being that if you have these 3 properties present in the problem you are looking at, you can look at the big data stack of solutions.
Large Oil & Gas companies have experienced growth in the data it collects especially over the last decade and a bit. And they continue to experience growth in the data they collect.
Sensors installed in Upstream assets are streaming data in many processes including I-Field initiatives, or Logging and Measuring while Drilling. This is data arriving at a fast rate, or Velocity.
Data is collected in a variety of formats. Files, databases, documents and maps. These contribute to Variety, and finally the overall volume of data produced increases every year. We just keep adding to what we have faster than we are archiving it.
Here is a more understandable definition to get a grasp of what Big Data is about.
Just because we are gathering more data does not mean we are deriving the most value out of it, or putting it all together in a cohesive manner.
IBM put it well by stating that as a percentage of the whole, we are understanding little of what we actually collect. Furthermore, what we are collecting is not necessarily of the highest quality, checked, approved and free of errors. So, there is a lot of “noise” in what we already have.
Big Data is meant to address this situation, and capitalize on the “collection” of data of varies sources, and bring cohesion and coherency by enabling Superior Analytics.
Every good businessman or -woman carefully analyzes all the available facts before making a decision.
Its strengths lie in:
The ability to examine very large and disparate data sets.
Apply Data Science technique including cutting edge methods such as machine learning, Data Mining, and Advanced Statistics packages (such as R package etc.).
Uncover patterns and insights by letting machines find them, since the data sets being analyzed are too big for humans to go through.
This leads to better problem solving, and better, more informed decision-making. More data and insights provide better understanding which leads to better decisions.
Let us compare Big Data Analytics to Conventional Data Analytics to get an idea of the new capabilities.
Conventional Data Analytics
Begin by covering conventional Data Analytics first. A typical analytical project works within a data repository. So, we typically generate cumulative values, averages, deviation from norm, trends within a repository, such as the master database.
In some areas where information needs to be combined, there are specialized techniques for roll-ups and cubes of information which can be derived from a data warehouse. A data warehouse is composed of data marts where data is either Federated or a process known as ETL (Extract Transform Load) is used. The results are aggregated and shown in a visual format as graphs, or charts, or trends. These lead to changes which result in optimized business processes.
State example with Production data.
Big Data Analytics
Big Data Analytics takes the standard repositories, and a whole other set to them. Nothing is left out of bounds. We take documents, emails, messages, maps, spreadsheets, power point presentations, and data from sensors. There is a lot of information here which is used in Big Data Analytics technology stack. The result is patterns, correlations, insights and these lead to better decisions as everything is considered.
So, how does Big Data technology stack relate to Oil & Gas.
Let us consider Upstream data.
If you take the information which lives in all the data stores and repositories, you can combine it in the Big Data technology stack.
Examples are correlations between Lithology and Drilling LWD data. Real-time data sent from downhole sensors combined with ESP data.
The Big Data technology stack is definitely applicable to Upstream Oil & Gas.
Now that we know what Big Data is, and that it applies to Upstream Oil & Gas – Great! Let us begin
The impetus to charge ahead is there, and we seem to kinda, sorta, understand it… sure why not.
IT leaders see it and hear about it and want to try it, and perhaps overbuy or oversell it.
We are ready to pay.
Just bear with me a little more before you begin…
For those who have dealt with Information Technology in the industry and been around for a while, there are some Old Names. Established technology firms are looking at this area. There are also some very new names, and technology companies who are targeting this “niche” or emerging area.
We see some catchy titles, and the marketing literature certainly looks very promising. The presentations are appealing and one wants to just start and get going.
But just as one starts, some questions emerge right at the very start:
How do we understand these different products? There are a lot of them which are available. It is a bit too early to tell them apart also. This is typical of an emerging technology.
How do we actually apply these in our data centers?
Who do we get who has the right skill set and experience? Training, outsourcing, buying, building … who has the expertise to navigate us through the choices.
How do we make a plan that prepares us for a Big Data project?
A good Big Data project plan must focus on 3 big areas. I will over these areas in more detail and elaborate the different elements within each.
In general, however, there are 3 areas that must be addressed in a Big Data project plan:
Business: A Big Data project, like any other project, needs investment. Resources, computer clusters, software, expertise etc. Make sure you approach it with the mindset of applying technology to the Oil & Gas business. Have some objectives in mind. Key thing to remember is that the Oil & Gas sector use technology to enhance its core business.
Technology Stack: This is usually a starting point for many projects and receives the majority of attention. It is important to recognize this is where the execution occurs, but don’t fixate on this phase. This is a leading area of interest for companies. Many choices, but don’t get overwhelmed. A good way to approach this is to work with your technology development team and list features that are important to you and if possible prioritize the list or at least rank it.
Data Science: Data is going to play a huge role in a Big Data project. You will discover things you did not know. Involve SMEs. Hard to find the right experienced resources.
The first major focus area in the plan.
Business: Good quote from Jake Porway and very relevant. Start with the question and keep the business in mind.
Since this is a new technology, do not just list 1 or 2 cases, it is better and safer to start with a slightly longer list. There are some cases that you will think are Big Data but they will not be. You can do them using other means, or they won’t be feasible to get data for etc. So, definitely meet with users, data managers, analysts. It is best to do a set of brainstorming sessions. Start with explaining Big Data, the concept, the capabilities and then follow up with meetings to list and flesh out business cases.
If possible get them from across the enterprise. PE, Exploration, Drilling. If you do just one area, others will think Big Data only applies to PE or Drilling. This has possibilities for all, so get everyone involved early. Make them reflective of a major exercise. Don’t do something too small or too big. Try and right-size it. Spend a bit of time studying what you are going to put in and get out of it.
Finally, definitely, absolutely, work with Subject Matter Experts. Get their buy-in from the beginning. You will see that they are critical in subsequent phases and steps, so involve them from the very beginning.
Drilling operations. Can you predict stuck pipe before it occurs? Formation collapse, or drilling mud problems before they occur.
Pump failures and the ability to predict can help take action to reduce downtime, increased production
Better planning for new wells. Drilling plans by studying offset wells, optimal location to place them by studying offset wells etc.
Scan notes about safety on jobs and aggregate them to give pointers for newer jobs. Being aware of problems can help avoid them
Get more from existing assets. Better production, injection, identify PE problems. For PE operations do holistic causal analysis on typical problems effecting production flow such as plugging, sand, water, scale,
Early leak detection, or get there before the leak. Reduce corrosion. When fracking, ensure no harm to water resources etc.
Different types of readings collected during a large i-Field installation.
Need an SME to take a look at them and see which ones should be cross-correlated.
Useful for planning further installations of wells, you can deploy the right types of sensors so that you can analyze the incoming data in the future to give answer to questions you are interested in.
An example from the Petroleum Engineering side.
Electrical Submersible Pumps are used to help bring fluid to the surface where the pressure is low. They are installed in the production tubing.
One problem with the pumps is to try and minimize a trip event. Every time a pump trips, it costs money in terms of lost production. Frequent tripping can also increase maintenance costs or require the pump to be changed out.
A typical application which analyses this data took into consideration the average time between trip events, at different wells, and began to do some predictive analytics on trends to show when a pump might trip (typical problems were with VSD controllers or Power outages which caused trips).
However, Advanced Analytics can examine more data and see what were the conditions and readings before the outage suction pressure, motor temperature, vibration. Perform analysis on the relationship between the pump trips and failures. Each trip causes a strain on the motors and life of the ESP. Fewer trips usually means longer life.
Oil & Gas enterprises deal with drilling wells, or performing workovers to deepen boreholes, or sidetrack them.
Combine real time data with historical data and make models. Then use it to monitor and advise. Train for different situations including
Choosing the best bit. Mechanical stuck pipe failures. Stand pipe pressure.
In the Drilling domain, we have a high level of uncertainty encountered when conducting drilling operations.
We capitalize on sensing technology, advanced analytics and artificial intelligence to monitor, predict and provide advisory recommendations for the 220+ drilling rigs operating
This is in order to ensure drilling safety, enhance drilling efficiency, and reduce cost.
In the next slides, I will share with you some of the examples:
We apply advanced data science to provide a live holistic view of the Risk Status of all Saudi Aramco wells utilizing automated smart agents that securely collect data from all related data repertories.
These smart agents analyze the collected data and rank the risk factor associated for each well and generates the appropriate alerts if necessary.
Using these advanced solutions, we have successfully managed to facilitate lowering the number of critical and high risk wells in our fields.
The second focus area for a Big Data project plan is the Technology Stack.
To go over this, the diagram shows all the key components of a technology architecture which will make up the different pieces of the hardware and software needed for Big Data.
It is simplified here, and when you actually begin to install and work with the choices, there will be more to it, but these are the areas which need to be fleshed out.
Identify the source data repositories, and the first thing is to do is to “ingest” or bring the data into the Big Data processing area.
A computer cluster to host and serve the software (could be windows or linux based) depending on the choice of vendor (or in-house) technology. It will cater for both structured and unstructured data. (Structured data is your typical database storing data in relational tables, and Unstructured data refers to Documents etc.).
The processing will include special rich data analytics libraries which can slice and dice different data types, batch processing capabilities for long-running queries, and handlers for real-time data.
Pay special attention to security, since something secure should not be available for access here. Finally the end-users care about how all the results are presented to them.
The 3rd focus area on the plan is Data Preparation.
Begin by identifying what data is needed for each business case. I will venture and say this will be more difficult than you predicted. It will raise many questions when you begin to list the data needed. Better to think about these issues NOW than when you are in the middle of executing the project and run into the hurdles.
First start with identifying a relevant data set. Make sure it is not too big or too small
Deal with the Data ownership and Security issues listed. Need permission from different organizations as they own the data, and each data owners may place different restrictions on the use of data. Try to avoid data masking, makes it very difficult when dealing with unstructured data, documents, maps, text.
Identify Data Migration tools. There are some on the market which help move data about. Some are better than others, so identify those that have good “connectors” for your data types.
The trickiest part is to understand the 4th “V” or veracity of data.
Data quality plays a big role in Big Data. It is a bigger role than other projects because we are combining data from different data sources, different “grain” of data, and the origins of Big Data were more to give insights on “marketing” type of applications. 80% correct is just fine for many other businesses.
For Oil & Gas, or Engineering data, one needs to be more precise
So, this is going to need special attention and planning. The better the quality of data, the better the analysis, the more reliable the results.
It is also very difficult to predict the resultant set. (refer to last bullet).
Think about what you are putting in, and how it will affect the outcome.
The timeline for a Big Data project is also long. It is not something you will finish very quickly. Remember this is a new area which tackles a very difficult problem, so it is not something which has been done frequently in the past that we can just replicate easily.
Use a Reference model. It will help you keep track of where you are. If you run into trouble, or need to re-work something, then it will tell you which phase you should go back to.
CRISP is one such model. You can use others if you like. But use something as a guide.
I mentioned earlier that Subject Matter Experts play a big role in Big Data Evaluation.
Since these are “new” insights and correlations, it is not up to the data scientists and the developers to validate the findings. Do not rely on application developers to do this.
Any finding or correlation must be validated by the SME.
(go over the example shown).
This may be an iterative process. The SME might ask you to add or drop parameters, include more data or exclude some data points etc. Account for this time. There is a fair bit of unknown here, so make sure you plan for the time.
One thing for sure is that the time for SMEs is valuable. On-going operations and business will take priority, so find a way to get the time and be judicious with their time.
Pilot a project to understand what you are dealing with. It is a new technology.
Big Data is different than dealing with a system comprised of data entry screens and reports. We typically mock these up in prototypes and then developers build them. Here we are not dealing with pre-defined output.
Did you gain any insight? Learn about combining data, the capabilities, get the feel for benefit to business
This may differ from organization to organization, and that is just fine. Remember we wanted to gain insights on our data and relationships to make better decisions for our core business.
** Make sure to have technological success factors also. These should address scale, performance, and processing power. These will help size and scale the production environment.
In summary:
Make sure you prepare.
Good preparation will dramatically increase your chances of success.
Know what you want from the Big Data project.
Don’t jump in before you are ready
Don’t approach it like a typical IT project
Evaluate all your options before deciding on a production implementation technology stack. Think of how to adjust the stack as needed.
Remember, don’t chase technology…