SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
© 2015 Concurrent, Inc. All rights reserved.
ACHIEVINGOPERATIONAL
READINESSONHADOOP
9 BEST PRACTICES FOR THE ENTERPRISE
BY SUPREET OBEROI
VP of Field Engineering, Concurrent
© 2015 Concurrent, Inc. All rights reserved.
Back then, only hobbyists and status seekers drove
their own vehicles, which were temperamental, to put
it mildly. To operate a car you pretty much had to be a
mechanic, because the only thing you could count on
was a breakdown. The dashboard showed little, if any,
actionable information. Only after that changed, and you
could reliably get from here to there without a wrench,
did automobiles really take off.
I draw the analogy to Hadoop because nearly every day
I hear from enterprise IT teams in industries like retail,
finance, health care, and insurance who thought Hadoop
was a Camry, and are learning that it’s more like whatever
predated the Model T. These teams were led to believe
they could replicate the business successes of Twitter,
LinkedIn, and Netflix simply taking a shiny new cluster
for a spin, but now they’re struggling to deploy Hadoop
applications with the standards of quality, reliability, and
manageability that they have come to expect.
In short, Hadoop, out of the box, is not operationally ready.
In this paper, I’ll share 9 best practices for how IT
organization can achieve operational readiness on
Hadoop. Of course, there are not yet formal certifications
or commonly accepted standards for overcoming the
many challenges. But there is an emerging consensus
around how Hadoop applications are best built, deployed,
and managed.
GettingstartedwithHadoop
isalittlelikebuyingacar—
100yearsago.
Buildcultureandtoolssupporting
collaborationbetweendevelopers,
operators,&otherHadoopteammembers
Operators are not trained — and often can’t or don’t want to be trained — to look
at the Java stack traces and debug code. Likewise, asking a developer to address
performance problems is a little like asking a passenger to get out of your car and look
under the hood (assuming your passenger is not a mechanic).
Therefore, at least for the foreseeable future, optimizing performance on Hadoop is
a team proposition. In the best case, an operator who detects a problem can easily
collaborate with the developers, data scientists, and even business managers who have
stakes in the application running smoothly. They should all see operational readiness as
a shared responsibility, and be armed with tools that show them the complete picture
when remediation is required.
1
© 2015 Concurrent, Inc. All rights reserved.
2
Connectexecutionproblemswith
applicationcontext
At newer companies, especially those in the San Francisco Bay Area, I find that one
person typically handles all the work — the data science, the development, and the
deployment to production — around a big data application. If there are problems when
the app runs, that same person can usually fix it. After all, she wrote it.
For big, traditional enterprises, it’s a different story. The operations team running fraud
and risk detection apps on Hadoop might live in Phoenix, while the team that developed
them sits ten time zones away in India. In some cases, the operations team today is
completely different from the one that first deployed an application.
Therefore, when a Hadoop job fails or takes too long to execute, operators should be
able to quickly link problems not only to the application that caused them, but also to
the relevant data flow logic inside the application. It’s also great if the operator can
immediately see detailed mapper/reducer stats tied to the problem. That way, they can
more quickly understand if the problem is with the code, the data or the hardware.
When a performance
problem arises, operators
should be able to investigate
app logic and cluster usage.
To an operator running hundreds or thousands of applications on a Hadoop cluster,
all of them look the same — until there’s a problem. So you need tools that let you
look at performance over groups of applications. Ideally, you should be able to
segment performance tracking by application types, departments, teams and data-
sensitivity levels.
Monitorthefleet,notthevehicle
3
© 2015 Concurrent, Inc. All rights reserved.
Monitoring a fleet still means knowing when an individual vehicle performs poorly.
Similarly, operators need to set SLA bounds on performance and define alerts and
escalation paths when they’re violated.
Every business is unique, so there’s no set list of performance metrics to monitor. But it’s
certainly looking at more than what you can see in log files.
SLA bounds should incorporate both raw metadata such as job status, as well as
business-level events like sensitive data access. Successful practitioners of operational
readiness also set up metrics that help predict future SLA violations, so they can
proactively address and avoid them.
Defineandenforceservicelevel
agreements—Yes,evenonHadoop
4
© 2015 Concurrent, Inc. All rights reserved.
Understandinter-appdependencies
5 At technology companies blessed with enough capital to run a dedicated Hadoop
cluster for every use case, applications run more or less independently. That’s not the
case, however, at larger, more traditional enterprises, which tend to run their clusters
as a shared service across lines of business. As a result, each application has at least
a few “roommates” in the cluster, some of which can be noisy, disruptive, or otherwise
detrimental to its own performance.
To understand what’s behind the errant behavior of one Hadoop application, in other
words, you have to first understand what others were doing on the cluster when it
ran. Did a rogue app hog resources, causing others to perform poorly? Was the poor
performance of one application actually due to its dependency on data from some other
application, upstream, that failed to operate as expected?
Provide your operations team with as much cluster-related context as you can. For
example, just by tracking cluster usage by application, you’ll more quickly understand
when an SLA violation is really about a rogue app, rather than a problem in the one that
triggered the alert.
© 2015 Concurrent, Inc. All rights reserved.
Tracking applications
that consume more
than 10,000 mappers
Establishing and enforcing the rules for rationing cluster resources is vital for achieving a
meaningful state of operational readiness and meeting SLA contracts. You may have to
handle unusual edge cases. For example, is it acceptable for a recommender engine to
meets its SLA contract in terms of spitting out recommendations but totally consuming a
700-node cluster for the duration of its execution? (I saw this happen in real life!)
RationYourCluster
6 To optimize cluster usage and ROI, operators must ration resources on the cluster and
enforce the limits.
For example, an operator can budget 10,000 mappers for the execution of a particular
application. Then, the onus is on the application to do two things: comply with the
budget restriction, and then demonstrate that compliance. Lacking such proof, rationing
rules should prevent the application from being deployed on the cluster. After all, the
application is not trustworthy.
© 2015 Concurrent, Inc. All rights reserved.
Solving for data lineage and governance in an unstructured environment like Hadoop is
no easy task. Traditional techniques to manually maintain a metadata dictionary quickly
lead to stale and old repositories. In addition, there is no proof that the model that is
deployed in production is using the fields described in the metadata repository.
What is required is visibility and enforcement at the operational level on the use of data
fields. If you can track if and when a data field is accessed by an app, you can make the
case you need to make.
Tracking the lineage
of data fields at an
operational level
Tracedataaccessattheoperationallevel
7 Good Hadoop management isn’t only about rationing compute resources; it also means
regulating access to sensitive data. This is especially true in industries with heightened
privacy concerns, such as financial services, health care, insurance, even, these days,
social media.
For example, a data scientist may develop a vastly improved new model for reducing
lending risk, but unless the enterprise can prove that the application does not use any
private data, it cannot deploy the application to production.
© 2015 Concurrent, Inc. All rights reserved.
Recorddatamisfires
8 Compliance folks at large enterprises also want proof that a Hadoop application
processed every record in a dataset, and they look for documentation when it fails
to do so. Failures can result from format changes in upstream data sets or plain old
data corruption. Keeping track of all records that the application failed to process is
particularly vital in regulated industries.
© 2015 Concurrent, Inc. All rights reserved.
Tuneyourenginebeforeyoureplaceit
9 With new compute fabrics emerging all the time, teams are sometimes too quick to junk
their old ones in pursuit of better performance. However, it’s very often the case that
you can achieve equal or greater performance gains just by optimizing code and data
flows on your existing fabrics. That way you can avoid expensive infrastructure upgrades
unless they’re truly necessary.
© 2015 Concurrent, Inc. All rights reserved.
OperationalreadinessonHadoop:
Youcangettherefromhere Supreet is the Vice President of Field
Engineering at Concurrent. Prior to that,
he was Director of Big Data application
infrastructure for American Express, where
he led the development of use cases for
fraud, operational risk, marketing and
privacy on Big Data platforms. He holds
multiple patents in data engineering and
has held leadership positions at Real-Time
Innovations, Oracle, and Microsoft.
When Henry Ford launched the Model T, it was rugged and robust, the first
car fit for a broader market. The car was not only affordable to buy; it was
also practical to own and operate.
Perhaps Hadoop will get there, too. Until then, there’s a lot you can do to
avoid getting left on the side of the big data road.
If you’re looking for help with operational readiness on Hadoop, or are
curious about the charts and displays I’ve shown here, get in touch with me
at sales@concurrentinc.com or visit concurrentinc.com.
ABOUT SUPREET OBEROI
@supreet_online
www.concurrentinc.com
sales@concurrentinc.com

Weitere ähnliche Inhalte

Andere mochten auch

Lavoro accessorio e prestazioni a sostegno del reddito
Lavoro accessorio e prestazioni a sostegno del redditoLavoro accessorio e prestazioni a sostegno del reddito
Lavoro accessorio e prestazioni a sostegno del redditoAntonio Palmieri
 
Silabo para tarea de curso de docencia
Silabo para tarea de curso de docenciaSilabo para tarea de curso de docencia
Silabo para tarea de curso de docenciaLleroc Llerena
 
Herramientas educativas.
Herramientas educativas.Herramientas educativas.
Herramientas educativas.Lydatsr
 
My ideal society: "The Modern Islands"
My ideal society: "The Modern Islands"My ideal society: "The Modern Islands"
My ideal society: "The Modern Islands"Karla Andino-Perez
 
Peter Walton-'La complejidad de las Normas Internacionlaes de Información Fin...
Peter Walton-'La complejidad de las Normas Internacionlaes de Información Fin...Peter Walton-'La complejidad de las Normas Internacionlaes de Información Fin...
Peter Walton-'La complejidad de las Normas Internacionlaes de Información Fin...Fundación Ramón Areces
 
Obesidad infantil
Obesidad infantil Obesidad infantil
Obesidad infantil avelez26
 

Andere mochten auch (9)

Lavoro accessorio e prestazioni a sostegno del reddito
Lavoro accessorio e prestazioni a sostegno del redditoLavoro accessorio e prestazioni a sostegno del reddito
Lavoro accessorio e prestazioni a sostegno del reddito
 
Silabo para tarea de curso de docencia
Silabo para tarea de curso de docenciaSilabo para tarea de curso de docencia
Silabo para tarea de curso de docencia
 
Herramientas educativas.
Herramientas educativas.Herramientas educativas.
Herramientas educativas.
 
My ideal society: "The Modern Islands"
My ideal society: "The Modern Islands"My ideal society: "The Modern Islands"
My ideal society: "The Modern Islands"
 
¿Emprendimiento?
¿Emprendimiento?¿Emprendimiento?
¿Emprendimiento?
 
Alzheimer
 Alzheimer Alzheimer
Alzheimer
 
Berg Reference
Berg ReferenceBerg Reference
Berg Reference
 
Peter Walton-'La complejidad de las Normas Internacionlaes de Información Fin...
Peter Walton-'La complejidad de las Normas Internacionlaes de Información Fin...Peter Walton-'La complejidad de las Normas Internacionlaes de Información Fin...
Peter Walton-'La complejidad de las Normas Internacionlaes de Información Fin...
 
Obesidad infantil
Obesidad infantil Obesidad infantil
Obesidad infantil
 

Ähnlich wie whitepaper_9bestpractices

10 tips for enterprise cloud migration
10 tips for enterprise cloud migration10 tips for enterprise cloud migration
10 tips for enterprise cloud migrationJeferson Rodrigues
 
Implementing cloud based devops for distributed agile projects
Implementing cloud based devops for distributed agile projectsImplementing cloud based devops for distributed agile projects
Implementing cloud based devops for distributed agile projectsTom Stiehm
 
Using Testing as a Service, Globe Testing Helping Startups Make Leap to Cloud...
Using Testing as a Service, Globe Testing Helping Startups Make Leap to Cloud...Using Testing as a Service, Globe Testing Helping Startups Make Leap to Cloud...
Using Testing as a Service, Globe Testing Helping Startups Make Leap to Cloud...Dana Gardner
 
how_to_build_a_robust_web_application_in_2023.pdf
how_to_build_a_robust_web_application_in_2023.pdfhow_to_build_a_robust_web_application_in_2023.pdf
how_to_build_a_robust_web_application_in_2023.pdfsarah david
 
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtlBenefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtlMezzybatliwala
 
Migrating From Legacy Applications To The Cloud
Migrating From Legacy Applications To The CloudMigrating From Legacy Applications To The Cloud
Migrating From Legacy Applications To The CloudaNumak & Company
 
Saa s versus-on-premise-erp
Saa s versus-on-premise-erpSaa s versus-on-premise-erp
Saa s versus-on-premise-erpMitch Rushing
 
Allow is the New Block
Allow is the New BlockAllow is the New Block
Allow is the New BlockSean Dickson
 
7 Best Practices for Achieving Operational Readiness on Hadoop with Driven an...
7 Best Practices for Achieving Operational Readiness on Hadoop with Driven an...7 Best Practices for Achieving Operational Readiness on Hadoop with Driven an...
7 Best Practices for Achieving Operational Readiness on Hadoop with Driven an...Cascading
 
Realizing Cloud POV
Realizing Cloud POVRealizing Cloud POV
Realizing Cloud POVRene Claudio
 
Asymetric Modernization
Asymetric ModernizationAsymetric Modernization
Asymetric ModernizationPeter Presnell
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
How to Build a Scalable Web Application for Your Project
How to Build a Scalable Web Application for Your ProjectHow to Build a Scalable Web Application for Your Project
How to Build a Scalable Web Application for Your ProjectBitCot
 
10 alternatives to heavy handed cloud app control
10 alternatives to heavy handed cloud app control10 alternatives to heavy handed cloud app control
10 alternatives to heavy handed cloud app controlAneel Mitra
 
The Eight Building Blocks of Enterprise Application Architecture
The Eight Building Blocks of Enterprise Application ArchitectureThe Eight Building Blocks of Enterprise Application Architecture
The Eight Building Blocks of Enterprise Application ArchitectureTechAhead
 
Essential_Skills_of_a_Site_Reliability_E.pdf
Essential_Skills_of_a_Site_Reliability_E.pdfEssential_Skills_of_a_Site_Reliability_E.pdf
Essential_Skills_of_a_Site_Reliability_E.pdfrobert mota
 
To cloud or not to cloud
To cloud or not to cloudTo cloud or not to cloud
To cloud or not to cloudRoni Banerjee
 
Whitepaper the application network
Whitepaper   the application networkWhitepaper   the application network
Whitepaper the application networkBeatEggli
 
Overcoming problems implementing cloud based dev ops for distributed agile pr...
Overcoming problems implementing cloud based dev ops for distributed agile pr...Overcoming problems implementing cloud based dev ops for distributed agile pr...
Overcoming problems implementing cloud based dev ops for distributed agile pr...Tom Stiehm
 

Ähnlich wie whitepaper_9bestpractices (20)

10 tips for enterprise cloud migration
10 tips for enterprise cloud migration10 tips for enterprise cloud migration
10 tips for enterprise cloud migration
 
Implementing cloud based devops for distributed agile projects
Implementing cloud based devops for distributed agile projectsImplementing cloud based devops for distributed agile projects
Implementing cloud based devops for distributed agile projects
 
Using Testing as a Service, Globe Testing Helping Startups Make Leap to Cloud...
Using Testing as a Service, Globe Testing Helping Startups Make Leap to Cloud...Using Testing as a Service, Globe Testing Helping Startups Make Leap to Cloud...
Using Testing as a Service, Globe Testing Helping Startups Make Leap to Cloud...
 
how_to_build_a_robust_web_application_in_2023.pdf
how_to_build_a_robust_web_application_in_2023.pdfhow_to_build_a_robust_web_application_in_2023.pdf
how_to_build_a_robust_web_application_in_2023.pdf
 
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtlBenefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
Benefits Of Migrating Asp .Net Apps To The Cloud - GoDgtl
 
Migrating From Legacy Applications To The Cloud
Migrating From Legacy Applications To The CloudMigrating From Legacy Applications To The Cloud
Migrating From Legacy Applications To The Cloud
 
Saa s versus-on-premise-erp
Saa s versus-on-premise-erpSaa s versus-on-premise-erp
Saa s versus-on-premise-erp
 
Moving To SaaS
Moving To SaaSMoving To SaaS
Moving To SaaS
 
Allow is the New Block
Allow is the New BlockAllow is the New Block
Allow is the New Block
 
7 Best Practices for Achieving Operational Readiness on Hadoop with Driven an...
7 Best Practices for Achieving Operational Readiness on Hadoop with Driven an...7 Best Practices for Achieving Operational Readiness on Hadoop with Driven an...
7 Best Practices for Achieving Operational Readiness on Hadoop with Driven an...
 
Realizing Cloud POV
Realizing Cloud POVRealizing Cloud POV
Realizing Cloud POV
 
Asymetric Modernization
Asymetric ModernizationAsymetric Modernization
Asymetric Modernization
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
How to Build a Scalable Web Application for Your Project
How to Build a Scalable Web Application for Your ProjectHow to Build a Scalable Web Application for Your Project
How to Build a Scalable Web Application for Your Project
 
10 alternatives to heavy handed cloud app control
10 alternatives to heavy handed cloud app control10 alternatives to heavy handed cloud app control
10 alternatives to heavy handed cloud app control
 
The Eight Building Blocks of Enterprise Application Architecture
The Eight Building Blocks of Enterprise Application ArchitectureThe Eight Building Blocks of Enterprise Application Architecture
The Eight Building Blocks of Enterprise Application Architecture
 
Essential_Skills_of_a_Site_Reliability_E.pdf
Essential_Skills_of_a_Site_Reliability_E.pdfEssential_Skills_of_a_Site_Reliability_E.pdf
Essential_Skills_of_a_Site_Reliability_E.pdf
 
To cloud or not to cloud
To cloud or not to cloudTo cloud or not to cloud
To cloud or not to cloud
 
Whitepaper the application network
Whitepaper   the application networkWhitepaper   the application network
Whitepaper the application network
 
Overcoming problems implementing cloud based dev ops for distributed agile pr...
Overcoming problems implementing cloud based dev ops for distributed agile pr...Overcoming problems implementing cloud based dev ops for distributed agile pr...
Overcoming problems implementing cloud based dev ops for distributed agile pr...
 

whitepaper_9bestpractices

  • 1. © 2015 Concurrent, Inc. All rights reserved. ACHIEVINGOPERATIONAL READINESSONHADOOP 9 BEST PRACTICES FOR THE ENTERPRISE BY SUPREET OBEROI VP of Field Engineering, Concurrent
  • 2. © 2015 Concurrent, Inc. All rights reserved. Back then, only hobbyists and status seekers drove their own vehicles, which were temperamental, to put it mildly. To operate a car you pretty much had to be a mechanic, because the only thing you could count on was a breakdown. The dashboard showed little, if any, actionable information. Only after that changed, and you could reliably get from here to there without a wrench, did automobiles really take off. I draw the analogy to Hadoop because nearly every day I hear from enterprise IT teams in industries like retail, finance, health care, and insurance who thought Hadoop was a Camry, and are learning that it’s more like whatever predated the Model T. These teams were led to believe they could replicate the business successes of Twitter, LinkedIn, and Netflix simply taking a shiny new cluster for a spin, but now they’re struggling to deploy Hadoop applications with the standards of quality, reliability, and manageability that they have come to expect. In short, Hadoop, out of the box, is not operationally ready. In this paper, I’ll share 9 best practices for how IT organization can achieve operational readiness on Hadoop. Of course, there are not yet formal certifications or commonly accepted standards for overcoming the many challenges. But there is an emerging consensus around how Hadoop applications are best built, deployed, and managed. GettingstartedwithHadoop isalittlelikebuyingacar— 100yearsago.
  • 3. Buildcultureandtoolssupporting collaborationbetweendevelopers, operators,&otherHadoopteammembers Operators are not trained — and often can’t or don’t want to be trained — to look at the Java stack traces and debug code. Likewise, asking a developer to address performance problems is a little like asking a passenger to get out of your car and look under the hood (assuming your passenger is not a mechanic). Therefore, at least for the foreseeable future, optimizing performance on Hadoop is a team proposition. In the best case, an operator who detects a problem can easily collaborate with the developers, data scientists, and even business managers who have stakes in the application running smoothly. They should all see operational readiness as a shared responsibility, and be armed with tools that show them the complete picture when remediation is required. 1
  • 4. © 2015 Concurrent, Inc. All rights reserved. 2 Connectexecutionproblemswith applicationcontext At newer companies, especially those in the San Francisco Bay Area, I find that one person typically handles all the work — the data science, the development, and the deployment to production — around a big data application. If there are problems when the app runs, that same person can usually fix it. After all, she wrote it. For big, traditional enterprises, it’s a different story. The operations team running fraud and risk detection apps on Hadoop might live in Phoenix, while the team that developed them sits ten time zones away in India. In some cases, the operations team today is completely different from the one that first deployed an application. Therefore, when a Hadoop job fails or takes too long to execute, operators should be able to quickly link problems not only to the application that caused them, but also to the relevant data flow logic inside the application. It’s also great if the operator can immediately see detailed mapper/reducer stats tied to the problem. That way, they can more quickly understand if the problem is with the code, the data or the hardware. When a performance problem arises, operators should be able to investigate app logic and cluster usage.
  • 5. To an operator running hundreds or thousands of applications on a Hadoop cluster, all of them look the same — until there’s a problem. So you need tools that let you look at performance over groups of applications. Ideally, you should be able to segment performance tracking by application types, departments, teams and data- sensitivity levels. Monitorthefleet,notthevehicle 3
  • 6. © 2015 Concurrent, Inc. All rights reserved. Monitoring a fleet still means knowing when an individual vehicle performs poorly. Similarly, operators need to set SLA bounds on performance and define alerts and escalation paths when they’re violated. Every business is unique, so there’s no set list of performance metrics to monitor. But it’s certainly looking at more than what you can see in log files. SLA bounds should incorporate both raw metadata such as job status, as well as business-level events like sensitive data access. Successful practitioners of operational readiness also set up metrics that help predict future SLA violations, so they can proactively address and avoid them. Defineandenforceservicelevel agreements—Yes,evenonHadoop 4
  • 7. © 2015 Concurrent, Inc. All rights reserved. Understandinter-appdependencies 5 At technology companies blessed with enough capital to run a dedicated Hadoop cluster for every use case, applications run more or less independently. That’s not the case, however, at larger, more traditional enterprises, which tend to run their clusters as a shared service across lines of business. As a result, each application has at least a few “roommates” in the cluster, some of which can be noisy, disruptive, or otherwise detrimental to its own performance. To understand what’s behind the errant behavior of one Hadoop application, in other words, you have to first understand what others were doing on the cluster when it ran. Did a rogue app hog resources, causing others to perform poorly? Was the poor performance of one application actually due to its dependency on data from some other application, upstream, that failed to operate as expected? Provide your operations team with as much cluster-related context as you can. For example, just by tracking cluster usage by application, you’ll more quickly understand when an SLA violation is really about a rogue app, rather than a problem in the one that triggered the alert.
  • 8. © 2015 Concurrent, Inc. All rights reserved. Tracking applications that consume more than 10,000 mappers Establishing and enforcing the rules for rationing cluster resources is vital for achieving a meaningful state of operational readiness and meeting SLA contracts. You may have to handle unusual edge cases. For example, is it acceptable for a recommender engine to meets its SLA contract in terms of spitting out recommendations but totally consuming a 700-node cluster for the duration of its execution? (I saw this happen in real life!) RationYourCluster 6 To optimize cluster usage and ROI, operators must ration resources on the cluster and enforce the limits. For example, an operator can budget 10,000 mappers for the execution of a particular application. Then, the onus is on the application to do two things: comply with the budget restriction, and then demonstrate that compliance. Lacking such proof, rationing rules should prevent the application from being deployed on the cluster. After all, the application is not trustworthy.
  • 9. © 2015 Concurrent, Inc. All rights reserved. Solving for data lineage and governance in an unstructured environment like Hadoop is no easy task. Traditional techniques to manually maintain a metadata dictionary quickly lead to stale and old repositories. In addition, there is no proof that the model that is deployed in production is using the fields described in the metadata repository. What is required is visibility and enforcement at the operational level on the use of data fields. If you can track if and when a data field is accessed by an app, you can make the case you need to make. Tracking the lineage of data fields at an operational level Tracedataaccessattheoperationallevel 7 Good Hadoop management isn’t only about rationing compute resources; it also means regulating access to sensitive data. This is especially true in industries with heightened privacy concerns, such as financial services, health care, insurance, even, these days, social media. For example, a data scientist may develop a vastly improved new model for reducing lending risk, but unless the enterprise can prove that the application does not use any private data, it cannot deploy the application to production.
  • 10. © 2015 Concurrent, Inc. All rights reserved. Recorddatamisfires 8 Compliance folks at large enterprises also want proof that a Hadoop application processed every record in a dataset, and they look for documentation when it fails to do so. Failures can result from format changes in upstream data sets or plain old data corruption. Keeping track of all records that the application failed to process is particularly vital in regulated industries.
  • 11. © 2015 Concurrent, Inc. All rights reserved. Tuneyourenginebeforeyoureplaceit 9 With new compute fabrics emerging all the time, teams are sometimes too quick to junk their old ones in pursuit of better performance. However, it’s very often the case that you can achieve equal or greater performance gains just by optimizing code and data flows on your existing fabrics. That way you can avoid expensive infrastructure upgrades unless they’re truly necessary.
  • 12. © 2015 Concurrent, Inc. All rights reserved. OperationalreadinessonHadoop: Youcangettherefromhere Supreet is the Vice President of Field Engineering at Concurrent. Prior to that, he was Director of Big Data application infrastructure for American Express, where he led the development of use cases for fraud, operational risk, marketing and privacy on Big Data platforms. He holds multiple patents in data engineering and has held leadership positions at Real-Time Innovations, Oracle, and Microsoft. When Henry Ford launched the Model T, it was rugged and robust, the first car fit for a broader market. The car was not only affordable to buy; it was also practical to own and operate. Perhaps Hadoop will get there, too. Until then, there’s a lot you can do to avoid getting left on the side of the big data road. If you’re looking for help with operational readiness on Hadoop, or are curious about the charts and displays I’ve shown here, get in touch with me at sales@concurrentinc.com or visit concurrentinc.com. ABOUT SUPREET OBEROI @supreet_online www.concurrentinc.com sales@concurrentinc.com