Digital transformation involves profoundly transforming business activities, processes, competencies, and models to leverage changes from digital technologies strategically. It requires new capabilities and data management maturity. There are three areas of data management: data in motion which involves transferring data between systems; data at rest which refers to how data is stored; and data in use which is about extracting, transforming and analyzing data. A modern data platform uses cloud native technologies to manage data in real-time across all three areas at massive scales.
The importance of efficient data management for Digital Transformation
1. The importance of efficient
Data Management for Digital
Transformation
Roman Gruhn
Director of Information Strategy (EMEA)
2. Digital transformation is the profound
and accelerating transformation of business
activities, processes, competencies and models
to fully leverage the changes and opportunities
of digital technologies and their impact across
society in a strategic and prioritized way.
Digital transformation = Business transformation
Source: I-Scoop
??? ?
Digital Transformation = Business Transformation
3. Why Digital Transformation?
Want to:
• Launch new services
• Grow market share & revenue
• Increase customer satisfaction
• Digitise operations
• Offer new touch points
Have to:
• Defend against new competitors
• Respond to customer demands
• Comply with new regulations
• Increase scalability & resilience
• Reduce operational cost
6. Data in Motion Data at Rest
Data in Use
Enterprise
Service Bus
Application / Database
Translation
(ORM)
Batch
Processing
Bolt-on and/or
3rd Party
Replication
Coarse Grained File Level
Transfer
Structured,
spreadsheet like
storage
”Data Accelerator
Technologies”
Batch
Analytics
Feature Backlogs
Centralized Data Platforms
(Mainframes, RDBMS, etc)
Complex
Extraction &
Transformation
Timely Loading
Processes
Legacy Data Platform
7. Data in Motion Data at Rest
Data in Use
Stream
Processing
Built-in
Replication
Micro
Services
Record Level
Transfer
Native
Drivers
Deployment Agnostic:
Cloud Native,
Distributed
Data Agnostic:
Multi-Structured
Dynamic Schema
Everything
Real-time
Cloud Native
Continuous Delivery
Practically limitless scale
Modern Data platform
APIs
Digital Transformation - the term has been used quite a lot in recent years. It has developed from a hype into a standard tool within corporate strategy. But what exactly do we mean by it?
Before we continue, let’s maybe just look at a definition. You can find 100s online, but I have picked on from I-Scoop which I think encompasses the essence of it incredibly well.
[QUOTE]
So effectively, Digital Transformation is Business Transformation with a slightly different label. And pretty much every organisation today has in some way or form an ongoing Digital Transformation programme.
Show of hands, who has an initiative like that going on in their own organisation? And show of hands, who would say they are not doing any form of Digital transformation right now? (Hopefully no hands)
I thought so It isn’t rocket science, but it also isn’t easy. Otherwise every programme would be a major success, but reality has shown us otherwise.
Another show of hands, who would say digital transformation is easy? (Hopefully very few)
(Link to next slide)
So if DT isn’t that easy, why would you want to do it and embark on such a difficult journey in the first place? Well, let’s look at some of the drivers behind many of the DT programmes I have been engaged in in recent years.
Basically, there are two major reasons:
Things you WANT to do
Things you HAVE to do
WANT TO:
Developing a new kind of services, application or digital product often requires organisations to radically change what they do or how they do things.
Following the launch of a new offering is the expansion of market share and growth of business in current or new markets. Scaling a business model to increase the bottom line more efficiently.
Focussing on existing and new customers and their satisfaction with the organization's services and products is another frequent reason for transformation programmes.
Digitising legacy processes, e.g. away from paper based operations towards digital internal or external (self-service) offerings is a great way to boost efficiency.
Lastly, by offering new touch points you can increase your potential target customer group. E.g. by providing a messaging bot/chat bot interface you could target younger, very mobile focussed customers.
But there are not only things you want to do. There are also external factors you have less control over that can trigger a digital transformation programme.
HAVE TO:
Defending existing business against new market entrants and start-ups is major concern for established companies. For example, the London Black Cab community is trying to defend their position by offering apps like Hailo to compete with other minicab services like Uber.
Changed customer demands and expectations are also driving business to rethink their offerings. For example, being able to compare lots of prices for products online has caused companies to now dynamically adjust their own prices to their competition. Amazon does this for a lot of their top products to ensure they are always just a little bit cheaper than their competitors.
New legislation and regulatory requirements are another trigger for many business transformation programmes and digital initiatives. For example, the “right to be forgotten” in the EU is causing many companies problems as they don’t know where in their large list of system they store pieces of customer information.
Dealing with scalability issues of legacy services that fall over under increased consumer demand over time is important to ensure resilience and business continuity of services. Architecture patterns like micro services or cloud deployments, which we will hear more about later, are parts to a solution to these problems.
Reducing the cost of doing business in general to improve margins or cope with decreasing top-line revenue drives some digital transformation projects, but it certainly isn’t the only reason why you should ever attempt to do it.
So, we understand what DT is and why we have to do it. But what do we need to be successful?
If you search online for DT capabilities or DT maturity, you can find a whole range of different models and opinions from very smart people and organisations like Gartner, Deloitte, IBM, IDC, McKinsey and others, all trying to describe and list the required capabilities and phases you will most likely go through.
When I think of efficient and effective IM, it requires capabilities in 3 different domains:
Data in Motion: How do you get data into the system and exchange it between parts of the same system
Data at Rest: How do you store and organise it physically and logically
Data in Use: How do you make it available to applications? How quickly and efficiently can you process it?
Data in Motion
Let’s start with Data in Motion. Traditionally, to load any large or complex amount of data into a relational database or data platform like a mainframe or Enterprise Data Warehouse (EDW) requires a considerable amount of ETL (Extract-Transform-Load) or ELT (Extract-Load-Transform).
ETL/ELT is typically facilitated by old-fashioned File Level Transfers, i.e. there is no direct system-to-system communication but data gets transferred via a “carrier” file. This could be very simple flat files, CSV files, or sometimes more advanced like XML.
Extracting as well as loading larger data sets can put considerable stress on systems. To avoid performance implications, these processes are commonly executed during “quiet” periods of the source and target systems.
When operating only locally, and maybe on an internally used business system with a user activity between 9-5pm, this is no problem. But in a global economy with externally facing system to a customer base that interacts 24/7, finding “quiet” times becomes a lot more challenging and might require “maintenance windows” and downtime.
To further minimise the overhead, ETL/ELT processes and file transfers are usually processed in batches, aggregating data together in larger blocks. This can cause various problems, e.g. when processing data only on a nightly basis you introduce a delay in data accuracy/quality of at least 1 day between source and target system.
And on top of all that, any change in the source or target data structure has negative implications and requires change management throughout the entire pipeline to avoid failures. It is like throwing a small stone into a pond, the effects ripple throughout the entire ecosystem.
Data at Rest
Assuming you managed to load data into your target data platform, it is now sitting in a centralised system, often still hosted in self-managed data centres. Data is typically organised in rigid, relational schemas.
Those are well suited to hold data which was internally generated, predictable, structured, and in relatively small quantities, but are ill-suited for modern data requirements were data varies and is either semi-structured or entirely unstructured, for example digital assets like images.
Data in Use
When applications finally get to use the data, a legacy data platform like this faces additional challenges. For example, due to the inherent structure of the data in the rigid schema and dependencies on batch loading, processing or analysis of data is also happening in batches to work through the huge chunks of data that get piped into the system for nightly ETL and batch loading. This limits the flexibility and accuracy of data as well as BI information and reports.
Accessing centrally organised data in a distributed application landscape adds challenges around latency, which are often only overcome by using “data accelerators” like operational caches holding copies of the data.introducing another layer into the technology stack doesn’t only increase costs for hardware, software, and operations, but also increases complexity. And all of this just because the underlying database system wasn’t able to cope with the performance or availability requirements.
Changing anything in a complex system like this is not an easy undertaking and often requires a lot of Change Management, affecting the frequency of releases and causing long release cycles with e.g. only quarterly or yearly releases being the norm in some organisations.
In this legacy world the 3 components of Information Management were fairly isolated from one another and we developed band aid solutions to bridge the gaps. For example we are using complex enterprise service buses to pipe data in and out of a system and use non-native ODBC drivers or Object-relational mapping (ORM) to allow applications to access data in the database. To achieve local replication and high-availability, most databases require additional bolt-ons or costly 3rd party solutions as the original relational database systems were simply not designed with distribution in mind.
SUMMARY
Data in Motion: Coarse grained, slow, complicated
Data at Rest: Structured, inflexible, centralized and expensive
Data in Use: Feature backlogs, Batch everything, Long release cycles, accelerating technologies (caches, engineered systems)
(This is the anti-pattern for the 4 MongoDB value drivers)
High TCO
Slow time to value
Increased risk
Not using data or technology for competitive advantage
As we have seen, there are lots of challenges with a legacy platform like the one I described earlier. But what could it look like if you were to adopt a more modern approach to Information Management? How can a database like MongoDB unlock the potential for full stack modernisation and innovation?
Data in Use
Application today are often real-time in nature, for example real-time stock level checks in a onlineshop or location-based searches for available bikes in the Santander bike scheme. Data needs to be available without unnecessary processing delays.
From an infrastructure perspective, the shift towards cloud-based infrastructure and services as a trend which will only increase over the coming years. Any technology now needs to be fully cloud native and truly elastically scalable to provide your business the flexibility it needs to provide efficient and effective IT services.
An on top of those trends sit organisation and process changes like agile development and continuous delivery in a devops model. Releases happen on a weekly, daily, or even hourly basis and in small, isolated chunks. Increased business agility is crucial to keep a competitive edge.
Data in Motion
But not only the use of data has changed, we are encountering similar fundamental changes with regards to moving data around. To enable real-time data consumption for applications, we need to be able to load data in real-time as well. Stream processing enables just that, for example for real-time analytics on customer behaviour and predictive analytics.
So instead of looking at data coarse-grained and in large batches, we are moving data around on record level, for example via a messaging system like Kafka or RabbitMQ.
And instead of adding additional complexity by replicating data for local high-availability and resiliency using 3rd party tools, we need systems with built in replication and self-healing features like automated failover in case of server or network problems.
A modern data platform now enables the convergence of Data in Use and Data in Motion, enabling modern architecture patterns like micro services, breaking up monolithic architectures.
Data at Rest
And as a 3rd point, how we handle data at rest has also changed. We require true flexibility and a data agnostic architecture that can handle structured as well as semi-structured and unstructured data equally well in a single system.
All this deployed in a flexible way: on-premise, bare metal, virtualized, in the cloud, in containers, it doesn’t matter. Our new, distributed data platform is accessed using native drivers, making it incredibly easy and fast for developers to develop applications efficiently and without the added complexity of legacy components like ORM.
We believe MongoDB is the only modern data platform that truly bring all of these 3 domains together successfully, providing you a multi-use case, operational database that has been deployed in startups as well as enterprises like Barclays, HSBC, comparethemarket.com as well as with public sector clients like HMRC or the MetOffice.
SUMMARY:
Data in motion: fine grained, streaming, built-in replication
Data at rest: All Data native, All deployment native
Data in Use: no feature backlog, continuous deliveries, no barriers to use use data in motion or at rest
Enables:
Low TCO
Increased time to value
Reduced risk
Using data or technology for competitive advantage