The document discusses moving legacy data and workloads from traditional data warehouses to Hadoop. It describes how ELT processes on dormant data waste resources and how offloading this data to Hadoop can optimize costs and performance. The presentation includes a demonstration of using Tableau for self-service analytics on data in Hadoop and a case study of a financial organization reducing ELT development time from weeks to hours by offloading mainframe data to Hadoop.
Make agility clearer on this slide….add something here with security/compliance as well.
Our data center footprint is global, spanning 5 continents with highly redundant clusters of data centers in each region. Our footprint is expanding continuously as we increase capacity, redundancy and add locations to meet the needs of our customers around the world.
TODO: add Infa and data movmment into this slide. Put apps into the enterprise side, add a layer for Mercator / SFDC as another block on this diagram
big reason people are moving so fast to the cloud is breadth of services/features/geo AWS has
If want to build new businesses from scratch or move some/all workloads to cloud, need a broad array of services and features to make this happen and not have to piecemeal it
Today, we’re extending these instance families further.
HS1 instance family which will double the number of vCPU threads
Increase storage throughput performance from 2.6 to 3.6 gigabits per second.
R3 instance family. R3 instances feature an 8:1 memory to CPU ratio, with up to 244GB of RAM, fast SSD based local storage and enhanced networking.
R3 instances replace the M2 and CR1 instances, focusing on memory-optimized use cases.
R3’s offers more instances sizes up to 244GiB of RAM, with around 27% faster memory based on STREAM performance over M2.
Start an EMR cluster using console or cli tools
Master instance group created that controls the cluster
Core instance group created for life of cluster
Core instances run DataNode and TaskTracker daemons
Optional task instances can be added or subtracted to perform work (SPOT)
S3 can be used as underlying ‘file system’ for input/output data
Master node coordinates distribution of work and manages cluster state
Core and Task instances read-write to S3
As we’ve seen AWS allows you to instantly provision a great platform to manage and process large amounts of data with and without Hadoop. However, this is just part of the story. Without the right tools, collecting, processing and distributing data for valuable analytics requires either manual coding or writing hundreds of lines of SQL and in the case of Hadoop even Java Pig, HiveQL, and more.
That’s why we developed Ironcluster – these are the first and only pure-play ETL solutions available on the Amazon market place, so you can instantly deploy a full feature ETL environment to collect, process and distribute data in the cloud.
Ironcluster ETL, Amazon EC2 Edition allows you to instantly provision a full-featured ETL environment running on Amazon Elastic Compute Cloud (Amazon EC2). Ironcluster ETL takes away the complexity of data integration, delivering a much more agile ETL environment with the capacity you need, when you need it. No hardware to procure, no software licenses to buy.
Ironcluster Hadoop ETL runs natively within your amazon EMR cluster – allowing you to leverage the massive scalability and performance of Hadoop in the Cloud
Both – Ironcluster ETL and Ironcluster Hadoop ETL are available on the AWS Marketplace, this means
Let me tell you a bit about each…
Complete Customer Quote from Greg Sokol, Data Warehouse Architect, ModCloth, an early Ironcluster user.
“We needed an easy to install and upgrade, high-performance, lightweight ETL product that works well in the cloud with Amazon Web Services,”…“Ironcluster ETL has served as a great product given our requirements and priorities, helping us take full advantage of the cost and efficiency benefits we achieve with cloud computing as part of our data management architecture.”
Then Hadoop
First roadblocl – How do you stand up your Hadoop cluster?
Solution -> Now you have it!
Second: -> Now What?
Then Hadoop
First roadblocl – How do you stand up your Hadoop cluster?
Solution -> Now you have it!
Second: -> Now What?
A bit more detail about Hadoop
The first and only ETL tool for Amazon EMR
GUI
Use Case Accelerators
Price point
FREE VERSION
Fully integrated Hadoop ETL – Smarter architecture – no code generation
Faster time to deployment
And lower costs
We’re part of the AWS marketplace
You don’t have to buy your license – we’re integrated into AWS marketplace for Amazon EMR
AWS
Marketplace
Partner network logo
Free online support for the free version
World-class support
Free online for free version
Personal support for paid version
In the end is all about the insights you can get from your Data, and we know people love data discovery and visualization tools
The good news is you can use Syncsort DMX-h with the leading BI tool of your choice, but I specifically wanted to mention Tableau – since they are one of our strategic partners and we just happened to release a fully integrated connector, that allows you to create a Tableau data extract file directly from our interface.
You simply select Tableau as the target and it will generate the TDE file, no need to install any additional software since we include the Tableau API.
Now from the business perspective there are benefits too….