Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 46 Anzeige

An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017

Herunterladen, um offline zu lesen

Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.

Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (20)

Anzeige

Ähnlich wie An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017 (20)

Weitere von Codemotion (20)

Anzeige

Aktuellste (20)

An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017

  1. 1. Introduction to Apache Ignite and GridGain Mandhir Gidda ROME 24-25 MARCH 2017
  2. 2. © 2015 The Apache Software Foundation. Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are trademarks of The Apache Software Foundation. In-Memory Computing Platform Built on MANDHIR GIDDA EMEA Solutions Architect
  3. 3. Agenda • GridGain & Apache Ignite Project • Ignite In-Memory Computing Platform • Introduction to Clustering • Data Grid • Compute Grid, Service Grid & Streaming • Hadoop & Spark Integration • GridGain Roadmap • Q & A
  4. 4. © 2016 GridGain Systems, Inc. What is Apache Ignite High-performance distributed in-memory platform for computing and transacting on large-scale data sets in near real-time.
  5. 5. © 2016 GridGain Systems, Inc. Apache Ignite Project - Recap • 2007: First version of GridGain (compute grid) • Oct. 2014: GridGain contributes Ignite to ASF • Aug. 2015: Ignite is the second fastest project to graduate after Spark • Today vs. Feb. 2016: • 28 more contributors: 88+ contributors • Huge development momentum - Estimated 248 years of effort since the first commit in February, 2014 vs 192 year last Feb. [Openhub] • 200k more SLOC and 2.5k more commits: 900k+ SLOC & more than 18.5k commits February 2017
  6. 6. © 2016 GridGain Systems, Inc. • What is GridGain Enterprise Edition? • Is a binary build of Apache Ignite™ created by GridGain • Added enterprise features for enterprise deployments • Earlier features and bug fixes by a few weeks • Heavily tested
  7. 7. © 2016 GridGain Systems, Inc. Customer Use Cases Automated Trading Systems Real time analysis of trading positions & market risk. High volume transactions, ultra low latencies. Financial Services Fraud Detection, Risk Analysis, Insurance rating and modelling. Online & Mobile Advertising Stream processing, geo-targeting & personalisation & segmentation Big Data/Visual Analytics Customer 360 view, real-time analysis of KPIs, up- to-the-second operational BI. Online Gaming Real-time back-ends for mobile and massively parallel games. SaaS Platforms & Apps High performance next-generation architectures for Software as a Service Application vendors. Travel & E-Commerce High performance next-generation architectures for online hotel booking.
  8. 8. © 2016 GridGain Systems, Inc. What In-Memory Capabilities are Supported? ‣ HPC ‣ Machine learning ‣ Risk analysis ‣ Grid computing ‣ HA API Services ‣ Scalable Middleware ‣ Web-session clustering ‣ Distributed caching ‣ In-Memory SQL ‣ Real-time Analytics ‣ Big Data ‣ Monitoring tools ‣ Big Data ‣ Realtime Analytics ‣ Batch processing ‣ Distributed In- Memory File System ‣ Node2Node & Topic-based Messaging ‣ Fault Tolerance ‣ Multiple backups ‣ Cluster groups ‣ Auto Rebalancing ‣ Complex event processing ‣ Event driven design ‣ Distributed queues ‣ Atomic variables ‣ Dist. Semaphore
  9. 9. © 2016 GridGain Systems, Inc. Introduction to Clustering
  10. 10. © 2016 GridGain Systems, Inc. Definitions and Terminology An Ignite cluster is a group of Ignite nodes working together to accomplish tasks like distributed compute and caching An Ignite node is a single Ignite process running in a JVM Many Ignite nodes can live on one physical server or JVM Ignite nodes can be Clients or Servers . Nodes can be named and logically grouped into Cluster Groups Server/VM/Container Server/VM/Container JVM Ignite JVM Ignite Ignite JVM Ignite …
  11. 11. © 2016 GridGain Systems, Inc. Shared-nothing architecture involves multiple identical nodes forming a cluster with no single master or coordinator All nodes in a shared-nothing cluster run the exact same processes Nodes communicate using message passing as Peer to Peer Unlike Master-Slave, No single point of failure or bottleneck = Linearly scalable & highly available Operational efficiency at scale Ignite Clustering Server/VM/Container JVM Ignite Server/VM/Container JVM Ignite Server/VM/Container JVM Ignite Server/VM/Container JVM Ignite
  12. 12. © 2016 GridGain Systems, Inc. An Ignite node can be started as a client or a server. Server nodes participate in caching and computations. Client nodes can also participate in computations. Client nodes are used for IgniteAPI operations from the client side such as cache operations, issuing SQL, transactions, and data streaming. Ignite Clients & Servers Server Server Server Server ClientClient Client Data & Compute Nodes Client Connectors
  13. 13. © 2016 GridGain Systems, Inc. – Distributed Key-Value Store – Pluggable SPI Design – E.g FailoverSPI, LoadBalancerSPI – Fault Tolerance and Scalability – CacheMode, DiscoverySPI – SQL Queries (ANSI 99) – ACID Transactions – In-Memory Indexes – RDBMS / NoSQL Integration – 100% JCache Compliant (JSR 107) High-level Architecture
  14. 14. © 2016 GridGain Systems, Inc. In-Memory Data Grid
  15. 15. © 2016 GridGain Systems, Inc. Data Grid: Cache Modes & Horizontal Scaling Replicated Cache All data replicated to each node Highest availability topology Every data update propagated to all nodes, scaling can become an issue Best for scenarios where dataset is small, and high read activity > 80%
  16. 16. © 2016 GridGain Systems, Inc. Data Grid: Cache Modes & Horizontal Scaling Partitioned Cache Most scalable distributed topology Dataset is divided into partitions, and distributed equally amongst nodes (x TB) Updates are cheap, 1x Primary partition and optionally 1+ backup partition Reads expensive if data is not collocated Near-Cache use is more relevant Best for large datasets, frequent updates
  17. 17. © 2016 GridGain Systems, Inc. • Vertical Scale – OnHeap, OffHeap, OffHeap_Values, Swap Space • Avoid Java GC Collection Pauses • Off-Heap Indexes – OffHeap, OffHeap_Values • Full RAM Utilization • I/O + Network Efficiency • OffHeap to disk or network -> zero copies. OS optimized • Simple Configuration Data Grid: Off-Heap Memory
  18. 18. © 2016 GridGain Systems, Inc. Data Grid: External Persistence • Read-through & Write- through • Support for Write-behind • Configurable eviction policies • LRU, FIFO, Sorted, TTL • DB schema mapping wizard: • Generates all the XML configuration and Java POJOs
  19. 19. © 2016 GridGain Systems, Inc. Data Grid: Cache APIs • Predicate-based Scan Queries • Text Queries based on Lucene indexing • Query configuration using annotations, Spring XML or simple Java code • SQL Queries: ANSI-99 Compliant • Memcached (PHP, Java, Python, Ruby) • HTTP REST API • JDBC & ODBC
  20. 20. © 2016 GridGain Systems, Inc. • ANSI-99 SQL • In-Memory Indexes (On and Off- Heap) • Automatic Group By, Aggregations, Sorting • Cross-Cache Joins, Unions • Use local H2 engine Data Grid: SQL Support (ANSI 99)
  21. 21. © 2017 GridGain Systems, Inc. GridGain Company Confidential In-Memory SQL Grid (aka. IMDB) • JDBC and ODBC as a connection point • SQL for data access • DML for Data Modification • DDL for Cache and Index Management • Advantages – GridGain as a distributed SQL based storage – No need to rewrite application logic – Support of variety of tools and languages – Ad-hoc queries (GeoSpatial) • Available in Apache Ignite (open source) Application Layer GridGain Cluster ANSI SQL-99 DML DDL ODBC/JDBC ACID Transactions Data Layer Persistent Storage
  22. 22. © 2016 GridGain Systems, Inc. Data Grid: Transactions • Fully ACID • Support for Transactional & Atomic • Cross-cache transactions • Optimistic and Pessimistic concurrency modes with multiple isolation levels • Deadlock protection (Serializable) • JTA Integration
  23. 23. © 2016 GridGain Systems, Inc. Distributed Java Structures Use of java.util.Concurrent • Distributed Map (cache) • Distributed Set • Distributed Queue • CountDownLatch • AtomicLong • AtomicSequence • AtomicReference • Distributed ExecutorService
  24. 24. © 2016 GridGain Systems, Inc. Continuous Queries • Execute a query and get notified on data changes captured in the filter • Remote filter to evaluate event and local listener to receive notification • Guarantees exactly once delivery of an event
  25. 25. © 2016 GridGain Systems, Inc. • IgniteDataStreamer - Collocation & Indexing, millions of objects/sec • StreamReceiver/StreamVisitor • Sliding Windows for CEP/Continuous Query, • JMS, Kafka, MQTT, Flume, Camel data streamer integrations • Real-time visual analytics/BI Streaming and CEP
  26. 26. © 2016 GridGain Systems, Inc. • Create chains of event processors & transform an object through various states • Synchronous or asynchronous execution of remote filters & listeners with thread control Payment Validator Payment Verifier Payment Processor Ignite Cache Event Processing using Ignite (SEDA)
  27. 27. © 2016 GridGain Systems, Inc. Event Processing using Ignite
  28. 28. © 2016 GridGain Systems, Inc. Data Grid: Tiered Memory & Local Store • Tiered Memory • On-Heap -> Off-Heap -> Swap (volatile) • Persistent On-Disk Store • Fast Recovery • Local Data Reload • Eliminate Network and Db impacts when reloading in-memory store
  29. 29. © 2017 GridGain Systems, Inc. GridGain Company Confidential Ultimate Edition GridGain Disk Storage
  30. 30. © 2017 GridGain Systems, Inc. GridGain Company Confidential Current GridGain Model Application Layer Mobile Cloud/SaaS Social IoT Enterprise Applications Data Layer RDBMS NoSQL Hadoop In-Memory Computing Layer Data Grid Compute Grid Streaming Service Grid ANSI SQL-99 File System Spark Integration Unified API ACID Transactions
  31. 31. © 2017 GridGain Systems, Inc. GridGain Company Confidential Current GridGain Model Application Layer Mobile Cloud/SaaS Social IoT Enterprise Applications Data Layer RDBMS NoSQL Hadoop In-Memory Computing Layer Data Grid Compute Grid Streaming Service Grid ANSI SQL-99 File System DML/DDL Spark Integration Unified API ACID Transactions Data Layer Disk Storage
  32. 32. © 2017 GridGain Systems, Inc. GridGain Company Confidential • All the features of Enterprise Edition Including – Persistent Data Store – Ability to query memory and disk together – Instantaneous Restarts – Full and Incremental Cluster Snapshots – Point-In-Time Recovery • Special license is required Ultimate Edition: Features Set
  33. 33. © 2016 GridGain Systems, Inc. • Multiple (up to 32) Data Centres • Complex Replication Technologies • Active-Active & Active-Passive • Smart Conflict Resolution • Durable Persistent Queues • Automatic Throttling • GridGain Enterprise Data Grid: DC Replication
  34. 34. © 2016 GridGain Systems, Inc. In-Memory Compute Grid & the rest
  35. 35. © 2016 GridGain Systems, Inc. Client-Server vs. Affinity Colocation 1 2 4 3 Data 1 Job 1 2 3 Data 2 Job 2 Processing Node 1 Processing Node 2 Client Node Data Node 1 Data Node 2 Processing Node 1 1 3 4 Data 1 Data 2 2 2 1. Initial Request 2. Fetch data from remote nodes 3. Process entire data-set 4. Return to client 1. Initial Request 2. Co-locating processing with data 3. Return partial result 4. Reduce & return to client
  36. 36. © 2016 GridGain Systems, Inc. • Direct API for MapReduce (ComputeTask, map(), result(), reduce() ) • Cron-like Task Scheduling • State Checkpoints • Load Balancing • Round-robin • Random & weighted • Probe • Automatic Failover - FailoverSPI • Per-node Shared State • Zero Deployment • P2P distributed class loading • Inter-node bytecode transfer In-Memory Compute Grid
  37. 37. © 2016 GridGain Systems, Inc. • Distributed Closures • Java lambda expressions • Java Runnable(s)/Callable(s) • ExecutorService (JDK) • Distributed, Fault Tolerant, Load Balanced • Sync or Async • Task Deployment (GAR) In-Memory Compute Grid
  38. 38. © 2016 GridGain Systems, Inc. Messaging & Events • Topic-based messaging • Ordered & Unordered messages • Local & Remote message listeners • Local & Remote event listeners • Trigger actions from any cluster events or operations • Query events via IgniteEvents API • Full Events logging not best practice
  39. 39. © 2016 GridGain Systems, Inc. • Deploy arbitrary user-defined services on cluster • Control (SLA) how many service instances are deployed on each cluster or node • Automatically ensure deployment and fault tolerance • Singletons on the Cluster – Cluster Singleton – Node Singleton – Key Singleton • Guaranteed Availability – Auto Redeployment in Case of Failures In-Memory Service Grid
  40. 40. © 2016 GridGain Systems, Inc. • Resilience - Build an in-memory resilient service layer between your client application and the grid • Shielding- Only expose application APIs and not direct grid APIs • Continuations - Call services internally via compute tasks to create service chains In-Memory Service Grid
  41. 41. © 2016 GridGain Systems, Inc. Hadoop & Spark Integration
  42. 42. © 2016 GridGain Systems, Inc. • Ignite In-Memory File System (IGFS) – Hadoop-compliant – Easy to Install – On-Heap and Off-Heap – Caching Layer for HDFS – Write-through and Read-through HDFS – Any Hadoop distribution – Performance Boost IGFS: In-Memory File System MR HIVE PIG In-Memory MapReduce IGFS HDFS IGFS YARN Any Hadoop Distro
  43. 43. © 2016 GridGain Systems, Inc. Hadoop Accelerator: Map Reduce • In-Memory Performance • Zero Code Change • Use existing MR code • Use existing Hive queries • No Name Node • No Network Noise • In-Process Data Colocation • Eager Push Scheduling User Application Hadoop Client Ignite Client Hadoop Jobtracker Hadoop Name Node Hadoop Tasktracker Hadoop Tasktracker Ignite Data Node (IGFS) Ignite Data Node (IGFS) Hadoop Data Node (HDFS) Hadoop Data Node (HDFS) Ignite Path Hadoop Path
  44. 44. © 2016 GridGain Systems, Inc. • IgniteRDD – Share RDD across jobs on the host – Share RDD across jobs in the application – Share RDD globally • Faster SQL – In-Memory Indexes – SQL on top of Shared RDD Spark & Ignite Integration Spark Application Spark Worker Spark Job Spark Job Ignite Node Yarn Mesos Docker Cloud Server Spark Worker Spark Job Spark Job Ignite Node Server Spark Worker Spark Job Spark Job Ignite Node Server In-Memory Shared RDDs
  45. 45. © 2016 GridGain Systems, Inc. • Docker • Amazon AWS • Azure Marketplace • Google Cloud • Apache JClouds • Mesos • YARN • Apache Karaf (OSGi) Deployment
  46. 46. © 2016 GridGain Systems, Inc. Thank You! www.gridgain.com https://ignite.apache.org @gridgain #gridgain Thank you for joining us. Follow the conversation. Author: Mandhir Gidda

Hinweis der Redaktion

  • Collection of logical in-memory components that solve high performance and scalability challenges
  • Collection of logical in-memory components that solve high performance and scalability challenges
  • Collection of logical in-memory components that solve high performance and scalability challenges
  • application level soft-locking using versioning
    deadlock protection
  • When it comes to querying and acting on data — including in Big Data/Fast Data environments — SQL still dominates. And no other database-agnostic in-memory solution handles SQL functionality like the Apache Ignite In-Memory Data Fabric.
  • application level soft-locking using versioning
    deadlock protection (Serializable)
  • Apache Ignite allows for most of the data structures from the java.util.concurrent framework to be used in a distributed fashion
  • Collection of logical in-memory components that solve high performance and scalability challenges
  • ComputeTask with service injection
  • Direct API for Fork/Join
  • Collection of logical in-memory components that solve high performance and scalability challenges
  • Direct API for Fork/Join
  • Affinity run of spark jobs
  • - supports multiple infrastructure including Google Cloud, AWS & Docker

×