Weitere ähnliche Inhalte Mehr von Impetus Technologies (20) Kürzlich hochgeladen (20) NoSQL Landscape and a Solution to Polyglot Persistence2. © Impetus Technologies
Agenda
• Big Data Problems
• Transition from RDMS to NoSQL
• NoSQL Landscape
• Challenges in transition
• Tools for NoSQL
• Kundera – an open source polyglot solution
Recorded version available at http://bit.ly/1hfz4Tn
© Impetus Technologies
4. © Impetus Technologies
Why not RDBMS?
Scalability
Data format
High availability
Data volume in zeta
byte, yottabyte
Horizontal scaling
would be expensive
Data format can be
static or dynamic
Relational / Non-
relational
Data locality
No single point
of failure
Recorded version available at http://bit.ly/1hfz4Tn
5. © Impetus Technologies
Non-RDBMS way
Scale out
Scale up Static schema
Dynamic schema
Centralized
Decentralized
Recorded version available at http://bit.ly/1hfz4Tn
6. © Impetus Technologies
Introduction to NoSQL
“An approach to storing and retrieving data with horizontal scaling, simple
design and high availability”
Data format driven
processing
Distributed with No
single point of
failure(SPOF)
Thinking out of SQL
box
Recorded version available at http://bit.ly/1hfz4Tn
7. © Impetus Technologies
NoSQL :A Pragmatic Solution?
With NOSQL data can be consistent, highly available
and with no SPOF!
But not 100%!
Recorded version available at http://bit.ly/1hfz4Tn
8. © Impetus Technologies
CAP Theorem
Consistency
Availability
Partition
Tolerance N/A
Recorded version available at http://bit.ly/1hfz4Tn
10. © Impetus Technologies
Size
High data growth ! scalability is an issue?
Traditional RDBMS based solutions will not work!
xxx
xxx
xxx
Recorded version available at http://bit.ly/1hfz4Tn
11. © Impetus Technologies
Velocity
Near real time/Big Data analytics
Parallel processing, ready-for-read design is required
Traditional RDBMS solutions are not
fast enough to meet the SLAs !
Recorded version available at http://bit.ly/1hfz4Tn
13. © Impetus Technologies
Format
Non relational data format.
Different nature of data set: graph based, key-value based access
Traditional database is limited to static tables!
lo
g
s
Recorded version available at http://bit.ly/1hfz4Tn
15. © Impetus Technologies
Transition to NoSQL
Datastore
Selection
API
exploration
Landscape
Understanding
Implementation
Recorded version available at http://bit.ly/1hfz4Tn
16. © Impetus Technologies
Selecting a NoSQL Datastore
Neo4j, Titan,
Objectivity,
Orient DB,
Vertex DB
Cassandra,
HBase,
Hypertable,
BigTable
Oraclekv, Redis,
Couch DB, Riak
MongoDB,
Couch base
Graph Columnar
Key-value Document
Recorded version available at http://bit.ly/1hfz4Tn
17. © Impetus Technologies
High Level APIs
Kundera
Kundera
Kundera
Hector
Easy Cassandra
Datastax java
driver
Astyanax
Morphia
Data Nucleus
Jongo
Spring data
Spring data
Neo4j
Hibernate OGM
Data nucleus
Hbase api
Spring data
Kundera
Recorded version available at http://bit.ly/1hfz4Tn
18. © Impetus Technologies
Hybrid Design
Cassandra, HBase RDBMS Redis
MongoDB, Couchbase Neo4J, Titan Hadoop, Spark
Recorded version available at http://bit.ly/1hfz4Tn
19. © Impetus Technologies
Bumpy Ride!
Unlearn and Learn
new APIs!
Index based retrieval
over multiple NOSQL
data stores
Atomic operations
NOSQL world is still
evolving, may need to
explore among data stores
Migration of existing
production applications and
many more…
Recorded version available at http://bit.ly/1hfz4Tn
20. © Impetus Technologies
One Stop Solution
Master key, possible?
Let’s explore!
Recorded version available at http://bit.ly/1hfz4Tn
21. © Impetus Technologies
Polyglot Way
Migrating existing
solutions
Guarantee
atomicity
Switch
databases
Recorded version available at http://bit.ly/1hfz4Tn
22. © Impetus Technologies
High Level Polyglot API
Spring data
Kundera
Spring data
Kundera
Spring data
Kundera
Spring data
Kundera
Let’s implement in JPA way!
Recorded version available at http://bit.ly/1hfz4Tn
23. © Impetus Technologies
Kundera to the Rescue!!
Supports 8 data stores –
Cassandra, Hbase,
MongoDB, Redis, Neo4j,
Oracle NoSQL, CouchDB
and any RDBMS
CRUD / Strong Query
Support
Object Relationships
Handling
Datastore-Optimized
Persistence and Query
Approach
Interceptors / Events /
Caching
Connection Pool / Fallback
(Lucene) Indexing
Flexibility
Recorded version available at http://bit.ly/1hfz4Tn
25. © Impetus Technologies
User Logs Sample App
@Entity
@Table(name = "user“)
@IndexCollection(columns = { @Index(name = "emailId") })
public class User {
@Id
@Column(name = "user_id")
private String userId;
@Column(name = "first_name")
private String firstName;
@Column(name = "last_name")
private String lastName;
@Column(name = "emailId")
private String emailId;
@OneToMany(cascade = CascadeType.ALL, fetch = FetchType.LAZY)
@JoinColumn(name = "user_id")
private Set<userLogs> logs;
@Embedded
private PersonalDetail personalDetail;
public User() {
// Default constructor.
}
//Setters and Getters
@Entity
@Table(name = “logs”)
@Index(columns = { "body", “created_at" }, index = true)
public class UserLogs {
@Id
@Column(name = “log_id")
private String logId;
@Column(name = "body")
private String body;
@Column(name = “created_at")
@Temporal(TemporalType.DATE)
private Date createdDate;
public UserLogs() {
// Default constructor.
}
// Setters and Getters
User Entity UserLogs Entity
Recorded version available at http://bit.ly/1hfz4Tn
26. © Impetus Technologies
User Logs Sample App
Configuration : Persistence.xml
<!-- Persistence unit for Cassandra persistence -->
<persistence-unit name=“logCassandra">
<provider>com.impetus.kundera.KunderaPersistence</provider>
<class>com.impetus.kvapps.entities.UserLogs</class>
<exclude-unlisted-classes>true</exclude-unlisted-classes>
<properties>
<property name="kundera.nodes" value="localhost" />
<property name="kundera.port" value="9160" />
<property name="kundera.keyspace" value=“userstore" />
<property name="kundera.dialect" value="cassandra" />
<property name="kundera.client.lookup.class"
value="com.impetus.client.cassandra.thrift.ThriftClientFactory" />
<property name="kundera.ddl.auto.prepare" value="create" />
<property name="index.home.dir" value="lucene"/>
</properties>
</persistence-unit>
<!-- Persistence unit for mysql persistence -->
<persistence-unit name=“logRdbms">
<provider>com.impetus.kundera.KunderaPersistence</provider>
<class>com.impetus.kvapps.entities.User</class>
<exclude-unlisted-classes>true</exclude-unlisted-classes>
<properties>
<property name="kundera.client.lookup.class"
value="com.impetus.client.rdbms.RDBMSClientFactory" />
<property name="hibernate.hbm2ddl.auto" value="create" />
<property name="hibernate.show_sql" value="false" /><property
name="hibernate.format_sql" value="false" />
<property name="hibernate.dialect"
value="org.hibernate.dialect.MySQL5Dialect" />
<property name="hibernate.connection.driver_class"
value="com.mysql.jdbc.Driver" />
<property name="hibernate.connection.url"
value="jdbc:mysql://localhost:3306/userstore" />
<property name="hibernate.connection.username" value="root" />
<property
name="hibernate.connection.password" value="root" />
</propertie>
</persistence-unit>
Recorded version available at http://bit.ly/1hfz4Tn
27. © Impetus Technologies
Switching Data stores
<!-- Persistence unit for Cassandra persistence -->
<persistence-unit name=“logCassandra">
<provider>com.impetus.kundera.KunderaPersistence</provider>
<class>com.impetus.kvapps.entities.userLogs</class>
<exclude-unlisted-classes>true</exclude-unlisted-classes>
<properties>
<property name="kundera.nodes" value="localhost" />
<property name="kundera.port" value="9160" />
<property name="kundera.keyspace" value=“userstore" />
<property name="kundera.dialect" value="cassandra" />
<property name="kundera.client.lookup.class"
value="com.impetus.client.cassandra.thrift.ThriftClientFactory" />
<property name="kundera.ddl.auto.prepare" value="create" />
<property name="index.home.dir" value="lucene"/>
</properties>
</persistence-unit>
<!-- Persistence unit for Mongo persistence -->
<persistence-unit name=“logMongo">
<provider>com.impetus.kundera.KunderaPersistence</provider>
<class>com.impetus.kvapps.entities.User</class>
<exclude-unlisted-classes>true</exclude-unlisted-classes>
<properties>
<property name="kundera.nodes" value="localhost" />
<property name="kundera.port" value="27017" />
<property name="kundera.keyspace" value=“userlstore" />
<property name="kundera.dialect" value="mongodb" />
<property name="kundera.client.lookup.class"
value="com.impetus.client.mongodb.MongoDBClientFactory" />
<property name="kundera.ddl.auto.prepare" value="create" />
</properties>
</persistence-unit>
//create entity manager factory.
EntityManagerFactory emf = Persistence.createEntityManagerFactory(“logCassandra,logMongo”, properties);
EntityManager em = emf.createEntityManager();
…..
em.persist(user);
Configuration : Persistence.xml
Persist Data
Recorded version available at http://bit.ly/1hfz4Tn
29. © Impetus Technologies
Technical Challenges Addressed!
• Distributed indexing over multiple NOSQL database e.g. Solr,
Elastic search
• Plugin Kundera powered ES or Lucene indexer
• Build your own library and simply plugin
• Unlearn and Learn new APIs!
• Based on most popular JPA 2.0 specification
• Atomicity guarantee and Transaction management
• Built in support for JPA/JTA transaction and batch operations
• NOSQL world is evolving, plan to switch databases?
• Since it’s a JPA powered solution, reuse same code with almost no changes
Recorded version available at http://bit.ly/1hfz4Tn
31. © Impetus Technologies
ThankYou!
• Meet us at
• Hadoop Summit, San Jose
• CIO Big Data Summit, Texas
• Strata Conference + Hadoop World, New York
• Gartner Symposium, Orlando
• Try / Recommend Kundera
• https://github.com/impetus-opensource/Kundera
• @impetustech
Hinweis der Redaktion TITLE: Real-time Streaming Analytics – Business Value, Use Cases and Architectural Considerations
Speaker: Anand Venugopal, Sr. Director of Business Development
Abstract: As IT and line-of-business executives begin to operationalize Hadoop and MPP based batch big data analytics, it&apos;s time to begin to understand and prepare for the next wave of innovation in data processing—Analytics over real-time streaming data. This session will provide an overview and discussion on the business value, use cases and architectural considerations of integrating real-time streaming analytics into your Enterprise Big Data roadmap.