Slide presentation pycassa_upload

PYCON INDIA 2012

Pycassa – Python
Cassandrified
28-30th September 2012 Ramesh Rajini
Dharmaram Vidya Infosys Limited,
Kshetram Education & Research, Bangalore
Bangalore, Karnataka

Session Plan
• Need & Introduction to NoSQL DB
• Cassandra Introduction
• Data model creation
• Pycassa in action

Heard of NO - SQL?
• Stands for Not Only SQL
• Class of non-relational data storage systems
• No fixed table schema
• No Joins!
• Relax one or more of the ACID properties & will
implement BASE & CAP Theorem!

Do we “REALLY” need them ?

• RDBMS …So strong
• so crisp
• so vast
• And WE know it well!

Trends shrends!

– Gartner‟s 10 key IT trends for 2012
• unstructured data will grow some 80% over the
course of the next five years

5

What made some apps go No-SQLized?
• Explosion of social media sites with large data needs
• Open-source community
• Upsurge of cloud-based solutions
• Migration to dynamically-typed languages

RDBMS..hmmm
• Normalization => Joins => Slow Queries /Complications
• Consistency => locks /transactions => Performance issues in
distributed environments
• Scalability becomes a mess as our apps grow in size and
demand

Current Approach to Scalability
• Add hardware
• Upgrade hardware
• More machines
• Turn off unwanted services
• Caching
• De-normalize…

RDBMS ..tends to

Massive [terabytes]

Elastic scalability

Easily achieve Fault tolerance

Tunable Consistency

But Why..

• ACID
• - transaction slow under heavy load
• - in distributed /replicated environment = 2 phase
commit => infinite wait by either NODE or Coordinator

But RDBMS is still holding up!!
• Yes..it is
• Will continue to Co-exist with NOSQL
• What if data is no more a problem to me!
• What new problems will I like to have?

Seeds of NoSQL
• Three major papers
– BigTable (Google)
– Dynamo (Amazon)
• Gossip protocol (discovery and error detection)
• Distributed key-value data store
• Eventual consistency
– CAP Theorem

Brewer’s CAP Theorem
• Properties of a system:
– Consistency
– Availability
– Partitions

Brewer’s CAP Theorem
• You can have it good, you can have it fast, you can have
it cheap: pick two

14

BASE Vs ACID - Eventual Consistency
• No updates for a long duration => eventually all updates
will propagate through the system => all the nodes will
be consistent
• Any given accepted update and a given node, eventually
either the update reaches the node or the node is
removed from service
• Known as BASE (Basically Available, Soft state,
Eventual consistency)

What kinds of NoSQL
• 2 Major areas:
– Key/Value or „the big hash table‟.
• Dynamo
• Voldemort
• Scalaris
– Schema-less
• column-based, document-based or graph-based.
– Cassandra (column-based)
– CouchDB (document-based)
– Neo4J (graph-based)
– HBase (column-based)

Cassandra to the Rescue!
– , source,
Open

Distributed, Decentralized,

Elastically scalable

Highly available / fault-tolerant

Tune ably consistent

Column-oriented database

Automatic sharding

Gossip Architecture

18

Distributed and Decentralized

Can be running Decentralized
on multiple • that there is no single
machines point of failure.
• appearing to users as • All the nodes in
single instance cluster function
exactly the same
[server symmetry]

19

Elastic Scalability

• Vertical scaling :
– more hardware capacity /memory

• Horizontal scaling :
• More machines that have all or some
of the data
• So that no machine is bearing the
complete load

20

Elastic Scalability , No single point failure
• Elastic scalability :
– Cluster will be able to scale up & down
• Master Slave issue

21

Scale UP & Scale down

• Add nodes and they can start serving clients!
– NO server restart / NO query change / NO
balancing
– JUST add an another machine.
• Just unplug the system.
– Since cassandra has multiple copies of the same
data in more than one node [configurable] there
wont be any loss of data.

High Availability and Fault Tolerance
• High availability + central server based system = problem
– Internal Hard ware redundancy
– Sounds cool but Extremely Costly

23

High Availability and Fault Tolerance
– Cassandra allows to :
• replace failed nodes in with no downtime
• replicate data to multiple data centers to prevent
downtime [automatic]

Tuneable Consistency
• Consistency : All Reads return the most recently written
value
– Cassandra is “eventually consistent” model by
default.

25

But then!

• Amazon, Facebook, Google, Twitter which uses this
model.
– DATA is their main sales item
– High performance!

Setting up Apache Cassandra
• From the DataStax community Project
– www.datastax.com/download
• From the Apache Cassandra project:
– http://cassandra.apache.org/

Believe it.. It‟s easy to
install & set up!

Keyspace & Column Family creation

Column family 1
Key1 ColumnName1 ColumnName2
Value Value
Key2 ColumnName1 ColumnName2
Value Value
Key3 ColumnName1 ColumnName2 ColumnName3
Value Value Value

Column family 2
Key1 ColumnName1 ColumnName2 ColumnName3
Value Value Value

Data makes sense..

Column family Close Friends
010051 Mail id tweets
Ramesh_Rajini Hello
010052 Mail id tweets
Vinz_Raj I‟m logged in!
010053 Mail id tweet1 tweet2
Ragh_Rao Hey, how r u ? Movie..

Column family Colleagues
020061 Mail id City Likes
Puru_lal Bangalore Ladoos!

Cassandra Data Structure

key space

Ex:
column family
Colony
Name,
UserIDs,
Ex:
Address, column
EmpIDs Tweets,
Likes, name value timestamp
Skill Set

Key-in the Key space..

31

Multi-level Dictionary

{“FriendsInfo”: Keyspace
{“closefriends”: Column Family
Key {010053: OrderedDict(
[(“MailId”:“Ragh_Rao”),
Columns (“tweet1”:“Hey, how r u ?”),
(“tweet2”: “Movie..”)])

OrderedDict(
..
}} ColumnKeys ColumnValues

Can I insert in bulk?
• Yes, luckily as an ordered dict..
col_fam.batch_insert(
{'010054': {'Name': 'Vinayak', 'Id': „9308'},
'010057': {'Name': 'Poorvi'}
})
__________________________________
for i in range(1000, 1010):
... col_fam.insert('EmpIDs', {str(i): 'Hello'})

34

Is the data stored?
• With Key , get all details:
col_fam.get('010052')
OrderedDict
([('Maild', 'Vinz_Raj'), ('tweets', 'Im loggedin!')])

• With Key, get specific details:
col_fam.get('010053', columns=['MaiID', 'tweet2'])
OrderedDict([('tweet2', 'Movie..')])
• Specifying start & end columns:
col_fam.get('EmpIDs', column_start='1002', column_finish='1006')
OrderedDict([('1002', 'Hello'), ('1003', 'Hello'), ('1004', 'Hello'),
('1005', 'Hello'), ('1006', 'Hello')])

35

Can the columns be sliced?
• Specifying the reverse way
col_fam.get('EmpIDs', column_reversed=True, column_count=3)
OrderedDict([('1009', 'Hello'), ('1008', 'Hello'), ('1007', 'Hello')])
• Fetching multiple rows
col_fam.multiget(['010053', '010051'])
OrderedDict(
[('010053',
OrderedDict([('Maild', 'Ragh_Rao'), ('tweet1', 'Hey, how r u?'),
('tweet2', 'Movie..')])),
('010051',
OrderedDict([('Mailid', 'Ramesh_Rajini'), ('tweets', 'Hello')]))])

36

Counting..
• get_count()
 Count the number of columns in the row with key .
• multiget_count()
 Perform a column count in parallel on a set of rows.
 Similar parameters as for multiget(), except that a list
of keys may be used.
 A dictionary of the form {key: int} is returned.

37

What Next?
• Explore more on Pycassa modules..
– http://pycassa.github.com/pycassa/api/index.html
• Start using it.. I‟m sure you‟ll enjoy because it is simply
superb!

38

Recap
• Need & Introduction to NoSQL DB
• Cassandra Introduction
• Data model creation
• Pycassa in action

39

References
• Cassandra, The Definitive Guide – O‟reilly
Publication,Eben Hewitt
• http://www.datastax.com/
• http://pycassa.github.com/pycassa/
• https://github.com/twissandra/twissandra
• https://groups.google.com/forum/?fromgroups#!forum/py
cassa-discuss

40

Time for R&R?
- Requests & Responses

Thank you!

- R&R
Ramesh Rajini

Disclaimer : All logos and images belong to the creator and companies which own them

Slide presentation pycassa_upload

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Slide presentation pycassa_upload

Ähnlich wie Slide presentation pycassa_upload (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Slide presentation pycassa_upload