Digital Marketing Plan, how digital marketing works
NoSQL (Not Only SQL)
1. Dr. Pouria Amirian
June 2014
Dr. Pouria Amirian
Big Data Project Manager and Data Scientist
University of Oxford
Pouria.Amirian@ndm.ox.ac.uk; Pouria.Amirian@gmail.com
@pouriaamirian
3. “By 2015, 4.4 million IT jobs globally will be created to support
Big Data.
But there is a challenge. There is not enough talent in the
industry. Our public and private education systems are failing us.
Therefore only one-third of the IT jobs will be filled.These jobs are
the future of the new information economy.”
Three Major areas of demand in Computer Science and IT:
Big Data, Mobile and SocialComputing
(the foundation of theses three topics is Cloud Computing)
3
4. SQL
Advantages and Disadvantages
NoSQL
History
CommonTraits
Categories
Examples
Trends
4
7. Proven and Available talent /Well-known
Many programmers are already familiar with it.
Transactions and ACID make development easy.
Lots of tools to use.
Scalable
Free and Commercial production support
SQL (general and high-level query language)
7
8. Create a database for posts of a weblog
Each post is authored by a user
Each post can have multiple comments from other
users
Users can vote for a post (stars 0-5)
Users can like comments
Posts have date, comments have date
9. How Can I Cast an object to an Interface in C#?
I have to work with COM-based system and the only way to
work with the system is to work with interfaces. the problem
is when I worked in VB 6.0 the compiler could automatically
cast any object to an interface. However since C# is more
type-safe it is not provided automatically. So how can I
convert an Obj to an Interface in C#?
Joe “2011-07-26”
Tags: C#, Cast, Interface
James “2011-07-26”
use the cast operator of C#
Ana, “11-07-27”
you can use the ‘as’ keyword, look at the following code:
Iinterface myInterface= myObj as Iinterface
10.
11. What are the posts by “Joe”? How many Stars they got?
What are the comments written by “James”?
12. 12
{
“_id” : ObjectId("4e2e3f92268cdda473b628f6"),
“title” : “How can I cast an Object to an Interface in C#?”,
“when” : Date(“2011-07-26”),
“author” : “joe”,
“text” : “I have to work with COM-based system and the only
way to work with the system is to work with interfaces. the
problem is ….”,
“tags” : [“C#”, “Cast”, “Interface”],
“voters” : [“James”, “11-07-26”, 4],[“John”, “11-07-26”,5],
“comments” : [
{“by”:“James”, “text”:“use the cast operator of C#”,
“when”:”11-07-26”},
{“by”:“Ana”, “text”:“you can use the ‘as’ keyword …”,
“when”:”11-07-27”}]
} db.posts.find({“author” : “joe”}).sort()
db.posts.find({“comments.by” : “James”})
13. Rigid schema design
Hard to scale (Very limited scalability)
Hard and complex Joins across multiple nodes
Hard to handle data growth (Schema change, High
Volume of Data, HighVolume ofTransactions,…)
Need for interface for data access (another layer of complexity)
Impedance mismatches
Mapping between Relational storage and Object-based
computing (Object Relational Mapping doesn't work quite well)
13
14. Relational Databases are no longer one-size-fits-all
Examples
Content Management Systems
Network Data (Social Networking, Location-Based
Application)
Spatial Data Management Systems
High frequency of change (huge amount of read and
write)
14
17. The needs of modern applications do not always
match what relational databases provide.
Success stories of Big Data management of
internet giants such as Google, Amazon,
Facebook, LinkedIn, …
The mentioned companies faced unique
challenge and they developed some sort of
custom solution
17
18. The Google File System, October 2003
MapReduce, December 2004
BigTable, November 2006
…
Massively Scalable Google’s Infrastructure for:
Google Search Engine
Google Map and Google Earth
Gmail, …
18
19. Open source developers have tried to replicate each
peace of Google’sTechnology Stack
Project Hadoop and its sub projects was born atYahoo!
Google Infrastructure Hadoop Universe
Google File System
(GFS)
Hadoop Distributed File
System (HDFS)
MapReduce Hadoop
BigTable HBase
19
20. Dynamo: Amazon’s Highly Available Key/Value
Store, 2007
Then use cases from Ebay, Facebook, Netflix,
Yahoo, IBM and …
20
21. 21
2004 BigTable (Google)
2007 Dynamo (Amazon)
2008 Cassandra (Facebook)
In 2009 in San Francisco NoSQL name proposed by Eric Evans to
describe the growing non-relational movement
In 1998Carlos Strozzi use the word “NoSQL” to describe a relational database
that did not expose a SQL interface
22. Not based on the relational model
Flexible Schema
Supports distributed database architectures
Provides high scalability, high availability, and fault
tolerance
Supports very large amounts of sparse data
Geared toward performance rather than consistency
22
25. Memcached – Key value stores.
Membase – Memcached with persistence and
improved consistent hashing.
AppFabric Cache – Multi region Cache.
Redis – Data structure server.
Riak – Based on Amazon’s Dynamo.
ProjectVoldemort – eventual consistent key value
stores, auto scaling.
26. Schema Free.
Usually JSON like interchange model.
Query Model: JavaScript or custom.
Aggregations: Map/Reduce.
Indexes are done via B-Trees.
27. 11
27
{
“_id” : ObjectId("4e2e3f92268cdda473b628f6"),
“title” : “How can I cast an Object to an Interface in C#?”,
“when” : Date(“2011-07-26”),
“author” : “joe”,
“text” : “I have to work with COM-based system and the only
way to work with the system is to work with interfaces. the
problem is ….”,
“tags” : [“C#”, “Cast”, “Interface”],
“voters” : [“James”, “11-07-26”, 4],[“John”, “11-07-26”,5],
“comments” : [
{“by”:“James”, “text”:“use the cast operator of C#”,
“when”:”11-07-26”},
{“by”:“Ana”, “text”:“you can use the ‘as’ keyword …”,
“when”:”11-07-27”}]
}
28. Id username email Department
1 John john@foo.com Sales
2 Mary mary@foo.com Marketing
3 Yoda yoda@foo.com IT
Id
1
2
3
Username
John
Mary
Yoda
email
john@foo.com
mary@foo.com
yoda@foo.com
Department
Sales
Marketing
IT
Row oriented (Relational)
Column oriented
34. In a sample social network containing 1,000,000 nodes
(people) each with approximately 50 edges
(relationship)
34
Depth RDBMS Graph Returned Records
2 0.016 0.01 ~2500
3 30.267 0.168 ~110,000
4 1543.505 1.359 ~600,000
5 Unfinished 2.132 ~800,000
Time in Seconds
36. 1- Non-relational
NoTables
No Joins
No ACIDTransaction *
No support for SQL *
*: a few NoSQL databases support ACID and SQL
36
37. 2- Schema Free
In a data collection:
There can be records with completely different data
items (fields)
▪ Book 1 {name, publicationYear}
▪ Book 2 {author, publisher}
The schema is in:
the data itself or (JSON)
usually in application not in the database
37
39. 4-Web Scale Applications:
Simple requests (underlying database seems to be
unsophisticated)
However:
Sheer volume of data
huge number of users (millions of user)
39
40. 5- Open Source but from large internet companies:
Google
Facebook
Twitter
Linkedin
Yahoo
40
42. 42
Volume
• Huge amount of Collected and generated data by organizations or
individuals
• Need for huge amount of storage and processing power
Velocity
• Frequency at which data is generated, captured, shared and processed
• Need for real-time retrieval and process of data for large number of users
Variety
• Many formats and structures and sources
• Need for new types of storage and processing for structured and
Unstructured data
44. many different types of tools, techniques,
technologies, algorithms and computation models for
collection, generation, storage, management, analysis
and visualization of high-volume (of size), high-velocity
(of change) and high-variety (in nature) data sets.
44
48. Also known as Brewer’sTheorem by Prof. Eric Brewer,
published in 2000 at University of Berkeley.
“Of three properties of a shared data system: data
consistency, system availability and tolerance to
network partitions, only two can be achieved at any
given moment.”
Proven by Nancy Lynch et al. MIT labs.
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-
keynote.pdf
49. Consistency: All clients have same view of data
Availability: Each client can always read and write
data
Partition tolerance: the system works well despite
physical network partitions
“CAP theorem” says A Database may only excels at
two of the CAP attributes
49
51. Atomicity: All or nothing.
Consistency: Consistent state of data
Isolation:Transactions are isolated from each other.
Durability:When the transaction is committed, state
will be durable.
Any data store can achieve Atomicity, Isolation and
Durability but do you always need consistency? No.
By giving up ACID properties, one can achieve higher
performance and scalability.
52. CAP in SQL databases >> CA (not distributed), CP (not
available distributed)
ACID is guaranteed
DBMS keeps users waiting (in order to propagate all
the changes to all nodes)
52
53. CAP in NoSQL databases >> AP, CP
DBMS will guarantee the consistency eventually but
meanwhile DBMS give control back to the application
(no waiting for users)
The NoSQL database doesn’t commit the changes
right away (buffers)
The data will be eventually consistent
53
54. Acronym contrived to be the opposite of ACID
Basically Available,
Soft state,
Eventually Consistent
54
55. 55
Basically Available
possibilities of faults but not a fault of the whole system
Soft state
copies of a data item may be inconsistent
Eventual Consistency
When no updates occur for a long period of time, eventually all
updates will propagate through the system and all the nodes will
be consistent
copies becomes consistent at some later time if there are no
more updates to that data item
56. ACID:
• Strong consistency.
• Less availability.
• Pessimistic concurrency.
• Complex.
BASE:
• Availability is the most important thing.Willing to
sacrifice for this (CAP).
• Weaker consistency (Eventual).
• Simple and fast.
• Optimistic concurrency.
59. Massive write performance
Fast key value look ups
No single point of failure
Fast prototyping and development
Out of the box scalability (Horizontally Scalable)
Easy maintenance
59
60. Simple APIs
C# Example: db.collection.save(myDocument);
Seamless language integration
No impedance mismatch (look at the above C#
example)
Designed to be horizontally scalable (elastic)
Flexible data model and schema
Majority free and/or Open Source
60
61. There are more than 140 NoSQL Products
Many are not proven
Lack of SQL (the biggest missed feature)
Proprietary Query Languages
Lack of Skilled people
Do you know a DBA for MarkLogic?
Lack ofTools for modeling, documenting, reporting, …
(usually there are no good visual tools)
Lack of Standards (It is the biggest threat)
61
67. It is not necessary for the application to use a single
data store for all of its needs, since different databases
are built for different purposes and not all problems
can be elegantly solved by a singe database.
Using Different Data StorageTechnologies for
Varying Data Storage Needs
67
68. Key-value stores:
Processing a constant stream of small reads and writes.
Document databases:
Natural data modeling. Programmer friendly. Rapid
development. Web friendly, CRUD.
RDMBS:
OLTP. SQL.Transactions. Relations.
Columnar:
Handles size well. Massive write loads. High availability.
Multiple-data centers, MapReduce.
Graph:
Graph algorithms and relations.