SlideShare ist ein Scribd-Unternehmen logo
1 von 81
Downloaden Sie, um offline zu lesen
A importância dos dados em sua
         arquitetura… uma visão muito além
         do SQL Server

            Alexandre Porcelli - @porcelli
            Gleicon Moraes - @gleicon




sexta-feira, 3 de junho de 2011
Alexandre Porcelli
                                                                                  Creator &
                                                                                   Dictator




    Alexandre Porcelli
     Writer


                           Alexandre Porcelli
                            Organizer

                                                Alexandre Porcelli
                                                Commiter / Parser Developer


                                                                                Alexandre Porcelli
                                                                                Core Developer / API Designer




sexta-feira, 3 de junho de 2011
Gleicon Moraes

                                  http://zenmachine.wordpress.com
                                       http://github.com/gleicon
                                                @gleicon




sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
existe um mundo
 além do...




sexta-feira, 3 de junho de 2011
ou do...




sexta-feira, 3 de junho de 2011
além, até mesmo
 dos...




sexta-feira, 3 de junho de 2011
inclusive do...




sexta-feira, 3 de junho de 2011
próximo do
mundo sombrio
do...




sexta-feira, 3 de junho de 2011
nosql

sexta-feira, 3 de junho de 2011
uma
                                  nova
                                  escola

sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
contexto




sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
falta de capital
sexta-feira, 3 de junho de 2011
big data




sexta-feira, 3 de junho de 2011
história...




sexta-feira, 3 de junho de 2011
modelos
                     • Hierarchical (IMS): late 1960’s and 1970’s
                     • Directed graph (CODASYL): 1970’s
                     • Relational: 1970’s and early 1980’s
                     • Entity-Relationship: 1970’s
                     • Extended Relational: 1980’s
                     • Semantic: late 1970’s and 1980’s
                     • Object-oriented: late 1980’s and early 1990’s
                     • Object-relational: late 1980’s and early 1990’s
                     • Semi-structured (XML): late 1990’s to late 2000’s
                     • The next big thing: ???




                                  ref: What Goes Around Comes Around por Michael Stonebraker e Joey Hellerstein
sexta-feira, 3 de junho de 2011
next big thing?




sexta-feira, 3 de junho de 2011
definição...




sexta-feira, 3 de junho de 2011
abaixo ao
                              banco de
                               dados
                             relacional!

sexta-feira, 3 de junho de 2011
abaixo ao banco de
                                    dados relacional!

                                  como bala
                                   de prata!



sexta-feira, 3 de junho de 2011
momento
  histórico...
sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
resolver
                            problemas
                            específicos

sexta-feira, 3 de junho de 2011
estrutura
sexta-feira, 3 de junho de 2011
                                  de dados
chave-valor




sexta-feira, 3 de junho de 2011
modelo




sexta-feira, 3 de junho de 2011
família de colunas
sexta-feira, 3 de junho de 2011
modelo
                                                      Keyspace

                                                  Família de Colunas

                                                        linha
                        chave
                                  coluna     coluna     coluna coluna          coluna . . .   coluna

                                                           .
                                                           .
                                                           .

                                                        linha
                        chave
                                  coluna          coluna               coluna        ...      coluna




                                                        Coluna

                                           nome   timestamp            valor




sexta-feira, 3 de junho de 2011
documento




sexta-feira, 3 de junho de 2011
modelo




sexta-feira, 3 de junho de 2011
grafo




sexta-feira, 3 de junho de 2011
visão geral




sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
arquitetura




sexta-feira, 3 de junho de 2011
Architectural Anti Patterns
                  Notes on Data Distribution and Handling Failures




sexta-feira, 3 de junho de 2011
Fail




sexta-feira, 3 de junho de 2011
Anti Patterns

   •    Evolution from SQL Anti Patterns (NoSQL:br May 2010)
   •    More than just RDBMS
   •    Large volumes of data
   •    Distribution
   •    Architecture
   •    Research on other tools
   •    Message Queues, DHT, Job Schedulers, NoSQL
   •    Indexing, Map/Reduce
   •    New revision since QConSP 2010: included Hierarchical
        Sharding, Embedded lists and Distributed Global Locking




sexta-feira, 3 de junho de 2011
RDBMS Anti Patterns
  Not all things fit on a relational database, single ou distributed

   •    The eternal table-as-a-tree
   •    Dynamic table creation
   •    Table as cache, queue, log file
   •    Stoned Procedures
   •    Row Alignment
   •    Extreme JOINs
   •    Your scheme must be printed in an A3 sheet.
   •    Your ORM issue full queries for Dataset iterations
   •    Hierarchical Sharding
   •    Embedded lists
   •    Distributed global locking
   •    Throttle Control



sexta-feira, 3 de junho de 2011
The eternal tree
  Problem: Most threaded discussion example uses something
  like a table which contains all threads and answers, relating to
  each other by an id. Usually the developer will come up with
  his own binary-tree version to manage this mess.

  id - parent_id -author - text
  1 - 0 - gleicon - hello world
  2 - 1 - elvis - shout !

  Alternative: Document storage:
  { thread_id:1, title: 'the meeting', author: 'gleicon', replies:[
       {
         'author': elvis, text:'shout', replies:[{...}]
       }
     ]
  }

sexta-feira, 3 de junho de 2011
Dynamic table creation
  Problem: To avoid huge tables, one must come with a
  "dynamic schema". For example, lets think about a document
  management company, which is adding new facilities over the
  country. For each storage facility, a new table is created:

  item_id - row - column - stuff
  1 - 10 - 20 - cat food
  2 - 12 - 32 - trout

  Now you have to come up with "dynamic queries", which will
  probably query a "central storage" table and issue a huge join
  to check if you have enough cat food over the country.

  Alternatives:
  - Document storage, modeling a facility as a document
  - Key/Value, modeling each facility as a SET

sexta-feira, 3 de junho de 2011
Table as cache
  Problem: Complex queries demand that a result be stored in a
  separated table, so it can be queried quickly. Worst than views


  Alternatives:

  - Really ?

  - Memcached

  - Redis + AOF + EXPIRE

  - De-normalization




sexta-feira, 3 de junho de 2011
Table as queue
  Problem: A table which holds messages to be completed.
  Worse, they must be ordered by
  time of creation.

  Corolary: Job Scheduler table

  Alternatives:
  - RestMQ, Resque

  - Any other message broker

  - Redis (LISTS - LPUSH + RPOP)

  - Use the right tool



sexta-feira, 3 de junho de 2011
Table as log file
  Problem: A table in which data gets written as a log file. From
  time to time it needs to be purged. Truncating this table once
  a day usually is the first task assigned to new DBAs.

  Alternative:

  - MongoDB capped collection

  - Redis, and RRD pattern

  - RIAK




sexta-feira, 3 de junho de 2011
Stoned procedures
  Problem: Stored procedures hold most of your applications
  logic. Also, some triggers are used to - well - trigger important
  data events.

  SP and triggers has the magic property of vanishing of our
  memories and being impossible to keep versioned.

  Alternative:
  - Now be careful so you dont use map/reduce as modern
  stoned procedures. Unfit for real time search/processing

  - Use your preferred language for business stuff, and let event
  handling to pub/sub or message queues.




sexta-feira, 3 de junho de 2011
Row Alignment
  Problem: Extra rows are created but not used, just in case.
  Usually they are named as a1, a2, a3, a4 and called padding.

  There's good will behind that, specially when version 1 of the
  software needed an extra column in a 150M lines database
  and it took 2 days to run an ALTER TABLE. But that's no
  excuse.

  Alternative:

  - Quit being cheap. Quit feeling 'hacker' about padding

  - Document based databases as MongoDB and CouchDB, has
  no schema. New atributes are local to the document and can
  be added easily.


sexta-feira, 3 de junho de 2011
Extreme JOINs
  Problem: Business stuff modeled as tables. Table inheritance
  (Product -> SubProduct_A). To find the complete data for a
  user plan, one must issue gigantic queries with lots of JOINs.

  Alternative:

  - Document storage, as MongoDB
    might help having important
    information together.

  - De-normalization

  - Serialized objects




sexta-feira, 3 de junho de 2011
Your scheme fits in an A3 sheet
  Problem: Huge data schemes are difficult to manage. Extreme
  specialization creates tables which converges to key/value
  model. The normal form get priority over common sense.

  Product_A                       Product_B
  id - desc                       id - desc

  Alternatives:

  - De-normalization
  - Another scheme ?
  - Document store for flattening model
  - Key/Value
  - See 'Extreme JOINs'



sexta-feira, 3 de junho de 2011
Your ORM ...
  Problem: Your ORM issue full queries for dataset iterations,
  your ORM maps and creates tables which mimics your
  classes, even the inheritance, and the performance is bad
  because the queries are huge, etc, etc

  Alternative:

  - Apart from denormalization and good old common sense,
  ORMs are trying to bridge two things with distinct impedance.

  - There is nothing to relational models which maps cleanly to
  classes and objects. Not even the basic unit which is the
  domain(set) of each column. Black Magic ?




sexta-feira, 3 de junho de 2011
Hierarchical Sharding
  Problem: Genius at work. Distinct databases inside a RDBMS
  ranging from A to Z, each database has tables for users
  starting with the proper letter. Each table has user data.
  Fictional example: e-mail accounts management

  > show databases;
  abcdefghijklmnopqrstuwxz
  > use a
  > show tables;
  ...alberto alice alma ... (and a lot more)

  There is no way to query anything in common for all users with
  out application side processing. In this particular case this
  sharding was uncalled for as relational databases have all
  tools to deal with this particular case of 'different clients and
  data'

sexta-feira, 3 de junho de 2011
Embedded Lists
  Problem: As data complexity grows, one thinks that it's proper
  for the application to handle different data structures
  embedded in a single cell or row. The popular 'Let's use
  commas to separe it'. You may find distinct separators as |, -, []
  and so on.

  > select group_field from that_email_sharded_database.user
  "a@email1.net, b@email1.net,c@email2.net"

  > select flags from stupid_email_admin where id = 2
  1|0|1|1|0|

  Either learn to model your data, or resort to model keys on K/V
  stores. Or any other way to show up flags, as you are not
  programming in C over a RDBMS hopefully.


sexta-feira, 3 de junho de 2011
Distributed Global Locking
  Problem: Trying to emulate JAVA's synchronize in a distributed
  manner. As there is no primitive architectura block to do that,
  sounds like the proper place to do that would be a RDBMS.

  May starts with a reference counter in a table and end up with
  this:

  > select COALESCE(GET_LOCK('my_lock',0 ),0 )

  Plain and simple, you might find it embedded in a magic class
  called DistributedSynchronize or ClusterSemaphore. Locks,
  transactions and reference counters (which may act as soft
  locks) doesn't belongs to the database. While its they use is
  questionable even in code, the matter of fact is that you are
  doing it wrong, if you are doing like that.


sexta-feira, 3 de junho de 2011
Throttle Control
  Problem: To control and track access to a given resource, a
  sequence of statements is issued, varying from a
  update...select to a transaction block using a stored procedure:

  > select count(access) from throttle_ctl where ip_addr like ...
  > update .... or begin ... commit

  Apart from having IP addresses stored as string, each request
  would have to check on this block. It gets worse if throttle
  control is mixed with a table-based access.

  Using memcached (or any other k/v) as the data is ephemeral
  would work as (after creating the entry and setting expire
  time):

  if (add 'IPADDR:YYYY:MM:DD:HH', 1) < your_limit: do_stuff()

sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
                                  ferramentas
noSQL

sexta-feira, 3 de junho de 2011
column
   key-value                                document   graph
                                   family

sexta-feira, 3 de junho de 2011
column
   key-value                                document   graph
                                   family

sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
newSQL

sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
noSQL                   newSQL
sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
cada escolha
                         uma
                       renúncia

sexta-feira, 3 de junho de 2011
padrões




sexta-feira, 3 de junho de 2011
how-to




sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
acid




sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
                                  (
existe nosql
                           acid



sexta-feira, 3 de junho de 2011
sexta-feira, 3 de junho de 2011
                                  )
A word about ops




sexta-feira, 3 de junho de 2011
Meaningful data
   • Email traffic accounting ~4500 msgs/sec in, ~2000 msgs
     out

   • 10 Gb SPAM/NOT SPAM tokenized data per day

   • 300 Gb logfile data/day

   • 80 Billion DNS queries per day

   • ~1k req/sec tcptable queries.

   • 0.8 Pb of data migrated over mixed quality network.
     Planned 3mo, executed 6mo, online, on production.

   • traffic from 400mb/s to 3.1Gb/s


sexta-feira, 3 de junho de 2011
Stuff to think about
  Think if the data you use aren't de-normalized somewhere
  (cached)

  Most of the anti-patterns signals that there are architectural
  issues instead of only database issues.

  Call it NoSQL, non relational, or any other name, but assemble
  a toolbox to deal with different data types.

  Are you dependent on cache ? Does your application fails
  when there is no warm cache ? Does it just slows down ?

  Think about the way to put and to get back your data from the
  database (be it SQL or NoSQL). Migrations are painful.


sexta-feira, 3 de junho de 2011
Stuff to think about
  - Without operational requirements, it's a personal choice
  about data organization
  - Without time pressure, any migration is easy, regardless data
  size.
  - Out of production environment, events flow in a different time
  flow
  - 'Normal Accidents' (Charles Perrow) - how resilient is your
  operation ? Are you ready to tackle your incidents ?
  - Everything breaks under scale (Benjamin Black)
  - "Picking up pennies in front of a steamroller" (Nassin N.
  Taleb)




sexta-feira, 3 de junho de 2011
Perguntas?


sexta-feira, 3 de junho de 2011
Obrigado




           github.com/porcelli                 github.com/gleicon

           linkedin.com/in/alexandreporcelli   linkedin.com/in/gleicon

           @porcelli                           @gleicon

           porcelli.com.br                     zenmachine.wordpress.com

sexta-feira, 3 de junho de 2011

Weitere ähnliche Inhalte

Andere mochten auch

Instrumentos
InstrumentosInstrumentos
Instrumentos
elihappy
 
Good morning how_are_you
Good morning how_are_youGood morning how_are_you
Good morning how_are_you
seewolf64
 
Novetats setmana 1 al 7 juny
Novetats setmana 1 al 7 junyNovetats setmana 1 al 7 juny
Novetats setmana 1 al 7 juny
Purabiblioteca
 
Resumen de informática
Resumen de informáticaResumen de informática
Resumen de informática
machuca193
 
Myheath essential final final
Myheath essential final finalMyheath essential final final
Myheath essential final final
MHE2u
 
Públic o privat en sanitat a catalunya 1a part2011 bm86
Públic o privat en sanitat a catalunya 1a part2011 bm86Públic o privat en sanitat a catalunya 1a part2011 bm86
Públic o privat en sanitat a catalunya 1a part2011 bm86
Ramon Piñol Llovera
 
04_3danalyze를 이용한 렌더링 최적화
04_3danalyze를 이용한 렌더링 최적화04_3danalyze를 이용한 렌더링 최적화
04_3danalyze를 이용한 렌더링 최적화
noerror
 
Tutorial del blog (repaired)
Tutorial del blog (repaired)Tutorial del blog (repaired)
Tutorial del blog (repaired)
nayeli8a
 

Andere mochten auch (20)

Tec comemp972003
Tec comemp972003Tec comemp972003
Tec comemp972003
 
prof.ca
prof.caprof.ca
prof.ca
 
Instrumentos
InstrumentosInstrumentos
Instrumentos
 
Good morning how_are_you
Good morning how_are_youGood morning how_are_you
Good morning how_are_you
 
Novetats setmana 1 al 7 juny
Novetats setmana 1 al 7 junyNovetats setmana 1 al 7 juny
Novetats setmana 1 al 7 juny
 
Principais Tecnologias WEB
Principais Tecnologias WEBPrincipais Tecnologias WEB
Principais Tecnologias WEB
 
U turn
U turnU turn
U turn
 
Mimejoramigo
MimejoramigoMimejoramigo
Mimejoramigo
 
4º año Biología. Sistema digestivo
4º año Biología. Sistema digestivo4º año Biología. Sistema digestivo
4º año Biología. Sistema digestivo
 
Comparativo
ComparativoComparativo
Comparativo
 
Resumen de informática
Resumen de informáticaResumen de informática
Resumen de informática
 
3° año Biología. Las funciones de relación y control en las células
3° año Biología. Las funciones de relación y control en las células3° año Biología. Las funciones de relación y control en las células
3° año Biología. Las funciones de relación y control en las células
 
Myheath essential final final
Myheath essential final finalMyheath essential final final
Myheath essential final final
 
Uml orientado a objetos1
Uml orientado a objetos1Uml orientado a objetos1
Uml orientado a objetos1
 
Públic o privat en sanitat a catalunya 1a part2011 bm86
Públic o privat en sanitat a catalunya 1a part2011 bm86Públic o privat en sanitat a catalunya 1a part2011 bm86
Públic o privat en sanitat a catalunya 1a part2011 bm86
 
04_3danalyze를 이용한 렌더링 최적화
04_3danalyze를 이용한 렌더링 최적화04_3danalyze를 이용한 렌더링 최적화
04_3danalyze를 이용한 렌더링 최적화
 
Fraseshechas
FraseshechasFraseshechas
Fraseshechas
 
Tutorial del blog (repaired)
Tutorial del blog (repaired)Tutorial del blog (repaired)
Tutorial del blog (repaired)
 
Pourquoi espace-numerique
Pourquoi espace-numeriquePourquoi espace-numerique
Pourquoi espace-numerique
 
6° año Biología, Gen y Soc. TP Clonación y transgénicos
6° año Biología, Gen y Soc. TP Clonación y transgénicos6° año Biología, Gen y Soc. TP Clonación y transgénicos
6° año Biología, Gen y Soc. TP Clonación y transgénicos
 

Ähnlich wie A importância dos dados em sua arquitetura... uma visão muito além do SQL Server!

History of Search and Web Search Engines - Seminar on Web Search
History of Search and Web Search Engines - Seminar on Web SearchHistory of Search and Web Search Engines - Seminar on Web Search
History of Search and Web Search Engines - Seminar on Web Search
Beat Signer
 
Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)
ecommerce poland expo
 
MySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveMySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP Perspective
Tim Juravich
 
Visualizations of Spatial and Social Data
Visualizations of Spatial and Social DataVisualizations of Spatial and Social Data
Visualizations of Spatial and Social Data
interface2011
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
jbellis
 

Ähnlich wie A importância dos dados em sua arquitetura... uma visão muito além do SQL Server! (20)

noSQL @ QCon SP
noSQL @ QCon SPnoSQL @ QCon SP
noSQL @ QCon SP
 
Pronk like you mean it
Pronk like you mean itPronk like you mean it
Pronk like you mean it
 
Theres a rabbit on my symfony
Theres a rabbit on my symfonyTheres a rabbit on my symfony
Theres a rabbit on my symfony
 
To be relational, or not to be relational? That's *not* the question!
To be relational, or not to be relational? That's *not* the question! To be relational, or not to be relational? That's *not* the question!
To be relational, or not to be relational? That's *not* the question!
 
The State of Web Typography
The State of Web TypographyThe State of Web Typography
The State of Web Typography
 
IAT334-Lec04-DesignIdeasPrinciples.pptx
IAT334-Lec04-DesignIdeasPrinciples.pptxIAT334-Lec04-DesignIdeasPrinciples.pptx
IAT334-Lec04-DesignIdeasPrinciples.pptx
 
BotNetBenchmark - A Benchmark for Social Network
BotNetBenchmark - A Benchmark for Social NetworkBotNetBenchmark - A Benchmark for Social Network
BotNetBenchmark - A Benchmark for Social Network
 
How Lanyrd uses Twitter
How Lanyrd uses TwitterHow Lanyrd uses Twitter
How Lanyrd uses Twitter
 
History of Search and Web Search Engines - Seminar on Web Search
History of Search and Web Search Engines - Seminar on Web SearchHistory of Search and Web Search Engines - Seminar on Web Search
History of Search and Web Search Engines - Seminar on Web Search
 
Persistence Smoothie
Persistence SmoothiePersistence Smoothie
Persistence Smoothie
 
Introducing Ext GWT 3.0
Introducing Ext GWT 3.0Introducing Ext GWT 3.0
Introducing Ext GWT 3.0
 
Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)
 
Node js techtalksto
Node js techtalkstoNode js techtalksto
Node js techtalksto
 
Introduction to Accumulo
Introduction to AccumuloIntroduction to Accumulo
Introduction to Accumulo
 
MySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveMySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP Perspective
 
Visualizations of Spatial and Social Data
Visualizations of Spatial and Social DataVisualizations of Spatial and Social Data
Visualizations of Spatial and Social Data
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 
Chapter4
Chapter4Chapter4
Chapter4
 
Web heresies
Web heresiesWeb heresies
Web heresies
 
Metadata for Visual Resources
Metadata for Visual ResourcesMetadata for Visual Resources
Metadata for Visual Resources
 

Mehr von Alexandre Porcelli

QConSP 2013 - Não confunda engenharia de software com lean startup
QConSP 2013 - Não confunda engenharia de software com lean startupQConSP 2013 - Não confunda engenharia de software com lean startup
QConSP 2013 - Não confunda engenharia de software com lean startup
Alexandre Porcelli
 
JUDCon São Paulo - Drools in a Nutshell
JUDCon São Paulo - Drools in a NutshellJUDCon São Paulo - Drools in a Nutshell
JUDCon São Paulo - Drools in a Nutshell
Alexandre Porcelli
 
noSQL e ORM, será que dá samba?
noSQL e ORM, será que dá samba?noSQL e ORM, será que dá samba?
noSQL e ORM, será que dá samba?
Alexandre Porcelli
 
noSQL - Uma nova escola de pensamento
noSQL - Uma nova escola de pensamentonoSQL - Uma nova escola de pensamento
noSQL - Uma nova escola de pensamento
Alexandre Porcelli
 
J1Brasil: Persistência de Dados além do JPA, ou Como usar noSQL em Java
J1Brasil: Persistência de Dados além do JPA, ou Como usar noSQL em JavaJ1Brasil: Persistência de Dados além do JPA, ou Como usar noSQL em Java
J1Brasil: Persistência de Dados além do JPA, ou Como usar noSQL em Java
Alexandre Porcelli
 
ANTLR Conference - OpenSpotLight driven by ANTLR
ANTLR Conference - OpenSpotLight driven by ANTLRANTLR Conference - OpenSpotLight driven by ANTLR
ANTLR Conference - OpenSpotLight driven by ANTLR
Alexandre Porcelli
 

Mehr von Alexandre Porcelli (20)

Dawn of the citizen developer
Dawn of the citizen developerDawn of the citizen developer
Dawn of the citizen developer
 
Running rules and processes in the cloud
Running rules and processes in the cloudRunning rules and processes in the cloud
Running rules and processes in the cloud
 
Impulsione sua carreira contribuindo para projetos open source
Impulsione sua carreira contribuindo para projetos open sourceImpulsione sua carreira contribuindo para projetos open source
Impulsione sua carreira contribuindo para projetos open source
 
QConSP 2013 - Não confunda engenharia de software com lean startup
QConSP 2013 - Não confunda engenharia de software com lean startupQConSP 2013 - Não confunda engenharia de software com lean startup
QConSP 2013 - Não confunda engenharia de software com lean startup
 
JUDCon São Paulo - Drools in a Nutshell
JUDCon São Paulo - Drools in a NutshellJUDCon São Paulo - Drools in a Nutshell
JUDCon São Paulo - Drools in a Nutshell
 
NoSQL for the rest of us - a JBoss perspective over those hot tools and how y...
NoSQL for the rest of us - a JBoss perspective over those hot tools and how y...NoSQL for the rest of us - a JBoss perspective over those hot tools and how y...
NoSQL for the rest of us - a JBoss perspective over those hot tools and how y...
 
Armazenamento de Dados em Poucas Palavras ou Uma resposta definitiva para tod...
Armazenamento de Dados em Poucas Palavras ou Uma resposta definitiva para tod...Armazenamento de Dados em Poucas Palavras ou Uma resposta definitiva para tod...
Armazenamento de Dados em Poucas Palavras ou Uma resposta definitiva para tod...
 
DevinVale: SQL, noSQL ou newSQL - Onde armazenar meus dados?
DevinVale:  SQL, noSQL ou newSQL - Onde armazenar meus dados?DevinVale:  SQL, noSQL ou newSQL - Onde armazenar meus dados?
DevinVale: SQL, noSQL ou newSQL - Onde armazenar meus dados?
 
noSQL e ORM, será que dá samba?
noSQL e ORM, será que dá samba?noSQL e ORM, será que dá samba?
noSQL e ORM, será que dá samba?
 
noSQL - Uma nova escola de pensamento
noSQL - Uma nova escola de pensamentonoSQL - Uma nova escola de pensamento
noSQL - Uma nova escola de pensamento
 
noSQL @ MSTechDay São Paulo
noSQL @ MSTechDay São PaulonoSQL @ MSTechDay São Paulo
noSQL @ MSTechDay São Paulo
 
Integration & DSL
Integration & DSLIntegration & DSL
Integration & DSL
 
SQL, NoSQL ou NewSQL: Onde armazenar meus dados?
SQL, NoSQL ou NewSQL: Onde armazenar meus dados?SQL, NoSQL ou NewSQL: Onde armazenar meus dados?
SQL, NoSQL ou NewSQL: Onde armazenar meus dados?
 
J1Brasil: Persistência de Dados além do JPA, ou Como usar noSQL em Java
J1Brasil: Persistência de Dados além do JPA, ou Como usar noSQL em JavaJ1Brasil: Persistência de Dados além do JPA, ou Como usar noSQL em Java
J1Brasil: Persistência de Dados além do JPA, ou Como usar noSQL em Java
 
noSQL WTF?! - Citi2010
noSQL WTF?! - Citi2010noSQL WTF?! - Citi2010
noSQL WTF?! - Citi2010
 
noSQL além do buzz
noSQL além do buzznoSQL além do buzz
noSQL além do buzz
 
GraphDatabases @ TDC2010
GraphDatabases @ TDC2010GraphDatabases @ TDC2010
GraphDatabases @ TDC2010
 
Motor de Regras @ TDC2010
Motor de Regras @ TDC2010Motor de Regras @ TDC2010
Motor de Regras @ TDC2010
 
OpenSpotLight - Concepts
OpenSpotLight - ConceptsOpenSpotLight - Concepts
OpenSpotLight - Concepts
 
ANTLR Conference - OpenSpotLight driven by ANTLR
ANTLR Conference - OpenSpotLight driven by ANTLRANTLR Conference - OpenSpotLight driven by ANTLR
ANTLR Conference - OpenSpotLight driven by ANTLR
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

A importância dos dados em sua arquitetura... uma visão muito além do SQL Server!

  • 1. A importância dos dados em sua arquitetura… uma visão muito além do SQL Server Alexandre Porcelli - @porcelli Gleicon Moraes - @gleicon sexta-feira, 3 de junho de 2011
  • 2. Alexandre Porcelli Creator & Dictator Alexandre Porcelli Writer Alexandre Porcelli Organizer Alexandre Porcelli Commiter / Parser Developer Alexandre Porcelli Core Developer / API Designer sexta-feira, 3 de junho de 2011
  • 3. Gleicon Moraes http://zenmachine.wordpress.com http://github.com/gleicon @gleicon sexta-feira, 3 de junho de 2011
  • 4. sexta-feira, 3 de junho de 2011
  • 5. existe um mundo além do... sexta-feira, 3 de junho de 2011
  • 6. ou do... sexta-feira, 3 de junho de 2011
  • 7. além, até mesmo dos... sexta-feira, 3 de junho de 2011
  • 10. nosql sexta-feira, 3 de junho de 2011
  • 11. uma nova escola sexta-feira, 3 de junho de 2011
  • 12. sexta-feira, 3 de junho de 2011
  • 13. contexto sexta-feira, 3 de junho de 2011
  • 14. sexta-feira, 3 de junho de 2011
  • 15. falta de capital sexta-feira, 3 de junho de 2011
  • 16. big data sexta-feira, 3 de junho de 2011
  • 18. modelos • Hierarchical (IMS): late 1960’s and 1970’s • Directed graph (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to late 2000’s • The next big thing: ??? ref: What Goes Around Comes Around por Michael Stonebraker e Joey Hellerstein sexta-feira, 3 de junho de 2011
  • 19. next big thing? sexta-feira, 3 de junho de 2011
  • 21. abaixo ao banco de dados relacional! sexta-feira, 3 de junho de 2011
  • 22. abaixo ao banco de dados relacional! como bala de prata! sexta-feira, 3 de junho de 2011
  • 24. sexta-feira, 3 de junho de 2011
  • 25. resolver problemas específicos sexta-feira, 3 de junho de 2011
  • 26. estrutura sexta-feira, 3 de junho de 2011 de dados
  • 28. modelo sexta-feira, 3 de junho de 2011
  • 29. família de colunas sexta-feira, 3 de junho de 2011
  • 30. modelo Keyspace Família de Colunas linha chave coluna coluna coluna coluna coluna . . . coluna . . . linha chave coluna coluna coluna ... coluna Coluna nome timestamp valor sexta-feira, 3 de junho de 2011
  • 32. modelo sexta-feira, 3 de junho de 2011
  • 33. grafo sexta-feira, 3 de junho de 2011
  • 34. visão geral sexta-feira, 3 de junho de 2011
  • 35. sexta-feira, 3 de junho de 2011
  • 36. sexta-feira, 3 de junho de 2011
  • 38. Architectural Anti Patterns Notes on Data Distribution and Handling Failures sexta-feira, 3 de junho de 2011
  • 39. Fail sexta-feira, 3 de junho de 2011
  • 40. Anti Patterns • Evolution from SQL Anti Patterns (NoSQL:br May 2010) • More than just RDBMS • Large volumes of data • Distribution • Architecture • Research on other tools • Message Queues, DHT, Job Schedulers, NoSQL • Indexing, Map/Reduce • New revision since QConSP 2010: included Hierarchical Sharding, Embedded lists and Distributed Global Locking sexta-feira, 3 de junho de 2011
  • 41. RDBMS Anti Patterns Not all things fit on a relational database, single ou distributed • The eternal table-as-a-tree • Dynamic table creation • Table as cache, queue, log file • Stoned Procedures • Row Alignment • Extreme JOINs • Your scheme must be printed in an A3 sheet. • Your ORM issue full queries for Dataset iterations • Hierarchical Sharding • Embedded lists • Distributed global locking • Throttle Control sexta-feira, 3 de junho de 2011
  • 42. The eternal tree Problem: Most threaded discussion example uses something like a table which contains all threads and answers, relating to each other by an id. Usually the developer will come up with his own binary-tree version to manage this mess. id - parent_id -author - text 1 - 0 - gleicon - hello world 2 - 1 - elvis - shout ! Alternative: Document storage: { thread_id:1, title: 'the meeting', author: 'gleicon', replies:[ { 'author': elvis, text:'shout', replies:[{...}] } ] } sexta-feira, 3 de junho de 2011
  • 43. Dynamic table creation Problem: To avoid huge tables, one must come with a "dynamic schema". For example, lets think about a document management company, which is adding new facilities over the country. For each storage facility, a new table is created: item_id - row - column - stuff 1 - 10 - 20 - cat food 2 - 12 - 32 - trout Now you have to come up with "dynamic queries", which will probably query a "central storage" table and issue a huge join to check if you have enough cat food over the country. Alternatives: - Document storage, modeling a facility as a document - Key/Value, modeling each facility as a SET sexta-feira, 3 de junho de 2011
  • 44. Table as cache Problem: Complex queries demand that a result be stored in a separated table, so it can be queried quickly. Worst than views Alternatives: - Really ? - Memcached - Redis + AOF + EXPIRE - De-normalization sexta-feira, 3 de junho de 2011
  • 45. Table as queue Problem: A table which holds messages to be completed. Worse, they must be ordered by time of creation. Corolary: Job Scheduler table Alternatives: - RestMQ, Resque - Any other message broker - Redis (LISTS - LPUSH + RPOP) - Use the right tool sexta-feira, 3 de junho de 2011
  • 46. Table as log file Problem: A table in which data gets written as a log file. From time to time it needs to be purged. Truncating this table once a day usually is the first task assigned to new DBAs. Alternative: - MongoDB capped collection - Redis, and RRD pattern - RIAK sexta-feira, 3 de junho de 2011
  • 47. Stoned procedures Problem: Stored procedures hold most of your applications logic. Also, some triggers are used to - well - trigger important data events. SP and triggers has the magic property of vanishing of our memories and being impossible to keep versioned. Alternative: - Now be careful so you dont use map/reduce as modern stoned procedures. Unfit for real time search/processing - Use your preferred language for business stuff, and let event handling to pub/sub or message queues. sexta-feira, 3 de junho de 2011
  • 48. Row Alignment Problem: Extra rows are created but not used, just in case. Usually they are named as a1, a2, a3, a4 and called padding. There's good will behind that, specially when version 1 of the software needed an extra column in a 150M lines database and it took 2 days to run an ALTER TABLE. But that's no excuse. Alternative: - Quit being cheap. Quit feeling 'hacker' about padding - Document based databases as MongoDB and CouchDB, has no schema. New atributes are local to the document and can be added easily. sexta-feira, 3 de junho de 2011
  • 49. Extreme JOINs Problem: Business stuff modeled as tables. Table inheritance (Product -> SubProduct_A). To find the complete data for a user plan, one must issue gigantic queries with lots of JOINs. Alternative: - Document storage, as MongoDB might help having important information together. - De-normalization - Serialized objects sexta-feira, 3 de junho de 2011
  • 50. Your scheme fits in an A3 sheet Problem: Huge data schemes are difficult to manage. Extreme specialization creates tables which converges to key/value model. The normal form get priority over common sense. Product_A Product_B id - desc id - desc Alternatives: - De-normalization - Another scheme ? - Document store for flattening model - Key/Value - See 'Extreme JOINs' sexta-feira, 3 de junho de 2011
  • 51. Your ORM ... Problem: Your ORM issue full queries for dataset iterations, your ORM maps and creates tables which mimics your classes, even the inheritance, and the performance is bad because the queries are huge, etc, etc Alternative: - Apart from denormalization and good old common sense, ORMs are trying to bridge two things with distinct impedance. - There is nothing to relational models which maps cleanly to classes and objects. Not even the basic unit which is the domain(set) of each column. Black Magic ? sexta-feira, 3 de junho de 2011
  • 52. Hierarchical Sharding Problem: Genius at work. Distinct databases inside a RDBMS ranging from A to Z, each database has tables for users starting with the proper letter. Each table has user data. Fictional example: e-mail accounts management > show databases; abcdefghijklmnopqrstuwxz > use a > show tables; ...alberto alice alma ... (and a lot more) There is no way to query anything in common for all users with out application side processing. In this particular case this sharding was uncalled for as relational databases have all tools to deal with this particular case of 'different clients and data' sexta-feira, 3 de junho de 2011
  • 53. Embedded Lists Problem: As data complexity grows, one thinks that it's proper for the application to handle different data structures embedded in a single cell or row. The popular 'Let's use commas to separe it'. You may find distinct separators as |, -, [] and so on. > select group_field from that_email_sharded_database.user "a@email1.net, b@email1.net,c@email2.net" > select flags from stupid_email_admin where id = 2 1|0|1|1|0| Either learn to model your data, or resort to model keys on K/V stores. Or any other way to show up flags, as you are not programming in C over a RDBMS hopefully. sexta-feira, 3 de junho de 2011
  • 54. Distributed Global Locking Problem: Trying to emulate JAVA's synchronize in a distributed manner. As there is no primitive architectura block to do that, sounds like the proper place to do that would be a RDBMS. May starts with a reference counter in a table and end up with this: > select COALESCE(GET_LOCK('my_lock',0 ),0 ) Plain and simple, you might find it embedded in a magic class called DistributedSynchronize or ClusterSemaphore. Locks, transactions and reference counters (which may act as soft locks) doesn't belongs to the database. While its they use is questionable even in code, the matter of fact is that you are doing it wrong, if you are doing like that. sexta-feira, 3 de junho de 2011
  • 55. Throttle Control Problem: To control and track access to a given resource, a sequence of statements is issued, varying from a update...select to a transaction block using a stored procedure: > select count(access) from throttle_ctl where ip_addr like ... > update .... or begin ... commit Apart from having IP addresses stored as string, each request would have to check on this block. It gets worse if throttle control is mixed with a table-based access. Using memcached (or any other k/v) as the data is ephemeral would work as (after creating the entry and setting expire time): if (add 'IPADDR:YYYY:MM:DD:HH', 1) < your_limit: do_stuff() sexta-feira, 3 de junho de 2011
  • 56. sexta-feira, 3 de junho de 2011 ferramentas
  • 57. noSQL sexta-feira, 3 de junho de 2011
  • 58. column key-value document graph family sexta-feira, 3 de junho de 2011
  • 59. column key-value document graph family sexta-feira, 3 de junho de 2011
  • 60. sexta-feira, 3 de junho de 2011
  • 61. newSQL sexta-feira, 3 de junho de 2011
  • 62. sexta-feira, 3 de junho de 2011
  • 63. sexta-feira, 3 de junho de 2011
  • 64. sexta-feira, 3 de junho de 2011
  • 65. noSQL newSQL sexta-feira, 3 de junho de 2011
  • 66. sexta-feira, 3 de junho de 2011
  • 67. sexta-feira, 3 de junho de 2011
  • 68. cada escolha uma renúncia sexta-feira, 3 de junho de 2011
  • 69. padrões sexta-feira, 3 de junho de 2011
  • 70. how-to sexta-feira, 3 de junho de 2011
  • 71. sexta-feira, 3 de junho de 2011
  • 72. acid sexta-feira, 3 de junho de 2011
  • 73. sexta-feira, 3 de junho de 2011 (
  • 74. existe nosql acid sexta-feira, 3 de junho de 2011
  • 75. sexta-feira, 3 de junho de 2011 )
  • 76. A word about ops sexta-feira, 3 de junho de 2011
  • 77. Meaningful data • Email traffic accounting ~4500 msgs/sec in, ~2000 msgs out • 10 Gb SPAM/NOT SPAM tokenized data per day • 300 Gb logfile data/day • 80 Billion DNS queries per day • ~1k req/sec tcptable queries. • 0.8 Pb of data migrated over mixed quality network. Planned 3mo, executed 6mo, online, on production. • traffic from 400mb/s to 3.1Gb/s sexta-feira, 3 de junho de 2011
  • 78. Stuff to think about Think if the data you use aren't de-normalized somewhere (cached) Most of the anti-patterns signals that there are architectural issues instead of only database issues. Call it NoSQL, non relational, or any other name, but assemble a toolbox to deal with different data types. Are you dependent on cache ? Does your application fails when there is no warm cache ? Does it just slows down ? Think about the way to put and to get back your data from the database (be it SQL or NoSQL). Migrations are painful. sexta-feira, 3 de junho de 2011
  • 79. Stuff to think about - Without operational requirements, it's a personal choice about data organization - Without time pressure, any migration is easy, regardless data size. - Out of production environment, events flow in a different time flow - 'Normal Accidents' (Charles Perrow) - how resilient is your operation ? Are you ready to tackle your incidents ? - Everything breaks under scale (Benjamin Black) - "Picking up pennies in front of a steamroller" (Nassin N. Taleb) sexta-feira, 3 de junho de 2011
  • 81. Obrigado github.com/porcelli github.com/gleicon linkedin.com/in/alexandreporcelli linkedin.com/in/gleicon @porcelli @gleicon porcelli.com.br zenmachine.wordpress.com sexta-feira, 3 de junho de 2011