SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
Databases




Elena Punskaya, op205@cam.ac.uk

                                  1
Big Data
•   Facebook stores over 100 petabytes of media (photos and
    videos) uploaded by its 845 million users
•   There are 762 billion objects stored in Amazon S3 that
    processes over 500,000 requests per second for these objects
    at peak times




          Bryce Durbin, Techcrunch              aws.typepad.com Amazon


                                                                     © 2012 Elena Punskaya
                                                                                            2
                                                Cambridge University Engineering Department

                                                                                                2
Big Data
•   Storing large amounts of data requires managing complexity:
    - mapping real world to data
    - providing concurrent access to creating, reading and changing of the data
    - providing distributed access and storage of the data

•   Database Management Systems decouples business logic of
    applications working with data from the details of physical
    storage and transaction (operations on data) management
•   Any non-trivial system needs to
    store its application data:
    - user/password
    - credit cards
    - product information
    - health records ...

•   It is possible to store all data
    directly as files but a typical
    filesystem isn’t build for
    transaction management and
    high performance                                     Cambridge High Performance Computing Cluster Darwin

                                                                                  © 2012 Elena Punskaya
                                                                                                         3
                                                             Cambridge University Engineering Department

                                                                                                               3
Database Systems I                                                                    1


                               Transaction Processing
                               Transaction Processing
•   Databases Databases common component of distributeddistributed systems.
                are a are a common component of many many sys-
    They storetems. They store records for a large number of distinct entities
                records for a large number of distinct entities and will
    typically support a small setsmall operations to access and and manipulate
               and will typically support a of set of operations to access
    those entities. These operations can be assumedto be atomic i.e.
               manipulate those entities. These operations can be assumed
                                                                             to
               be atomic i.e. they cannot be interrupted.
    they cannot be interrupted.
             External clients execute transactions which are sequences of op-
• External clients execute transactions which areto
             erations applied to one or more database entities designed sequences of
  operations achieve a single logical affect. more database entities designed to
              applied to one or
    achieve a single logical affect.
                                                     Recovery Log

                         Client A

                                          TA
                                                      Transaction
                                                                                        Database
                                                       Manager
                         Client B         TB



                                      Transactions                  Atomic Operations
•   The transaction manager ensures that transactions appear atomic
    to clients. Client receives an acknowledgement atomic
                The transaction manager ensures that transactions appear of every successful
    transaction. clients. Client receives an acknowledgement of every successful
                to
                 transaction.                                                                                     © 2012 Elena Punskaya
                                                                                                                                         4
                                                                                             Cambridge University Engineering Department

                                                                                                                                             4
Example: Bank Transfer
•   Each account is represented by a different database object,
    which guarantees that each operation is atomic

                   class Account { // link to required account records
                       DBaseAccessInfo dbinfo;
                   public:
                   // Constructor - open an account
                       account(string account_name);
                   // Atomic operations
                       void debit(float amount);
                       void credit(float amount);
                       float read_balance();
                   };



                   // A typical transaction would be
                   void transfer(account& A, account& B, float amount)
                   {
                       float balance = A.read_balance();
                       if (balance >= amount) {
                           A.debit(amount);
                           B.credit(amount);
                       }
                   }




•   A key issue is what happens if there is a failure part-way
    through the transaction?
                                                                                 © 2012 Elena Punskaya
                                                                                                        5
                                                            Cambridge University Engineering Department

                                                                                                            5
System Crash
•   What happens if the system crashes in the middle of
    transaction?

                 // a typical transaction
                 void transfer(account& A, account& B, float amount) {
                     float balance = A.read_balance();
                     if(balance >= amount) {
                         A.debit(amount);
                         <-----------------------------CRASH!




•   Account A will have had its money debited, but it will never
    appear in account B! – invalid state
•   The transaction manager (or any transaction processing
    system) must have a means of recovering from errors, and
    always leaving the system in a valid state
•   Need to ensure that Credit/Debit is ATOMIC, i.e. can only be
    preformed as a WHOLE not in parts



                                                                                   © 2012 Elena Punskaya
                                                                                                          6
                                                              Cambridge University Engineering Department

                                                                                                              6
ACID
•   A transaction my fail in many different ways (e.g. two clients
    try to access the same entity at the same time, temporary
    network failure, software fault, disk crash, etc). The
    transaction processor tries to ensure that transactions have
    the following properties
•   Atomicity
    - Either all or none of the transaction’s operations are performed

•   Consistency
    - Transactions transform the system from one consistent state to another

•   Isolation
    - An incomplete transaction cannot reveal its result to other transactions before it is
      complete

•   Durability
    - Once the transaction is committed, the system must guarantee that the results of its
      operations will persist, even if there are subsequent system failures




                                                                                  © 2012 Elena Punskaya
                                                                                                         7
                                                             Cambridge University Engineering Department

                                                                                                             7
ems I                                                            5


                 Recovery                  Recovery
       In order to maintain the ACID properties, a transaction
        •
       processor must be able to recover from errors by restoring
o maintain the ACID properties, a transaction processor
       the system to a consistent state.
ble to recover from errors by restoring the system to a
     • To achieve this, transactions are modelled on the following
 state.
            state machine
e this, transactions are modelled on the following state
                                              // A typical transaction with commit
                                                    void transfer(account& A, account& B, float amount)
                                                    {
                                                        try {
                                                             // Record transaction start
                                                             int id = BeginTransaction();
                                                   Transaction balance = A.read_balance();
                                                             float

                                                   processor might >= amount) {
                                                             if (balance
                                                                 A.debit(amount); B.credit(amount);
                                          ←        invalidate this
                                                             }
                                                   transaction success, so commit (finish)
                                                             // (see
                                                             Commit(id);
                                                   last slide)
                                                        }
                                                        catch(..) {
                                                             // transaction failed, recover/revert
                                                             Abort(id);
                                                        }
 the transfer transaction                           }


sfer(account& A, account& B, float amount)
                                                                                          © 2012 Elena Punskaya
                                                                                                                 8
                                                                     Cambridge University Engineering Department

                                                                                                                     8
Database Systems I                                                      9



                                     Concurrency
                                      Concurrency
•   In practice, a database transaction processor will be receiving
              In practice, a database transaction processor will be receiving a
    a stream of transaction requests, and will need to execute
              stream of transaction requests, and will need to execute transac-
    transactions in parallel in order to provide acceptable
    response tions in parallel in order to provide acceptable response times.
               times.
•   When two transactions reference the same account,
              When two transactions reference the same account, uncontrolled
    uncontrolled interleaving of operations can produce an
              interleaving of operations can produce an incorrect result. There
    incorrect are three classes of concurrency problem:
               result. There are three classes of concurrency
    problem
                • The uncommitted dependency problem

                      Time Transaction 1 Transaction 2
                       t1        –         A.write()
                       t2    A.read()          –
                       t3        –          abort()

            In this case, transaction 1 reads an updated account value, but
• In this case, transaction 1 reads an updated account value,
            transaction 2 aborts undoing the effect of the update. Transac-
  but transaction 2 aborts undoing the effect of the update.
            tion 1 is then left holding an incorrect account value.
  Transaction 1 is then left holding an incorrect account value
                                                                                   © 2012 Elena Punskaya
                                                                                                          9
                                                              Cambridge University Engineering Department
                Note: A.read() indicates any operation which reads a value from ac-                           9
• The lost update problem

10                                            Concurrency
        Time Transaction 1 Transaction 2 Software Engineering and Design
                           Engineering Part IIA: 3F6 -
          t1        A.read()                –
     • The lost update problem
          t2            –              A.read()
          t3        A.write()               –          • The change made to account
        Time Transaction 1 Transaction 2
          t4            –             A.write()          A at t3 by transaction 1 is lost
          t1        A.read()                –            because it is overwritten at
       In t2 case, the change made to account A at t3 by t4 by transaction 2
          this          –              A.read()          time transac-
          t         A.write()               –
       tion3 1 is lost because it is overwritten at time t4 by transac-
          t
       tion4 2.         –             A.write()

       In this case, the change made to account A at t3 by transac-
       The is lost because it is overwritten at time t4 by transac-
     • tion 1inconsistent analysis problem
       tion 2.
                                                    • Transaction 2 updates
        Time Transaction 1 Transaction 2
                                                      account A after transaction 1
          t1       A.read()           –
                                                      has read its value
     • The inconsistent analysis problem
          t2           –           A.read()
                                                    • Hence, transaction 1 is left
          t3           –           A.write()
        Time Transaction 1 Transaction 2
          t4           –           commit()           holding an incorrect value for
          t1       A.read()           –               account A
       In t2 case, transaction 2A.read() account A after transac-
           this        –            updates
       tion3 1 has read its value. Hence, transaction 1 is left holding University Engineering Department 10
          t            –           A.write()                     Cambridge
                                                                                   © 2012 Elena Punskaya

       an tincorrect value for account A.
            4          –           commit()                                                                    10
Managing Concurrency
•   The problems discussed can be managed by applying a
    Pessimistic or Optimistic concurrency control
•   Pessimistic
    - When a transaction wishes to access an account it first secures a lock on that account,
      when it has finished it releases the lock. If a lock is already taken, the transaction must
      wait until it is released.
    - Locking could be on the whole table or a single row and could be declared at different
      levels of exclusivity (e.g. no one else can access data or some access is allowed)
    - Could cause deadlocks, e.g. Tx1 and Tx2 require two resources R1 and R2 to proceed:
       ‣ T1 holds R1 and is waiting for R2
       ‣ T2 holds R2 and is waiting for R1
    - Useful when there is a lot of data that is often updated by many users

•   Optimistic
    - Allows uncontrolled access to accounts, and then simply abort any transactions which
      might have suffered a conflict
    - Implemented by creating a new copy of the data that maybe be updated and when the
      update is completed checks if the master copy hasn’t changed in meantime
       ‣ if changed – aborted
       ‣ if not – complete
    - Useful when most operations are reading data and changes occur rarely

                                                                                   © 2012 Elena Punskaya
                                                                                                          11
                                                              Cambridge University Engineering Department

                                                                                                               11
Relational Databases
•   By late 1960s, the “Software Crisis” was
    already declared and data storage wasn’t
    doing much better

    Computer calculations cost hundreds of dollars a minute, so great
    human effort was spent to make programs as efficient as possible
    before they were run. Early databases used either a rigid hierarchical
    structure or a complex navigational plan of pointers to the physical
    locations of the data on magnetic tapes. Teams of programmers were
    needed to express queries to extract meaningful information. While such
    databases could be efficient in handling the specific data and queries
    they were designed for, they were absolutely inflexible. New types of                 Edgar 'Ted' Codd, 1923-2003
    queries required complex reprogramming, and adding new types of data                  image © IBM

    forced a total redesign of the database itself.
                                                              IBM Research News,
                 www.research.ibm.com/resources/news/20030423_edgarpassaway.shtml



•   In 1970, Edgar Codd, an English mathematician working for
    IBM, published a paper “"A Relational Model of Data for Large
    Shared Data Banks", it started with the following words:
    “Future users of large data banks must be protected from
    having to know how the data is organised in the machine...”
                                                                                             © 2012 Elena Punskaya
                                                                                                                    12
                                                                        Cambridge University Engineering Department

                                                                                                                         12
Relational Databases
•   Codd suggested to move away from hierarchical or
    navigational structure of early databases to simple tables
    with rows and columns
•   Based on Relational Algebra, this approach allowed to greatly
    simplify database queries (ability to access and analyse data)
•   Many relational database management systems (RDBMS) are
    accessed using SQL (Structured Query Language)
    - SQL is defined by industry standards and has been developed over many revisions from
      SQL-87 to SQL 2008

•   There are many free and commercial databases available:
    - Free: PostgreSQL, MySQL, SQLite...
    - Commercial: Oracle, DB2, SQL Server...

•   SQLite is the easiest database to start using as it requires no
    setup, and is available on the teaching system.
    - Type: sqlite3 <db-name>
    - Then enter SQL commands followed by a ‘;’. The database will be stored in a file called
      <db-name> which will be created if it does not already exist.


                                                                                 © 2012 Elena Punskaya
                                                                                                        13
                                                            Cambridge University Engineering Department

                                                                                                             13
ble. A relation contains a set of tuples (rows).
        2               2
                 relation The Relational IIA:Engineering andEngineering and Design
                                  Engineering Part IIA: 3F6 - Software 3F6 - Software Design
                                                   Engineering Part
                                                                     Model
                                                                  course
                 scheme Title The relationalLeader                         model Lectures
                        The relational model
• The relational model is related to set theory. A relation is a
                 t1           RISC Processors                            Sanchez                 8
        Therelation contains a settheory. A relation is a ta- is 34ta-
  table. A    relationalThe QAM related to is related tuples (rows).
                 t2       model is for model set of to set theory. A relation a
                              relational
                                            modems                       Sanchez
        ble. A t        ble. A relation contains a set of tuples (rows).
                 relation contains a set of to Mainframes Belford
                  3           Introduction tuples (rows). course
                          relation                                                              20
          relation                                   course
                 t4       scheme refresh LCDs
                              Fast Title                                 Richard
                                                                            Leader               1
                                                                                             Lectures
          scheme Title                                       Leader              Lectures
                 t5       t1 t5 [T itle] Processors
                                     RISC                                t [Leader] t5[Lectures]
                                                                            Sanchez              8
          t1         RISC Processors                         Sanchez 5                8
                          t2
                 t6 QAM for modems for[T itle, Leader] Sanchez
                                     QAM t6 modems                                              34
          t   2                                              Sanchez                 34
                           t3        Introduction to Mainframes Belford            20
             t3   The Introduction theMainframes Belfordby the scheme, which is a
                                     to
                       meaning of Fastdata is LCDs                         20
                           t4              refresh described        Richard         1
             t4        Fast refresh LCDs
                           t5        t5[T itle]
                                                         Richard t [Leader] t [Lectures]
                                                                            1
                  set of column names. Column names are known as attributes.
                                                                     5           5
             t5        t5[Ttitle]                 t6[T itle, Leader] t5 [Lectures]
                                                         t5[Leader]
                            6
             t6   course scheme =6[T itle, Leader] Lectures)
                                t (Title, Leader,
                       The meaning of the data is described by the scheme, which is a
   - The meaning of theof thecolumn names. by the scheme, which is as attributes.
          The meaning set of is described Column names are known a set of column names.
                        data data is described by the scheme, which is a
               There is no ordering or grouping of attributes. The table is a
     Column names are names. as attributes. are known as attributes.
          set of column known Column names
                 relationcourse this scheme. A relationLectures) scheme R is written
                          over scheme = (Title, Leader, r over a
   - There courseas r(R). = (Title, Leader,attributes. TheD. So: a relation over this scheme.
                  scheme Each column Lectures)
           is no ordering or grouping ofhas a domain, table is
     A relation r over a scheme R is writtenor grouping of attributes.has a table is a D.
                         There is no ordering as r(R). Each column The domain,
            ThereDTitle ordering overgrouping of attributes. r The a scheme a is written
                    is no = strings, this scheme. A relation over table is R
                           relation or      DLectures = Z+
            relation over thisr(R). Each column hasover a scheme R is written
                           as scheme. A relation r a domain, D. So:
            as r(R). Each elementhas a domain, D.tiSo: D1 × D2 × · · · × Dn
                   So each column ti[j] ∈ Dj and ∈ +
                           DTitle = strings,     DLectures = Z
            DTitle For example, the schemeZ+ y) with Dx = Dy = R, the domain
                   = strings,      DLectures =
                           So each element ti[j] (x,Dj and ti ∈ D1 × D2 × · · · × Dn
                                                 ∈
            So each  the tuples       D and ti ∈ D1 × D2     dimensional vectors. 2012 Elena Punskaya
                                                                  ×D
                                                                                       ©
                  of element t [j]is∈the domain of all two× · · · Cambridge University Engineering Department 14
                          For example,j the scheme (x, y) with Dx = Dn = R, the domain
                              i
                                                                       y                                           14
DTitle = strings,         DLectures = Z+

       So each element tThe Dj and ti ∈ D1 × D2 × · · · × Dn
                        i [j] ∈ Relational Model

       For example, the scheme (x, y) with Dx = Dy = R, the domain
       of the tuples is the domain of all two dimensional vectors.
       SQL:                            domain↓                 Constraint↓
       CREATE TABLE course (Title text, Leader text, Lectures int, CHECK(Lectures > 0))
       INSERT INTO course VALUES ("RISC Processors", "Sanchez", 10)
       UPDATE course SET Lectures=8 WHERE Leader="Sanchez"
       DELETE FROM course WHERE Lectures=8 AND Leader="Sanchez"
       DROP TABLE course

        SQL allows domain of tuples: Dt ⊆ D1 × D2 × · · · × Dn.

•   Built on principles of Relational Algebra
    - Projection, Selection, Union, Intersection, Subtraction, Join




                                                                                   © 2012 Elena Punskaya
                                                                                                          15
                                                              Cambridge University Engineering Department

                                                                                                               15
Relational algebra: Projection Π Π
       Relational algebra: Projection

The projection operator, Π, removes columns by listing the ones
to be retained. The operator is written as:

Πcolumn1,column2,. . . (relation).
                                             Leader             Lectures
An example of applying projection is:        Sanchez            8
                 ΠLeader,Lectures (course) = Sanchez            34
                                             Belford            20
                                             Richard            1

Consider a relation, r(R) where R=(x,y,z) and x, y, z ∈ R. Each
row represents a 3D vector. The relation Πx,y(r) contains the
projection of the vectors onto the x, y plane.

In SQL the SELECT statement performs all of the primitive
relational algebra funcionality. The selection above is rendered
                                                               © 2012 Elena Punskaya
                                          Cambridge University Engineering Department
                                                                                      16


as:                                                                                        16
Consider a relation, r(R) where R=(x,y,z) and x, y, z ∈ R. Each
row represents a 3D vector. The relation Πx,y(r) contains the
projectionRelational algebra: Projection Π
           of the vectors onto the x, y plane.

In SQL the SELECT statement performs all of the primitive
relational algebra funcionality. The selection above is rendered
as:

SELECT Leader,Lectures FROM course

The general form being:

SELECT Col1[, Col2, [· · · ]] FROM table

Note that SQL is not entirely relational and the expression:
SELECT Leader FROM course
has duplicate rows. To remove duplicates, use:
SELECT DISTINCT Leader FROM course
The there is a shorthand for the identity projection:
SELECT * FROM table
                                                                   © 2012 Elena Punskaya
                                                                                          17
                                              Cambridge University Engineering Department

                                                                                               17
4                      Engineering Part IIA: 3F6 - Software Engineering and Design


     Relational algebra: Selection σ σ
     Relational algebra: Selection

The selection operator accepts a predicate, Θ and a relation.
Rows matching the predicate are retained:

                            Title           Leader Lectures
σLeader=”Sanchez”(course) = RISC Processors Sanchez    8
                            QAM for modems Sanchez    34

The general form of the resulting relation can be written in set
builder notation

σΘ(r) = {t|t ∈ r, Θ(t)}

That is, the result consists of all tuples t such that each tuple is
both in the relation r and for which the predicate applied to the
tuple, i.e. Θ(t), is true.

In SQL, selection is also performed with the select statement with
the predicate being specified by the WHERE clause:

SELECT * FROM course WHERE Leader=”Sanchez”

Predicates can contain expressions involving any or all of the
                                                                   © 2012 Elena Punskaya
rows. SQL has more or less the same set of numeric operators as
                                              Cambridge University Engineering Department
                                                                                          18


C and also AND, OR, NOT, BETWEEN:                                                              18
the predicate being specified by the WHERE clause:
         Relational algebra: Selection σ
SELECT * FROM course WHERE Leader=”Sanchez”

Predicates can contain expressions involving any or all of the
rows. SQL has more or less the same set of numeric operators as
C and also AND, OR, NOT, BETWEEN:
SELECT * FROM course WHERE Lectures BETWEEN 2 AND 10
and IN: WHERE Leader IN ("Belford", "Richard")
Projection and selection can be readily composed, so in general:

ΠS (σΘ(r)) translates to SELECT S FROM r WHERE Θ




                                                               © 2012 Elena Punskaya
                                                                                      19
                                          Cambridge University Engineering Department

                                                                                           19
In SQL, union intersection and subtraction behave much more
                  Union, intersection, subtraction
                                           like set theory than relational algebra. For these operations it is
                                           the order of the attributes not the names of the attributes which
                                           have significance.
•   In SQL, union intersection and subtraction behave much more
    like set theory than relational algebra. For theseofoperations If there
                                   Set union, ∪ aggregates the rows two sets together. it
                                   are two relations, r(R) and s(R), then the union, r ∪ s
    is the order of the attributes not the names of the attributes can be
                                   computed:
    which have significance.
                                           SELECT * FROM r UNION SELECT * FROM s
•   Set union, ∪ aggregates the
                                           Likewise, intersection can be computed using:
    rows of two sets together.
    If there are two relations,            SELECT * FROM r INTERSECT SELECT * FROM s
    r(R) and s(R), then the                Set differencing is either MINUS or EXCEPT depending on the
    union, r ∪ s can be                    database.
    computed:                                                     r             s

                                                                     r−s


                                           SELECT * FROM r EXCEPT SELECT * FROM s

                                           Since ordering, not naming matters, with the schema R=(a,b),
                                           S=(b,a) and the tables r(R), s(S):

                                             r       s
                                           a b      b a                         a b
                                                                       r-s=
                                           1 2      3 5                         3 4
                                           3 4      1 2                            © 2012 Elena Punskaya
                                                                                                          20
                                                              Cambridge University Engineering Department

                                                                                                               20
information.
              students Join / cartesian product ×
                                              labs
        Student Supervisor Lab Demonstrator
• The cartesian product is the Cook primitive operator which
        Gibson Sanchez                 3F27        only
  combines two tables3F89 different schemes. Joining two
        Murphy Belford                  with Libby
  relations, a Goldstein Part IIA: 3F6 - Software Engineering and Design
     6   Libby       × b generates a Margo relation with every row in in
                          Engineering 4F185       new
  a paired with every row in b.Ray
         Cook        Sanchez           3F34         Joining is very useful for
  extracting related information. ×
            Join / cartesian product
       The table students × labs is on the next page. Note that the                                                               7
• Joining students and labs:
                                                                    Database Systems II


     The cartesian get augmented with the table name tocom- ambiguity.
       attributes product is the only primitive operator which avoid
     bines two tables with different schemes. Joining two relations,
       The table name may be omitted if it is not ambiguous. SQL: students × labs
     a × b generates a new relation with every row in in a paired
                                                                  students.Student        students.Supervisor labs.Lab labs.Demonstrator
     with every row in b. Joining is very useful for extracting related
       SELECT * FROM students, labs                                    Gibson                  Sanchez          3F27         Cook
     information.                                                      Gibson                  Sanchez          3F89         Libby
                                                                   Gibson                      Sanchez         4F185         Margo
      Find all students of “Sanchez” who are demonstrating:
            students                 labs                          Gibson                      Sanchez          3F34          Ray
                                                                  Murphy                        Belford         3F27         Cook
      Student Supervisor Lab Demonstrator                         Murphy                        Belford         3F89         Libby
      Π         (σ
      Gibson Sanchez         3F27        Cook
         Student Student=Demonstrator∧Supervisor=“Sanchez”        (students × labs))
                                                                  Murphy                        Belford        4F185         Margo
      Murphy Belford         3F89        Libby                    Murphy                        Belford         3F34          Ray
                                                                   Libby                       Goldstein        3F27         Cook
       Libby Goldstein 4F185             Margo                     Libby                       Goldstein        3F89         Libby
      SELECT Student FROM students, labs
       Cook       Sanchez    3F34         Ray                      Libby                       Goldstein       4F185         Margo
                  WHERE Student=Demonstrator AND                   Libby                       Goldstein        3F34          Ray
                                                                   Cook                        Sanchez          3F27         Cook
     The table students × labs is on the next page. Note that the
                        Supervisor="Sanchez"                       Cook                        Sanchez          3F89         Libby
     attributes get augmented with the table name to avoid ambiguity.
                                                                   Cook                        Sanchez         4F185         Margo
     The table name may be omitted if it is not ambiguous. SQL: Cook                           Sanchez          3F34          Ray


     SELECTresult is students, Selection
      The * FROM Cook . labs            is often composed with joining,
                                                                                  © 2012 Elena Punskaya
      so it is given the non primitive operator, the theta join:
                                                             Cambridge University Engineering Department
                                                                                                         21
     Find all students of “Sanchez” who are demonstrating:
                                                                                                                                           21
students                      labs
Database Systems II                                       7         Student Supervisor        Lab     Demonstrator
                                           Join / cartesian product ×
                                                                    Gibson Sanchez            3F27        Cook
                                                                    Murphy Belford            3F89        Libby
                         students × labs
 students.Student students.Supervisor labs.Lab labs.Demonstrator
                                                                     Libby Goldstein         4F185        Margo
      Gibson           Sanchez          3F27         Cook            Cook     Sanchez         3F34         Ray
      Gibson           Sanchez          3F89         Libby
      Gibson           Sanchez         4F185         Margo
      Gibson           Sanchez          3F34          Ray
                                                                   The table students × labs is on the next page. Note that the
     Murphy             Belford         3F27         Cook          attributes get augmented with the table name to avoid ambiguity.
     Murphy             Belford         3F89         Libby
     Murphy             Belford        4F185         Margo
                                                                   The table name may be omitted if it is not ambiguous. SQL:
     Murphy             Belford         3F34          Ray
      Libby            Goldstein        3F27         Cook          SELECT * FROM students, labs
      Libby            Goldstein        3F89         Libby
      Libby            Goldstein       4F185         Margo
      Libby            Goldstein        3F34          Ray          Find all students of “Sanchez” who are demonstrating:
      Cook             Sanchez          3F27         Cook
      Cook             Sanchez          3F89         Libby         ΠStudent(σStudent=Demonstrator∧Supervisor=“Sanchez”(students × labs))
      Cook             Sanchez         4F185         Margo
      Cook             Sanchez          3F34          Ray
                                                                   SELECT Student FROM students, labs
                                                                               WHERE Student=Demonstrator AND
                                                                                     Supervisor="Sanchez"


                                                                   The result is Cook . Selection is often composed with joining,
                                                                   so it is given the non primitive operator, the theta join:

                                                                   a Θ b ≡ σΘ(a × b).


                                                                                                                © 2012 Elena Punskaya
                                                                                                                                       22
                                                                                           Cambridge University Engineering Department

                                                                                                                                            22
Database Systems II                              • Perform a join.   9

                                                •
                      Natural Join              Perform selection so that attributes with the same name must
                                               Natural Join
                                                  be equal.
                                              • Perform projection to remove duplicated attributes.
   • A ‘natural join’ is a join followed by some selection and
A ‘natural join’ is a join followed by some selection and projec- attribute ambiguities.
                                             Note that there are no
tion: projection:
  • Perform a join. join
       - Perform a                             If attributes with the same name are semantically the same, then
                                               the natural join is usually the correct kind of join to use. In ad-
        - Perform selection attributes with the same name must name must be equal
   • Perform selection so that  so that attributes with the same
                                               dition to the ‘labs’ table, we also have a table listing lab sessions:
        - Perform projection to remove duplicated attributes
     be equal.
                                                                sessions
   •• If attributes remove duplicated attributes.
     Perform projection to with the same name are semantically the same,
                                                  Lab                 Title
Note that there are no attribute ambiguities.usually the correct kind of join to use.
       then the natural join is 3F27                         Mainframe filesystems
       In addition to the ‘labs’ 3F27             table, we also have a table listing lab
                                                              Filesystem security
If attributes with the same name are semantically the same, then
       sessions:                                 3F89        Large vehicle control
the natural join is usually the correct kind of join to use. In ad- finance systems
                                                4F185 Networks for
dition to the ‘labs’ table, we also have a table 3F34 lab sessions:
                                                 listing Magnetic storage forensics

                sessions                       The natural join matches up the shared attributes
  Lab                 Title
                                                                 Demonstrator       Lab             Title
  3F27        Mainframe filesystems                                  Cook            3F27     Filesystem security
  3F27         Filesystem security                                  Cook            3F27    Mainframe filesystems
  3F89        Large vehicle control            sessions  labs =
                                                                    Libby           3F89    Large vehicle control
 4F185     Networks for finance systems                             Margo           4F185 Networks for finance systems
  3F34      Magnetic storage forensics                               Ray            3F34 Magnetic storage forensics

The natural join matches up the shared attributes

                      Demonstrator Lab                    Title                                   © 2012 Elena Punskaya
                                                                                                                         23
                                                                             Cambridge University Engineering Department
                         Cook      3F27            Filesystem security
                                                                                                                              23
10                     Engineering Part IIA: 3F6 - Software Engineering and Design



                       Natural Join
More formally:

There are two relations r(R) and s(S).

The set of shared attributes is A:

A = {A1, · · · , An} = R ∩ S

where n = |A|. The set of all attributes with no duplicates is:

R ∪ S.

The natural join is therefore:

r  s ≡ ΠR ∪ Sσr.A1=s.A1∧···∧r.An=s.An (r × s)

In SQL, natural joins are performed with NATURAL JOIN:
SELECT * FROM sessions NATURAL JOIN labs

In practice, you will usually design databases by considering the
type of data, how it is stored in tables and how to extract the
relevant information. Relation algebra will not crop up much in
day-to-day design, but it is essential for understanding how the
various operations in a relational database work.
                                                                                 © 2012 Elena Punskaya
                                                                                                        24
                                                            Cambridge University Engineering Department

                                                                                                             24
C Processors, Sanchez, 10) course SET Lectures=8 WHERE Leader=Sanchez FROM courCREATE




                                                                               Example
     •   Let’s consider an example of movies database for LOVEFiLM.com
     •   It is likely to have


                                                                                    movie
                                                 Title                                       Year                 Actor
                     Pulp Fiction                                                        1994           John Travolta
                     Hackers                                                             1995           Angelina Jolie
                     The Matrix                                                          1999           Keanu Reeves
                     The Devil’s Advocate                                                1997           Keanu Reeves

     •   SQL:
         CREATE TABLE movie (Title text, Year int, Actor text)

         INSERT INTO movie VALUES (Pulp Fiction, 1994, “John Travolta”)

         INSERT INTO movie VALUES (Hackers, 1995, “Angelina Jolie”)

                                                                        etc.
                                                                                                                         © 2012 Elena Punskaya
                                                                                                                                                25
                                                                                                    Cambridge University Engineering Department

                                                                                                                                                     25
Example
•   Projection                                    •   Distinct
                                    Actor             SELECT DISTINCT Actor                 Actor
    SELECT Actor FROM movie
                              John Travolta           FROM movie
                                                                                   John Travolta
                              Angelina Jolie                                       Angelina Jolie
                              Keanu Reeves                                         Keanu Reeves
                              Keanu Reeves
•   Selection
    SELECT * FROM movie WHERE Actor=”Keanu
                                                                    movie
    Reeves”                                            Title           Year                    Actor
                                             The Matrix                   1999 Keanu Reeves
                                             The Devil’s Advocate 1997 Keanu Reeves

•   Projection and Selection
    composed                                               Title
    SELECT Title FROM movie WHERE               The Matrix
    Actor=”Keanu Reeves”
                                                The Devil’s Advocate

                                                                                     © 2012 Elena Punskaya
                                                                                                            26
                                                                Cambridge University Engineering Department

                                                                                                                 26
Example


•   Selection may use AND, OR, NOT, BETWEEN, IN and etc.
    SELECT * FROM movie WHERE Year BETWEEN 1995 AND 1997

    (BETWEEN 1995 AND 1997 Inclusive)


                                         movie
                            Title              Year        Actor
                Hackers                       1995 Angelina Jolie
                The Devil’s Advocate 1997 Keanu Reeves




                                                                                   © 2012 Elena Punskaya
                                                                                                          27
                                                              Cambridge University Engineering Department

                                                                                                               27
Example
•   Let us take now a simplified table

                                    movie
                          Title                   Actor
               Pulp Fiction                 John Travolta
               Hackers                      Angelina Jolie


•   Imagine we also have some info regarding the number of won
    Oscars

                                   people
                         Actor               Oscars
              John Travolta                       0
              Angelina Jolie                      1



                                                                   © 2012 Elena Punskaya
                                                                                          28
                                              Cambridge University Engineering Department

                                                                                               28
Example
•   Cartesian product        SELECT Title, Actor, Actor, Oscars FROM movie, people

                                  movie x people
     movie.Title       movie.Actor            people.Actor             people.Oscars
    Pulp Fiction      John Travolta          John Travolta                         0
    Pulp Fiction      John Travolta Angelina Jolie                                 1
    Hackers           Angelina Jolie John Travolta                                 0
    Hackers           Angelina Jolie Angelina Jolie                                1
    - the only one that can create new record (if one doesn’t count renaming)
    - BUT it creates too many records!

•   Natural join would give information on whether there are
    Oscar winning actors in the movie SELECT * FROM movie, people WHERE
    movie.Actor = people.Actor or SELECT * FROM movie NATURAL JOIN people

                                       movie
                        Title                       Actor            Oscars
          Pulp Fiction                             John                   0
          Hackers                                Travolta
                                                 Angelina                 1
                                                    Jolie                             © 2012 Elena Punskaya
                                                                                                             29
                                                                 Cambridge University Engineering Department

                                                                                                                  29
Example
 •   Let us consider two tables with Oscar and BAFTA nominations

                 Oscar                                          BAFTA

John Travolta Pulp Fiction                      John Travolta      Pulp Fiction
Angelina Jolie Girl, Interrupted               Angelina Jolie      Changeling

Angelina Jolie Changeling                    Jesse Eisenberg The Social Network

 •   Union
     (SELECT * FROM Oscar) UNION (SELECT * FROM BAFTA)


                                      Oscar ∪ BAFTA

                        John Travolta       Pulp Fiction
                       Angelina Jolie       Girl, Interrupted

                       Angelina Jolie       Changeling

                                                                                © 2012 Elena Punskaya
                                                                                                       30
                                                           Cambridge University Engineering Department

                                                                                                            30
Example
•   Intersection
    (SELECT * FROM Oscar) INTERSECT (SELECT * FROM BAFTA)


                                   Oscar ∩ BAFTA
                     John Travolta        Pulp Fiction

                     Angelina Jolie       Changeling

•   Difference
    (SELECT * FROM Oscar) EXCEPT (SELECT * FROM BAFTA)


                                     Oscar – BAFTA

                      Angelina Jolie       Girl, Interrupted
                    Jesse Eisenberg The Social Network

    NOTE: some operators are treated differently in different databases,
    some may not be present
                                                                                 © 2012 Elena Punskaya
                                                                                                        31
                                                            Cambridge University Engineering Department

                                                                                                             31
Keys and Uniqueness
•   Rows in a relation can be uniquely identified by a key, which
    can consist of one or more columns
    - A key must be able to uniquely identify all possible rows that relation could have in the
      domain of tuples, not just the rows that currently exist.

•   Superkey
    - Any collection of columns which can uniquely identify a row. There may be more than
      one valid superkey.

•   Candidate key
    - A minimal superkey, i.e. a superkey with the minimal number of columns. I.e. there is
      no subset of the columns in a candidate key which will also form a candidate key. There
      may be more than one candidate key.

•   Primary key
    - A superkey or candidate key which has been selected to have a special status. A table
      can have at most one primary key. Should be small and constant.

•   Foreign key
     -If two relations r and s share a key k, then r[k] is a foreign
      key if k is the primary key of s. Therefore, the foreign key k
      does not necessarily uniquely identify the rows of r
                                                                                  © 2012 Elena Punskaya
                                                                                                         32
                                                             Cambridge University Engineering Department

                                                                                                              32
Keys

Name       Address     DoB           Gender      Relation Email
                                                 ship


John Smith 34 West rd, 2 Jan 1981    Male        Single           john@smith.
           Cambridge                                              com


Thomas     Flat 303,   11 March      Male        Single           neo@matrix.
Anderson               1962                                       org


...



Mia        20 Sunset    10 October   Female      Married          m.wallace@h
Wallace    rd, Carlsbad 1994                                      otmail.com


                                                                 © 2012 Elena Punskaya
                                                                                        33
                                            Cambridge University Engineering Department

                                                                                             33
Keys

id    Name        Address      DoB           Gender           Relation Email
                                                              ship


1     John Smith 34 West rd,   2 Jan 1981    Male             Single         john@smith.
                 Cambridge                                                   com


2     Thomas      Flat 303, 101 11 March     Male             Single         neo@matrix.
      Anderson    Red st, Zion 1962                                          org


...   ...         ...          ...           ...              ...            ...



10001 Mia Wallace 20 Sunset rd, 10 October   Female           Married m.wallace@
                  Carlsbad      1994                                  hotmail.com



                                                                        © 2012 Elena Punskaya
                                                                                               34
                                                   Cambridge University Engineering Department

                                                                                                    34
Normalization
                          Normalization
                           Normalization
    If a database has has duplicated information thenisitsubject it up-up-
         If a database duplicated information then it     is subject it
• Ifdate anomalies, and the information can become inconsistent.
     a database has duplicated information then it is subject it
         date anomalies, and the information can become inconsistent.
  update anomalies, and the information can becomelec-
    Imagine adding contact details to the the ‘course’ table allow
         Imagine adding contact detailscontact detailsto to allow lec-
  inconsistent. Imagine adding
                                        to ‘course’ table to the ‘course’
    turers toallow contacted to be
               be contacted easily:
  table turers to belecturers easily: contacted easily:
          to
      Title
          Title                   Leader Lectures Telephone
                                     Leader Lectures Telephone
      RISC Processors
          RISC Processors         Sanchez
                                     Sanchez 8 8 6596065960
      QAM for modems
          QAM for modems          Sanchez
                                     Sanchez 34 34 65960
                                                      65960
      Introduction to Mainframes Belford
          Introduction to Mainframes Belford 20 20 65536
                                                      65536
      LowLow latency LCD screens Richard
            latency LCD screens      Richard 1 1 3276832768

•
      If the table is updated, for for instance the the the SQL command:
    If the
           If table is updated, instance with with SQL command:
              the table is updated, for instance with SQL command:
     UPDATE course SET SET Leader=Libby WHERE Title=RISC Processors
         UPDATE course Leader=Libby WHERE Title=RISC Processors
•   Then Then contact details will become incorrect. The process of
          the the contact details
      Then the contact details will will become incorrect. The process of
                                    become incorrect. The process of
    normalizing a database involves splitting up large tables with
      normalizing related information splitting up large tables with
          normalizing a database involves into a large tables with
    only weakly a database involves splitting upnumber of smaller
    tables. Normalized data is theninto a numbersmaller tables.
      onlyonly weakly related information accessed by joining tables.
           weakly related information into a number of of smaller tables
    together and performing accessed joining tablesresults. and
      Normalized datadata is then selectionsjoining tables together and
          Normalized is then accessed by by on the together
                                                                      © 2012 Elena Punskaya
     performing selections on the the results.
         performing selections on results.                                                   35
                                                 Cambridge University Engineering Department

                                                                                                  35
date anomalies, and the information can become inconsistent.
         Imagine adding contact details to the ‘course’ table to allow lec-
                              Normalization
         turers to be contacted easily:

          Title                               Leader Lectures Telephone
          RISC Processors                     Sanchez    8    65960
          QAM for modems                      Sanchez   34    65960
          Introduction to Mainframes          Belford   20    65536
          Low latency LCD screens             Richard    1    32768

•   The database above is not normalised the SQL command:
         If the table is updated, for instance with because there is
    duplicated data. More intuitively, the telephone number has
         UPDATE course SET Leader=Libby WHERE Title=RISC Processors
    merely been inserted as a convenience and has nothing
    directly to do with courses
       Then the contact details will become incorrect. The process of
• Much like type safety and object oriented design, database
       normalizing a database involves splitting up large tables with
  normalization allows databases to be designed such that
  certain errors (for instance data inconsistency) are tables.likely.
       only weakly related information into a number of smaller less
       Normalized data is then accessed by joining tables together and
• Normalization is the process of designing the database
       performing selections on the results.
    comply with normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF
    and DKNF).
         The database above is not normalized because there is duplicated
         data. More intuitively, the telephone number has merely been  © 2012 Elena Punskaya
                                                                                              36
         inserted as a convenience and has nothing directly to do with
                                                  Cambridge University Engineering Department

                                                                                                   36
First Normal Form (1NF)

                                First Normal Form
                       1. Make sure that your database really obeys the relational
                          model:
•   Make sure that your database really obeys the relational
                  (a) No ordering over rows
    model:
                         (b) No ordering over columns
    - No ordering over rows
    - No ordering over columns duplicates
                         (c) No
    - No duplicates    2. Each row/column intersection contains exactly one datum.
•   Each row/column intersection contains exactly one datum
              Consider trying to extend the earlier design to allow for multiple
• Consider storing numbers:
              phone multiple phone numbers for the Leader

                      Title   Lectures    ID     Numbers
                      ···     8           456    65950, 60294, 70231
                      ···     8           456    65950, 60294, 70231
              BAD     ···     34          20     65536
                      ···     1           82     32768, 16384
                      Title   Lectures   ID     Phone 1 Phone 2 Phone 3
                      ···     8          456      65960 60294        70231
                      ···     34         456      65960 60294        70231
              BAD     ···     20         9        65536
                      ···     1          82       32768              16384

                      Note the use of IDs to avoid duplicates as names make bad keys:
                                                                                      © 2012 Elena Punskaya
                                                                                                             37
                                                                 Cambridge University Engineering Department
                employees
                                         The     list   of   phone   numbers        for     the                   37
···           1                    82
      456        65960 First Normal Form
                         60294    70231
      9          65536                      Note the use of IDs
    • Employees table is used for
      details of 32768 Leaders
      82         course           16384 employees
    •   Adding a Phone number to store                        Th
                                        Name        ID Phone
        employee’s contact details
 of IDs to avoid duplicates as names make bad keys:65960 lea
                                        Sanchez 456
    • To support multiple phone numbers                       be
      need to duplicate Name/ID data    Belford      9 65536
                                                              ΠP
      The list of phone numbers         Richard for 82 the
                                                        32768
one                                     Sanchez 456 60294
     leader of a particular course can now
60 • The list of phone numbers for the leader of a particular course
     can now be extracted using relational algebra: algebra:
     be extracted using relational
36
     ΠPhone(σTitle=“RISC Processors”(course  employees))
68
     or in SQL:
94
        SELECT phone FROM course NATURAL JOIN employees WHERE
        Title=”RISC Processors”

                                                                 © 2012 Elena Punskaya
                                                                                        38
                                            Cambridge University Engineering Department

                                                                                             38
Second Normal Form (2NF)
                   A tableSecond normal form if it satisfies:
                             is in second Normal Form (2NF)
                     1. It is in first normal form (1NF).
•   A table is in second normal form if it satisfies:
                     2. All non-prime attributes depend on the whole candidate key.
     - It is in first normal form (1NF).
     - All non-prime attributes depend on the whole candidate key.
             From the previous example, the complete relation, employees(E),
•   From the previous example, the complete relation, employees
    (E), is:
             is:
                                                 Lack of normalization allows
                  employees                      buggy programs to create incon-
                                                 sistencies:
                 Name     ID     Phone
                 Sanchez 456     65960           Inserting the record (“Belford”,
                 Belford    9    65536           10, 131072) leads to a mismatch
                                                 between the name and id.
                 Richard 82      32768
                 Sanchez 456     60294           An employee name change re-
                                                 quires updates across multiple
                 Sanchez 456     70231           rows, which may be done incor-
                 Richard 82      16384           rectly. It also requires more lock-
                                                 ing.
•   The candidate key is Cis= (ID, Phone). The non prime attribute
             The candidate key C = (ID, Phone). The non prime attribute
    is therefore E − CE − C =(Name). The employees’ names do do not
             is therefore =(Name). The employees’ names not
    depend on theon the phone number, onlythe ID. Therefore the table table
             depend phone number, only the ID. Therefore the
    is not in 2NF.
             is not in 2NF. A 2NF design is:
                                                                                    © 2012 Elena Punskaya
                                     contacts                  Cambridge University Engineering Department
                                                                                                           39


                                    ID Phone                                                                    39
The candidate key is C = (ID, Phone). The non prime attri
                   is therefore E − C Form (2NF) employees’ names do
               Second Normal           =(Name). The
                   depend on the phone number, only the ID. Therefore the
• A 2NF design is: is not in 2NF. A 2NF design is:


                                        contacts
                                      ID Phone
                     employee names
                                      456 65960
                     Name        ID
                                      9     65536
                     Sanchez    456
                                      82 32768
                     Belford      9
                                      456 60294
                     Richard     82
                                      456 70231
                                      82 16384

•   ID is the Primary Key in employee_names
•   Phone is the Primary Key in contacts
•   ID is a Foreign Key in contacts connecting employee names
    and their phone numbers

                                                                   © 2012 Elena Punskaya
                                                                                          40
                                              Cambridge University Engineering Department

                                                                                               40
upon the key, the whole key and nothing but the key.”
                         Third Normal Form (3NF)
      More formally a table over R is in 3NF iff:
•   “I swear by Codd that each non-prime attribute shall depend
    upon It is in 2NF the whole key and nothing but the key.”
       1. the key, (and therefore 1NF)
•   More Every non-prime attribute R is in 3NF if and only if:
      2. formally a table over is directly dependent on every
    - It is in 2NF (and key of R. 1NF)
            candidate therefore
    - Every non-prime attribute is directly dependent on every candidate key of R

     Practical             Date      Demonstrator Contact Pay rate
     Acoustic coupling     Mon 1 Feb Dade         45102   10
     Acoustic coupling     Sat 7 Feb Dade         45102   15
     Self-propagating code Tue 2 Mar Joey         67822   10
     Self-propagating code Sun 9 Mar Kate         62341   15
    The candidate key is:
• The candidate key is (Practical, Date), however, the table not
    fully normalized because there is repetition of data (the
      (Practical,Date)
    contact numbers and the pay rates). The table is not in 3NF
    because:
      Table is not fully normalized because there is repetition of data
    - Pay rate depends on the key, but not the whole key. Specifically, it only depends on the
      date. contact numbers and the pay rates). The table is not in
       (the
    - Contact depends upon the whole key, but the dependence is transitive, not direct, that
       3NF because:
      is: Contact → Demonstrator → (Practical, Date)

       • Pay rate depends on the key, but not the whole key. Specif- Elena Punskaya 41
                                                                         © 2012
                                                    Cambridge University Engineering Department

         ically, it only depends on the date.                                                     41
to abort, rather than Engineering Part IIA: 3F6 -data. Engineering and Design
 16                    make inconsistent Software
 In addition to normal forms, which can be represented in rela-
                          SQL attributes (helpful for 1NF)
                                Constraints
                   SQL Constraints
 NOT NULL prevents missing tables to be constructed with addi-
 tional algebra, SQL allows
• In addition to normal forms, which can be represented in
 tional constraints which(Name the database more robust. Unlike
 CREATE TABLE course make string NOT NULL, ...)
  relational algebra, SQL allows tables to be constructed with
 normalization,normal forms, whichThis theimpossible to ID is
 In primary key constraints do not makewillrepresented more robust.
 Aadditional to can be specified. make be database in construct
    addition constraints which       can it ensure that rela-
 errors.algebra, SQL data that break unique. cause transactions
 tional and invalid allows tables alsobe constructedcauses
  Providing
 unique,  However, constraintsare to constraints with addi-
              therefore all rows do make errors
  transactions to abort, rather than make inconsistent data
 to abort, rather than make inconsistent data. robust. Unlike
 tional constraints which make the database more
•CREATE
  Types    TABLE people
          of constraints   (Name string, ID int PRIMARY KEY)
 normalization, constraints do not make it impossible to construct
•NOT NULL – ensures that the value unique: column can not be
 Known candidate constraints do attributes (helpfultransactions
 errors.NULL prevents can be marked as of this
  NOT However, keys missing make errors cause for 1NF)
  omitted
 CREATE TABLE than make (Name UNIQUE(a, b),
 CREATE TABLE r course inconsistent data. NOT NULL, ...)
 to abort, rather (a, b, c, d, string
• UNIQUE – ensures that the value of this column is unique
                 UNIQUE(a, c, d))
•A primary key can missing– designates will column that ID is
 NOT NULL prevents be specified. This the for 1NF)as a key
  PRIMARY/FOREIGN KEY attributes (helpful ensure
 unique, and therefore all(Name are also unique. KEY which
 A particularly important constraint is FOREIGN
 CREATE TABLE course         rows string NOT NULL, ...)
 ensures that an attribute is a primary key in another table:
 CREATE TABLEcan be specified. This will ensure that ID is KEY)
 A primary key people (Name string, ID int PRIMARY
 CREATE and therefore all rows are also unique.
 unique, candidate keys can be marked asPRIMARY KEY, ID int,
          TABLE course (Title string
 Known                                        unique:
                 Lectures int,
 CREATE TABLEFOREIGN KEY c, string, ID intemployees)
 CREATE TABLE people (Name d, UNIQUE(a, b),
                   r (a, b, (ID) REFERENCES PRIMARY KEY)
 Known candidate keys can be c, d)) as unique:
                  UNIQUE(a, marked                                  © 2012 Elena Punskaya
 The ID of the course leader is now constrained to be a valid em-Department
                                               Cambridge University Engineering
                                                                                    42

 A particularly important c, d, UNIQUE(a, b),
 CREATE TABLE r (a, b, constraint is FOREIGN KEY which                                   42
Entity-Relationship (E/R) Modelling
 •   As in Object Oriented approach, designing a database schema
     requires finding conceptual abstractions (that represent the
     data) and defining relationships between them

entity set   Employee           Leads        Course                 No. lectures


                              relationship
                                   set
        Name                                  Title
                  attribute


 •   Notation suggested by Peter Chen in “The Entity Relationship
     Model: Toward a Unified View of Data”, 1976
     - UML can also be used

 •   Relationships have cardinality
     - 1 to 1
     - 1 to Many
     - Many to Many etc.


                                                                     © 2012 Elena Punskaya
                                                                                            43
                                                Cambridge University Engineering Department

                                                                                                 43
Entity-Relationship (E/R) Modelling
                                                                                   Name
                                                                                                 Number


                                                                             Employee



                                                                                   ISA



                                      Does                   Mechanic                        Salesman


             Description
                                                                                     Price                                    Date


            Number                 RepairJob                                Date                 Buys              Sells             Value

                                                                License
  Parts             Cost                                                            Value                                     Comission

       Work
                                     Repairs                      Car
                                                                                                    seller     buyer

                                                                                     Year               Client                 ID
                                   Manufacturer                  Model


                                                                                          Name                             Phone
Pável Calado, http://www.texample.net/tikz/examples/entity-relationship-diagram/                        Address
                                                                                                                                  © 2012 Elena Punskaya
                                                                                                                                                         44
                                                                                                             Cambridge University Engineering Department

                                                                                                                                                              44
Objects and Databases
•   Relational Database Management Systems were mature
    stable products by 1980s
•   Object-Oriented approach reached wide adoption in 1990s
•   Any large software system still needs to persist data, hence
    store it in databases
•   Question: how we map Objects in a software system at
    runtime to Data stored in databases?
•   Originally, two options emerged:
    - Object to Relationship Mapping – a software layer that can provide database persistent
      to OO system (e.g. Hibernate, TopLink) – commonly used
    - Object Databases – a nice idea that failed to reach mainstream adoption

•   Most recently, further developments included non-
    relationship approaches (NoSQL) to working with large
    distributed datasets, e.g. Hadoop (hadoop.apache.org)
    - Map/Reduce: distributed processing of large data sets on compute clusters
    - Hive: A data warehouse infrastructure that provides data summarization and ad hoc
      querying
    - Cassandra: A scalable multi-master database with no single points of failure
                                                                                © 2012 Elena Punskaya
                                                                                                       45
                                                           Cambridge University Engineering Department

                                                                                                            45
No(t only)SQL at guardian.co.uk
•   The Guardian online, 1999

                                                                     Web server          Web server                  Web server

 Guardian journalism online: 1999

                                                                     App bring
                                                                       I server   you NEWS!!!
                                                                                        App server                   App server



                                                                                      Memcached (20Gb)




                                                                                          Oracle


                                                                              CMS                           Data feeds

                                                                                                              © 2012 Elena Punskaya
Matthew Wall, Simon Willison, www.slideshare.net/matwall/nosql-presentation                                                          46
                                                                                         Cambridge University Engineering Department

                                                                                                                                          46
No(t only)SQL at guardian.co.uk
•   The Guardian online, 2010
                                                                                               Core
                                                                                                                                     Out
                                                             In                               Web servers
Guardian journalism online: 2010
                                                                App                                                                   Solr




                                                                              Proxy
                                                                                              App server
                                                                App                                                                   Solr
                                                                                        Memcached (20Gb)
                                                                App                                                                   Solr
                                                                App           CMS         Data feeds            Solr
                                                                                                                                      Solr
                                                                App
                                                                                              M/Q
                                                                                                                                      Solr
                                                                App
                                                                                      rdbms         CouchDB?
                                                    external hosting                                                           Cloud, EC2
                                                     app engine etc




                                                                                                                   © 2012 Elena Punskaya
Matthew Wall, Simon Willison, www.slideshare.net/matwall/nosql-presentation                                                               47
                                                                                              Cambridge University Engineering Department

                                                                                                                                               47
Security and SQL Injection
•   Consider the following example
         // allowing a user to search by product name
         string name;
         cout  Enter product name:  endl;
         getline(cin, name);
         string query = SELECT * FROM products WHERE name= + name + ;
         do_sql(query);


•   What happens if the user enters:  ; DROP TABLE products; --
•   The query becomes
          // going to delete the table Products
          SELECT * FROM products WHERE name= ; DROP TABLE products; -- 




•   SQL Injection could be used
    to steal data from a database




                                                                                         news.bbc.co.uk/1/hi/8206305.stm

                                                                                        © 2012 Elena Punskaya
                                                                                                               48
                                                                   Cambridge University Engineering Department

                                                                                                                           48

Weitere ähnliche Inhalte

Was ist angesagt?

Vb net xp_10
Vb net xp_10Vb net xp_10
Vb net xp_10Niit Care
 
MoDisco EclipseCon2010
MoDisco EclipseCon2010MoDisco EclipseCon2010
MoDisco EclipseCon2010fmadiot
 
20110507 Implementing Continuous Deployment
20110507 Implementing Continuous Deployment20110507 Implementing Continuous Deployment
20110507 Implementing Continuous DeploymentXebiaLabs
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The EnterpriseDaniel Egan
 
Pal gov.tutorial3.session4.rest
Pal gov.tutorial3.session4.restPal gov.tutorial3.session4.rest
Pal gov.tutorial3.session4.restMustafa Jarrar
 
Rolling out Real User Monitoring at Telefónica Germany
Rolling out Real User Monitoring at Telefónica GermanyRolling out Real User Monitoring at Telefónica Germany
Rolling out Real User Monitoring at Telefónica GermanyOliver Ochs
 

Was ist angesagt? (16)

Dacj 1-2 b
Dacj 1-2 bDacj 1-2 b
Dacj 1-2 b
 
Ajs 3 a
Ajs 3 aAjs 3 a
Ajs 3 a
 
Ajs 4 a
Ajs 4 aAjs 4 a
Ajs 4 a
 
Vb net xp_10
Vb net xp_10Vb net xp_10
Vb net xp_10
 
Ajs 3 b
Ajs 3 bAjs 3 b
Ajs 3 b
 
MoDisco EclipseCon2010
MoDisco EclipseCon2010MoDisco EclipseCon2010
MoDisco EclipseCon2010
 
Ajs 4 b
Ajs 4 bAjs 4 b
Ajs 4 b
 
Ajs 2 b
Ajs 2 bAjs 2 b
Ajs 2 b
 
Ajs 3 c
Ajs 3 cAjs 3 c
Ajs 3 c
 
Ajs 2 a
Ajs 2 aAjs 2 a
Ajs 2 a
 
Lifecycle
LifecycleLifecycle
Lifecycle
 
20110507 Implementing Continuous Deployment
20110507 Implementing Continuous Deployment20110507 Implementing Continuous Deployment
20110507 Implementing Continuous Deployment
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The Enterprise
 
Pal gov.tutorial3.session4.rest
Pal gov.tutorial3.session4.restPal gov.tutorial3.session4.rest
Pal gov.tutorial3.session4.rest
 
Ajs 2 c
Ajs 2 cAjs 2 c
Ajs 2 c
 
Rolling out Real User Monitoring at Telefónica Germany
Rolling out Real User Monitoring at Telefónica GermanyRolling out Real User Monitoring at Telefónica Germany
Rolling out Real User Monitoring at Telefónica Germany
 

Andere mochten auch

3 f6 9a_corba
3 f6 9a_corba3 f6 9a_corba
3 f6 9a_corbaop205
 
02 JavaScript Syntax
02 JavaScript Syntax02 JavaScript Syntax
02 JavaScript SyntaxYnon Perek
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Michael Rys
 
Web Programming Intro
Web Programming IntroWeb Programming Intro
Web Programming IntroYnon Perek
 
3 f6 security
3 f6 security3 f6 security
3 f6 securityop205
 
авиаперевозки Clients 2009
авиаперевозки Clients 2009авиаперевозки Clients 2009
авиаперевозки Clients 2009Pavel Red'ko
 
Money Management CH 16
Money Management CH 16Money Management CH 16
Money Management CH 16marshalls1
 
cleveland_district3_narrative
cleveland_district3_narrativecleveland_district3_narrative
cleveland_district3_narrativeandy biggin
 
מצגת אחריות הרשויות המקומיות לניצולי השואה בתחומן
מצגת אחריות הרשויות המקומיות לניצולי השואה בתחומןמצגת אחריות הרשויות המקומיות לניצולי השואה בתחומן
מצגת אחריות הרשויות המקומיות לניצולי השואה בתחומןelio2b
 
Trade with China. AsstrA Presentation 2009
Trade with China. AsstrA Presentation 2009Trade with China. AsstrA Presentation 2009
Trade with China. AsstrA Presentation 2009Pavel Red'ko
 
Trapper School Grad P P
Trapper  School  Grad  P PTrapper  School  Grad  P P
Trapper School Grad P Pnsbsd
 
mtzbarcelona
mtzbarcelonamtzbarcelona
mtzbarcelonaMTZDMC
 
DiffCalculus: August 27, 2012
DiffCalculus: August 27, 2012DiffCalculus: August 27, 2012
DiffCalculus: August 27, 2012Carlos Vázquez
 
Huetouch Partner Proposal 2009
Huetouch Partner Proposal 2009Huetouch Partner Proposal 2009
Huetouch Partner Proposal 2009cookielady
 
Vakfotografie Sonny Lips
Vakfotografie Sonny LipsVakfotografie Sonny Lips
Vakfotografie Sonny LipsSonny1967
 

Andere mochten auch (20)

3 f6 9a_corba
3 f6 9a_corba3 f6 9a_corba
3 f6 9a_corba
 
02 JavaScript Syntax
02 JavaScript Syntax02 JavaScript Syntax
02 JavaScript Syntax
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
Web Programming Intro
Web Programming IntroWeb Programming Intro
Web Programming Intro
 
3 f6 security
3 f6 security3 f6 security
3 f6 security
 
авиаперевозки Clients 2009
авиаперевозки Clients 2009авиаперевозки Clients 2009
авиаперевозки Clients 2009
 
Teaching with social media in classroom settings: Top ten practices from teac...
Teaching with social media in classroom settings: Top ten practices from teac...Teaching with social media in classroom settings: Top ten practices from teac...
Teaching with social media in classroom settings: Top ten practices from teac...
 
Aaa! Newsletter
Aaa! NewsletterAaa! Newsletter
Aaa! Newsletter
 
WIRA brochure 2010
WIRA brochure 2010WIRA brochure 2010
WIRA brochure 2010
 
Money Management CH 16
Money Management CH 16Money Management CH 16
Money Management CH 16
 
cleveland_district3_narrative
cleveland_district3_narrativecleveland_district3_narrative
cleveland_district3_narrative
 
מצגת אחריות הרשויות המקומיות לניצולי השואה בתחומן
מצגת אחריות הרשויות המקומיות לניצולי השואה בתחומןמצגת אחריות הרשויות המקומיות לניצולי השואה בתחומן
מצגת אחריות הרשויות המקומיות לניצולי השואה בתחומן
 
Trade with China. AsstrA Presentation 2009
Trade with China. AsstrA Presentation 2009Trade with China. AsstrA Presentation 2009
Trade with China. AsstrA Presentation 2009
 
Andresferran
AndresferranAndresferran
Andresferran
 
Trapper School Grad P P
Trapper  School  Grad  P PTrapper  School  Grad  P P
Trapper School Grad P P
 
mtzbarcelona
mtzbarcelonamtzbarcelona
mtzbarcelona
 
DiffCalculus: August 27, 2012
DiffCalculus: August 27, 2012DiffCalculus: August 27, 2012
DiffCalculus: August 27, 2012
 
C sharp
C sharpC sharp
C sharp
 
Huetouch Partner Proposal 2009
Huetouch Partner Proposal 2009Huetouch Partner Proposal 2009
Huetouch Partner Proposal 2009
 
Vakfotografie Sonny Lips
Vakfotografie Sonny LipsVakfotografie Sonny Lips
Vakfotografie Sonny Lips
 

Ähnlich wie 3 f6 8_databases

Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...Michaël Figuière
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud applicationNoam Sheffer
 
Leverage Azure and SQL Azure to build SaaS applications
Leverage Azure and SQL Azure to build SaaS applications Leverage Azure and SQL Azure to build SaaS applications
Leverage Azure and SQL Azure to build SaaS applications Common Sense
 
Micro services - Practicalities & things to watch out for
Micro services - Practicalities & things to watch out forMicro services - Practicalities & things to watch out for
Micro services - Practicalities & things to watch out forParthiban J
 
Iot cloud service v2.0
Iot cloud service v2.0Iot cloud service v2.0
Iot cloud service v2.0Vinod Wilson
 
BASE: An Acid Alternative
BASE: An Acid AlternativeBASE: An Acid Alternative
BASE: An Acid AlternativeHiroshi Ono
 
Migrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive SystemsMigrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive SystemsLightbend
 
Migrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMigrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMarkus Eisele
 
Streaming to a new Jakarta EE / JOTB19
Streaming to a new Jakarta EE / JOTB19Streaming to a new Jakarta EE / JOTB19
Streaming to a new Jakarta EE / JOTB19Markus Eisele
 
Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts weili_at_slideshare
 
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011Eric D. Boyd
 
Decomposing the Monolith using Microservices that don't give you pain
Decomposing the Monolith using Microservices that don't give you painDecomposing the Monolith using Microservices that don't give you pain
Decomposing the Monolith using Microservices that don't give you painDennis Doomen
 
Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Derek Ashmore
 
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Andrew Miller
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservicesAnil Allewar
 
Reactive programming
Reactive programmingReactive programming
Reactive programmingSUDIP GHOSH
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Tammy Bednar
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application DesignGlobalLogic Ukraine
 
Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained  Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained Markus Eisele
 

Ähnlich wie 3 f6 8_databases (20)

Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
Leverage Azure and SQL Azure to build SaaS applications
Leverage Azure and SQL Azure to build SaaS applications Leverage Azure and SQL Azure to build SaaS applications
Leverage Azure and SQL Azure to build SaaS applications
 
Micro services
Micro servicesMicro services
Micro services
 
Micro services - Practicalities & things to watch out for
Micro services - Practicalities & things to watch out forMicro services - Practicalities & things to watch out for
Micro services - Practicalities & things to watch out for
 
Iot cloud service v2.0
Iot cloud service v2.0Iot cloud service v2.0
Iot cloud service v2.0
 
BASE: An Acid Alternative
BASE: An Acid AlternativeBASE: An Acid Alternative
BASE: An Acid Alternative
 
Migrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive SystemsMigrating From Java EE To Cloud-Native Reactive Systems
Migrating From Java EE To Cloud-Native Reactive Systems
 
Migrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systemsMigrating from Java EE to cloud-native Reactive systems
Migrating from Java EE to cloud-native Reactive systems
 
Streaming to a new Jakarta EE / JOTB19
Streaming to a new Jakarta EE / JOTB19Streaming to a new Jakarta EE / JOTB19
Streaming to a new Jakarta EE / JOTB19
 
Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts
 
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
 
Decomposing the Monolith using Microservices that don't give you pain
Decomposing the Monolith using Microservices that don't give you painDecomposing the Monolith using Microservices that don't give you pain
Decomposing the Monolith using Microservices that don't give you pain
 
Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15
 
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
Varrow Q4 Lunch & Learn Presentation - Virtualizing Business Critical Applica...
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservices
 
Reactive programming
Reactive programmingReactive programming
Reactive programming
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained  Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained
 

Mehr von op205

Lecture 7 Software Engineering and Design User Interface Design
Lecture 7 Software Engineering and Design User Interface Design Lecture 7 Software Engineering and Design User Interface Design
Lecture 7 Software Engineering and Design User Interface Design op205
 
Lecture 6 Software Engineering and Design Good Design
Lecture 6 Software Engineering and Design Good Design Lecture 6 Software Engineering and Design Good Design
Lecture 6 Software Engineering and Design Good Design op205
 
Lecture 5 Software Engineering and Design Design Patterns
Lecture 5 Software Engineering and Design Design PatternsLecture 5 Software Engineering and Design Design Patterns
Lecture 5 Software Engineering and Design Design Patternsop205
 
Lecture 4 Software Engineering and Design Brief Introduction to Programming
Lecture 4 Software Engineering and Design Brief Introduction to ProgrammingLecture 4 Software Engineering and Design Brief Introduction to Programming
Lecture 4 Software Engineering and Design Brief Introduction to Programmingop205
 
Lecture 2 Software Engineering and Design Object Oriented Programming, Design...
Lecture 2 Software Engineering and Design Object Oriented Programming, Design...Lecture 2 Software Engineering and Design Object Oriented Programming, Design...
Lecture 2 Software Engineering and Design Object Oriented Programming, Design...op205
 
More on DFT
More on DFTMore on DFT
More on DFTop205
 
Digital Signal Processing Summary
Digital Signal Processing SummaryDigital Signal Processing Summary
Digital Signal Processing Summaryop205
 
Implementation of Digital Filters
Implementation of Digital FiltersImplementation of Digital Filters
Implementation of Digital Filtersop205
 
Basics of Analogue Filters
Basics of Analogue FiltersBasics of Analogue Filters
Basics of Analogue Filtersop205
 
Design of IIR filters
Design of IIR filtersDesign of IIR filters
Design of IIR filtersop205
 
Design of FIR filters
Design of FIR filtersDesign of FIR filters
Design of FIR filtersop205
 
Basics of Digital Filters
Basics of Digital FiltersBasics of Digital Filters
Basics of Digital Filtersop205
 
Introduction to Digital Signal Processing
Introduction to Digital Signal ProcessingIntroduction to Digital Signal Processing
Introduction to Digital Signal Processingop205
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transformop205
 
Brief Review of Fourier Analysis
Brief Review of Fourier AnalysisBrief Review of Fourier Analysis
Brief Review of Fourier Analysisop205
 
3F3 – Digital Signal Processing (DSP) - Part1
3F3 – Digital Signal Processing (DSP) - Part13F3 – Digital Signal Processing (DSP) - Part1
3F3 – Digital Signal Processing (DSP) - Part1op205
 

Mehr von op205 (16)

Lecture 7 Software Engineering and Design User Interface Design
Lecture 7 Software Engineering and Design User Interface Design Lecture 7 Software Engineering and Design User Interface Design
Lecture 7 Software Engineering and Design User Interface Design
 
Lecture 6 Software Engineering and Design Good Design
Lecture 6 Software Engineering and Design Good Design Lecture 6 Software Engineering and Design Good Design
Lecture 6 Software Engineering and Design Good Design
 
Lecture 5 Software Engineering and Design Design Patterns
Lecture 5 Software Engineering and Design Design PatternsLecture 5 Software Engineering and Design Design Patterns
Lecture 5 Software Engineering and Design Design Patterns
 
Lecture 4 Software Engineering and Design Brief Introduction to Programming
Lecture 4 Software Engineering and Design Brief Introduction to ProgrammingLecture 4 Software Engineering and Design Brief Introduction to Programming
Lecture 4 Software Engineering and Design Brief Introduction to Programming
 
Lecture 2 Software Engineering and Design Object Oriented Programming, Design...
Lecture 2 Software Engineering and Design Object Oriented Programming, Design...Lecture 2 Software Engineering and Design Object Oriented Programming, Design...
Lecture 2 Software Engineering and Design Object Oriented Programming, Design...
 
More on DFT
More on DFTMore on DFT
More on DFT
 
Digital Signal Processing Summary
Digital Signal Processing SummaryDigital Signal Processing Summary
Digital Signal Processing Summary
 
Implementation of Digital Filters
Implementation of Digital FiltersImplementation of Digital Filters
Implementation of Digital Filters
 
Basics of Analogue Filters
Basics of Analogue FiltersBasics of Analogue Filters
Basics of Analogue Filters
 
Design of IIR filters
Design of IIR filtersDesign of IIR filters
Design of IIR filters
 
Design of FIR filters
Design of FIR filtersDesign of FIR filters
Design of FIR filters
 
Basics of Digital Filters
Basics of Digital FiltersBasics of Digital Filters
Basics of Digital Filters
 
Introduction to Digital Signal Processing
Introduction to Digital Signal ProcessingIntroduction to Digital Signal Processing
Introduction to Digital Signal Processing
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transform
 
Brief Review of Fourier Analysis
Brief Review of Fourier AnalysisBrief Review of Fourier Analysis
Brief Review of Fourier Analysis
 
3F3 – Digital Signal Processing (DSP) - Part1
3F3 – Digital Signal Processing (DSP) - Part13F3 – Digital Signal Processing (DSP) - Part1
3F3 – Digital Signal Processing (DSP) - Part1
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

3 f6 8_databases

  • 2. Big Data • Facebook stores over 100 petabytes of media (photos and videos) uploaded by its 845 million users • There are 762 billion objects stored in Amazon S3 that processes over 500,000 requests per second for these objects at peak times Bryce Durbin, Techcrunch aws.typepad.com Amazon © 2012 Elena Punskaya 2 Cambridge University Engineering Department 2
  • 3. Big Data • Storing large amounts of data requires managing complexity: - mapping real world to data - providing concurrent access to creating, reading and changing of the data - providing distributed access and storage of the data • Database Management Systems decouples business logic of applications working with data from the details of physical storage and transaction (operations on data) management • Any non-trivial system needs to store its application data: - user/password - credit cards - product information - health records ... • It is possible to store all data directly as files but a typical filesystem isn’t build for transaction management and high performance Cambridge High Performance Computing Cluster Darwin © 2012 Elena Punskaya 3 Cambridge University Engineering Department 3
  • 4. Database Systems I 1 Transaction Processing Transaction Processing • Databases Databases common component of distributeddistributed systems. are a are a common component of many many sys- They storetems. They store records for a large number of distinct entities records for a large number of distinct entities and will typically support a small setsmall operations to access and and manipulate and will typically support a of set of operations to access those entities. These operations can be assumedto be atomic i.e. manipulate those entities. These operations can be assumed to be atomic i.e. they cannot be interrupted. they cannot be interrupted. External clients execute transactions which are sequences of op- • External clients execute transactions which areto erations applied to one or more database entities designed sequences of operations achieve a single logical affect. more database entities designed to applied to one or achieve a single logical affect. Recovery Log Client A TA Transaction Database Manager Client B TB Transactions Atomic Operations • The transaction manager ensures that transactions appear atomic to clients. Client receives an acknowledgement atomic The transaction manager ensures that transactions appear of every successful transaction. clients. Client receives an acknowledgement of every successful to transaction. © 2012 Elena Punskaya 4 Cambridge University Engineering Department 4
  • 5. Example: Bank Transfer • Each account is represented by a different database object, which guarantees that each operation is atomic class Account { // link to required account records DBaseAccessInfo dbinfo; public: // Constructor - open an account account(string account_name); // Atomic operations void debit(float amount); void credit(float amount); float read_balance(); }; // A typical transaction would be void transfer(account& A, account& B, float amount) { float balance = A.read_balance(); if (balance >= amount) { A.debit(amount); B.credit(amount); } } • A key issue is what happens if there is a failure part-way through the transaction? © 2012 Elena Punskaya 5 Cambridge University Engineering Department 5
  • 6. System Crash • What happens if the system crashes in the middle of transaction? // a typical transaction void transfer(account& A, account& B, float amount) { float balance = A.read_balance(); if(balance >= amount) { A.debit(amount); <-----------------------------CRASH! • Account A will have had its money debited, but it will never appear in account B! – invalid state • The transaction manager (or any transaction processing system) must have a means of recovering from errors, and always leaving the system in a valid state • Need to ensure that Credit/Debit is ATOMIC, i.e. can only be preformed as a WHOLE not in parts © 2012 Elena Punskaya 6 Cambridge University Engineering Department 6
  • 7. ACID • A transaction my fail in many different ways (e.g. two clients try to access the same entity at the same time, temporary network failure, software fault, disk crash, etc). The transaction processor tries to ensure that transactions have the following properties • Atomicity - Either all or none of the transaction’s operations are performed • Consistency - Transactions transform the system from one consistent state to another • Isolation - An incomplete transaction cannot reveal its result to other transactions before it is complete • Durability - Once the transaction is committed, the system must guarantee that the results of its operations will persist, even if there are subsequent system failures © 2012 Elena Punskaya 7 Cambridge University Engineering Department 7
  • 8. ems I 5 Recovery Recovery In order to maintain the ACID properties, a transaction • processor must be able to recover from errors by restoring o maintain the ACID properties, a transaction processor the system to a consistent state. ble to recover from errors by restoring the system to a • To achieve this, transactions are modelled on the following state. state machine e this, transactions are modelled on the following state // A typical transaction with commit void transfer(account& A, account& B, float amount) { try { // Record transaction start int id = BeginTransaction(); Transaction balance = A.read_balance(); float processor might >= amount) { if (balance A.debit(amount); B.credit(amount); ← invalidate this } transaction success, so commit (finish) // (see Commit(id); last slide) } catch(..) { // transaction failed, recover/revert Abort(id); } the transfer transaction } sfer(account& A, account& B, float amount) © 2012 Elena Punskaya 8 Cambridge University Engineering Department 8
  • 9. Database Systems I 9 Concurrency Concurrency • In practice, a database transaction processor will be receiving In practice, a database transaction processor will be receiving a a stream of transaction requests, and will need to execute stream of transaction requests, and will need to execute transac- transactions in parallel in order to provide acceptable response tions in parallel in order to provide acceptable response times. times. • When two transactions reference the same account, When two transactions reference the same account, uncontrolled uncontrolled interleaving of operations can produce an interleaving of operations can produce an incorrect result. There incorrect are three classes of concurrency problem: result. There are three classes of concurrency problem • The uncommitted dependency problem Time Transaction 1 Transaction 2 t1 – A.write() t2 A.read() – t3 – abort() In this case, transaction 1 reads an updated account value, but • In this case, transaction 1 reads an updated account value, transaction 2 aborts undoing the effect of the update. Transac- but transaction 2 aborts undoing the effect of the update. tion 1 is then left holding an incorrect account value. Transaction 1 is then left holding an incorrect account value © 2012 Elena Punskaya 9 Cambridge University Engineering Department Note: A.read() indicates any operation which reads a value from ac- 9
  • 10. • The lost update problem 10 Concurrency Time Transaction 1 Transaction 2 Software Engineering and Design Engineering Part IIA: 3F6 - t1 A.read() – • The lost update problem t2 – A.read() t3 A.write() – • The change made to account Time Transaction 1 Transaction 2 t4 – A.write() A at t3 by transaction 1 is lost t1 A.read() – because it is overwritten at In t2 case, the change made to account A at t3 by t4 by transaction 2 this – A.read() time transac- t A.write() – tion3 1 is lost because it is overwritten at time t4 by transac- t tion4 2. – A.write() In this case, the change made to account A at t3 by transac- The is lost because it is overwritten at time t4 by transac- • tion 1inconsistent analysis problem tion 2. • Transaction 2 updates Time Transaction 1 Transaction 2 account A after transaction 1 t1 A.read() – has read its value • The inconsistent analysis problem t2 – A.read() • Hence, transaction 1 is left t3 – A.write() Time Transaction 1 Transaction 2 t4 – commit() holding an incorrect value for t1 A.read() – account A In t2 case, transaction 2A.read() account A after transac- this – updates tion3 1 has read its value. Hence, transaction 1 is left holding University Engineering Department 10 t – A.write() Cambridge © 2012 Elena Punskaya an tincorrect value for account A. 4 – commit() 10
  • 11. Managing Concurrency • The problems discussed can be managed by applying a Pessimistic or Optimistic concurrency control • Pessimistic - When a transaction wishes to access an account it first secures a lock on that account, when it has finished it releases the lock. If a lock is already taken, the transaction must wait until it is released. - Locking could be on the whole table or a single row and could be declared at different levels of exclusivity (e.g. no one else can access data or some access is allowed) - Could cause deadlocks, e.g. Tx1 and Tx2 require two resources R1 and R2 to proceed: ‣ T1 holds R1 and is waiting for R2 ‣ T2 holds R2 and is waiting for R1 - Useful when there is a lot of data that is often updated by many users • Optimistic - Allows uncontrolled access to accounts, and then simply abort any transactions which might have suffered a conflict - Implemented by creating a new copy of the data that maybe be updated and when the update is completed checks if the master copy hasn’t changed in meantime ‣ if changed – aborted ‣ if not – complete - Useful when most operations are reading data and changes occur rarely © 2012 Elena Punskaya 11 Cambridge University Engineering Department 11
  • 12. Relational Databases • By late 1960s, the “Software Crisis” was already declared and data storage wasn’t doing much better Computer calculations cost hundreds of dollars a minute, so great human effort was spent to make programs as efficient as possible before they were run. Early databases used either a rigid hierarchical structure or a complex navigational plan of pointers to the physical locations of the data on magnetic tapes. Teams of programmers were needed to express queries to extract meaningful information. While such databases could be efficient in handling the specific data and queries they were designed for, they were absolutely inflexible. New types of Edgar 'Ted' Codd, 1923-2003 queries required complex reprogramming, and adding new types of data image © IBM forced a total redesign of the database itself. IBM Research News, www.research.ibm.com/resources/news/20030423_edgarpassaway.shtml • In 1970, Edgar Codd, an English mathematician working for IBM, published a paper “"A Relational Model of Data for Large Shared Data Banks", it started with the following words: “Future users of large data banks must be protected from having to know how the data is organised in the machine...” © 2012 Elena Punskaya 12 Cambridge University Engineering Department 12
  • 13. Relational Databases • Codd suggested to move away from hierarchical or navigational structure of early databases to simple tables with rows and columns • Based on Relational Algebra, this approach allowed to greatly simplify database queries (ability to access and analyse data) • Many relational database management systems (RDBMS) are accessed using SQL (Structured Query Language) - SQL is defined by industry standards and has been developed over many revisions from SQL-87 to SQL 2008 • There are many free and commercial databases available: - Free: PostgreSQL, MySQL, SQLite... - Commercial: Oracle, DB2, SQL Server... • SQLite is the easiest database to start using as it requires no setup, and is available on the teaching system. - Type: sqlite3 <db-name> - Then enter SQL commands followed by a ‘;’. The database will be stored in a file called <db-name> which will be created if it does not already exist. © 2012 Elena Punskaya 13 Cambridge University Engineering Department 13
  • 14. ble. A relation contains a set of tuples (rows). 2 2 relation The Relational IIA:Engineering andEngineering and Design Engineering Part IIA: 3F6 - Software 3F6 - Software Design Engineering Part Model course scheme Title The relationalLeader model Lectures The relational model • The relational model is related to set theory. A relation is a t1 RISC Processors Sanchez 8 Therelation contains a settheory. A relation is a ta- is 34ta- table. A relationalThe QAM related to is related tuples (rows). t2 model is for model set of to set theory. A relation a relational modems Sanchez ble. A t ble. A relation contains a set of tuples (rows). relation contains a set of to Mainframes Belford 3 Introduction tuples (rows). course relation 20 relation course t4 scheme refresh LCDs Fast Title Richard Leader 1 Lectures scheme Title Leader Lectures t5 t1 t5 [T itle] Processors RISC t [Leader] t5[Lectures] Sanchez 8 t1 RISC Processors Sanchez 5 8 t2 t6 QAM for modems for[T itle, Leader] Sanchez QAM t6 modems 34 t 2 Sanchez 34 t3 Introduction to Mainframes Belford 20 t3 The Introduction theMainframes Belfordby the scheme, which is a to meaning of Fastdata is LCDs 20 t4 refresh described Richard 1 t4 Fast refresh LCDs t5 t5[T itle] Richard t [Leader] t [Lectures] 1 set of column names. Column names are known as attributes. 5 5 t5 t5[Ttitle] t6[T itle, Leader] t5 [Lectures] t5[Leader] 6 t6 course scheme =6[T itle, Leader] Lectures) t (Title, Leader, The meaning of the data is described by the scheme, which is a - The meaning of theof thecolumn names. by the scheme, which is as attributes. The meaning set of is described Column names are known a set of column names. data data is described by the scheme, which is a There is no ordering or grouping of attributes. The table is a Column names are names. as attributes. are known as attributes. set of column known Column names relationcourse this scheme. A relationLectures) scheme R is written over scheme = (Title, Leader, r over a - There courseas r(R). = (Title, Leader,attributes. TheD. So: a relation over this scheme. scheme Each column Lectures) is no ordering or grouping ofhas a domain, table is A relation r over a scheme R is writtenor grouping of attributes.has a table is a D. There is no ordering as r(R). Each column The domain, ThereDTitle ordering overgrouping of attributes. r The a scheme a is written is no = strings, this scheme. A relation over table is R relation or DLectures = Z+ relation over thisr(R). Each column hasover a scheme R is written as scheme. A relation r a domain, D. So: as r(R). Each elementhas a domain, D.tiSo: D1 × D2 × · · · × Dn So each column ti[j] ∈ Dj and ∈ + DTitle = strings, DLectures = Z DTitle For example, the schemeZ+ y) with Dx = Dy = R, the domain = strings, DLectures = So each element ti[j] (x,Dj and ti ∈ D1 × D2 × · · · × Dn ∈ So each the tuples D and ti ∈ D1 × D2 dimensional vectors. 2012 Elena Punskaya ×D © of element t [j]is∈the domain of all two× · · · Cambridge University Engineering Department 14 For example,j the scheme (x, y) with Dx = Dn = R, the domain i y 14
  • 15. DTitle = strings, DLectures = Z+ So each element tThe Dj and ti ∈ D1 × D2 × · · · × Dn i [j] ∈ Relational Model For example, the scheme (x, y) with Dx = Dy = R, the domain of the tuples is the domain of all two dimensional vectors. SQL: domain↓ Constraint↓ CREATE TABLE course (Title text, Leader text, Lectures int, CHECK(Lectures > 0)) INSERT INTO course VALUES ("RISC Processors", "Sanchez", 10) UPDATE course SET Lectures=8 WHERE Leader="Sanchez" DELETE FROM course WHERE Lectures=8 AND Leader="Sanchez" DROP TABLE course SQL allows domain of tuples: Dt ⊆ D1 × D2 × · · · × Dn. • Built on principles of Relational Algebra - Projection, Selection, Union, Intersection, Subtraction, Join © 2012 Elena Punskaya 15 Cambridge University Engineering Department 15
  • 16. Relational algebra: Projection Π Π Relational algebra: Projection The projection operator, Π, removes columns by listing the ones to be retained. The operator is written as: Πcolumn1,column2,. . . (relation). Leader Lectures An example of applying projection is: Sanchez 8 ΠLeader,Lectures (course) = Sanchez 34 Belford 20 Richard 1 Consider a relation, r(R) where R=(x,y,z) and x, y, z ∈ R. Each row represents a 3D vector. The relation Πx,y(r) contains the projection of the vectors onto the x, y plane. In SQL the SELECT statement performs all of the primitive relational algebra funcionality. The selection above is rendered © 2012 Elena Punskaya Cambridge University Engineering Department 16 as: 16
  • 17. Consider a relation, r(R) where R=(x,y,z) and x, y, z ∈ R. Each row represents a 3D vector. The relation Πx,y(r) contains the projectionRelational algebra: Projection Π of the vectors onto the x, y plane. In SQL the SELECT statement performs all of the primitive relational algebra funcionality. The selection above is rendered as: SELECT Leader,Lectures FROM course The general form being: SELECT Col1[, Col2, [· · · ]] FROM table Note that SQL is not entirely relational and the expression: SELECT Leader FROM course has duplicate rows. To remove duplicates, use: SELECT DISTINCT Leader FROM course The there is a shorthand for the identity projection: SELECT * FROM table © 2012 Elena Punskaya 17 Cambridge University Engineering Department 17
  • 18. 4 Engineering Part IIA: 3F6 - Software Engineering and Design Relational algebra: Selection σ σ Relational algebra: Selection The selection operator accepts a predicate, Θ and a relation. Rows matching the predicate are retained: Title Leader Lectures σLeader=”Sanchez”(course) = RISC Processors Sanchez 8 QAM for modems Sanchez 34 The general form of the resulting relation can be written in set builder notation σΘ(r) = {t|t ∈ r, Θ(t)} That is, the result consists of all tuples t such that each tuple is both in the relation r and for which the predicate applied to the tuple, i.e. Θ(t), is true. In SQL, selection is also performed with the select statement with the predicate being specified by the WHERE clause: SELECT * FROM course WHERE Leader=”Sanchez” Predicates can contain expressions involving any or all of the © 2012 Elena Punskaya rows. SQL has more or less the same set of numeric operators as Cambridge University Engineering Department 18 C and also AND, OR, NOT, BETWEEN: 18
  • 19. the predicate being specified by the WHERE clause: Relational algebra: Selection σ SELECT * FROM course WHERE Leader=”Sanchez” Predicates can contain expressions involving any or all of the rows. SQL has more or less the same set of numeric operators as C and also AND, OR, NOT, BETWEEN: SELECT * FROM course WHERE Lectures BETWEEN 2 AND 10 and IN: WHERE Leader IN ("Belford", "Richard") Projection and selection can be readily composed, so in general: ΠS (σΘ(r)) translates to SELECT S FROM r WHERE Θ © 2012 Elena Punskaya 19 Cambridge University Engineering Department 19
  • 20. In SQL, union intersection and subtraction behave much more Union, intersection, subtraction like set theory than relational algebra. For these operations it is the order of the attributes not the names of the attributes which have significance. • In SQL, union intersection and subtraction behave much more like set theory than relational algebra. For theseofoperations If there Set union, ∪ aggregates the rows two sets together. it are two relations, r(R) and s(R), then the union, r ∪ s is the order of the attributes not the names of the attributes can be computed: which have significance. SELECT * FROM r UNION SELECT * FROM s • Set union, ∪ aggregates the Likewise, intersection can be computed using: rows of two sets together. If there are two relations, SELECT * FROM r INTERSECT SELECT * FROM s r(R) and s(R), then the Set differencing is either MINUS or EXCEPT depending on the union, r ∪ s can be database. computed: r s r−s SELECT * FROM r EXCEPT SELECT * FROM s Since ordering, not naming matters, with the schema R=(a,b), S=(b,a) and the tables r(R), s(S): r s a b b a a b r-s= 1 2 3 5 3 4 3 4 1 2 © 2012 Elena Punskaya 20 Cambridge University Engineering Department 20
  • 21. information. students Join / cartesian product × labs Student Supervisor Lab Demonstrator • The cartesian product is the Cook primitive operator which Gibson Sanchez 3F27 only combines two tables3F89 different schemes. Joining two Murphy Belford with Libby relations, a Goldstein Part IIA: 3F6 - Software Engineering and Design 6 Libby × b generates a Margo relation with every row in in Engineering 4F185 new a paired with every row in b.Ray Cook Sanchez 3F34 Joining is very useful for extracting related information. × Join / cartesian product The table students × labs is on the next page. Note that the 7 • Joining students and labs: Database Systems II The cartesian get augmented with the table name tocom- ambiguity. attributes product is the only primitive operator which avoid bines two tables with different schemes. Joining two relations, The table name may be omitted if it is not ambiguous. SQL: students × labs a × b generates a new relation with every row in in a paired students.Student students.Supervisor labs.Lab labs.Demonstrator with every row in b. Joining is very useful for extracting related SELECT * FROM students, labs Gibson Sanchez 3F27 Cook information. Gibson Sanchez 3F89 Libby Gibson Sanchez 4F185 Margo Find all students of “Sanchez” who are demonstrating: students labs Gibson Sanchez 3F34 Ray Murphy Belford 3F27 Cook Student Supervisor Lab Demonstrator Murphy Belford 3F89 Libby Π (σ Gibson Sanchez 3F27 Cook Student Student=Demonstrator∧Supervisor=“Sanchez” (students × labs)) Murphy Belford 4F185 Margo Murphy Belford 3F89 Libby Murphy Belford 3F34 Ray Libby Goldstein 3F27 Cook Libby Goldstein 4F185 Margo Libby Goldstein 3F89 Libby SELECT Student FROM students, labs Cook Sanchez 3F34 Ray Libby Goldstein 4F185 Margo WHERE Student=Demonstrator AND Libby Goldstein 3F34 Ray Cook Sanchez 3F27 Cook The table students × labs is on the next page. Note that the Supervisor="Sanchez" Cook Sanchez 3F89 Libby attributes get augmented with the table name to avoid ambiguity. Cook Sanchez 4F185 Margo The table name may be omitted if it is not ambiguous. SQL: Cook Sanchez 3F34 Ray SELECTresult is students, Selection The * FROM Cook . labs is often composed with joining, © 2012 Elena Punskaya so it is given the non primitive operator, the theta join: Cambridge University Engineering Department 21 Find all students of “Sanchez” who are demonstrating: 21
  • 22. students labs Database Systems II 7 Student Supervisor Lab Demonstrator Join / cartesian product × Gibson Sanchez 3F27 Cook Murphy Belford 3F89 Libby students × labs students.Student students.Supervisor labs.Lab labs.Demonstrator Libby Goldstein 4F185 Margo Gibson Sanchez 3F27 Cook Cook Sanchez 3F34 Ray Gibson Sanchez 3F89 Libby Gibson Sanchez 4F185 Margo Gibson Sanchez 3F34 Ray The table students × labs is on the next page. Note that the Murphy Belford 3F27 Cook attributes get augmented with the table name to avoid ambiguity. Murphy Belford 3F89 Libby Murphy Belford 4F185 Margo The table name may be omitted if it is not ambiguous. SQL: Murphy Belford 3F34 Ray Libby Goldstein 3F27 Cook SELECT * FROM students, labs Libby Goldstein 3F89 Libby Libby Goldstein 4F185 Margo Libby Goldstein 3F34 Ray Find all students of “Sanchez” who are demonstrating: Cook Sanchez 3F27 Cook Cook Sanchez 3F89 Libby ΠStudent(σStudent=Demonstrator∧Supervisor=“Sanchez”(students × labs)) Cook Sanchez 4F185 Margo Cook Sanchez 3F34 Ray SELECT Student FROM students, labs WHERE Student=Demonstrator AND Supervisor="Sanchez" The result is Cook . Selection is often composed with joining, so it is given the non primitive operator, the theta join: a Θ b ≡ σΘ(a × b). © 2012 Elena Punskaya 22 Cambridge University Engineering Department 22
  • 23. Database Systems II • Perform a join. 9 • Natural Join Perform selection so that attributes with the same name must Natural Join be equal. • Perform projection to remove duplicated attributes. • A ‘natural join’ is a join followed by some selection and A ‘natural join’ is a join followed by some selection and projec- attribute ambiguities. Note that there are no tion: projection: • Perform a join. join - Perform a If attributes with the same name are semantically the same, then the natural join is usually the correct kind of join to use. In ad- - Perform selection attributes with the same name must name must be equal • Perform selection so that so that attributes with the same dition to the ‘labs’ table, we also have a table listing lab sessions: - Perform projection to remove duplicated attributes be equal. sessions •• If attributes remove duplicated attributes. Perform projection to with the same name are semantically the same, Lab Title Note that there are no attribute ambiguities.usually the correct kind of join to use. then the natural join is 3F27 Mainframe filesystems In addition to the ‘labs’ 3F27 table, we also have a table listing lab Filesystem security If attributes with the same name are semantically the same, then sessions: 3F89 Large vehicle control the natural join is usually the correct kind of join to use. In ad- finance systems 4F185 Networks for dition to the ‘labs’ table, we also have a table 3F34 lab sessions: listing Magnetic storage forensics sessions The natural join matches up the shared attributes Lab Title Demonstrator Lab Title 3F27 Mainframe filesystems Cook 3F27 Filesystem security 3F27 Filesystem security Cook 3F27 Mainframe filesystems 3F89 Large vehicle control sessions labs = Libby 3F89 Large vehicle control 4F185 Networks for finance systems Margo 4F185 Networks for finance systems 3F34 Magnetic storage forensics Ray 3F34 Magnetic storage forensics The natural join matches up the shared attributes Demonstrator Lab Title © 2012 Elena Punskaya 23 Cambridge University Engineering Department Cook 3F27 Filesystem security 23
  • 24. 10 Engineering Part IIA: 3F6 - Software Engineering and Design Natural Join More formally: There are two relations r(R) and s(S). The set of shared attributes is A: A = {A1, · · · , An} = R ∩ S where n = |A|. The set of all attributes with no duplicates is: R ∪ S. The natural join is therefore: r s ≡ ΠR ∪ Sσr.A1=s.A1∧···∧r.An=s.An (r × s) In SQL, natural joins are performed with NATURAL JOIN: SELECT * FROM sessions NATURAL JOIN labs In practice, you will usually design databases by considering the type of data, how it is stored in tables and how to extract the relevant information. Relation algebra will not crop up much in day-to-day design, but it is essential for understanding how the various operations in a relational database work. © 2012 Elena Punskaya 24 Cambridge University Engineering Department 24
  • 25. C Processors, Sanchez, 10) course SET Lectures=8 WHERE Leader=Sanchez FROM courCREATE Example • Let’s consider an example of movies database for LOVEFiLM.com • It is likely to have movie Title Year Actor Pulp Fiction 1994 John Travolta Hackers 1995 Angelina Jolie The Matrix 1999 Keanu Reeves The Devil’s Advocate 1997 Keanu Reeves • SQL: CREATE TABLE movie (Title text, Year int, Actor text) INSERT INTO movie VALUES (Pulp Fiction, 1994, “John Travolta”) INSERT INTO movie VALUES (Hackers, 1995, “Angelina Jolie”) etc. © 2012 Elena Punskaya 25 Cambridge University Engineering Department 25
  • 26. Example • Projection • Distinct Actor SELECT DISTINCT Actor Actor SELECT Actor FROM movie John Travolta FROM movie John Travolta Angelina Jolie Angelina Jolie Keanu Reeves Keanu Reeves Keanu Reeves • Selection SELECT * FROM movie WHERE Actor=”Keanu movie Reeves” Title Year Actor The Matrix 1999 Keanu Reeves The Devil’s Advocate 1997 Keanu Reeves • Projection and Selection composed Title SELECT Title FROM movie WHERE The Matrix Actor=”Keanu Reeves” The Devil’s Advocate © 2012 Elena Punskaya 26 Cambridge University Engineering Department 26
  • 27. Example • Selection may use AND, OR, NOT, BETWEEN, IN and etc. SELECT * FROM movie WHERE Year BETWEEN 1995 AND 1997 (BETWEEN 1995 AND 1997 Inclusive) movie Title Year Actor Hackers 1995 Angelina Jolie The Devil’s Advocate 1997 Keanu Reeves © 2012 Elena Punskaya 27 Cambridge University Engineering Department 27
  • 28. Example • Let us take now a simplified table movie Title Actor Pulp Fiction John Travolta Hackers Angelina Jolie • Imagine we also have some info regarding the number of won Oscars people Actor Oscars John Travolta 0 Angelina Jolie 1 © 2012 Elena Punskaya 28 Cambridge University Engineering Department 28
  • 29. Example • Cartesian product SELECT Title, Actor, Actor, Oscars FROM movie, people movie x people movie.Title movie.Actor people.Actor people.Oscars Pulp Fiction John Travolta John Travolta 0 Pulp Fiction John Travolta Angelina Jolie 1 Hackers Angelina Jolie John Travolta 0 Hackers Angelina Jolie Angelina Jolie 1 - the only one that can create new record (if one doesn’t count renaming) - BUT it creates too many records! • Natural join would give information on whether there are Oscar winning actors in the movie SELECT * FROM movie, people WHERE movie.Actor = people.Actor or SELECT * FROM movie NATURAL JOIN people movie Title Actor Oscars Pulp Fiction John 0 Hackers Travolta Angelina 1 Jolie © 2012 Elena Punskaya 29 Cambridge University Engineering Department 29
  • 30. Example • Let us consider two tables with Oscar and BAFTA nominations Oscar BAFTA John Travolta Pulp Fiction John Travolta Pulp Fiction Angelina Jolie Girl, Interrupted Angelina Jolie Changeling Angelina Jolie Changeling Jesse Eisenberg The Social Network • Union (SELECT * FROM Oscar) UNION (SELECT * FROM BAFTA) Oscar ∪ BAFTA John Travolta Pulp Fiction Angelina Jolie Girl, Interrupted Angelina Jolie Changeling © 2012 Elena Punskaya 30 Cambridge University Engineering Department 30
  • 31. Example • Intersection (SELECT * FROM Oscar) INTERSECT (SELECT * FROM BAFTA) Oscar ∩ BAFTA John Travolta Pulp Fiction Angelina Jolie Changeling • Difference (SELECT * FROM Oscar) EXCEPT (SELECT * FROM BAFTA) Oscar – BAFTA Angelina Jolie Girl, Interrupted Jesse Eisenberg The Social Network NOTE: some operators are treated differently in different databases, some may not be present © 2012 Elena Punskaya 31 Cambridge University Engineering Department 31
  • 32. Keys and Uniqueness • Rows in a relation can be uniquely identified by a key, which can consist of one or more columns - A key must be able to uniquely identify all possible rows that relation could have in the domain of tuples, not just the rows that currently exist. • Superkey - Any collection of columns which can uniquely identify a row. There may be more than one valid superkey. • Candidate key - A minimal superkey, i.e. a superkey with the minimal number of columns. I.e. there is no subset of the columns in a candidate key which will also form a candidate key. There may be more than one candidate key. • Primary key - A superkey or candidate key which has been selected to have a special status. A table can have at most one primary key. Should be small and constant. • Foreign key -If two relations r and s share a key k, then r[k] is a foreign key if k is the primary key of s. Therefore, the foreign key k does not necessarily uniquely identify the rows of r © 2012 Elena Punskaya 32 Cambridge University Engineering Department 32
  • 33. Keys Name Address DoB Gender Relation Email ship John Smith 34 West rd, 2 Jan 1981 Male Single john@smith. Cambridge com Thomas Flat 303, 11 March Male Single neo@matrix. Anderson 1962 org ... Mia 20 Sunset 10 October Female Married m.wallace@h Wallace rd, Carlsbad 1994 otmail.com © 2012 Elena Punskaya 33 Cambridge University Engineering Department 33
  • 34. Keys id Name Address DoB Gender Relation Email ship 1 John Smith 34 West rd, 2 Jan 1981 Male Single john@smith. Cambridge com 2 Thomas Flat 303, 101 11 March Male Single neo@matrix. Anderson Red st, Zion 1962 org ... ... ... ... ... ... ... 10001 Mia Wallace 20 Sunset rd, 10 October Female Married m.wallace@ Carlsbad 1994 hotmail.com © 2012 Elena Punskaya 34 Cambridge University Engineering Department 34
  • 35. Normalization Normalization Normalization If a database has has duplicated information thenisitsubject it up-up- If a database duplicated information then it is subject it • Ifdate anomalies, and the information can become inconsistent. a database has duplicated information then it is subject it date anomalies, and the information can become inconsistent. update anomalies, and the information can becomelec- Imagine adding contact details to the the ‘course’ table allow Imagine adding contact detailscontact detailsto to allow lec- inconsistent. Imagine adding to ‘course’ table to the ‘course’ turers toallow contacted to be be contacted easily: table turers to belecturers easily: contacted easily: to Title Title Leader Lectures Telephone Leader Lectures Telephone RISC Processors RISC Processors Sanchez Sanchez 8 8 6596065960 QAM for modems QAM for modems Sanchez Sanchez 34 34 65960 65960 Introduction to Mainframes Belford Introduction to Mainframes Belford 20 20 65536 65536 LowLow latency LCD screens Richard latency LCD screens Richard 1 1 3276832768 • If the table is updated, for for instance the the the SQL command: If the If table is updated, instance with with SQL command: the table is updated, for instance with SQL command: UPDATE course SET SET Leader=Libby WHERE Title=RISC Processors UPDATE course Leader=Libby WHERE Title=RISC Processors • Then Then contact details will become incorrect. The process of the the contact details Then the contact details will will become incorrect. The process of become incorrect. The process of normalizing a database involves splitting up large tables with normalizing related information splitting up large tables with normalizing a database involves into a large tables with only weakly a database involves splitting upnumber of smaller tables. Normalized data is theninto a numbersmaller tables. onlyonly weakly related information accessed by joining tables. weakly related information into a number of of smaller tables together and performing accessed joining tablesresults. and Normalized datadata is then selectionsjoining tables together and Normalized is then accessed by by on the together © 2012 Elena Punskaya performing selections on the the results. performing selections on results. 35 Cambridge University Engineering Department 35
  • 36. date anomalies, and the information can become inconsistent. Imagine adding contact details to the ‘course’ table to allow lec- Normalization turers to be contacted easily: Title Leader Lectures Telephone RISC Processors Sanchez 8 65960 QAM for modems Sanchez 34 65960 Introduction to Mainframes Belford 20 65536 Low latency LCD screens Richard 1 32768 • The database above is not normalised the SQL command: If the table is updated, for instance with because there is duplicated data. More intuitively, the telephone number has UPDATE course SET Leader=Libby WHERE Title=RISC Processors merely been inserted as a convenience and has nothing directly to do with courses Then the contact details will become incorrect. The process of • Much like type safety and object oriented design, database normalizing a database involves splitting up large tables with normalization allows databases to be designed such that certain errors (for instance data inconsistency) are tables.likely. only weakly related information into a number of smaller less Normalized data is then accessed by joining tables together and • Normalization is the process of designing the database performing selections on the results. comply with normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF and DKNF). The database above is not normalized because there is duplicated data. More intuitively, the telephone number has merely been © 2012 Elena Punskaya 36 inserted as a convenience and has nothing directly to do with Cambridge University Engineering Department 36
  • 37. First Normal Form (1NF) First Normal Form 1. Make sure that your database really obeys the relational model: • Make sure that your database really obeys the relational (a) No ordering over rows model: (b) No ordering over columns - No ordering over rows - No ordering over columns duplicates (c) No - No duplicates 2. Each row/column intersection contains exactly one datum. • Each row/column intersection contains exactly one datum Consider trying to extend the earlier design to allow for multiple • Consider storing numbers: phone multiple phone numbers for the Leader Title Lectures ID Numbers ··· 8 456 65950, 60294, 70231 ··· 8 456 65950, 60294, 70231 BAD ··· 34 20 65536 ··· 1 82 32768, 16384 Title Lectures ID Phone 1 Phone 2 Phone 3 ··· 8 456 65960 60294 70231 ··· 34 456 65960 60294 70231 BAD ··· 20 9 65536 ··· 1 82 32768 16384 Note the use of IDs to avoid duplicates as names make bad keys: © 2012 Elena Punskaya 37 Cambridge University Engineering Department employees The list of phone numbers for the 37
  • 38. ··· 1 82 456 65960 First Normal Form 60294 70231 9 65536 Note the use of IDs • Employees table is used for details of 32768 Leaders 82 course 16384 employees • Adding a Phone number to store Th Name ID Phone employee’s contact details of IDs to avoid duplicates as names make bad keys:65960 lea Sanchez 456 • To support multiple phone numbers be need to duplicate Name/ID data Belford 9 65536 ΠP The list of phone numbers Richard for 82 the 32768 one Sanchez 456 60294 leader of a particular course can now 60 • The list of phone numbers for the leader of a particular course can now be extracted using relational algebra: algebra: be extracted using relational 36 ΠPhone(σTitle=“RISC Processors”(course employees)) 68 or in SQL: 94 SELECT phone FROM course NATURAL JOIN employees WHERE Title=”RISC Processors” © 2012 Elena Punskaya 38 Cambridge University Engineering Department 38
  • 39. Second Normal Form (2NF) A tableSecond normal form if it satisfies: is in second Normal Form (2NF) 1. It is in first normal form (1NF). • A table is in second normal form if it satisfies: 2. All non-prime attributes depend on the whole candidate key. - It is in first normal form (1NF). - All non-prime attributes depend on the whole candidate key. From the previous example, the complete relation, employees(E), • From the previous example, the complete relation, employees (E), is: is: Lack of normalization allows employees buggy programs to create incon- sistencies: Name ID Phone Sanchez 456 65960 Inserting the record (“Belford”, Belford 9 65536 10, 131072) leads to a mismatch between the name and id. Richard 82 32768 Sanchez 456 60294 An employee name change re- quires updates across multiple Sanchez 456 70231 rows, which may be done incor- Richard 82 16384 rectly. It also requires more lock- ing. • The candidate key is Cis= (ID, Phone). The non prime attribute The candidate key C = (ID, Phone). The non prime attribute is therefore E − CE − C =(Name). The employees’ names do do not is therefore =(Name). The employees’ names not depend on theon the phone number, onlythe ID. Therefore the table table depend phone number, only the ID. Therefore the is not in 2NF. is not in 2NF. A 2NF design is: © 2012 Elena Punskaya contacts Cambridge University Engineering Department 39 ID Phone 39
  • 40. The candidate key is C = (ID, Phone). The non prime attri is therefore E − C Form (2NF) employees’ names do Second Normal =(Name). The depend on the phone number, only the ID. Therefore the • A 2NF design is: is not in 2NF. A 2NF design is: contacts ID Phone employee names 456 65960 Name ID 9 65536 Sanchez 456 82 32768 Belford 9 456 60294 Richard 82 456 70231 82 16384 • ID is the Primary Key in employee_names • Phone is the Primary Key in contacts • ID is a Foreign Key in contacts connecting employee names and their phone numbers © 2012 Elena Punskaya 40 Cambridge University Engineering Department 40
  • 41. upon the key, the whole key and nothing but the key.” Third Normal Form (3NF) More formally a table over R is in 3NF iff: • “I swear by Codd that each non-prime attribute shall depend upon It is in 2NF the whole key and nothing but the key.” 1. the key, (and therefore 1NF) • More Every non-prime attribute R is in 3NF if and only if: 2. formally a table over is directly dependent on every - It is in 2NF (and key of R. 1NF) candidate therefore - Every non-prime attribute is directly dependent on every candidate key of R Practical Date Demonstrator Contact Pay rate Acoustic coupling Mon 1 Feb Dade 45102 10 Acoustic coupling Sat 7 Feb Dade 45102 15 Self-propagating code Tue 2 Mar Joey 67822 10 Self-propagating code Sun 9 Mar Kate 62341 15 The candidate key is: • The candidate key is (Practical, Date), however, the table not fully normalized because there is repetition of data (the (Practical,Date) contact numbers and the pay rates). The table is not in 3NF because: Table is not fully normalized because there is repetition of data - Pay rate depends on the key, but not the whole key. Specifically, it only depends on the date. contact numbers and the pay rates). The table is not in (the - Contact depends upon the whole key, but the dependence is transitive, not direct, that 3NF because: is: Contact → Demonstrator → (Practical, Date) • Pay rate depends on the key, but not the whole key. Specif- Elena Punskaya 41 © 2012 Cambridge University Engineering Department ically, it only depends on the date. 41
  • 42. to abort, rather than Engineering Part IIA: 3F6 -data. Engineering and Design 16 make inconsistent Software In addition to normal forms, which can be represented in rela- SQL attributes (helpful for 1NF) Constraints SQL Constraints NOT NULL prevents missing tables to be constructed with addi- tional algebra, SQL allows • In addition to normal forms, which can be represented in tional constraints which(Name the database more robust. Unlike CREATE TABLE course make string NOT NULL, ...) relational algebra, SQL allows tables to be constructed with normalization,normal forms, whichThis theimpossible to ID is In primary key constraints do not makewillrepresented more robust. Aadditional to can be specified. make be database in construct addition constraints which can it ensure that rela- errors.algebra, SQL data that break unique. cause transactions tional and invalid allows tables alsobe constructedcauses Providing unique, However, constraintsare to constraints with addi- therefore all rows do make errors transactions to abort, rather than make inconsistent data to abort, rather than make inconsistent data. robust. Unlike tional constraints which make the database more •CREATE Types TABLE people of constraints (Name string, ID int PRIMARY KEY) normalization, constraints do not make it impossible to construct •NOT NULL – ensures that the value unique: column can not be Known candidate constraints do attributes (helpfultransactions errors.NULL prevents can be marked as of this NOT However, keys missing make errors cause for 1NF) omitted CREATE TABLE than make (Name UNIQUE(a, b), CREATE TABLE r course inconsistent data. NOT NULL, ...) to abort, rather (a, b, c, d, string • UNIQUE – ensures that the value of this column is unique UNIQUE(a, c, d)) •A primary key can missing– designates will column that ID is NOT NULL prevents be specified. This the for 1NF)as a key PRIMARY/FOREIGN KEY attributes (helpful ensure unique, and therefore all(Name are also unique. KEY which A particularly important constraint is FOREIGN CREATE TABLE course rows string NOT NULL, ...) ensures that an attribute is a primary key in another table: CREATE TABLEcan be specified. This will ensure that ID is KEY) A primary key people (Name string, ID int PRIMARY CREATE and therefore all rows are also unique. unique, candidate keys can be marked asPRIMARY KEY, ID int, TABLE course (Title string Known unique: Lectures int, CREATE TABLEFOREIGN KEY c, string, ID intemployees) CREATE TABLE people (Name d, UNIQUE(a, b), r (a, b, (ID) REFERENCES PRIMARY KEY) Known candidate keys can be c, d)) as unique: UNIQUE(a, marked © 2012 Elena Punskaya The ID of the course leader is now constrained to be a valid em-Department Cambridge University Engineering 42 A particularly important c, d, UNIQUE(a, b), CREATE TABLE r (a, b, constraint is FOREIGN KEY which 42
  • 43. Entity-Relationship (E/R) Modelling • As in Object Oriented approach, designing a database schema requires finding conceptual abstractions (that represent the data) and defining relationships between them entity set Employee Leads Course No. lectures relationship set Name Title attribute • Notation suggested by Peter Chen in “The Entity Relationship Model: Toward a Unified View of Data”, 1976 - UML can also be used • Relationships have cardinality - 1 to 1 - 1 to Many - Many to Many etc. © 2012 Elena Punskaya 43 Cambridge University Engineering Department 43
  • 44. Entity-Relationship (E/R) Modelling Name Number Employee ISA Does Mechanic Salesman Description Price Date Number RepairJob Date Buys Sells Value License Parts Cost Value Comission Work Repairs Car seller buyer Year Client ID Manufacturer Model Name Phone Pável Calado, http://www.texample.net/tikz/examples/entity-relationship-diagram/ Address © 2012 Elena Punskaya 44 Cambridge University Engineering Department 44
  • 45. Objects and Databases • Relational Database Management Systems were mature stable products by 1980s • Object-Oriented approach reached wide adoption in 1990s • Any large software system still needs to persist data, hence store it in databases • Question: how we map Objects in a software system at runtime to Data stored in databases? • Originally, two options emerged: - Object to Relationship Mapping – a software layer that can provide database persistent to OO system (e.g. Hibernate, TopLink) – commonly used - Object Databases – a nice idea that failed to reach mainstream adoption • Most recently, further developments included non- relationship approaches (NoSQL) to working with large distributed datasets, e.g. Hadoop (hadoop.apache.org) - Map/Reduce: distributed processing of large data sets on compute clusters - Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying - Cassandra: A scalable multi-master database with no single points of failure © 2012 Elena Punskaya 45 Cambridge University Engineering Department 45
  • 46. No(t only)SQL at guardian.co.uk • The Guardian online, 1999 Web server Web server Web server Guardian journalism online: 1999 App bring I server you NEWS!!! App server App server Memcached (20Gb) Oracle CMS Data feeds © 2012 Elena Punskaya Matthew Wall, Simon Willison, www.slideshare.net/matwall/nosql-presentation 46 Cambridge University Engineering Department 46
  • 47. No(t only)SQL at guardian.co.uk • The Guardian online, 2010 Core Out In Web servers Guardian journalism online: 2010 App Solr Proxy App server App Solr Memcached (20Gb) App Solr App CMS Data feeds Solr Solr App M/Q Solr App rdbms CouchDB? external hosting Cloud, EC2 app engine etc © 2012 Elena Punskaya Matthew Wall, Simon Willison, www.slideshare.net/matwall/nosql-presentation 47 Cambridge University Engineering Department 47
  • 48. Security and SQL Injection • Consider the following example // allowing a user to search by product name string name; cout Enter product name: endl; getline(cin, name); string query = SELECT * FROM products WHERE name= + name + ; do_sql(query); • What happens if the user enters: ; DROP TABLE products; -- • The query becomes // going to delete the table Products SELECT * FROM products WHERE name= ; DROP TABLE products; -- • SQL Injection could be used to steal data from a database news.bbc.co.uk/1/hi/8206305.stm © 2012 Elena Punskaya 48 Cambridge University Engineering Department 48