This document discusses why databases can be difficult. It begins by noting that databases are selfish, want entire systems to themselves, are messy and suck up resources. It then compares databases to toddlers. It identifies problems like most PHP developers lacking SQL training. It provides quizzes and discusses concepts like joins, indexes, foreign keys, transactions and query plans. It offers programming advice like checking return codes and scrubbing data. Finally, it recommends books and invites questions.
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
PNWPHP -- What are Databases so &#%-ing Difficult
1. Why Are Databases So &# %-ingWhy Are Databases So &# %-ing DifficultDifficult?!?!
Dave Stokes
MySQL Community Manager
David.Stokes@Oracle.com @Stoker
Slideshare.net/DavidMStokes
2. 2
Safe Harbor Agreement
The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decision. The development, release, and timing of any
features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
8. 8
So back to the subject at hand
Why AreWhy Are
DatabasesDatabases
So &# %-ingSo &# %-ing
DifficultDifficult?!?!
9. 9
Databases are:
● Selfish
● Want entire system to self
● Messy
● Suck up memory, disk space,
bandwidth, sanity
● Growing all the time
● Needs updates
● Suck up a good part of your life
12. 12
Quiz #1
Select City.Name, Country.Name
FROM City
Join Country On (City.CountryCode = Country.Code)
WHERE City.Population > 2000000
ORDER By Country.Name, City.Name
● Can you describe the desired output of this query?
● Is the above SQL code good?
● Will it perform well for 10 records? 100000 records? 10^10 records?
13. 13
Quiz #1 – First Answer
Select City.Name, Country.Name
FROM City
Join Country On (City.CountryCode = Country.Code)
WHERE City.Population > 2000000
ORDER By Country.Name, City.Name
Desired Output
● Return the name of the Cities and their
corresponding Countries with Populations over
two million and sort by Country and City names
14. 14
Quiz #1 -Second Answer
Select City.Name, Country.Name
FROM City
Join Country On (City.CountryCode = Country.Code)
WHERE City.Population > 2000000
ORDER By Country.Name, City.Name
Is the above SQL code good?
● Syntax is good.
● But can not tell just by observation of the query
– May not be answering desired question
– May not map to the underlying data effectively
– Unless joining on foreign keys, may be wrong relationship between tables
– What are the units for population? Is is safe to assume here?
15. 15
Quiz #1
Select City.Name, Country.Name
FROM City
Join Country On (City.CountryCode = Country.Code)
WHERE City.Population > 2000000
ORDER By Country.Name, City.Name
Will it perform well for 10 records?
100000 records? 10^10 records?
● No way to tell by observation!
16. 16
Can you tell if PHP code is bad by observation?
<?php
$foo = 8;
if( $foo<10 )
if( $foo>5 )
echo "Greater than 5!";
else
echo "Less than 5!";
else
echo "Greater than 10!";
echo "<br />Another note.";
● http://code.tutsplus.com/tutorials/why-youre-a-
bad-php-programmer--net-18384
It can be
relatively easy
to recognize
BAD PHP
programs.
17. 17
So why is SQL Different
SQL is a
Declarative
Language
18. 18
So why is SQL Different
● In computer science, declarative programming is a
programming paradigm, a style of building the
structure and elements of computer programs, that
expresses the logic of a computation without
describing its control flow. Many languages applying
this style attempt to minimize or eliminate side
effects by describing what the program should
accomplish in terms of the problem domain,
rather than describing how to go about
accomplishing it as a sequence of the
programming language primitives (the how being
left up to the language's implementation).
● https://en.wikipedia.org/wiki/Declarative_programming
20. 20
Mix and/or Match
Can you Mix and Match
Declarative with Object
Orientated/Procedural
Programing Languages??
21. 21
Yes but carefully
$QUERY = “Select City.Name, Country.Name
FROM City
Join Country On (City.CountryCode =
Country.Code)
WHERE City.Population > 2000000
ORDER By Country.Name, City.Name”;
$res = $mysqli->query($QUERY);
$row = $res->fetch_assoc();
22. 22
SQL
● SQL Structured Query Language is a special-purpose
programming language designed for managing data
held in a relational database management system
(RDBMS), or for stream processing in a relational
data stream management system (RDSMS).
● Originally based upon relational algebra and tuple
relational calculus, SQL consists of a data definition
language, data manipulation language, and a data
control language. The scope of SQL includes data
insert, query, update and delete, schema creation
and modification, and data access control.
● https://en.wikipedia.org/wiki/SQL
23. 23
What many of the audience members look like right now!
Oh No! He
said the words
relational
algebra and
tuple relational
calculus!
24. 24
Problem #2 – Joins
● A SQL join clause combines
records from two or more
tables in a relational
database. It creates a set that
can be saved as a table or
used as it is. A JOIN is a
means for combining fields
from two tables (or more) by
using values common to
each.
● https://en.wikipedia.org/wiki/Join_(SQ
L)
26. 26
●
A JOIN is a means for combining fields from two tables (or
more) by using values common to each.
Join Country On
(City.CountryCode
= Country.Code)
27. 27
Data Helpers
● Do not make every column as large as it possible can be. A BIGINT will take
8 bytes to store, 8 bytes to read off disk, 8 bytes to send over the network,
etc.
– Your are not going to have 18,446,744,073,709,551,615 customers even
if your customer_id is a BIGINT.
● If you can get by with a simple character set like LATIN1 great, otherwise
stick to UTF8mb4 but realize you are going from one to three bytes for
storage.
● SELECT the columns you are going to use, avoid the * wildcard
– The less you move disk/memory/net the better!
28. 28
Problem #2 – Let Database Do the Heavy Lifiting
● Need SUM, AVG, AMX, MIN, STD, STDDEV_POP, STDDEV_SAMP,
STDDEV, VARIANCE, VAR_SAMP, VAR_POP
● Sorting easier outside of your application!
● Joins!!!!!
● Do not INDEX everything
● Use Foreign Keys
31. 31
Indexes
● INDEX columns
– On the right side of WHERE
– Used in joins
● INDEXES have overhead – so do not index everything
– Maintenance
– Insert/update/delete
– Use mysqlindexcheck
● Finds Duplicate Indexes
– Use Sys Schema
● Find Unused Indexes
32. 32
Foreign Keys
● A foreign key is a field (or collection of fields) in one table that
uniquely identifies a row of another table. In simpler words, the
foreign key is defined in a second table, but it refers to the primary
key in the first table. For example, a table called Employee has a
primary key called employee_id. Another table called Employee
Details has a foreign key which references employee_id in
order to uniquely identify the relationship between both the
tables.
● The table containing the foreign key is called the child table, and the
table containing the candidate key is called the referenced or parent
table. In database relational modeling and implementation, a unique
key is a set of zero, one or more attributes, the value(s) of which are
guaranteed to be unique for each tuple (row) in a relation. The value
or combination of values of unique key attributes for any tuple
cannot be duplicated for any other tuple in that relation.
● https://en.wikipedia.org/wiki/Foreign_key
33. 33
Foreign Keys
CREATE TABLE
employee (
e_id INT NOT NULL,
name CHAR(20),
PRIMARY KEY (e_id)
):
CREATE TABLE building (
office_nbr INT NOT NULL,
description CHAR(20),
e_id INT NOT NULL,
PRIMARY KEY (office_nbr),
FOREIGN KEY (e_id),
REFERENCES employee
(e_id)
ON UPDATE CASCADE,
ON DELETE CASCADE);
34. 34
Now Add Data
INSERT INTO employee VALUES
(10.'Larry'), (20,'Shemp'),(30,'Moe');
INSERT INTO building VALUES
(100,'Corner Office',10),
(101,'Lobby',40);
SELECT FROM employee
JOIN BUILDING (employee.e_id = building.e_id);
e_id name office_nbr description e_id
10 Larry 100 Corner Office 10
40 Moe 101 Lobby 40
Where is SHEMP????
35. 35
How do we find Shemp?
mysql> SELECT * FROM employee
LEFT JOIN building ON(employee.e_id=building.e_id);
e_id name office_nbr description e_id
10 Larry 100 Corner Office 10
40 Moe 101 Lobby 40
20 Shemp NULL NULL NULL
36. 36
FK save you from messy data
mysql> INSERT INTO building VALUES (120,'Cubicle',77);
ERROR 1452 (23000): Cannot add or update a child row: a
foreign key constraint fails (`test`.`building`, CONSTRAINT
`building_ibfk_1` FOREIGN KEY (`e_id`) REFERENCES
`employee` (`e_id`) ON DELETE CASCADE ON UPDATE
CASCADE)
Who is employee 77?
37. 37
Using Cascade
mysql> DELETE FROM employee WHERE e_id=40;
mysql> SELECT * FROM employee LEFT JOIN building ON
(employee.e_id=building.e_id);
e_id name office_nbr description e_id
10 Larry 100 Corner Office 10
20 Shemp NULL NULL NULL
38. 38
Updates
mysql> UPDATE employee SET e_id=21 WHERE e_id=20;
mysql> SELECT * FROM employee LEFT JOIN building ON
(employee.e_id=building.e_id);
e_id name office_nbr description e_id
10 Larry 100 Corner Office 10
21 Shemp NULL NULL NULL
41. 41
● What is the N+1 Query Problem ?
● This problem occurs when the code needs to load the children
of a parent-child relationship (the “many” in the “one-to-many”).
Most ORMs have lazy-loading enabled by default, so queries
are issued for the parent record, and then one query
for EACH child record. As you can expect, doing N+1
queries instead of a single query will flood your database with
queries, which is something we can and should avoid.
● http://www.sitepoint.com/silver-bullet-n1-problem/
42. 42
N+1 Problem
function get_author_id( $name )
{
$res = $db->query( "SELECT id FROM
authors WHERE name=?",
array( $name ) );
$id = null;
while( $res->fetchInto( $row ) ) { $id =
$row[0]; }
return $id;
}
function get_books( $id )
{
$res = $db->query( "SELECT id FROM
books WHERE author_id=?",
array( $id ) );
$ids = array();
while( $res->fetchInto( $row ) ) { $ids []=
$row[0]; }
return $ids;
}
function get_book( $id )
{
$res = $db->query( "SELECT * FROM
books WHERE id=?", array( $id ) );
while( $res->fetchInto( $row ) ) { return
$row; }
return null;
}
$author_id = get_author_id( 'Jack
Herrington' );
$books = get_books( $author_id );
foreach( $books as $book_id ) {
$book = get_book( $book_id );
var_dump( $book );
}
http://www.ibm.com/developerworks/library/os-php-dbmistake/
Three queries!!!!
43. 43
N+1 Continued
function get_books( $name )
{
$res = $db->query(
"SELECT books.* FROM authors,books WHERE
books.author_id=authors.id AND
authors.name=?",
array( $name ) );
$rows = array();
while( $res->fetchInto( $row ) ) { $rows []= $row; }
return $rows;
}
$books = get_books( 'Jack Herrington' );
var_dump( $books );
One read to get the same data!!!! Yea!!!
44. 44
Transactions!!!
● A transaction symbolizes a unit of work performed within a database
management system (or similar system) against a database, and treated in a
coherent and reliable way independent of other transactions. A transaction generally
represents any change in database. Transactions in a database environment have
two main purposes:
– To provide reliable units of work that allow correct recovery from failures and keep a
database consistent even in cases of system failure, when execution stops (completely or
partially) and many operations upon a database remain uncompleted, with unclear status.
– To provide isolation between programs accessing a database concurrently. If this isolation
is not provided, the program's outcome are possibly erroneous.
● A database transaction, by definition, must be atomic, consistent, isolated and
durable. Database practitioners often refer to these properties of database
transactions using the acronym ACID.
●
Transactions provide an "all-or-nothing" proposition, stating that each work-
unit performed in a database must either complete in its entirety or have no
effect whatsoever. Further, the system must isolate each transaction from other
transactions, results must conform to existing constraints in the database, and
transactions that complete successfully must get written to durable storage.
– https://en.wikipedia.org/wiki/Database_transaction
45. 45
More Transactions!!
● InnoDB or NDB only!
● Start with
– Do your SELECT & UPDATE
to change data
● COMMIT to save
ROLLBACK to undo
● SAVEPOINT foo
ROLLBACK foo
– Intermediate rollback point
● Note MySQL SQL Mode is
STRICT by DEFAULT
– So watch for missing data
48. 48
What is a Query Plan?
● After your query syntax is
checked, the optimizer looks for
the most efficient way to gather
the data requested.
– Each column add roughly 1
factorial to the complexity
– Is the data in memory
● Dive to disk is 100,000
time slower
– Are indexes fresh? Are there
indexes?
MySQL wants to create a query
plan for each execution of each
query
49. 49
Things to tune
● innodb_stats_persistent=ON will store statistics between restarts so
your servers does not have to re-learn the best way to run the query
● innodb_stats_auto_recalc (default on, 10%) will automatically redo
stats after a limit of DML changes to a table
● Higher cardinality helps
– Design your indexes accordingly!!
● See MySQL Manual 14.3.11.1 Configuring Persistent Optimizer
Statistics Parameters for details
● Also add skip-name-resolve to keep bad DNS zone transfers from
killing your application
50. 50
● Check return codes
<?php
// we connect to example.com and port 3307
$link = mysql_connect('example.com:3307', 'mysql_user',
'mysql_password');
if (!$link) {
die('Could not connect: ' . mysql_error());
}
echo 'Connected successfully';
mysql_close($link);
Some Programming Advice
51. 51
● Check return codes
// Create connection
conn = new mysqli($servername, $username, $password, $dbname);
/ Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
$sql = "UPDATE MyGuests SET lastname='Doe' WHERE id=2";
if ($conn->query($sql) === TRUE) {
echo "Record updated successfully";
} else {
echo "Error updating record: " . $conn->error;
Some More Programming Advice
52. 52
● Scrub all data coming into your database
– Less SQL Injection, more data integrity
● Think in SETs of data not ROWs
– Let database do heavy lifting
● Try to write good SQL
– Learn to use Visual Explain
– Actively look to reduce/combine queries
● Use Sys Schema to find indexes not being used, redundant
indexes and slow query log for queries that are running slow and
without indexes
Some Programming Advice