Controversy persists to this day about the relative merits of dynamic languages such as PHP and Python versus static languages such as C# and Java. We’re finding more and more that the difference isn’t so much about static or dynamic typing, but more about the cultures of different languages. In this article, I discuss an efficient representations of SQL trees in a database, an algorithm for creating that representation, and a PHP implementation. The PHP implementations uses objects in a way foreign to many developers: rather than using objects to represent nouns (data), it uses a class to represent a verb (an algorithm.) I make the case that programmers shouldn’t feel compelled to create new classes to represent every data item: that verb objects often provide the right level of abstraction for many tasks.
How to Troubleshoot Apps for the Modern Connected Worker
Nested sets, php, verb objects and noun objects
1. Generation 5 » Nested Sets, PHP, Verb Objects and Noun Objects
Subscribe to our RSS Feed | About Us
Nested Sets, PHP, Verb Objects and Noun Objects
Introduction
Controversy persists to this day about the relative merits of dynamic languages such
as PHP and Python versus static languages such as C# and Java. We’re finding more
and more that the difference isn’t so much about static or dynamic typing, but more
about the cultures of different languages. In this article, I discuss an efficient
representations of SQL trees in a database, an algorithm for creating that
representation, and a PHP implementation. The PHP implementations uses objects in
a way foreign to many developers: rather than using objects to represent nouns
(data), it uses a class to represent a verb (an algorithm.) I make the case that
programmers shouldn’t feel compelled to create new classes to represent every data
item: that verb objects often provide the right level of abstraction for many tasks.
The Presenting Problem
Lately I’ve been collecting pictures of animals, and decided that incorporating the
taxonomic database from ITIS would be a big help. I’m interested in asking questions
like “What are all the species underneath the Tapiridae family?” The ITIS database
uses the adjacency list representation, where each row contains a column that
references the primary key of a parent row. Algorithms for the adjacency list are well
known, but are awkward to implement in SQL since it takes multiple SQL statements
to traverse a tree.
Nested sets are an alternative representation that makes it simple to write fast
queries on trees. Like the parts explosion diagram below, components of the hierarchy
are represented with contiguous numbers (parts 1-3 form one end piece of the
steering knuckle.)
This article discusses the adjacency list and nested set models and presents a simple
algorithm for converting an adjacency list into nested sets.
Adjacency lists
A common representation of a tree in a relational database is like this:
http://gen5.info/q/2008/11/04/nested-sets-php-verb-objects-and-noun-objects/[1/12/2014 8:20:27 PM]
Search for:
Search
Archives
June 2012 (1)
August 2010 (1)
May 2010 (1)
June 2009 (2)
April 2009 (1)
March 2009 (1)
February 2009 (3)
January 2009 (3)
November 2008 (1)
August 2008 (2)
July 2008 (5)
June 2008 (5)
May 2008 (2)
April 2008 (6)
March 2008 (8)
June 2006 (1)
February 2006 (1)
Categories
AJAX (2)
Asynchronous Communications (16)
Biology (1)
Books (1)
Design (1)
Distributed (1)
Exceptions (2)
Functional Programming (1)
GIS (1)
Ithaca (1)
Japan (1)
Math (1)
Media (3)
Nature (1)
Semantic Web (3)
Tools (28)
CRUD (1)
Dot Net (17)
Freebase (2)
GWT (9)
Java (7)
Linq (2)
PHP (6)
Server Frameworks (1)
Silverlight (12)
SQL (5)
Uncategorized (1)
Web (2)
Analytics (1)
2. Generation 5 » Nested Sets, PHP, Verb Objects and Noun Objects
[01] create table obviousTree (
[02]
id varchar(255),
[03]
parent varchar(255),
[04]
primary key(id),
[05]
foreign key(parent) references tree(id)
[06] )
The parent column is allowed to be null , so the root of the tree has a null parent. A
typical tree in this representation might look like
[07] sql> SELECT * FROM obviousTree;
[08]
+--------------+
[09]
| id | parent |
[10]
|-----|--------|
[11]
| 'a' | null |
[12]
| 'b' | 'a'
|
[13]
| 'c' | 'a'
|
[14]
| 'd' | 'b'
|
[15]
| 'e' | 'b'
|
[16]
| 'f' | 'b'
|
[17]
+--------------+
It’s simple in this representatation to find out what the parent of a node is,
[18] SELECT parent FROM obviousTree WHERE id=@Child
or to find the direct children of a node,
[19] SELECT id FROM obviousTree WHERE parent=@Parent
It’s not possible, however, to write a pure SQL statement that traverses all of the
descendants of a node (at least not in standard SQL.) You need to either write a
stored procedure or write a program in a language like C# or PHP to implements a
breadth-first or depth-first traversal of the tree. This isn’t conceptually hard, but it’s
inefficient and doesn’t take full advantage of the querying power of SQL.
Nested Sets
Joe Celko has promoted the Nested Set representation of trees in several of his
books. In the nested set representation, we represent the position of each node in
the tree with two numbers: the left value and the right value. We’ll call them lft and
rgt in our code, since LEFT and RIGHT are reserved words in SQL.
The left and right values of a parent node enclose the left and right values of the
children, so it’s easy to ask for the the descendants of a node
[20] SELECT * FROM fastTree WHERE
[21]
lft>@AncestorLft
[22]
AND lft<@AncestorRgt
to count the children
[23] SELECT (rgt-lft-1)/2 FROM fastTree WHERE lft=@AncestorLeft
or to find all the ancestors of a node
[24] SELECT * FROM fastTree WHERE
[25]
lft<@DescendentLft
[26]
AND rgt>@DescendentRgt
Granted, update operations are slower and more complex in the nested set
representation, but for my application, where I’m accessing a nearly 500,000-node
http://gen5.info/q/2008/11/04/nested-sets-php-verb-objects-and-noun-objects/[1/12/2014 8:20:27 PM]
3. Generation 5 » Nested Sets, PHP, Verb Objects and Noun Objects
tree that never changes, nested sets are the clear choice.
The Conversion Algorithm in PHP
Looking at the tree above, you can see a straightforward algorithm for creating the
nested set representation. We traverse the tree depth-first, keeping a counter, which
we’ll call cnt as we go — it works a lot like a thumb operated tally counter:
When we first encounter a node (going down), we write cnt into the lft field of that
node, then we increment cnt. When we encounter it again (going up), we write cnt
into the rgt field of the node an increment cnt.
Joe Celko makes a good case for implementing mutating operations on nested sets in
stored procedures. (see his book on Trees and Hiearchies and SQL) I think he’s
particularly right when it comes to adding, removing and moving nodes, where the
operation requires several statements that should be done in a transaction.
Transactional integrity is less important, however, in a one-time import procedure
which is done in a batch; although Mysql 5.0 (which I’m using) supports stored
procedures, it’s got less of a culture of stored procedure use than other databases, so
I felt comfortable writing the conversion script in PHP. This script converted a tree
with 477,797 nodes in less than a minute, so performance was adequate.
I’m going to store the nested set in a table that looks like
[27] create table fastTree (
[28] lft integer not null,
[29]
rgt integer not null,
[30]
id varchar(255),
[31]
primary key(lft),
[32]
unique(rgt),
[33]
unique(id)
[34] )
I include “_Config.php”, which initializes the database connection, $conn, using the
third-party ADODB library. After that I retrieve the list of parent-child relationships
from the database:
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
require_once("./_Config.php");
$rawLink=$conn->Execute("SELECT id,parent FROM obviousTree");
$link=array();
foreach($rawLink as $k=>$row) {
$parent=$row["parent"];
$child=$row["id"];
if (!array_key_exists($parent,$link)) {
$link[$parent]=array();
}
$link[$parent][]=$child;
}
Note that we’re building a complete copy of the adjacency list in RAM: the value of
$link[$parent] is an array that contains the id’s of all the children of the $parent . I’m
doing this for two reasons: (i) the adjacency list is small enough to fit in RAM, and
(ii) minimize the cost of I/O: if you expect to access all rows in a table, it’s a lot
cheaper to do a full table scan than it is to do hundreds of thousands of index
queries.
Next we define an object, TreeTransformer, that represents the algorithm. In
particular, it provides a scope for the $cnt variable, which has a lifespan apart from
the recursive function that represents the tree:
[47]
[48]
[49]
[50]
[51]
[52]
class TreeTransformer {
function __construct($link) {
$this->count=1;
$this->link=$link;
}
http://gen5.info/q/2008/11/04/nested-sets-php-verb-objects-and-noun-objects/[1/12/2014 8:20:27 PM]
4. Generation 5 » Nested Sets, PHP, Verb Objects and Noun Objects
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
[65]
[66]
function traverse($id) {
$lft=$this->count;
$this->count++;
$kid=$this->getChildren($id);
if ($kid) {
foreach($kid as $c) {
$this->traverse($c);
}
}
$rgt=$this->count;
$this->count++;
$this->write($lft,$rgt,$id);
}
Traverse() is the core of the algorithm: it works in three phases: (i) it assigns the $lft
value, (ii) it loops over all children, calling itself recursively for each child, and (iii)
assigns the $rgt value and writes an output record. The scope of $this->count is
exactly the scope we want for the variable, and saves the need of passing $count
back and forth between different levels of traverse() . Traverse calls two functions:
[67]
[68]
[69]
[70]
[71]
[72]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
function getChildren($id) {
return $this->link[$id];
}
function write($lft,$rgt,$id) {
global $conn;
$conn->Execute("
INSERT INTO fastTree
(lft,rgt,id)
VALUES
(?,?,?)
",array($lft,$rgt,$id));
}
}
Noun Objects or Verb Objects?
This script uses a single class: instead of using classes to represent data, it uses a
class to represent an algorithm, a verb. Rather than create objects to create data
structures, I reuse data structures that come with PHP.
This is an extensible design.
TreeTransformer provides two extension points: getChildren() and write() . Most of
the objections a person could have to this implementation could be addressed here:
for instance, getChildren() could be modified to support a different data structure for
the adjacency list, or even to operate in a streaming mode that does an SQL query
for each node. write() , on the other hand, could be modified to avoid the limitations
of the global $conn, or to change the output format. If TreeTransformer were to
evolve in the future, it would make sense to push traverse() up to an
AbstractTreeWriter and define getChildren(), write() and $this->link in a subclass.
Noun objects (that represent things) can be useful, but a compulsion to create noun
objects can lead to an entanglement between algorithms and data structures, poor
code reuse, and a proliferation of artifacts that makes for a high defect count and
expensive maintainance. Even if were using noun objects in this program, it still
makes sense to implement this algorithm as a verb object: by creating interfaces for
the adjacancy list and for the output, I could keep the algorithm reusable while
keeping the noun objects simple.
Conclusion: Write Less Code
Many of the fashionable languages that people claim improve productivity, such as
Lisp, Python and Ruby, put powerful data structures at the programmer fingertips:
they encourage you to reuse data structures provided by the language rather than to
create a new object in response to every situation. Functional languages such as
CAML and F# show the power of programming by composition, enabled by the use of
standard data structures (and interfaces.)
What’s really exciting is that these methods are becoming increasingly available and
popular in mainstream languages, such as C# and PHP.
Generic classes and methods, as available in Java and C#, let you have both
reusable data structures and type safety. Although there are trade offs, you seriously
consider using a List<Thing> rather than creating a CollectionOfThings . Yes, a
collection of things that you create can present exactly the interface you need,
however, without a lot of care, you might find that another member of your team
wrote an incompatible JohnsCollectionOfThings, and you end up writing more glue
code because you’re interfacing with a third party vendor that uses a
MagicBoxFullOfThings instead.
http://gen5.info/q/2008/11/04/nested-sets-php-verb-objects-and-noun-objects/[1/12/2014 8:20:27 PM]