2. Who Am I?
• Software developer at Mozilla working on Socorro
• Author of the PHP Playbook
• Former frequent blogger on PHP topics
• Private pilot in my spare time
12. Solving the problem
• Separate the use of data from the retrieval of data.
• Think in terms of actions.
• Build our applications to be storage agnostic.
• Use the correct data storage medium.
17. <?php
class Data_Model {
/* ... */
public function getSomeData() {
$data = $this->adapter->queryData();
! /** process data here **/
! return $processedData;
}
}
class Data_Model_Adapter extends MySQL_Adapter implements Adapter
{
public function queryData() {
$sql = ‘SELECT * FROM table’;
/** turn into common format **/
return $commonFormatData;
}
}
Years and years ago, when the web was young, state was maintained simply by the creation of a database. Web applications were mostly small, and databases could easily handle the traffic that was being sent their way. Most of us learned how to write web applications against a database. Most of us used the &#x201C;LAMP stack&#x201D; or Linux Apache MySQL PHP.\n
As the web grew up, and grew bigger, methods for obtaining, storing and using data changed.\n\nDevelopers began using data sources provided by others, first over SOAP then REST. Other data stores like NoSQL, Redis, Elastic Search and Memcache came along to complicate things. \n\nIt was no longer all about the database. The database was just one piece of the puzzle.\n
Yet if we take a good look at most of the frameworks available, they&#x2019;re database-centric. For a long time, Doctrine support for other data layers was non-existent. Support for something other than a database in Django is non-existent. We still think in a database-centric way. Or data layers are still database-focused.\n
The bottom line: we need to change our thinking.\n\nDatabases are not it. Even for applications that start against a database (and that&#x2019;s most if not all of them), we need to think about the other ways that we&#x2019;ll ingest data.\n
This lesson was painful for those of us working on Socorro. Initially built as a database-centric application we&#x2019;ve slowly expanded our technology stack as new needs have arisen. While much of our webapp data comes from Postgres, we&#x2019;ve begun a process of moving our data layer to a more source-agnostic middleware layer.\n
\n
It&#x2019;s clear for us that a database centric model doesn&#x2019;t work anymore. We can&#x2019;t think of data in concepts of rows and columns. It doesn&#x2019;t work like that. \n\nSo how do we solve this problem?\n
\n
Large web applications don&#x2019;t pursue abstraction as an art form. They pursue it as a necessity. Failing to properly abstract a large web application can result in catastrophic failure. It is therefore important to abstract the layer that gets data from a data storage unit from the layers that use the data.\n\nHere&#x2019;s an example...\n
When programmers are in a hurry they often don&#x2019;t take the time to abstract their code in a way that makes it easy to come along later and make changes. I&#x2019;ve seen this example hundreds of time in codebases I&#x2019;ve worked on; many of you probably have too. But the problem here is that if ever the data source changes from some SQL-based database to something else, a programmer will have to rewrite the logic here and everywhere else all over again. This makes the cost of transition much higher than it has to be.\n
When programmers are in a hurry they often don&#x2019;t take the time to abstract their code in a way that makes it easy to come along later and make changes. I&#x2019;ve seen this example hundreds of time in codebases I&#x2019;ve worked on; many of you probably have too. But the problem here is that if ever the data source changes from some SQL-based database to something else, a programmer will have to rewrite the logic here and everywhere else all over again. This makes the cost of transition much higher than it has to be.\n
It would make good sense to therefore abstract the process of \n
We should instead use adapters to query the data and return it in an agreed upon format. The processing takes place elsewhere.\n
\n
NAP story. Data layer Postgres focused.\n
When the retrieval and processing are combined, it makes it that much harder to remove one from the other in the future.\n
\n
\n
\n
\n
\n
\n
When you think in terms of actions, rather than data sources, you don&#x2019;t care what happens behind the scenes. Instead, you start caring about the finished product. In Socorro, we have reports that use both Hbase and Postgres data. If we cared about the data source, we&#x2019;d have many more calls than we need.\n
\n
If we use JSON as a standard data format throughout our app, we can construct generic objects easily without worrying about what methods are automatically available to us.\n
\n
Rather than relying upon model-constructed or ORM-built objects, we should create our own when and if the need arises. \n
It&#x2019;s okay to process the results from a database query into some standard format or create an object using the data. But once the data has been retrieved, it should be pushed into a standard format that can be used in the app without caring about what the data source was.\n
Developers are drawn to things that are new, cool, or otherwise unique and special. But it&#x2019;s important to use the correct storage medium for development.\n
\n
\n
\n
Socorro uses ElasticSearch (not a NoSQL database) and Hbase. We should have used Cassandra, but we have Hbase instead.\n
\n
External APIs, the file system, all are valid data storage mechanisms. Just because we write database-driven applications doesn&#x2019;t mean our data storage has to be entirely a database. A REST API to an external resource is a valid data storage mechanism, that isn&#x2019;t database-driven (at least as far as your app is concerned).\n