A short history of how we got stuck with the notion that web applications require an ORM on top of an RDBMS, and an examination of the pros and cons of such a tight coupling.
Perhaps ORM isn't as natural a fit for your application as a key-value store?
Hi, my name is Ronen Botzer, and if my shirt didn't give it away already, I'm an engineer at Aerospike. I help to get the clients for dynamic languages to a fully featured state. I wanted to thank Ed and Diana from SouthBay PHP and Claire from Scale Warriors of Silicon Valley for organizing this meetup.
I've worked in a stream of startups since 1999. First as an engineer in several web teams. Next as a data engineer at an ad-network, a social network, and a mobile SDK vendor (Appcelerator), where we were handling larger volumes of data each time. Most recently I was the architect for a mobile parking startup called QuickPay.
I have a hard time staying serious. You probably noticed that the abstract for this talk invoked a pseduo-relationship metaphor. Objects and relationships, get it? Yeah, that gets tired as a joke fairly quickly.
I am guilty of doing pretty much everything I'm going to mention in this talk.
We can expand on it over beers later, but here goes
You fill in girlfriend/boyfriend as you see fit
Ok, fine, there also names like Hibernate for the subdued "let's watch a period piece movie on the couch" metaphor. But in general, your ORM slows down your data access with its code abstractions. It produces suboptimal performance.
Can someone decipher the acronym?
How did we get stuck with applications "needing" an ORM layer above an RDBMS?
An interesting guy who earned degrees in math and chemistry, served as a RAF pilot in WW2, and then worked for IBM as an actual âprogrammerâ. He gets his PhD at UMich, and moves to IBMâs San Jose Research Center.
Later renamed SQL due to copyright issues.
A bad decision in hindsight, as during the early 80s Oracleâs SQL proves more popular, and is a reason for Ingres slipping in share of deployments.
A post-Ingres database that includes his insights about system architecture.
Who knows what's so special about Mosaic? Yes! a GUI!
Can someone please translate WORA? Write Once Run Anywhere.
Some software architects quit their day jobs and settle on the profitable business of spreading the Gospel of Patterns. Writing voluminous books they continue to unearth new ways for us all to get more enterprise-y.
It was like Invasion of The Body Snatchers. Everybody you knew in the late 90s had at least one book about design patterns. If you acted like you found those cumbersome or a waste of time theyâd point and go <hhhhhhh>
Seriously, if a language is lucky and it had one gorilla show up early (like ActiveRecord along with Ruby-on-Rails) it will have ONLY 4-5 ORMs. Java, PHP? Youâre in the dozens of known ORMs. Likely one ORM per-startup that never got published. There are well over 100 published ones
Laravel has Eloquent ORM, and Iâve just seen a twitter shit storm of people critiquing it and threatening Taylor Otwell with statements like "You haven't seen what I've been working on the last four yearsâŠBe scared, be very scared". Well, I personally am scared.
Which is one tell-tale sign that such a database is not designed to scale.
An ORM user who is 'non-SQL' is a unicorn. If you see one, grab him by the bald head, rub it, and demand three wishes. :: A leaky abstraction refers to the way implementation details become visible through the abstraction when the operation is more complex (Joel Spolsky).
Schemas are rigid, so developers need to have understanding of SQL, DML, and DDL. You canât change a schema as easily as an object - you need migrations. Therefore you learn SQL anyway.
For example âthe bestâ ORMs support a fluent style, which basically is pseudo SQL
Something that has to run against some unknown data store. The web applications with the highest loads (Facebook, Twitter, MediaWiki) are not portable.
Shocking fact: RDBMSs do not implement SQL the same way.
Sometimes really good ones like hints for the query optimizer.
When an ORM hides direct access to a database feature, then re-implements on the application-side. Auto increment.
Your app is not just dependent on your database, but also on the DBAL. Now you can have data access and persistence bugs in two places, and you have to worry whether two technologies you depend on will become obsolete.
Show of hands - does anybody think that switching a web application from one database to the next is trivial if you use a DBAL?
ORMs are overhead, see the previous SELECT wrapper example.
Allegedly you donât need a specialist such as a DBA if you use an ORM. But if you want to figure out why your ORM is generating slow queries youâll need to get one, or become an expert. At which point you need to hope your ORM makes it easy to switch to a raw query.
Queries fetch individual relationships in a loop when we have a many:one relationships instead of a properly using a join.
The philosophy and behavior of objects and RDBMSs doesnât match up well. This is a big topic.
Some ORMs like ActiveRecord do address class hierarchies with a âtypeâ column.
Data structures on the application side are not - lists, maps, sets, etc. Complex data types are poorly expressed in RDBMSs.
If you don't know what first, second, third, boyce-codd normal form, etc is, I won't get into it in detail now.
Unless it wants to melt down early and often. Fail whales abound.
WikiMedia example - direct access to MySQL and PostgreSQL
If youâve been viewing SQL as something ugly you need to abstract away from, hello! Use NoSQL. Your persistence operations become much closer to how your application side objects behave
Stop making up for the inadequacies of your database by manually clustering. Remove sharding logic from your application. Again, faster and leaner application on top of a faster, leaner stack.
* Doesn't hurt to have founders who have been at a big internet company and saw first hand how massive loads bring traditional databases to their knees.
* What design works:
with new hardware new designs can perform better. Always develop with the current and upcoming hardware in mind.
optimize on those hardware technology advances - utilize all cores, make optimal use of memory, use the best disk access patterns (low IOPS bulk reads for example).
voltDB wrote an in-memory column-oriented database with a SQL dialect, but thinks about rotational disks for storage. Redis runs fully in-memory with weak durability (1 second fsync at best for the AOF)
Aerospike utilizes flash as storage and additional memory in a hybrid system with DRAM, because it is the most economical way to scale, while keeping speed advantage and predictable low latency as a feature.
Aerospike is a real distributed database. Clustering was built-in from the very beginning and is core to the operation and performance on the database. It is not an after-the-fact bolted-on feature.
masterless with replication
Smart client connects and learns about the cluster topology. It only needs a single IP address.
records are identified by a RIPEMD-160 digest of the PK. indexes are always 20-bytes.
The client knows the partition map and will seek to write to the master and replica partitions synchronously.
The client knows which partition to read a record from, and knows where the replica is for failover.
Stop writing sharding logic
Scaling in DRAM only is simply not economical.
SSDs are formatted, no filesystem, memory mapped I/O.
Raw device, direct access pattern.
Indexes are kept in DRAM to save on an extra IOP. Enterprise feature: fast restart (shared-memory)
Records ('rows') in sets ('tables') are kept contiguous for efficient bulk reads.
Each time a write happens the record is written to a new block, with the previous one marked for GC. This ensures an even wear on the flash drive.