8. ETL/Cache Loading Data Takes Too Long
Node 1 MemcacheD MemcacheD
DB (Scratch Cluster
DSG DB Server Server(s))
WSP ETL Server Backup & MemcacheD
Server(s)
Restore
MemcacheD
Transform CI Cache MemcacheD
DSG Extract Database CI Table Loading
Database Loading
Database Database Process Process
MemcacheD MemcacheDB
Node 2 Cluster
DB Server MemcacheDB
Backup &
Restore
MemcacheDB
CI Database
Page 16
This is the new Data Load Process. It makes it look easy…
…The reality it is quite complex. This is just one of our workflows. The orange/tan-ish boxes are Java map/reduce processes. The pink boxes are pig processes. The white boxes are BCP processes. The green boxes are MongoDB collections.
Here is our sharding scheme. We actually have 6 more servers than is shown because we decided to have multiple replicas at each remote site.