Magazine Luiza, one of the largest retail chains in Brazil, developed an in-house product recommendation system, built on top of a large knowledge Graph. AWS resources like Amazon EC2, Amazon SQS, Amazon ElastiCache and others made it possible for them to scale from a very small dataset to a huge Cassandra cluster. By improving their big data processing algorithms on their in-house solution built on AWS, they improved their conversion rates on revenue by more than 25 percent compared to market solutions they had used in the past.
2. About Magazine Luiza
Magazine Luiza is one of the largest household
appliance retail chains in Brazil. Focused on
providing durable goods for Brazil's middle and
lower-to-middle income classes.
•
•
•
•
•
731 stores
8 distribution centers
more than 23.000 workers
22.8 million customers
multi-channel strategy
Friday, November 15, 13
8. Graph Stack
Distributed Graph Database
• Used for OLTP queries
Friday, November 15, 13
Distributed database management system
9. Graph Stack
Distributed Graph Database
• Used for OLTP queries
• Native integration with Tinkerpop
Friday, November 15, 13
Distributed database management system
10. Graph Stack
Distributed Graph Database
Distributed database management system
• Used for OLTP queries
• Native integration with Tinkerpop
• Continuously available with no single point of failure
Friday, November 15, 13
11. Graph Stack
Distributed Graph Database
Distributed database management system
• Used for OLTP queries
• Native integration with Tinkerpop
• Continuously available with no single point of failure
• Elastic scalability
Friday, November 15, 13
12. Graph Stack
Distributed Graph Database
Distributed database management system
• Used for OLTP queries
• Native integration with Tinkerpop
• Continuously available with no single point of failure
• Elastic scalability
• Caching layer
Friday, November 15, 13
13. Graph Stack
Distributed Graph Database
Distributed database management system
• Used for OLTP queries
• Native integration with Tinkerpop
•
•
•
•
Friday, November 15, 13
Continuously available with no single point of failure
Elastic scalability
Caching layer
Built-in replication
14. Storing users data
Elastic
Load Balancing
EC2
instance
EC2
instance
Auto Scaling
API instances
Friday, November 15, 13
m2.xlarge
m2.xlarge
m2.xlarge
m2.xlarge
m2.xlarge
m2.xlarge
Cassandra cluster
15. Storing users data
Elastic
Load Balancing
EC2
instance
EC2
instance
Auto Scaling
API instances
Friday, November 15, 13
m2.xlarge
m2.xlarge
m2.xlarge
m2.xlarge
m2.xlarge
m2.xlarge
Cassandra cluster
39. Gremlin Graph Language
• Groovy DSL for graph traversals
• Easy to learn
Friday, November 15, 13
40. Gremlin Graph Language
• Groovy DSL for graph traversals
• Easy to learn
• Great community
Friday, November 15, 13
41. Gremlin Graph Language
• Groovy DSL for graph traversals
• Easy to learn
• Great community
• Part of the Tinkerpop stack
Friday, November 15, 13
42. Gremlin Graph Language
• Groovy DSL for graph traversals
• Easy to learn
• Great community
• Part of the Tinkerpop stack
• Works with any Blueprints enabled graph database
Friday, November 15, 13
54. Processing data with Spot Instances
Bob
dispatch a task to Amazon SQS
containing the product id
Simple Queue Service
(Amazon SQS)
Friday, November 15, 13
55. Processing data with Spot Instances
Bob
dispatch a task to Amazon SQS
containing the product id
Simple Queue Service
(Amazon SQS)
consume Amazon SQS tasks
EC2
instance
EC2
instance
m1.large
m1.large
…
Spot instances
Friday, November 15, 13
EC2
instance
m1.large
process W*A*
recommendations
56. Processing data with Spot Instances
Bob
dispatch a task to Amazon SQS
containing the product id
Simple Queue Service
(Amazon SQS)
consume Amazon SQS tasks
sync logs
sync logs
Simple Storage
Service (Amazon S3)
Friday, November 15, 13
EC2
instance
EC2
instance
m1.large
m1.large
…
Spot instances
EC2
instance
m1.large
process W*A*
recommendations
61. Personalized e-mails
Users receive e-mails when:
• A product has a price drop
• Abandoned a product on cart
• Visits many similar products
Friday, November 15, 13
63. Personalized e-mails
Bob
Bob API
notifies an
user interaction
Mailer
Manager
dispatch a task to Amazon SQS
containing the customer id
Simple Queue Service
(Amazon SQS)
m1.large
Bobby Mailer
Friday, November 15, 13
64. Personalized e-mails
Bob
Bob API
notifies an
user interaction
Mailer
Manager
dispatch a task to Amazon SQS
containing the customer id
Simple Queue Service
(Amazon SQS)
m1.large
consume Amazon SQS tasks
EC2
instance
EC2
instance
m1.large
m1.large
…
Spot instances
Bobby Mailer
Friday, November 15, 13
EC2
instance
m1.large
find the best
recommendation
for that user
65. Personalized e-mails
Bob
Bob API
notifies an
user interaction
Mailer
Manager
dispatch a task to Amazon SQS
containing the customer id
Simple Queue Service
(Amazon SQS)
m1.large
Simple Email
Service (Amazon SES)
send the e-mail
consume Amazon SQS tasks
EC2
instance
EC2
instance
m1.large
m1.large
…
Spot instances
Bobby Mailer
Friday, November 15, 13
EC2
instance
m1.large
find the best
recommendation
for that user
66. Personalized e-mails
Bob
Bob API
notifies an
user interaction
Mailer
Manager
dispatch a task to Amazon SQS
containing the customer id
Simple Queue Service
(Amazon SQS)
m1.large
sync logs
Simple Email
Service (Amazon SES)
sync logs
Simple Storage
Service (Amazon S3)
send the e-mail
consume Amazon SQS tasks
EC2
instance
EC2
instance
m1.large
m1.large
Spot instances
Bobby Mailer
Friday, November 15, 13
…
EC2
instance
m1.large
find the best
recommendation
for that user
68. Analytics with Faunus
Amazon EMR
Graph Analytics Engine
• Provides graphs input/output formats
Friday, November 15, 13
Distributed computing
69. Analytics with Faunus
Amazon EMR
Graph Analytics Engine
• Provides graphs input/output formats
and traversal language for graphs
Friday, November 15, 13
Distributed computing
70. Analytics with Faunus
Amazon EMR
Graph Analytics Engine
Distributed computing
• Provides graphs input/output formats
and traversal language for graphs
• Distributed processing of large data sets across clusters
Friday, November 15, 13
71. Analytics with Faunus
Amazon EMR
Graph Analytics Engine
Distributed computing
• Provides graphs input/output formats
and traversal language for graphs
• Distributed processing of large data sets across clusters
• Designed to scale
Friday, November 15, 13
72. Analytics with Faunus
Amazon EMR
Graph Analytics Engine
Distributed computing
• Provides graphs input/output formats
and traversal language for graphs
• Distributed processing of large data sets across clusters
• Designed to scale
• Detect and handle failures at application layer
Friday, November 15, 13
83. Metrics
• 4.3 million Magazine Luiza identified customers
• 50,000 nodes “products”
Friday, November 15, 13
84. Metrics
• 4.3 million Magazine Luiza identified customers
• 50,000 nodes “products”
• 90 million total nodes
Friday, November 15, 13
85. Metrics
•
•
•
•
4.3 million Magazine Luiza identified customers
50,000 nodes “products”
90 million total nodes
350 million total edges
Friday, November 15, 13
86. Metrics
•
•
•
•
•
4.3 million Magazine Luiza identified customers
50,000 nodes “products”
90 million total nodes
350 million total edges
700 GB of data
Friday, November 15, 13
87. Metrics
•
•
•
•
•
•
4.3 million Magazine Luiza identified customers
50,000 nodes “products”
90 million total nodes
350 million total edges
700 GB of data
Peaks with 20,000 reads/sec - Cassandra Cluster
Friday, November 15, 13
90. Results matter…
Solution A alone
January 2013
Friday, November 15, 13
March 2013
May 2013
July 2013
September 2013
91. Results matter…
Solution A alone
January 2013
Friday, November 15, 13
First Bob tests
March 2013
May 2013
July 2013
September 2013
92. Results matter…
Bob out for 2 weeks
Solution A alone
January 2013
Friday, November 15, 13
First Bob tests
March 2013
May 2013
July 2013
September 2013
93. Results matter…
Bob alone
Bob out for 2 weeks
Solution A alone
January 2013
Friday, November 15, 13
First Bob tests
March 2013
May 2013
July 2013
September 2013
97. Next steps
• Use Faunus to pre-process all W*A* recommendations
Friday, November 15, 13
98. Next steps
• Use Faunus to pre-process all W*A* recommendations
• Algorithms to identify communities in graph
Friday, November 15, 13
99. Next steps
• Use Faunus to pre-process all W*A* recommendations
• Algorithms to identify communities in graph
• Cassandra replication between regions
Friday, November 15, 13
100. Please give us your feedback on this
presentation
BDT303
As a thank you, we will select prize
winners daily for completed surveys!
Friday, November 15, 13
Thank You