5. Which Cloud?
• Trove has only API dependencies
• Overcloud (bare-metal)?
• In-Cloud (vms)?
8/19/14 tesora.com 5
6. HA Trove
• HA OverCloud
• Availability Zones
• HA Trove Control Plane
• Control Plane across availability zones
• Galera Cluster
• RabbitMQ Cluster
• Multiple Trove API, TaskManager, Conductors
8/19/14 tesora.com 6
7.
8. How did we get here?
• Salt Stack
• Salt-based Trove deployment
• https://github.com/saurabhsurana/trove-installer/
tree/master/saltstack
• Salt-based Openstack deployment
• https://github.com/EntropyWorks/salt-openstack
8/19/14 tesora.com 8
9. Configuration Management
• Helps define/control
• Packages and dependencies to be installed
• Configuration files to be copied
• Users / groups
• Gives a reproducible state of the infrastructure
• Highstate Trove-managed VMs on first boot
8/19/14 tesora.com 9
10. Remote Execution
• No SSH
• Can control infrastructure from single machine
• Can define user and resource level access
• Specifically useful for Trove to help manage DB instances
8/19/14 tesora.com 10
12. trove.conf
# Number of child processes to run
trove_api_workers = {{ pillar['trove_worker_threads']}}
# AMQP Connection info
rabbit_password = {{ pillar['trove_rabbit_password'] }}
rabbit_hosts = {{ pillar['trove_rabbit_hosts'] }}
rabbit_userid = {{ pillar['trove_rabbit_userid'] }}
sql_connection = {{ pillar['trove_mysql_connection']}}
{% if not pillar['devstack_setup'] %}
# Updates service and instance task statuses if instance failed become active
update_status_on_fail = True
# how long to wait for guest agent to become active (in sec) (default is 300)
usage_sleep_time = 30
usage_timeout = {{ salt['pillar.get']('trove_guestagent_active_timeout', 600) }}
{% endif %}
# Path to the extensions
api_extensions_path = {{ pillar['trove_path'] }}/extensions/routes
8/19/14 tesora.com 12
13. Trove @ HP Helion
• Image-based Deploys
• TripleO
• Trove Heat Templates
• Trove Image Elements
• Saltcloud / Nova wrapper -> Salt Master -> Trove
• Seed -> Under -> Over -> Heat -> Trove
8/19/14 tesora.com 13
14. Operations - SaltStack
• Most of the DBaaS operations are based on
SaltStack
• HA Deployment of Salt Masters
• Control the access to infrastructure with Salt Stack
• Control access to customer instances
• To help Debug the issues
• But protect the data and access to MySQL database
• Each Trove guest instance becomes a minion
8/19/14 tesora.com 14
15. Trove Upgrades
• Trove Datastore must be usable during all upgrades
• Upgrades usually involve downtime
• RPC Versioning
• Upgrade Sequence that we follow:
• Upgrade all the guest agents first (trove service)
• Upgrade Task Manager and Conductor
• Upgrade API servers
• If new RPC method is introduced, it must be available on the Guest
before an api operation is performed
8/19/14 tesora.com 15
16. Security of key Trove components
• Use SSL
• Trove API
• RabbitMQ
• Security Group
• Database
• Only Control Plane components needs access
• RabbitMQ
• Control Plane and All the guestagent needs access, but use the range where
ever possible
• Use separate DB and RMQ Credentials for each service
8/19/14 tesora.com 16
17. Monitoring of Trove Service / Instances
• Trove doesn’t ship with monitoring
• Upstart scripts respawn Trove services
• Monitor Trove API ports with Nagios
• Monitor RabbitMQ and DB connectivity from
Control plane nodes
8/19/14 tesora.com 17
18. Monitoring of key Trove components
• RabbitMQ
• Number of Queues
• Number of Sockets used
• Number of Established Connections
• Cluster Status
• Failed access attempts
• Database
• MySQL standard monitoring
• Cluster status
• Slow query log
• error.log for unauthorized/failed access attempts
8/19/14 tesora.com 18
19. Monitoring of key Trove components
• Trove Guest Agent Heartbeat status
• Trove Instance Audit (catch failed instances to help
identify service issues)
• Connectivity to trove instances from outside
8/19/14 tesora.com 19
21. OpenStack Trove : RabbitMQ
• RabbitMQ
• Up the default socket descriptor limit (as that will blow up pretty soon)
• Number of queues and sockets will keep on growing, if you don’t
enable RabbitMQ connections with heartbeat
• Monitoring is the key to deal with RabbitMQ cluster configured with
Mirrored queues
8/19/14 tesora.com 21
22. OpenStack Trove
• GuestAgent Hearbeats (Service Status notifications) should
be monitored for failure
• Upgrading the Guest Agent is tricky on xsmall
• Quota mismatch between Trove and Nova would be the
biggest reason for instance failures
• Resource mismatch between Trove and Nova
• Schedule jobs to correct things
8/19/14 tesora.com 22