Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Regulated Reactive - Security Considerations for Building Reactive Systems in Regulated Industries
1. The IBM Watson
and Cloud Platform
Ryan Hodgin
Lead Solution Architect –
Healthcare and Life Sciences
@rhodgin
Regulated Reactive:
Security Considerations
for Building Reactive
Systems in Regulated
Industries
One Platform. One architecture.
Cloud-native. One IBM.
2. Background On Ryan
IBMer for 15 years based in Boulder, CO
Application Development and Architecture Background
Now on IBM’s Cloud Platform Team focused on Solution Architecture for
Healthcare and Life Sciences
Twitter: @rhodgin
LinkedIn: https://www.linkedin.com/in/rhodgin/
SlideShare: https://www.slideshare.net/RyanHodgin
13. Reactive Patterns – Event Sourcing
• What is it?
“Capture all changes to an application state as a sequence of events.” – Martin Fowler
• Motivations:
• Distributed Computing - Append only architecture distributes well
• Fully visibility to system’s history
• Natural audit log
• Snapshots with ability to replay events
• Speed to recovery
17. Review Events and Make Corrections
Change Days
Absent
Command
Days Absent
Changed Event
18. Event Sourcing and GDPR
• General Data Protection Regulation (GDPR) goes into effect May 25, 2018
• In situations where ”Right to erasure” applies, does personal identifying data
need to be removed from the event source history?
19. Reactive Patterns – CQRS
• Definition: CQRS (Command Query Responsibility Segregation) - CQRS is
simply the creation of two objects where there was previously only one. The
separation occurs based upon whether the methods are a command or a
query. (Greg Young)
• Motivations :
• Supports different rules for display of data (query model based on usage)
• Fits well with Bounded Contexts in Domain-Driven Design
• Support separate access control / security rules between reads/writes
• Enables “Principle of Least Privilege (PoLP)”
23. Istio
• Open source project led by
Google, IBM, and Lyft
• Service Mesh
• Offers features in:
• Traffic Management
• Observability
• Policy Enforcement
• Service Identity and Security
• Initial support for Kubernetes and
plans for VMs, Cloud Foundry,
and Mesos
25. Disclaimers
• Full stack not yet production ready
• Other aspects of security still matter (a lot!)
• Edge Security
• Database Security
• Access Management
• Policies and procedures
• Variations based on industry / organization
26. NIST’s Cybersecurity Framework
• Many organizations are standardizing on the
framework (with some customizations by
industry/organization)
• Defines 5 key categories:
• Identify
• Protect
• Detect
• Respond
• Recover
27. Identify
• Definition: Develop the organizational understanding to manage cybersecurity
risk to systems, assets, data, and capabilities.
• Opportunities to reduce risk:
• Catalog Services and Data - include risk potential and appeal to attackers
• Secure events throughout lifecycle
• Intelligent routing of sensitive messages
• Leverage labels in Kubernetes and tags in cloud providers to give more visibility / reporting
28. Protect
• Definition: Develop and implement the appropriate safeguards to ensure
delivery of critical infrastructure services.
• Opportunities to reduce risk:
• Use SSL/TLS Consistently (Akka, Lagom, Play, anything else)
• Use SSL/TLS with Mutual Authentication for Akka Remoting
• Disable Java Serializer in Akka (since 2.4.11)
• Minimize container privileges
• Utilize Kubernetes Role Based Access Control (and record events) for changes
• Implement service identity and access control rules (service/data level authorization)
29. Detect
• Definition: Develop and implement the appropriate activities to identify the
occurrence of a cybersecurity event.
• Opportunities to reduce risk:
• Use tracing – Lightbend Telemetry / Open Tracing / Zipkin / Jaeger
• Use monitors and alerts – OpsClarity Monitors, Prometheus
• Akka Supervisors – Naturally handle all Actor exceptions (forward messages and track patterns)
• Centralize logs, build benchmarks, and detect unusual patterns of activity
• Integrate AI/Machine Learning – normal day vs. abnormal day
30. Respond
• Definition: Develop and implement the appropriate activities to take action
regarding a detected cybersecurity event.
• Opportunities to reduce risk:
• Elastic application design
• Quarantine a compromised Service/VM/container (fail fast)
• Utilize circuit breakers and rate limiting
• Patch rapidly and make changes without downtime
31. Recover
• Definition: Develop and implement the appropriate activities to maintain plans
for resilience and to restore any capabilities or services that were impaired due
to a cybersecurity event.
• Opportunities to reduce risk:
• Resiliency of the application
• Snapshots and Replay (Event Sourcing)
• Replication / Disaster Recovery strategy
• Use multi-data center capability with Hot-backup
• Utilize Kubernetes distributed clusters and federation
33. Reducing Risk in Reactive Patient Vitals App
• Use event sourcing (audit trail/recovery) and CQRS (controls for reporting)
• Create snapshots of events and prioritize ability to replay them (test it)
• Minimize instances of sensitive data - what information is really required?
• Restrict access to services - what should be able to call them?
• Secure communication between services (mutual TLS / service authentication)
• Capture and store metrics on caller, # of calls, and response times
• Capture container images, version deployment history
34. Communicating the Shift to Reactive
• Explain business reasons for change (competitive pressure, speed to market,
risk reduction, etc.)
• Highlight features of the architectural patterns
• Provide realtime visibility (trust but verify)
• Automate auditing (follow up on inconsistencies) – Netflix’s Security Monkey
• Restrict changes (who is allowed to make changes with strong traceability)
• Highlight reductions in current targets (time to patch, RTO/RPO)