Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Cloud Sobriety for Life Science IT Leadership (2018 Edition)

Candid/blunt AWS advice for research IT and life science IT leadership. Hard lessons learned from many years of AWS consulting. Contact dag@bioteam.net if you want a PDF copy of this presentation

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Cloud Sobriety for Life Science IT Leadership (2018 Edition)

  1. 1. 1 Cloud Sobriety 2018 May 2018 Third Rock Ventures Informatics/IT Symposium Practical Tips for Life Science IT Leadership_____________________ _ _____________________ _ Chris Dagdigian Senior Director - BioTeam chris@bioteam.net v2
  2. 2. 2 Content Warning I am not an “expert” … or a “thought leader” I try to speak honestly about what I see, do and experience “on the ground” as an IT worker My views are biased by the types of work I perform. Filter my words through your own expertise … TRV has asked me to concentrate on topics relevant to IT Directors dealing with computational teams
  3. 3. 3 Been doing this a while. Got some certs.
  4. 4. 4 Why Cloud? chris@bioteam.net Get yer mind right. _____________________ _ _____________________ _
  5. 5. 5 ‣ Science changes faster than we can refresh infrastructure ‣ You must be prepared to design, deploy and support complex IT infrastructure that lives for years ‣ … in an environment where scientists cannot really predict their requirements or tooling needs beyond 6 months ‣ This is what keeps Research IT professionals awake at night If you have to support scientists …
  6. 6. Why Cloud — Core Life Science Adoption Drivers ‣ The primary driver for IaaS Cloud Adoption in our space is NOT SAVING MONEY ‣ Cloud is a CAPABILITY and a COLLABORATION IT strategy 6
  7. 7. Why Cloud — Core Life Science Adoption Drivers ‣ Capability Drivers • Pressure-relief valve for when on-prem IT is not aligned to current need • Delegate significant infrastructure controls to end-users & teams • Leverage game-changing AWS services that have no comparable on- prem option ( s3:// + event triggers, Lambda, Glue, Batch, Poly, Lex, SageMaker, API Gateway, etc. etc) • Leverage hosted/managed services like RDS, ECS, EKS and Fargate that can reduce operational burden of your SysAdmin & Ops people 7
  8. 8. Why Cloud — Core Life Science Adoption Drivers ‣ Collaboration Drivers • The future of pharmaceutical drug development increasingly relies on complex multi-party relationships. Companies may be “friends” in one area and fierce competitors in a different market/area. • Nobody wants to bring a frenemy inside the corporate firewall • Also • s3:// is ideal for high-velocity data exchange and ingest. Petabytes of open- access lifesci data already hosted/available on AWS • Your third party data providers, outsourced genome sequencing shops, CROs and other business partners are already on AWS - data delivery is easy! 8
  9. 9. Why Cloud — Get yer mind right ‣ It is absolutely essential that Senior Leadership across the enterprise share the same vision for cloud including core use cases and security/risk model ‣ Or else this will happen … 9
  10. 10. Why Cloud —When Your Org is Not “Cloud Comfy” ‣ AD Administrators say this: “The cloud is scary and insecure, we will not allow your cloud servers to bind to our AD Forest” ‣ ITIL Addicted Support Org says this: “All linux hosts must run RedHat and be registered to our Satellite Server, Must be bound to Active Directory, Must run Anti- Virus. All servers must be listed in our CMDB. No exceptions” • This breaks tons of stuff. Plus heads explode when “serverless” is brought up ! ‣ Security Team says this: “All egress traffic will be blocked unless you fill out this MS Word .doc listing all possible SRC and DST hostnames + all applicable protocols. We will not allow subnet-wide firewall policies - only policies that list specific SRC hostnames and IPs will be allowed. • This breaks anything that auto-scales and any AWS service (like Lambda) that pulls an ENI from one of your private subnets for temporary use 10
  11. 11. 11 Biotech/Pharma Cloud: Most Common Mistakes ‣ C-Level executives shouting “cloud first!” without doing any sort of math or detailed planning ‣ Bypassing InfoSec and Legal stakeholders in early stages of cloud training, research and knowledge gathering - lack awareness in these groups breeds horrific policy ‣ Treating cloud as hostile/alien environment rather than a remote/virtual datacenter ‣ Forcing legacy IT design patterns, governance models, inventory management and provisioning mechanisms into the IaaS space without evaluating alternatives or even bothering to think that alternatives are worth considering ‣ Not enough bandwidth to the cloud footprint ‣ Improper IP space allocation and network subnet planning ‣ Allowing users and developers full access before “safety rails” and operational procedures are in place
  12. 12. 12 Which Cloud? chris@bioteam.net IaaS Cloud Rankings _____________________ _ _____________________ _
  13. 13. 13 Which Cloud? IaaS Cloud Ranking ‣ Data Intensive Science rarely fits into ‘canned’ solution stacks|services ‣ We need flexibility and diverse options to build suitable tooling ‣ The best IaaS scoring metric: “How many building blocks do you offer?” ‣ Best sign of a pathetic “pretender cloud” ‣ Only offers Object Storage, Block Storage and VMs (little else)
  14. 14. 14 IaaS Rankings: My $.02 1. AWS 2. Azure 3. Google* ‣ AWS is still ~2 years or more ahead of all others in terms of capability and useful IaaS building blocks ‣ AWS is best all around for exploratory and R&D type use cases and workloads ‣ Azure has proven itself and is improving very rapidly. Corporate/Enterprise workloads are well supported ‣ Google* (see next slide)
  15. 15. 15 IaaS Rankings: My $.02 1. AWS 2. Azure 3. Google* ‣ Google can be attractive for high-value pipelines and workflows where it makes sense to commit engineering and enhancement resources. Also attractive if you go “all-in” and are willing to rewrite and redesign to support “the google” way of operations ‣ Consider: Google should be evaluated as a potential competitive threat to your company and business model. Alphabet is far more likely to start life science, biotech, pharma or other companies in “our world” that could end up competing with you. ‣ ** Edit: <angsty personal anti-google opinions removed because it turns out to be impossible to separate ‘personal’ thoughts from my corporate role AND I was using old info/experiences in a 2018 talk which is not really fair to google cloud as they exist today>
  16. 16. 16 Screening the AWS Kool-Aid chris@bioteam.net Why AWS outreach and evangelists make my life a living hell :) _____________________ _ _____________________ _
  17. 17. 17 Screening the AWS Kool-Aid: ‣ AWS is incredible at outreach and evangelism across all media and platforms. They give away tons of stuff including advice, code, CF templates, “how-to guides” etc. because they know this is a huge driver for client uptake and adoption. ‣ The AWS re:Invent youtube channel is among the best technical training resources I’ve ever encountered ‣ But … ‣ IT leadership must understand that AWS always presents the rosiest possible ‘cloud-native’ view of the universe in their public outreach • … and sometimes this conflicts with the real world.
  18. 18. 18 Screening the AWS Kool-Aid ‣ [1 of 2] In the AWS outreach universe … • Everybody is fluent in DevOps, CloudFormation, Lambda & API Gateway • Legacy things don’t exist except as migrate/rewrite opportunities • All workloads and apps can be rewritten to be server less • All apps are stateless and thus targets for auto-scaling, multi-az, spot & load balancers • EC2 servers are ‘cattle-nodes’ or managed via immutable architecture design patterns. No server is ever special or manually configured with critical stuff. • Direct Connect, VPN Links, Routing and VPC Peering setup is easy/magical and rarely has to be discussed, let alone planned out carefully. VPC-to-Premise and VPC-to-VPC traffic flows are assumed.
  19. 19. 19 Screening the AWS Kool-Aid: ‣ [2 of 2] In the AWS outreach universe … • Unrestricted outbound internet access is assumed (even from within private subnets) • Everybody has all the IAM permissions they need and/or every company has a smooth IAM/RBAC implementation process • Most things you build will be public/internet facing • Route53 handles DNS for your domain name(s) so healthcheck-influenced failover, multi- az and geo-redundancy can magically occur • All AWS services are fair game for use, can store corporate data and are not subject to advance review/approval by internal Legal/IT/InfoSec teams • RDS is perfect for all the things and you’ll never have to run a gnarly Oracle or MS SQL Server setup on EC2
  20. 20. 20 Screening the AWS Kool-Aid: Real World Example #1 ‣ Midlevel Manager: “Why are you not granting me Administrator level IAM access in the prod VPC? This CloudFormation template I found on the internet looks awesome!” ‣ Narrator: “… the CF template in question would have created untagged non-compliant security groups, subnets and servers exposed to the internet via ELB and ElasticIP … within a production VPC explicitly designed to have zero internet-facing surfaces”
  21. 21. 21 Screening the AWS Kool-Aid: Real World Example #2 ‣ Watches re:Invent video and falls in love with containers: “Why is it taking you weeks to get me a functional ECR cluster within our production VPC’s private subnets?” ‣ Narrator: “… Because egress traffic to Internet IPs is screened by a firewall and AWS decided that it was acceptable for the ‘ecs-agent' binary to communicate with external “telemetry” endpoints that are undisclosed in any public documentation or list of AWS endpoints. Also - the hostname pattern used for that telemetry endpoint makes it impossible to globally whitelist that service on our firewall. ”
  22. 22. 22 Screening the AWS Kool-Aid: Real World Example #3 ‣ IT Project Manager: “Please take Application X out of our managed datacenter environment and deploy it according to this neat AWS Best Practices diagram I saw on the AWS blog … we really need the HA, scaling and multi-az failover benefits !!!! ” ‣ Narrator: “Your legacy application is not stateless, runs a database inside the app server, uses hardcoded hostnames in it’s configuration, uses insanely low TCP port #s (like TCP:16 !) for cluster communication and the domain/hostname you are demanding to use is not currently managed via AWS Route53”
  23. 23. 23 Screening the AWS Kool-Aid: Real World Example #4 ‣ Developer: “You folks are horrible at cloud, all my code is failing with ‘ssl verification’ errors!” ‣ Narrator: “Our organization intercepts and decrypts outbound SSL traffic to monitor for data exfiltration, malware, botnet control traffic and other bad things. If you are going to talk TLS/HTTPS to the outside world you need to install a few extra CA certificates … ”
  24. 24. 24 Screening the AWS Kool-Aid: Wrap Up ‣ The Good: AWS is incredible at open access training, documentation, “Getting Started with X” guides and other public-facing evangelism that truly drives excitement among potential users ‣ The Bad: Only the shiniest possible fully-automated cloud-native views are really expressed, legacy baggage is under-mentioned and the ‘ease of use’ message often runs up hard against real world corporate security, operational and provisioning rules. ‣ Advice: We need to be just as good at outreach and expectation setting as the AWS Evangelists are. Extensive work needs to be done to promote and advertise how we actually use the cloud within our Organization. Failures of expectation are OUR FAULT, not AWS.
  25. 25. 25 Multi-Cloud & Hybrid-Cloud chris@bioteam.net Beware marketing lies and sponsored/fake POCs _____________________ _ _____________________ _
  26. 26. 26 Hybrid Cloud Thoughts (Scientific Computing Focus) ‣ Awesome in theory and on the whiteboard. Awkward in reality. ‣ Hybrid cloud is not ideal for data intensive science workloads where our apps are encumbered by tera|petabyte data access requirements - especially when egress data transfer fees enter the mix ‣ Can work great for Chemistry, Molecular Modeling, Protein structure/folding work where the compute demands are very high but the data movement requirements are very low ‣ Blunt Advice: Design hybrid cloud to spec for your requirements using your people. Buying a Hybrid Cloud “vision” from a vendor who does not understand your data footprint can be a recipe for failure
  27. 27. 27 Multi-Cloud Thoughts ‣ Also awesome in theory and on the whiteboard but … ‣ Egress data charges and DNS control can be annoying. Plus you’ve doubled your cloud training/knowledge/ops needs ‣ Blunt Advice: Start small and keep it simple. The best multi-cloud design pattern I’ve started to see lately is this: ‣ AWS for Scientific Computing, Collaboration & Commercial Ops ‣ Azure for Active Directory, Federation and all things SSO + a few Business applications running native in Azure stack
  28. 28. 28 Starting to Wrap Up chris@bioteam.net Specific Advice Time _____________________ _ _____________________ _
  29. 29. 29 Data Intensive Science: Compute
  30. 30. 30 Data Intensive Science: Scalable Compute Power ‣ If you are cloud-native, greenfield or able to re-write your existing pipelines than use AWS Batch as the baseline compute platform ‣ If you need/prefer something that looks like a more traditional HPC Cluster (that still auto-scales) then use AWS CfnCluster ‣ If you are worried about AWS lock-in: Every commercial HPC scheduler or stack ISV has a cloud-friendly vision they’d be happy to sell you ‣ If you want to submit compute jobs via API but are worried about AWS Batch lock-in than code your submit scripts to the DRMAA API - that will at least make your stuff portable across many HPC schedulers and environments
  31. 31. 31 Data Intensive Science: Storage
  32. 32. 32 Data Intensive Science: Storage ‣ Target s3:// wherever possible. Object storage is the future of scientific data at rest. Period. ‣ Understand all the s3:// permutations including IA-class, Single-AZ IA- class and s3:// transfer acceleration options ‣ You will still probably need Windows CIFS shares at some point. Most of my clients just do this with EBS + Windows Servers
  33. 33. 33 Data Intensive Science: Storage, continued ‣ You will probably need NFS ‣ AWS EFS … is … kinda sorta ok … for some use cases ‣ Look at Avere vFXT for high-end options that can handle single namespace NFS across on-prem and multi-cloud environments ‣ Look at AWS Marketplace (SoftNas, etc) for other NAS/filer options ‣ If you need a parallel filesystem ‣ Look at GlusterFS - much easier to deploy on AWS vs. alternatives that require low-latency interconnects or metadata controllers ‣ And RedHat seems to be supporting/pushing it hard
  34. 34. 34 Data Intensive Science: Access & RBAC
  35. 35. 35 Data Intensive Science: Server Access ‣ Your staff will need friction-free access to deployed AWS servers. Despite what AWS says, not everything is an immutable cattle node ‣ Remember that one key cloud capability win is ability to delegate significant control over infrastructure to project teams — it won't ONLY be cloud Ops and Devs connecting to your footprint. Users too! ‣ This makes VPC design, multi-account strategy, VPC peering topology, Direct Connect decisions, Routing and connectivity config of extreme importance ‣ Don’t make the mistake of just designing to the technical need — account for how users/developers/support will connect!
  36. 36. 36 Data Intensive Science: Server Access, Continued ‣ Use of disconnected VPCs (no access to production resources or on- premise networks) and ‘sandbox accounts’ is totally legit ‣ Use of SSH Bastion or JumpHosts for access to VPCs when Direct Connect, VPN or VPC Peering is not wanted is totally legit ‣ Jumphost Advice: Invest major time and effort into training or wiki documentation on “SSH Tips & Tricks” including specific examples of how to do SSH port forwarding and proxy commands that tunnel through the JumpHost to hit the server the user wants ‣ This can have a 100x improvement on staff morale and productivity — one of my major early career mistakes was assuming that all SSH users were aware all the convenient time and effort-saving SSH features.
  37. 37. 37 Data Intensive Science: Linux Role Based Access Control ‣ Remember this is the real world ‣ Not all linux EC2 hosts are immutable cattle nodes ‣ You should be prepared to support humans logging in with corporate network usernames and credentials ‣ Your super awesome cloud-native architecture may not have this need now but “corporate cruft” grows over time and eventually you’ll have this need. Easier to prepare now and keep it on the shelf. ‣ Basic password checking is often not enough - what about sudo/admin access for certain users and groups?
  38. 38. 38 Data Intensive Science: Linux Role Based Access Control ‣ Modern linux w/ modern SSSD software can easily “realm join” an Active Directory domain and authenticate network users ‣ Sudo/admin access can be allowed/denied via AD group membership and an /etc/sudoers rule. SSH keys can be dropped by DevOps tooling ‣ But you won’t be able to do this: ‣ Cross transitive AD child domains. Your linux client will not be allowed to transit trust boundaries in AD forest. Simple Linux/AD integration via “realm join” is insufficient for complex AD environments ‣ No central management of POSIX stuff like personal SSH keys, $SHELL and $HOME preferences. No custom per-host sudo rules etc. Linux / AD Integration (Simple AD Topology)
  39. 39. 39 Wat? ‣ Works great: ‣ Company AD is “COMPANY.COM” ‣ Single Domain/Forest ‣ ‘realm join company.com’ will work just fine ‣ Can login with user@company.com credentials Linux and complex AD topologies ‣ DOES NOT WORK ‣ Company AD is “COMPANY.COM” ‣ Company child domains in Forest: ‣ NAFTA.COMPANY.COM ‣ EAME.COMPANY.COM ‣ APAC.COMPANY.COM Linux can join ‘company.com’ but SSSD on Linux will not be allowed by Active Directory to cross transitive trust boundaries to access users in any child domain within the AD Forest Logging in as user@company.com will not work if ‘user’ is in a child domain
  40. 40. 40 Data Intensive Science: Linux Role Based Access Control ‣ Consider deploying FreeIPA (Open Source) or Redhat Identity Management (“RHEL IdM”) server as middleware ‣ FreeIPA/IdM kinda masquerades as a Domain Controller and can cross transitive trust boundaries AND maintain trust relationships across multiple domains and directories — SUPER USEFUL ‣ FreeIPA/IdM provides full RBAC controls for Linux users, offers fine- grained control over sudo rules, can store all of the POSIX info we care about ($shell, $home preferences, personal SSH keys for each user) without requiring an AD schema change Linux / AD Integration (More capable or for more complex AD environments)
  41. 41. 41
  42. 42. 42 Data Intensive Science: App & Data Security
  43. 43. 43 Data Intensive Science: App & Data Security ‣ AWS is excellent in this area. Use the AWS security offerings wherever possible ‣ s3:// TSE crypto makes at-rest encryption trivial with no key management or user/code behavior change required ‣ AWS KMS makes encrypted EBS and RDS storage pretty easy ‣ KMS and cross-region data replication are not friendly to each other - research what this does to snapshot handling and replication costs if this applies to you ‣ If your threat model includes “AWS gets hacked or subpoenaed” then please understand that DIY encryption is EASY but DIY PKI management is super hard to do safely and securely ‣ AWS has great tools for automated scanning of environment for governance , policy and security violations. Use them, log them and alert upon them!
  44. 44. 44 Other Security Related Topics …
  45. 45. 45 Other Security Related Topics … ‣ AWS design patterns, evangelists and docs all assume that everything we run has unrestricted access to the internet via IGW or NAT Gateway ‣ AWS does not hide the alternatives (Proxies, SGs, Firewalls) but they do not go out of their way to make them visible as viable design patterns ‣ And remember, AWS API endpoints are “on the internet” because the IPs still resolve to public IPv4 or IPv6 addresses. The only thing that changes with VPC Service Endpoints is the routing path ‣ I think this is a bad assumption for AWS to make. In 2018 we need to rethink unrestricted and unmonitored VPC egress traffic. ‣ Unrestricted egress should not just be enabled for everything by default.
  46. 46. 46 Other Security Related Topics … ‣ I don’t know when they did this but AWS did make my single biggest gripe about their IP space go away! ‣ AWS does publish all of their IP space in easy to find documentation but they used to make NO DISTINCTION between IP space used by their Service APIs and IP space that could be assigned to EC2 servers operated by AWS customers ‣ For a long time this meant that it was really hard to firewall access ONLY to AWS API services — we had to switch to name-based destination policies because the AWS API IP space was commingled with IPs that could be used by dodgy EC2 servers hacking us or slinging malware our way ‣ But this has changed! You can now filter “AMAZON” out of “EC2” lists! https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html#aws-ip-egress-control
  47. 47. 47 Other Security Related Topics … ‣ Internet egress from a VPC should require informed action with positive control mechanism. It should not be enabled globally by default. ‣ At a minimum distinguish between the following and build suitable controls for each option: ‣ Things that need no external access at all ‣ Things that need access ONLY to AWS API endpoints & AWS Services ‣ Things that require access that can be whitelisted by destination hostname (software repos, patching services, code repos, etc.) ‣ Things that really require unrestricted outbound access My $.02 advice:
  48. 48. 48 Other Security Related Topics … ‣ Filter access to AWS APIs via Security Group membership ‣ Considering operation of your own fleet of Proxy Servers to control outbound access is legit ‣ Squid is great software ‣ Security controls in squid.conf allow easy rules for clients and destinations ‣ Logging is very customizable and you can export the log stream to your SEIM system to look for abuse or dark patterns. Or just archive the logs in in case of future audit need ‣ … you do have a SEIM, right? My $.02 advice:
  49. 49. 49 Other Security Related Topics … ‣ Despite all of the awesome AWS security features and services it is still totally legit to consider running virtual firewall appliances at the edge of your VPCs ‣ If your on-premise firewall maker has a virtual AWS version there are management/visibiltiy/trend-analysis advantages to running the same platform inside AWS My $.02 advice:
  50. 50. 50 Other Security Related Topics … ‣ Encrypted network traffic is a significant security threat surface ‣ IP Data theft & exfiltration of info out of your environment ‣ Malware downloads and payload delivery that bypass traditional firewall and screening methods. Control Plane for botnet management, etc. etc. ‣ I fully support organizations that decide they need to intercept, decrypt and monitor encrypted traffic streams leaving their environments. However … ‣ This is a total nightmare when done promiscuously (like intercepting AWS API traffic) — SSL decrypt needs to be weighed against the problems and hassles it causes. Should be applied judiciously ‣ Intercepting and decrypting AWS API traffic causes far more problems than the threat surface it purports to protect against (personal view) My $.02 advice:
  51. 51. 51 Final Wrap Up chris@bioteam.net Final Advice Time _____________________ _ _____________________ _
  52. 52. 52 Final AWS Cloud Advice Lessons learned from many projects and lots of mistakes
  53. 53. 53 Final Thoughts Lessons learned from many projects and lots of mistakes ‣ Fat pipes benefit all. Investing in fast connectivity to the Internet is worthwhile. Having 1gbps links to the outside world should be the new normal ‣ This also means that 1gbps or 10gbps Direct Connect is also worthwhile - but maybe a bit more business justification is needed! ‣ All IT departments and senior leaders must be read-in on cloud plans — extend training and “why we are doing X” evangelism to these folks if needed. ‣ If you don’t have complete support for the cloud roadmap you’ll end up in nasty conflict with Risk Management, Compliance, InfoSec, Network teams and the AD admins
  54. 54. 54 Final Thoughts Lessons learned from many projects and lots of mistakes ‣ Carve out ALL YOUR RFC1918 PRIVATE IP SPACE NOW and reserve it for planned and future usage. Make sure those IP addresses and CIDR blocks are never deployed at any office, site location or site-to-site VPN ‣ Reserve a ton of space so you can grow into multi-region, multi-account and multi-cloud without hitting IP address and subnet allocation limits ‣ Any sort of CIDR overlap or IP address conflict causes massive headaches with essential cloud services including VPC design, VPC peering, VPC Routing, Direct Connect, VPN Connections, etc. — careful planning will help you avoid this nightmare.
  55. 55. 55 Final Thoughts Lessons learned from many projects and lots of mistakes ‣ Think about domain names and DNS. Seriously. ‣ AWS Route53 DNS is almost essential for seamless load balancer failover, multi-az high-availability and/or global service failover. ‣ But … not all companies can or want to place their primary business domain name and DNS zone records under control of Route53 ‣ Purchase additional domain names if it makes sense. Use those with Route53 and your AWS footprint ‣ Also legit: Different domain names for “private facing” vs. “public facing” services hosted in AWS
  56. 56. 56 Final Thoughts Lessons learned from many projects and lots of mistakes ‣ Use AWS Organizations and their multi-account best practices ‣ Seriously. My only wish is that this was available 5 years earlier!
  57. 57. 57 Final Thoughts Lessons learned from many projects and lots of mistakes ‣ Log all the things to a dedicated AWS account ‣ CloudTrail (including S3 data events and Lambda if desired), AWS Config Events, VPC FLowLogs and all Cloudwatch log streams ‣ Do this on day 0 even if you don’t plan to set up your dashboard/monitoring service until much much later. This is the data you need to respond to a security event, API key breach or other serious incident. ‣ It’s also very useful for troubleshooting
  58. 58. 58 Final Thoughts Lessons learned from many projects and lots of mistakes ‣ Train your people and make training resources easy to access ‣ Cloud Unicorns are hard to find, hire and keep ‣ Everyone benefits with more skill, experience, training and lab access — and not just the front-line Ops and Dev teams. Security, Legal, networking, end-users and management all benefit from more info and awareness. ‣ Strongly advise online learning portals like https://acloud.guru/ and https://qwiklabs.com/ (there are many options in this space) ‣ My personal AWS certs are a result of cloudacademy.com, acloud.guru and the AWS re:Invent conference presentation videos
  59. 59. 59 Final Thoughts Lessons learned from many projects and lots of mistakes ‣ It’s ok to hire outside experts and professional services ‣ I have a very positive impression of AWS Professional Services ‣ I have a very positive impression of AWS Database Migration Services ‣ The specific hairy areas where professional advice is helpful: ‣ VPC, VPN, direct-connect and peering setup/planning ‣ Complex IAM policies and access rationalization ‣ Complex database migration efforts ‣ Big Data guidance
  60. 60. 60 Final Thoughts Lessons learned from many projects and lots of mistakes ‣ Plan on (and budget for) AWS Support subscription ‣ If you can afford AWS Enterprise Support than go for it ‣ It’s very very very useful ‣ Named account managers, direct access to TAMs over email & Slack ‣ Fast service on Support Cases
  61. 61. 61 Final AWS Cloud Advice Things you should BUY vs things you should BUILD
  62. 62. 62 Final Thoughts Things you should BUY vs things you should BUILD ‣ AWS Backup, DR and Cross-Region/Cross-Account Replication ‣ Don’t even think about a DIY solution ‣ Just go to https://www.n2ws.com and sign up for their Cloud Protection Manager marketplace subscription ‣ CPM is probably the best AWS Marketplace product I’ve ever seen or used. It’s seriously fantastic and pricing is reasonable ‣ It provides fantastic capabilities but does so by simply placing a UI and scheduling layer over standard AWS API calls. Nothing proprietary and nothing to trap you in to a particular backup/DR product.
  63. 63. 63 Final Thoughts Things you should BUY vs things you should BUILD ‣ SSO and Identity Federation ‣ This is a major hassle for mere IT mortals ‣ Just purchase SSO/Federation as a service from one of the many companies (Okta, etc.) that do this professionally and move on with your life
  64. 64. 64 Final Thoughts Things you should BUY vs things you should BUILD ‣ Cloud Monitoring, Reporting, Spend Optimization & Governance ‣ There are dozens of companies that offer this as a service - do yourself a favor and just screen the available options and purchase the one that best fits your need and budget model. They offer far more than native AWS services at the moment. ‣ Warning: These companies are still trying to figure out pricing models. Some charge by data volume (Splunk, etc.) while others have pricing models that may penalize Multi-Account usage. You need to screen vendors for the Features they offer as well as for any potential pricing gotchas
  65. 65. 65 Not a CloudCheckr endorsement :) All my other screenshots from dashboard systems had sensitive info requiring time consuming redaction efforts …
  66. 66. 66
  67. 67. 67
  68. 68. 68
  69. 69. 69 end; Thanks! slideshare.net/chrisdag/ chris@bioteam.net @chris_dag

×