SlideShare ist ein Scribd-Unternehmen logo
1 von 52
AUTOSCALED DISTRIBUTED
AUTOMATION
SELENIUM GRID / AWS
Ragavan Ambighananthan
@ragsambi
Expedia
London Selenium MeetUp Group 2016!
1
AKA ‘RUNNING TESTS WITHIN THE TIME TAKEN BY THE SLOWEST TEST CASE’
WHAT DO I GET?
• SeleniumGridScaler = Selenium Grid + AWS + Autoscaling
• DA will phenomenally shorten the UI automation run time to
few minutes
• Faster feedback cycle
• Fewer Jenkins jobs to run automation, instead of few
hundreds
• Cost effective and reliable
• Enables Continuous Integration / Continuous
Deployment
2
AGENDA
• Setting up
• Making the Grid stable
• Grid topologies
• Cost saving
• Reporting / Dashboard
3
PROBLEM DESCRIPTION
4
TOO MANY UI TESTS
PROBLEM DESCRIPTION
5
SLOW TEST / EXECUTION
PROBLEM DESCRIPTION
• Hundreds of Jenkins jobs to run all the tests
(monolithic apps)
• Not having a system to run hundreds of UI
automation tests reliably, fast and scalable in a cost
effective way is a blocker for CI / CD
• No intelligent automation report to narrow down
failures quickly!
6
SOLUTION
• To be able to run all UI automation
scenarios within the time taken by the
slowest test case
• Cost effective, scalable and reliable
• Teams focussing on automation
• Note: This is not about cross browser test coverage rather using grid for
parallel test execution
7
SETTING UP
8
TECHNOLOGIES / TOOLS USED
SELENIUMGRIDSCALER
9
SETTING UP
BIG PICTURE
SETTING UP
checkout/lx:
features/lx_fraud.feature:21:en_US
features/lx_fraud.feature:47:en_US
features/lx_responsive_design.feature:25:en_US
features/lx_responsive_design.feature:26:en_GB
features/lx_responsive_design.feature:27:en_US
features/lx_responsive_design.feature:90:de_DE
features/lx_responsive_design.feature:240:en_US
search_landing_pages/flights_tg:
features/tg_flights_revamp_hero_image.feature:120:en_US
features/tg_flights_revamp_social_sharing.feature:156:en_US
features/tg_flights_revamp_search_wizard.feature:202:en_US
features/tg_flights_revamp_search_wizard.feature:203:nl_NL
features/tg_flights_revamp_top_destinations.feature:159:en_US
features/tg_flights_revamp_top_destinations.feature:160:en_US
features/tg_flights_revamp_top_destinations.feature:161:en_US
features/tg_flights_revamp_top_destinations.feature:207:en_US
• Only scenarios that matches @stubbed | @live and @acceptance |
@regression will be included in the list to run
• All these tests will be executed concurrently
10
SAMPLE GENERATED SCENARIOS
SETTING UP
./gradlew -PnumBrowsers=150 :modulex.ui:scalaAcceptance -i -
Denvironment=JENKINS_STUBBED -Dbrowser=Grid
11
SAMPLE GENERATED SCENARIOS
use ParallelTestExecution Trait
SETTING UP
• c3.4xlarge (16 cpu / 30 GB RAM / High BW) for
thousands of test
• c3.large (2 cpu / 3.75 GB RAM / Enhanced Net) for
fewer hundreds of tests
• Hub should have enough network bandwidth but low
CPU / Memory is fine
• AMI with bootstrap SeleniumGridScaler jar, which will
act as the hub that can autoscale
• https://github.com/mhardin/SeleniumGridScaler
12
SELENIUM GRID HUB SETUP
SETTING UP
• Open Source
• Acts as an intelligent hub
• Auto scales grid nodes depending on the number of tests
• Optimized termination of nodes when not in use
• Adhoc launch of new nodes is also possible
• Talks to AWS using EC2
• Nodes are bootstrapped to attach themselves to the hub
• Supports AWS Windows as well
13
SELENIUMGRIDSCALER - HUB
• c3.xlarge
• Capable of running maximum 24 Firefox
• Number of Chrome that can be run is lesser ~15
• Node created out of AMI has bootstrap code to
help attach to the hub
14
SETTING UP
SELENIUM GRID NODE SETUP
SETTING UP
• To have your own node AMI
• Either you have to get the node AMI or create
an AWS instance, bootstrap it,create an AMI out
of it and refer it in the Hub config.
• Hub creates the node based on a config:
AMI ID, subnet, security group, node type,etc.
15
SELENIUMGRIDSCALER - NODE
SELENIUM NODE BOOTSTRAP
CODE
[root@ip-10-2-12-167 ~]# more /home/grid/grid/grid_start_node.sh
#!/bin/sh
PATH=/sbin:/usr/sbin:/bin:/usr/bin
cd /home/grid/grid
export EC2_INSTANCE_ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id || die "wget instance-id has failed:
$?"`"
# Pull down the user data, which will be a zip file containing necessary information
export NODE_TEMPLATE="/home/grid/grid/nodeConfigTemplate.json"
curl http://169.254.169.254/latest/user-data -o /home/grid/grid/data.zip
# Now, unzip the data downloaded from the userdata
unzip -o /home/grid/grid/data.zip -d /home/grid/ubuntu/grid
# Replace the instance ID in the node config file
sed "s/<INSTANCE_ID>/$EC2_INSTANCE_ID/g" $NODE_TEMPLATE > /home/grid/grid/nodeConfig.json
# Finally, run the java process in a window so browsers can run
xvfb-run --auto-servernum --server-args='-screen 0, 1600x1200x24' java -jar /home/grid/grid/selenium-server-node.jar -role node -
nodeConfig /home/grid/grid/nodeConfig.json -Dwebdriver.chrome.driver="/home/grid/grid/chromedriver" -log
/home/grid/grid/grid.log &
16
MAKING THE GRID STABLE
• Timeouts in json config
• “timeout”:240000 (ms)
• “browserTimeout”:390000 (ms)
• browserTimeout has to be bigger than ‘timeout’
and ‘webDriver’ timeout
• browserTimeout is specified in secs in command
line
TIMEOUTS
17
• If browser instance hangs (for any reason what so ever), it will take
3hrs (http client socket timeout) for the particular slot to become free.
• This timeouts the Jenkins job
• Solution:
• Fix the particular test scenario causing this issue
• Add a cronjob to kill any browser instances that is running for more
than 10mins.
• Make this as part of your Chef knife plugin
• Ref: selenium repo, PR: 227 / fixed in 285
MAKING THE GRID STABLE
TIMEOUTS
18
• Grid setup should be in the same AWS subnet
• Using multiple subnets will result in lots of
FORWARDING_TO_NODE_FAILED errors
MAKING THE GRID STABLE
AWS - SUBNET
19
• Subnet you are using should have enough free IP
addresses
• It will be a blocker for autoscaling the grid nodes
MAKING THE GRID STABLE
AWS - IP ADDRESS
20
• The webDriver object creation consumes bandwidth
in the range of 6Gbits/5min in the Hub for 250+ tests
in parallel
MAKING THE GRID STABLE
AWS - HUB BANDWIDTH
c3.4xlarge
bandwidth is “High”
c3.large can also be
used for smaller
apps
21
• Fine tune your
• -Xms
• -Xmx
• -DPOOL_MAX
MAKING THE GRID STABLE
AWS - HUB / NODE MEMORY
22
• HUB becomes unstable after running thousands of
tests
• Automate restarting of Hub after every 2000+ tests
or at the end of your test job
MAKING THE GRID STABLE
AWS - RESTARTING HUB
23
• Jenkins executor which would be running hundreds of
tests in parallel, needs to have enough CPU power.
MAKING THE GRID STABLE
AWS - JENKINS EXECUTOR CPU
c3.8xlarge when running 250+ tests in parallel
24
• Don’t rely too much on Selenium Grid’s queuing
policy
• If your average test execution time is greater than
webDriver timeout, tests will timeout at webDriver
creation itself
MAKING THE GRID STABLE
HUB QUEUING POLICY
25
• Update browsers in the node and create a new node
AMI
• Necessary browser settings:
MAKING THE GRID STABLE
UPDATE BROWSERS
26
profile =Selenium::WebDriver::Firefox::Profile.new
profile['app.update.auto'] = false
profile['app.update.enabled'] = false
profile['app.update.service.enabled'] = false
profile['dom.max_script_run_time'] = 60
profile['dom.max_chrome_script_run_time'] = 60
profile['focusmanager.testmode']=true
profile['accept_untrusted_certs']=true
profile['assume_untrusted_certificate_issuer'] = false
MAKING THE GRID STABLE
SCALE THE TEST INFRASTRUCTURE
27
GRID TOPOLOGIES
• Decide what you want before selecting the topology to be cost efficient!
• I want to release code to production ..
1. Every CL (change list)
2. Once a day
3. Once a week
4. When ever I want (on demand!)
• Based on the above answers, Do I want to run all UI automation for
5. Every CL ?
6. Every 2 hours
7. Four times a day
28
GRID TOPOLOGY - 1
HUB
• parallel execution for small projects
• 1 executor - 1 hub - 14 nodes
• eg: c3.8xlarge can execute 250*+ tests in parallel
• Test run would finish in ~5mins
c3.8xlarge
c3.large
c3.xlarge
29
….
GRID TOPOLOGY - 2
HUB
• Suitable for medium size projects (500+ tests)
• Adding one more executor (2 executors 1 hub
and 28 node),this could double your parallel
execution cases, still taking only ~5mins
c3.4xlarge
c3.8xlarge
c3.xlarge
30
….
….
GRID TOPOLOGY - 3
HUB
• Takes 2x times as previous topology, but half the
cost! (1 executor - 1 hub - 14 nodes)
• Suitable for medium size projects
• Test run would finish in ~10mins
c3.8xlarge
c3.xlargejob runs sequentially
31
….
c3.4xlarge
GRID TOPOLOGY
HUB
• One more job? Probably NOT as HUB network traffic would
make it unstable especially during webDriver creation
• c3.8xlarge network bandwidth limit is 10Gbit
c3.4xlarge
c3.8xlarge
c3.xlarge
32
….
….
GRID TOPOLOGY - 4
HUB
HUB
• Use two hubs to double
the tests (1000+)
• But speed is same as
topology 2 (~5mins)
• Double the cost
c3.8xlarge
c3.xlarge
33
c3.4xlarge
c3.4xlarge
COST SAVING
34
OPTIMAL USE OF GRID NODES
• Running 250+ tests on a grid setup with 250 slots will
take around 5mins
• Nodes are idling for the remaining 55mins of time
which is already billed by AWS
• Even during the 5mins of run, only very minority of the
tests takes around 4mins and majority of the test
complete in less than 1 min
35
COST SAVING
36
OPTIMAL USE OF GRID NODES
COST SAVING
• On a c3.8xlarge 250 tests can be run at one go
before all 32 CPU reach 100%
• Start 250 cases
• Then between every ~50 seconds or so, start 100
tests in batch, repeat this until all tests are executed
• Fine tune the delay according to your observation
37
BATCH PROCESSING
COST SAVING
GRID TOPOLOGY - BATCH PROCESSING
HUB
• Cost saving topology 1 executor - 1 hub - 16 nodes
• Can run any number of tests
• Can run 5000 UI tests within ~1hr 10mins
job runs sequentially
c3.8xlarge c3.xlarge
38
COST SAVING
c3.4xlarge
COMPARING AWS COST VS DATA CENTRE
• 1 Medium box (~$8000 / per month)
• 1 Large box (~$10000 / per month)
• 1 VM (~$2000 / per month)
• Total AWS cost for 2 Batch Processing Topologies
• ~$2400 / month (fully autoscaled and runs 9500+
UI test)
• Frequency: 9-11 times a day
39
COST SAVING
AUTOSCALING OF GRID NODES
• SeleniumGridScaler autoscales the grid nodes
• It creates AWS nodes on demand based on a
configuration file and the number of tests to run
• Optimized termination of nodes
40
COST SAVING
• http://x.x.x.x:4444/grid/admin/AutomationTestRunServlet?uuid=testRun1
&threadCount=250&browser=firefox”
• For 250 test cases, it will create 250/24 ~ 11 nodes
• It returns status codes
• 202 - request can be fulfilled by current capacity
• 201 - request can be fulfilled but AMI must be started to meet capacity
(wait for ~5mins)
41
AUTOSCALING OF GRID NODES
COST SAVING
• c3.xlarge = $0.21 per Hour (can run 24 Firefox instances)
• t2.micro = $0.013 per Hour
• 16 t2.micro for the price of 1 c3.xlarge = 16 Firefox
Conclusion:
• I would prefer to use c3.xlarge as it is more value add
• I would not have to use 15 extra IP addresses
But always this depends on your observation of your own
setup!
42
LARGE VS SMALLER NODE TYPES
COST SAVING
• Shutdown the hub when not in use
• Benefit: You are paying tiny amount to AWS when
a node is stopped than when its running
• Automate this stop and start tasks
43
STOPPING THE HUB
COST SAVING
PIPELINE
HUB
CI
build
Deploy
Job CI
automation job
checkifnodesstill
attachedtothehub
autoscalenodes
5min
44
startthehub
waitforthehubto
comeonline
ifyesshutdownthe
hub
ifno,letthehubrun
toterminaterestof
thenodes
createalarmtostop
hubifnoactivityfor
1hr
REPORTING / DASHBOARD
45
TREND CHARTS
REPORTING / DASHBOARD
46
POINT OF SALE GRID
REPORTING / DASHBOARD
47
UNIQUE ERROR REPORT
REPORTING / DASHBOARD
48
FAILURE HISTORY / ONE PAGE
REPORTING / DASHBOARD
49
HIPCHAT NOTIFICATION
REPORTING / DASHBOARD
50
INTELLIGENCE REPORTING
Automate the decision if a failure is a bug or automation
issue
•Use OCR to read failed screenshot images to get
error messages not captured by automation
• Use Java Script errors in browser console
• Use logs (Splunk) to get exceptions specific to the
test
• Use good automation failure logging best practices
FEW WORDS
• Few differences in Expedia specific SeleniumGrid
Scaler
• https://github.com/ambirag/SeleniumGridScaler,
branch: SeleniumGridScalerExp
• Dockerised!
51
QUESTIONS
52

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Case study - Test Automation of a Mobile Application
Case study - Test Automation of a Mobile ApplicationCase study - Test Automation of a Mobile Application
Case study - Test Automation of a Mobile Application
 
Mobile App Testing Strategy
Mobile App Testing StrategyMobile App Testing Strategy
Mobile App Testing Strategy
 
Run your Appium tests using Docker Android - AppiumConf 2019
Run your Appium tests using Docker Android - AppiumConf 2019Run your Appium tests using Docker Android - AppiumConf 2019
Run your Appium tests using Docker Android - AppiumConf 2019
 
Testing Best Practices
Testing Best PracticesTesting Best Practices
Testing Best Practices
 
CI CD Basics
CI CD BasicsCI CD Basics
CI CD Basics
 
QA process Presentation
QA process PresentationQA process Presentation
QA process Presentation
 
Automation With Appium
Automation With AppiumAutomation With Appium
Automation With Appium
 
Browser_Stack_Intro
Browser_Stack_IntroBrowser_Stack_Intro
Browser_Stack_Intro
 
Run tests at scale with on-demand Selenium Grid using AWS Fargate
Run tests at scale with on-demand Selenium Grid using AWS FargateRun tests at scale with on-demand Selenium Grid using AWS Fargate
Run tests at scale with on-demand Selenium Grid using AWS Fargate
 
Appium with MySQL Database
Appium with MySQL DatabaseAppium with MySQL Database
Appium with MySQL Database
 
Fundamentals of DevOps and CI/CD
Fundamentals of DevOps and CI/CDFundamentals of DevOps and CI/CD
Fundamentals of DevOps and CI/CD
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
Selenoid & Allure - how to make them work together?
Selenoid & Allure -  how to make them work together?Selenoid & Allure -  how to make them work together?
Selenoid & Allure - how to make them work together?
 
Getting started with appium
Getting started with appiumGetting started with appium
Getting started with appium
 
Test Automation Architecture
Test Automation ArchitectureTest Automation Architecture
Test Automation Architecture
 
Selenium with Cucumber
Selenium  with Cucumber Selenium  with Cucumber
Selenium with Cucumber
 
Scaling your Automated Tests: Docker and Kubernetes
Scaling your Automated Tests: Docker and KubernetesScaling your Automated Tests: Docker and Kubernetes
Scaling your Automated Tests: Docker and Kubernetes
 
Agile QA presentation
Agile QA presentationAgile QA presentation
Agile QA presentation
 
Software testing
Software testingSoftware testing
Software testing
 
Introduction to Software Test Automation
Introduction to Software Test AutomationIntroduction to Software Test Automation
Introduction to Software Test Automation
 

Andere mochten auch

Introduction to selenium_grid_workshop
Introduction to selenium_grid_workshopIntroduction to selenium_grid_workshop
Introduction to selenium_grid_workshop
seleniumconf
 

Andere mochten auch (20)

Distributed automation sel_conf_2015
Distributed automation sel_conf_2015Distributed automation sel_conf_2015
Distributed automation sel_conf_2015
 
Distributed automation selcamp2016
Distributed automation selcamp2016Distributed automation selcamp2016
Distributed automation selcamp2016
 
Scaling and Managing Selenium Grid
Scaling and Managing Selenium GridScaling and Managing Selenium Grid
Scaling and Managing Selenium Grid
 
How to work with Selenium Grid and Cloud Solutions
How to work with Selenium Grid and Cloud SolutionsHow to work with Selenium Grid and Cloud Solutions
How to work with Selenium Grid and Cloud Solutions
 
Migrating One of the Most Popular eCommerce Platforms to MongoDB
Migrating One of the Most Popular eCommerce Platforms to MongoDBMigrating One of the Most Popular eCommerce Platforms to MongoDB
Migrating One of the Most Popular eCommerce Platforms to MongoDB
 
Fast web acceptance testing with selenium-grid
Fast web acceptance testing with selenium-gridFast web acceptance testing with selenium-grid
Fast web acceptance testing with selenium-grid
 
Augmenting RDBMS with MongoDB for ecommerce
Augmenting RDBMS with MongoDB for ecommerceAugmenting RDBMS with MongoDB for ecommerce
Augmenting RDBMS with MongoDB for ecommerce
 
How to work with Selenium Grid: a quick walkthrough
How to work with Selenium Grid: a quick walkthroughHow to work with Selenium Grid: a quick walkthrough
How to work with Selenium Grid: a quick walkthrough
 
Selenium Gridで遊ぼう
Selenium Gridで遊ぼうSelenium Gridで遊ぼう
Selenium Gridで遊ぼう
 
Managing Large Selenium Grid
Managing Large Selenium Grid�Managing Large Selenium Grid�
Managing Large Selenium Grid
 
Meet the Selenium Grid
Meet the Selenium GridMeet the Selenium Grid
Meet the Selenium Grid
 
Introduction to selenium_grid_workshop
Introduction to selenium_grid_workshopIntroduction to selenium_grid_workshop
Introduction to selenium_grid_workshop
 
Selenium-Grid-Extras
Selenium-Grid-ExtrasSelenium-Grid-Extras
Selenium-Grid-Extras
 
NoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedNoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learned
 
SeConf2015: Advanced Automated Visual Testing With Selenium
SeConf2015: Advanced Automated Visual Testing With SeleniumSeConf2015: Advanced Automated Visual Testing With Selenium
SeConf2015: Advanced Automated Visual Testing With Selenium
 
Selenium Camp 2016 - Effective UI tests scaling on Java
Selenium Camp 2016 - Effective UI tests scaling on JavaSelenium Camp 2016 - Effective UI tests scaling on Java
Selenium Camp 2016 - Effective UI tests scaling on Java
 
Selenium Grid
Selenium GridSelenium Grid
Selenium Grid
 
Selenium grid workshop london 2016
Selenium grid workshop london 2016Selenium grid workshop london 2016
Selenium grid workshop london 2016
 
Grading the Quality of Selenium Tests
Grading the Quality of Selenium TestsGrading the Quality of Selenium Tests
Grading the Quality of Selenium Tests
 
How to make your functional tests really quick
How to make your functional tests really quickHow to make your functional tests really quick
How to make your functional tests really quick
 

Ähnlich wie Autoscaled Distributed Automation using AWS at Selenium London MeetUp

Ähnlich wie Autoscaled Distributed Automation using AWS at Selenium London MeetUp (20)

Autoscaled Distributed Automation Expedia Know How
Autoscaled Distributed Automation Expedia Know HowAutoscaled Distributed Automation Expedia Know How
Autoscaled Distributed Automation Expedia Know How
 
Distributed Automation(2018) - London Test Automation in Devops Meetup
Distributed Automation(2018) - London Test Automation in Devops MeetupDistributed Automation(2018) - London Test Automation in Devops Meetup
Distributed Automation(2018) - London Test Automation in Devops Meetup
 
Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)
Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)
Docker–Grid (A On demand and Scalable dockerized selenium grid architecture)
 
Powering Remote Developers with Amazon Workspaces
Powering Remote Developers with Amazon WorkspacesPowering Remote Developers with Amazon Workspaces
Powering Remote Developers with Amazon Workspaces
 
NSBCon UK nservicebus on Azure by Yves Goeleven
NSBCon UK nservicebus on Azure by Yves GoelevenNSBCon UK nservicebus on Azure by Yves Goeleven
NSBCon UK nservicebus on Azure by Yves Goeleven
 
More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)More Cache for Less Cash (DevLink 2014)
More Cache for Less Cash (DevLink 2014)
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck Threads
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
JPrime_JITServer.pptx
JPrime_JITServer.pptxJPrime_JITServer.pptx
JPrime_JITServer.pptx
 
Azure Service Fabric Mesh
Azure Service Fabric MeshAzure Service Fabric Mesh
Azure Service Fabric Mesh
 
Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)
 
Lc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangyaLc3 beijing-june262018-sahdev zala-guangya
Lc3 beijing-june262018-sahdev zala-guangya
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage maker
 
Building a multi-tenant cloud service from legacy code with Docker containers
Building a multi-tenant cloud service from legacy code with Docker containersBuilding a multi-tenant cloud service from legacy code with Docker containers
Building a multi-tenant cloud service from legacy code with Docker containers
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage maker
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Containerising bootiful microservices javaeeconf
Containerising bootiful microservices javaeeconfContainerising bootiful microservices javaeeconf
Containerising bootiful microservices javaeeconf
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Serverless in Java Lessons learnt
Serverless in Java Lessons learntServerless in Java Lessons learnt
Serverless in Java Lessons learnt
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Autoscaled Distributed Automation using AWS at Selenium London MeetUp

  • 1. AUTOSCALED DISTRIBUTED AUTOMATION SELENIUM GRID / AWS Ragavan Ambighananthan @ragsambi Expedia London Selenium MeetUp Group 2016! 1 AKA ‘RUNNING TESTS WITHIN THE TIME TAKEN BY THE SLOWEST TEST CASE’
  • 2. WHAT DO I GET? • SeleniumGridScaler = Selenium Grid + AWS + Autoscaling • DA will phenomenally shorten the UI automation run time to few minutes • Faster feedback cycle • Fewer Jenkins jobs to run automation, instead of few hundreds • Cost effective and reliable • Enables Continuous Integration / Continuous Deployment 2
  • 3. AGENDA • Setting up • Making the Grid stable • Grid topologies • Cost saving • Reporting / Dashboard 3
  • 6. PROBLEM DESCRIPTION • Hundreds of Jenkins jobs to run all the tests (monolithic apps) • Not having a system to run hundreds of UI automation tests reliably, fast and scalable in a cost effective way is a blocker for CI / CD • No intelligent automation report to narrow down failures quickly! 6
  • 7. SOLUTION • To be able to run all UI automation scenarios within the time taken by the slowest test case • Cost effective, scalable and reliable • Teams focussing on automation • Note: This is not about cross browser test coverage rather using grid for parallel test execution 7
  • 8. SETTING UP 8 TECHNOLOGIES / TOOLS USED SELENIUMGRIDSCALER
  • 10. SETTING UP checkout/lx: features/lx_fraud.feature:21:en_US features/lx_fraud.feature:47:en_US features/lx_responsive_design.feature:25:en_US features/lx_responsive_design.feature:26:en_GB features/lx_responsive_design.feature:27:en_US features/lx_responsive_design.feature:90:de_DE features/lx_responsive_design.feature:240:en_US search_landing_pages/flights_tg: features/tg_flights_revamp_hero_image.feature:120:en_US features/tg_flights_revamp_social_sharing.feature:156:en_US features/tg_flights_revamp_search_wizard.feature:202:en_US features/tg_flights_revamp_search_wizard.feature:203:nl_NL features/tg_flights_revamp_top_destinations.feature:159:en_US features/tg_flights_revamp_top_destinations.feature:160:en_US features/tg_flights_revamp_top_destinations.feature:161:en_US features/tg_flights_revamp_top_destinations.feature:207:en_US • Only scenarios that matches @stubbed | @live and @acceptance | @regression will be included in the list to run • All these tests will be executed concurrently 10 SAMPLE GENERATED SCENARIOS
  • 11. SETTING UP ./gradlew -PnumBrowsers=150 :modulex.ui:scalaAcceptance -i - Denvironment=JENKINS_STUBBED -Dbrowser=Grid 11 SAMPLE GENERATED SCENARIOS use ParallelTestExecution Trait
  • 12. SETTING UP • c3.4xlarge (16 cpu / 30 GB RAM / High BW) for thousands of test • c3.large (2 cpu / 3.75 GB RAM / Enhanced Net) for fewer hundreds of tests • Hub should have enough network bandwidth but low CPU / Memory is fine • AMI with bootstrap SeleniumGridScaler jar, which will act as the hub that can autoscale • https://github.com/mhardin/SeleniumGridScaler 12 SELENIUM GRID HUB SETUP
  • 13. SETTING UP • Open Source • Acts as an intelligent hub • Auto scales grid nodes depending on the number of tests • Optimized termination of nodes when not in use • Adhoc launch of new nodes is also possible • Talks to AWS using EC2 • Nodes are bootstrapped to attach themselves to the hub • Supports AWS Windows as well 13 SELENIUMGRIDSCALER - HUB
  • 14. • c3.xlarge • Capable of running maximum 24 Firefox • Number of Chrome that can be run is lesser ~15 • Node created out of AMI has bootstrap code to help attach to the hub 14 SETTING UP SELENIUM GRID NODE SETUP
  • 15. SETTING UP • To have your own node AMI • Either you have to get the node AMI or create an AWS instance, bootstrap it,create an AMI out of it and refer it in the Hub config. • Hub creates the node based on a config: AMI ID, subnet, security group, node type,etc. 15 SELENIUMGRIDSCALER - NODE
  • 16. SELENIUM NODE BOOTSTRAP CODE [root@ip-10-2-12-167 ~]# more /home/grid/grid/grid_start_node.sh #!/bin/sh PATH=/sbin:/usr/sbin:/bin:/usr/bin cd /home/grid/grid export EC2_INSTANCE_ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id || die "wget instance-id has failed: $?"`" # Pull down the user data, which will be a zip file containing necessary information export NODE_TEMPLATE="/home/grid/grid/nodeConfigTemplate.json" curl http://169.254.169.254/latest/user-data -o /home/grid/grid/data.zip # Now, unzip the data downloaded from the userdata unzip -o /home/grid/grid/data.zip -d /home/grid/ubuntu/grid # Replace the instance ID in the node config file sed "s/<INSTANCE_ID>/$EC2_INSTANCE_ID/g" $NODE_TEMPLATE > /home/grid/grid/nodeConfig.json # Finally, run the java process in a window so browsers can run xvfb-run --auto-servernum --server-args='-screen 0, 1600x1200x24' java -jar /home/grid/grid/selenium-server-node.jar -role node - nodeConfig /home/grid/grid/nodeConfig.json -Dwebdriver.chrome.driver="/home/grid/grid/chromedriver" -log /home/grid/grid/grid.log & 16
  • 17. MAKING THE GRID STABLE • Timeouts in json config • “timeout”:240000 (ms) • “browserTimeout”:390000 (ms) • browserTimeout has to be bigger than ‘timeout’ and ‘webDriver’ timeout • browserTimeout is specified in secs in command line TIMEOUTS 17
  • 18. • If browser instance hangs (for any reason what so ever), it will take 3hrs (http client socket timeout) for the particular slot to become free. • This timeouts the Jenkins job • Solution: • Fix the particular test scenario causing this issue • Add a cronjob to kill any browser instances that is running for more than 10mins. • Make this as part of your Chef knife plugin • Ref: selenium repo, PR: 227 / fixed in 285 MAKING THE GRID STABLE TIMEOUTS 18
  • 19. • Grid setup should be in the same AWS subnet • Using multiple subnets will result in lots of FORWARDING_TO_NODE_FAILED errors MAKING THE GRID STABLE AWS - SUBNET 19
  • 20. • Subnet you are using should have enough free IP addresses • It will be a blocker for autoscaling the grid nodes MAKING THE GRID STABLE AWS - IP ADDRESS 20
  • 21. • The webDriver object creation consumes bandwidth in the range of 6Gbits/5min in the Hub for 250+ tests in parallel MAKING THE GRID STABLE AWS - HUB BANDWIDTH c3.4xlarge bandwidth is “High” c3.large can also be used for smaller apps 21
  • 22. • Fine tune your • -Xms • -Xmx • -DPOOL_MAX MAKING THE GRID STABLE AWS - HUB / NODE MEMORY 22
  • 23. • HUB becomes unstable after running thousands of tests • Automate restarting of Hub after every 2000+ tests or at the end of your test job MAKING THE GRID STABLE AWS - RESTARTING HUB 23
  • 24. • Jenkins executor which would be running hundreds of tests in parallel, needs to have enough CPU power. MAKING THE GRID STABLE AWS - JENKINS EXECUTOR CPU c3.8xlarge when running 250+ tests in parallel 24
  • 25. • Don’t rely too much on Selenium Grid’s queuing policy • If your average test execution time is greater than webDriver timeout, tests will timeout at webDriver creation itself MAKING THE GRID STABLE HUB QUEUING POLICY 25
  • 26. • Update browsers in the node and create a new node AMI • Necessary browser settings: MAKING THE GRID STABLE UPDATE BROWSERS 26 profile =Selenium::WebDriver::Firefox::Profile.new profile['app.update.auto'] = false profile['app.update.enabled'] = false profile['app.update.service.enabled'] = false profile['dom.max_script_run_time'] = 60 profile['dom.max_chrome_script_run_time'] = 60 profile['focusmanager.testmode']=true profile['accept_untrusted_certs']=true profile['assume_untrusted_certificate_issuer'] = false
  • 27. MAKING THE GRID STABLE SCALE THE TEST INFRASTRUCTURE 27
  • 28. GRID TOPOLOGIES • Decide what you want before selecting the topology to be cost efficient! • I want to release code to production .. 1. Every CL (change list) 2. Once a day 3. Once a week 4. When ever I want (on demand!) • Based on the above answers, Do I want to run all UI automation for 5. Every CL ? 6. Every 2 hours 7. Four times a day 28
  • 29. GRID TOPOLOGY - 1 HUB • parallel execution for small projects • 1 executor - 1 hub - 14 nodes • eg: c3.8xlarge can execute 250*+ tests in parallel • Test run would finish in ~5mins c3.8xlarge c3.large c3.xlarge 29 ….
  • 30. GRID TOPOLOGY - 2 HUB • Suitable for medium size projects (500+ tests) • Adding one more executor (2 executors 1 hub and 28 node),this could double your parallel execution cases, still taking only ~5mins c3.4xlarge c3.8xlarge c3.xlarge 30 …. ….
  • 31. GRID TOPOLOGY - 3 HUB • Takes 2x times as previous topology, but half the cost! (1 executor - 1 hub - 14 nodes) • Suitable for medium size projects • Test run would finish in ~10mins c3.8xlarge c3.xlargejob runs sequentially 31 …. c3.4xlarge
  • 32. GRID TOPOLOGY HUB • One more job? Probably NOT as HUB network traffic would make it unstable especially during webDriver creation • c3.8xlarge network bandwidth limit is 10Gbit c3.4xlarge c3.8xlarge c3.xlarge 32 …. ….
  • 33. GRID TOPOLOGY - 4 HUB HUB • Use two hubs to double the tests (1000+) • But speed is same as topology 2 (~5mins) • Double the cost c3.8xlarge c3.xlarge 33 c3.4xlarge c3.4xlarge
  • 35. OPTIMAL USE OF GRID NODES • Running 250+ tests on a grid setup with 250 slots will take around 5mins • Nodes are idling for the remaining 55mins of time which is already billed by AWS • Even during the 5mins of run, only very minority of the tests takes around 4mins and majority of the test complete in less than 1 min 35 COST SAVING
  • 36. 36 OPTIMAL USE OF GRID NODES COST SAVING
  • 37. • On a c3.8xlarge 250 tests can be run at one go before all 32 CPU reach 100% • Start 250 cases • Then between every ~50 seconds or so, start 100 tests in batch, repeat this until all tests are executed • Fine tune the delay according to your observation 37 BATCH PROCESSING COST SAVING
  • 38. GRID TOPOLOGY - BATCH PROCESSING HUB • Cost saving topology 1 executor - 1 hub - 16 nodes • Can run any number of tests • Can run 5000 UI tests within ~1hr 10mins job runs sequentially c3.8xlarge c3.xlarge 38 COST SAVING c3.4xlarge
  • 39. COMPARING AWS COST VS DATA CENTRE • 1 Medium box (~$8000 / per month) • 1 Large box (~$10000 / per month) • 1 VM (~$2000 / per month) • Total AWS cost for 2 Batch Processing Topologies • ~$2400 / month (fully autoscaled and runs 9500+ UI test) • Frequency: 9-11 times a day 39 COST SAVING
  • 40. AUTOSCALING OF GRID NODES • SeleniumGridScaler autoscales the grid nodes • It creates AWS nodes on demand based on a configuration file and the number of tests to run • Optimized termination of nodes 40 COST SAVING
  • 41. • http://x.x.x.x:4444/grid/admin/AutomationTestRunServlet?uuid=testRun1 &threadCount=250&browser=firefox” • For 250 test cases, it will create 250/24 ~ 11 nodes • It returns status codes • 202 - request can be fulfilled by current capacity • 201 - request can be fulfilled but AMI must be started to meet capacity (wait for ~5mins) 41 AUTOSCALING OF GRID NODES COST SAVING
  • 42. • c3.xlarge = $0.21 per Hour (can run 24 Firefox instances) • t2.micro = $0.013 per Hour • 16 t2.micro for the price of 1 c3.xlarge = 16 Firefox Conclusion: • I would prefer to use c3.xlarge as it is more value add • I would not have to use 15 extra IP addresses But always this depends on your observation of your own setup! 42 LARGE VS SMALLER NODE TYPES COST SAVING
  • 43. • Shutdown the hub when not in use • Benefit: You are paying tiny amount to AWS when a node is stopped than when its running • Automate this stop and start tasks 43 STOPPING THE HUB COST SAVING
  • 48. REPORTING / DASHBOARD 48 FAILURE HISTORY / ONE PAGE
  • 50. REPORTING / DASHBOARD 50 INTELLIGENCE REPORTING Automate the decision if a failure is a bug or automation issue •Use OCR to read failed screenshot images to get error messages not captured by automation • Use Java Script errors in browser console • Use logs (Splunk) to get exceptions specific to the test • Use good automation failure logging best practices
  • 51. FEW WORDS • Few differences in Expedia specific SeleniumGrid Scaler • https://github.com/ambirag/SeleniumGridScaler, branch: SeleniumGridScalerExp • Dockerised! 51