SlideShare ist ein Scribd-Unternehmen logo
1 von 3
BeautifulSoup/Selenium
Deep dive
6th May, 2020 SAKURA Internet, Inc. Research Center SR / Naoto MATSUMOTO
(C) Copyright 1996-2020 SAKURA Internet Inc
BeautifulSoup sample code
2
# apt install python3-pip
# pip3 install BeautifulSoup4
# pip3 install lxml
# vi bs4.py
#!/usr/bin/env python3
import re
import sys
import requests
from bs4 import BeautifulSoup as bs4
url = sys.argv[1]
key = sys.argv[2]
html = requests.get(url)
soup = bs4(html.content,'lxml',from_encoding='utf-8')
for script in soup(["script", "style"]): script.decompose()
text = soup.get_text()
regex = re.compile(key)
for line in text.splitlines():
if line:
match = regex.search(line)
if match:
print("%s, %s" % (url, line))
# python3 bs4.py http://www.sakura.ad.jp VPS
http://www.sakura.ad.jp, さくらのVPS
SOURCE: SAKURA Internet Research Center (2020/05)
Selenium sample code
3
# apt install python-pip curl -y
# apt install -y unzip xvfb libxi6 libgconf-2-4 -y
# apt install chromium-chromedriver -y
# pip install selenium
# pip install pyvirtualdisplay
# vi se.py
# encoding: utf-8
import sys
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--headless')
options.add_argument('disable-infobars')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(chrome_options=options)
word = "site:" + sys.argv[1] + " " + sys.argv[2]
url= "https://www.google.com/search?q={}&safe=off".format(word)
driver.get(url)
time.sleep(1)
for i, g in enumerate(driver.find_elements_by_class_name("g")):
s = g.find_element_by_class_name("s")
x = s.find_element_by_class_name("st").text
print(x)
driver.quit()
SOURCE: SAKURA Internet Research Center (2020/05)

Weitere ähnliche Inhalte

Was ist angesagt?

gunicorn introduction
gunicorn introductiongunicorn introduction
gunicorn introductionAdam Lowry
 
AWS ロボを作ろう JAWSUG Kobe
AWS ロボを作ろう JAWSUG KobeAWS ロボを作ろう JAWSUG Kobe
AWS ロボを作ろう JAWSUG Kobe崇之 清水
 
Sensu wrapper-sensu-summit
Sensu wrapper-sensu-summitSensu wrapper-sensu-summit
Sensu wrapper-sensu-summitLee Briggs
 
Cool Git Tricks (That I Learn When Things Go Badly) [1/2]
Cool Git Tricks (That I Learn When Things Go Badly) [1/2]Cool Git Tricks (That I Learn When Things Go Badly) [1/2]
Cool Git Tricks (That I Learn When Things Go Badly) [1/2]Carina C. Zona
 
Redis is the answer, what's the question - Tech Nottingham
Redis is the answer, what's the question - Tech NottinghamRedis is the answer, what's the question - Tech Nottingham
Redis is the answer, what's the question - Tech NottinghamGarry Shutler
 
IRIS-HEP Retreat: Boost-Histogram Roadmap
IRIS-HEP Retreat: Boost-Histogram RoadmapIRIS-HEP Retreat: Boost-Histogram Roadmap
IRIS-HEP Retreat: Boost-Histogram RoadmapHenry Schreiner
 
Introduction of Python & Pygame
Introduction of  Python & PygameIntroduction of  Python & Pygame
Introduction of Python & PygameTakaki Yoneyama
 
用 Bitbar Tool 寫 Script 自動擷取外幣
用 Bitbar Tool 寫 Script 自動擷取外幣用 Bitbar Tool 寫 Script 自動擷取外幣
用 Bitbar Tool 寫 Script 自動擷取外幣Win Yu
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinChad Cooper
 
Altering the real world with JavaScript - Framsia
Altering the real world with JavaScript - FramsiaAltering the real world with JavaScript - Framsia
Altering the real world with JavaScript - FramsiaJan Jongboom
 
Digital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingDigital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingHenry Schreiner
 
Golang Arg / CABA Meetup #5 - go-carbon
Golang Arg / CABA Meetup #5 - go-carbonGolang Arg / CABA Meetup #5 - go-carbon
Golang Arg / CABA Meetup #5 - go-carbonEzequiel Maraschio
 
忙しい人のためのSphinx 入門 demo
忙しい人のためのSphinx 入門 demo忙しい人のためのSphinx 入門 demo
忙しい人のためのSphinx 入門 demoFumihito Yokoyama
 

Was ist angesagt? (20)

Evolution and AI
Evolution and AIEvolution and AI
Evolution and AI
 
gunicorn introduction
gunicorn introductiongunicorn introduction
gunicorn introduction
 
AWS ロボを作ろう JAWSUG Kobe
AWS ロボを作ろう JAWSUG KobeAWS ロボを作ろう JAWSUG Kobe
AWS ロボを作ろう JAWSUG Kobe
 
Sensu wrapper-sensu-summit
Sensu wrapper-sensu-summitSensu wrapper-sensu-summit
Sensu wrapper-sensu-summit
 
GoLang & GoatCore
GoLang & GoatCore GoLang & GoatCore
GoLang & GoatCore
 
Cool Git Tricks (That I Learn When Things Go Badly) [1/2]
Cool Git Tricks (That I Learn When Things Go Badly) [1/2]Cool Git Tricks (That I Learn When Things Go Badly) [1/2]
Cool Git Tricks (That I Learn When Things Go Badly) [1/2]
 
Redis is the answer, what's the question - Tech Nottingham
Redis is the answer, what's the question - Tech NottinghamRedis is the answer, what's the question - Tech Nottingham
Redis is the answer, what's the question - Tech Nottingham
 
Arfon-Smith-feb25
Arfon-Smith-feb25Arfon-Smith-feb25
Arfon-Smith-feb25
 
RingoJS
RingoJSRingoJS
RingoJS
 
IRIS-HEP Retreat: Boost-Histogram Roadmap
IRIS-HEP Retreat: Boost-Histogram RoadmapIRIS-HEP Retreat: Boost-Histogram Roadmap
IRIS-HEP Retreat: Boost-Histogram Roadmap
 
PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8
 
C++
C++C++
C++
 
Practica54
Practica54Practica54
Practica54
 
Introduction of Python & Pygame
Introduction of  Python & PygameIntroduction of  Python & Pygame
Introduction of Python & Pygame
 
用 Bitbar Tool 寫 Script 自動擷取外幣
用 Bitbar Tool 寫 Script 自動擷取外幣用 Bitbar Tool 寫 Script 自動擷取外幣
用 Bitbar Tool 寫 Script 自動擷取外幣
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 
Altering the real world with JavaScript - Framsia
Altering the real world with JavaScript - FramsiaAltering the real world with JavaScript - Framsia
Altering the real world with JavaScript - Framsia
 
Digital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingDigital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meeting
 
Golang Arg / CABA Meetup #5 - go-carbon
Golang Arg / CABA Meetup #5 - go-carbonGolang Arg / CABA Meetup #5 - go-carbon
Golang Arg / CABA Meetup #5 - go-carbon
 
忙しい人のためのSphinx 入門 demo
忙しい人のためのSphinx 入門 demo忙しい人のためのSphinx 入門 demo
忙しい人のためのSphinx 入門 demo
 

Ähnlich wie BeautifulSoup / selenium Deep dive

05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR mattersAlexandre Moneger
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly LanguageMotaz Saad
 
Will iPython replace Bash?
Will iPython replace Bash?Will iPython replace Bash?
Will iPython replace Bash?Babel
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?Roberto Polli
 
Sphinx autodoc - automated API documentation (PyCon APAC 2015 in Taiwan)
Sphinx autodoc - automated API documentation (PyCon APAC 2015 in Taiwan)Sphinx autodoc - automated API documentation (PyCon APAC 2015 in Taiwan)
Sphinx autodoc - automated API documentation (PyCon APAC 2015 in Taiwan)Takayuki Shimizukawa
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
Sphinx autodoc - automated api documentation - PyCon.MY 2015
Sphinx autodoc - automated api documentation - PyCon.MY 2015Sphinx autodoc - automated api documentation - PyCon.MY 2015
Sphinx autodoc - automated api documentation - PyCon.MY 2015Takayuki Shimizukawa
 
How to write rust instead of c and get away with it
How to write rust instead of c and get away with itHow to write rust instead of c and get away with it
How to write rust instead of c and get away with itFlavien Raynaud
 
Could you take a look at this Python code Towards the botto.pdf
Could you take a look at this Python code Towards the botto.pdfCould you take a look at this Python code Towards the botto.pdf
Could you take a look at this Python code Towards the botto.pdfdevangmittal4
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...Rodrigo Urubatan
 
Homer - Workshop at Kamailio World 2017
Homer - Workshop at Kamailio World 2017Homer - Workshop at Kamailio World 2017
Homer - Workshop at Kamailio World 2017Giacomo Vacca
 
Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsGergely Szabó
 
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24Kei IWASAKI
 
SECCOM 2017 - Conan.io o gerente de pacote para C e C++
SECCOM 2017 - Conan.io o gerente de pacote para C e C++SECCOM 2017 - Conan.io o gerente de pacote para C e C++
SECCOM 2017 - Conan.io o gerente de pacote para C e C++Uilian Ries
 
Dev8d 2011-pipe2 py
Dev8d 2011-pipe2 pyDev8d 2011-pipe2 py
Dev8d 2011-pipe2 pyTony Hirst
 
Sphinx autodoc - automated API documentation (EuroPython 2015 in Bilbao)
Sphinx autodoc - automated API documentation (EuroPython 2015 in Bilbao)Sphinx autodoc - automated API documentation (EuroPython 2015 in Bilbao)
Sphinx autodoc - automated API documentation (EuroPython 2015 in Bilbao)Takayuki Shimizukawa
 
PVS-Studio for Linux (CoreHard presentation)
PVS-Studio for Linux (CoreHard presentation)PVS-Studio for Linux (CoreHard presentation)
PVS-Studio for Linux (CoreHard presentation)Andrey Karpov
 
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 201910 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019Matt Raible
 
Secure Programming Practices in C++ (NDC Oslo 2018)
Secure Programming Practices in C++ (NDC Oslo 2018)Secure Programming Practices in C++ (NDC Oslo 2018)
Secure Programming Practices in C++ (NDC Oslo 2018)Patricia Aas
 

Ähnlich wie BeautifulSoup / selenium Deep dive (20)

05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly Language
 
Will iPython replace Bash?
Will iPython replace Bash?Will iPython replace Bash?
Will iPython replace Bash?
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?
 
Sphinx autodoc - automated API documentation (PyCon APAC 2015 in Taiwan)
Sphinx autodoc - automated API documentation (PyCon APAC 2015 in Taiwan)Sphinx autodoc - automated API documentation (PyCon APAC 2015 in Taiwan)
Sphinx autodoc - automated API documentation (PyCon APAC 2015 in Taiwan)
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
Sphinx autodoc - automated api documentation - PyCon.MY 2015
Sphinx autodoc - automated api documentation - PyCon.MY 2015Sphinx autodoc - automated api documentation - PyCon.MY 2015
Sphinx autodoc - automated api documentation - PyCon.MY 2015
 
How to write rust instead of c and get away with it
How to write rust instead of c and get away with itHow to write rust instead of c and get away with it
How to write rust instead of c and get away with it
 
Could you take a look at this Python code Towards the botto.pdf
Could you take a look at this Python code Towards the botto.pdfCould you take a look at this Python code Towards the botto.pdf
Could you take a look at this Python code Towards the botto.pdf
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...
 
Homer - Workshop at Kamailio World 2017
Homer - Workshop at Kamailio World 2017Homer - Workshop at Kamailio World 2017
Homer - Workshop at Kamailio World 2017
 
Python para equipos de ciberseguridad
Python para equipos de ciberseguridad Python para equipos de ciberseguridad
Python para equipos de ciberseguridad
 
Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
 
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
Collaboration hack with slackbot - PyCon HK 2018 - 2018.11.24
 
SECCOM 2017 - Conan.io o gerente de pacote para C e C++
SECCOM 2017 - Conan.io o gerente de pacote para C e C++SECCOM 2017 - Conan.io o gerente de pacote para C e C++
SECCOM 2017 - Conan.io o gerente de pacote para C e C++
 
Dev8d 2011-pipe2 py
Dev8d 2011-pipe2 pyDev8d 2011-pipe2 py
Dev8d 2011-pipe2 py
 
Sphinx autodoc - automated API documentation (EuroPython 2015 in Bilbao)
Sphinx autodoc - automated API documentation (EuroPython 2015 in Bilbao)Sphinx autodoc - automated API documentation (EuroPython 2015 in Bilbao)
Sphinx autodoc - automated API documentation (EuroPython 2015 in Bilbao)
 
PVS-Studio for Linux (CoreHard presentation)
PVS-Studio for Linux (CoreHard presentation)PVS-Studio for Linux (CoreHard presentation)
PVS-Studio for Linux (CoreHard presentation)
 
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 201910 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
 
Secure Programming Practices in C++ (NDC Oslo 2018)
Secure Programming Practices in C++ (NDC Oslo 2018)Secure Programming Practices in C++ (NDC Oslo 2018)
Secure Programming Practices in C++ (NDC Oslo 2018)
 

Mehr von Naoto MATSUMOTO

Alder Lake-S CPU Temperature Monitoring
Alder Lake-S CPU Temperature MonitoringAlder Lake-S CPU Temperature Monitoring
Alder Lake-S CPU Temperature MonitoringNaoto MATSUMOTO
 
CPU製品出荷状況と消費電力の見える化
CPU製品出荷状況と消費電力の見える化CPU製品出荷状況と消費電力の見える化
CPU製品出荷状況と消費電力の見える化Naoto MATSUMOTO
 
2023年以降のサーバークラスタリング設計(メモ)
2023年以降のサーバークラスタリング設計(メモ)2023年以降のサーバークラスタリング設計(メモ)
2023年以降のサーバークラスタリング設計(メモ)Naoto MATSUMOTO
 
防災を考慮した水中調査の一考察
防災を考慮した水中調査の一考察防災を考慮した水中調査の一考察
防災を考慮した水中調査の一考察Naoto MATSUMOTO
 
旅するパケットの見える化
旅するパケットの見える化旅するパケットの見える化
旅するパケットの見える化Naoto MATSUMOTO
 
LTE-M/NB IoTを試してみる nRF9160/Thingy:91
LTE-M/NB IoTを試してみる nRF9160/Thingy:91LTE-M/NB IoTを試してみる nRF9160/Thingy:91
LTE-M/NB IoTを試してみる nRF9160/Thingy:91Naoto MATSUMOTO
 
災害時における無線モニタリングによる社会インフラの見える化
災害時における無線モニタリングによる社会インフラの見える化災害時における無線モニタリングによる社会インフラの見える化
災害時における無線モニタリングによる社会インフラの見える化Naoto MATSUMOTO
 
Network Adapter Deep dive
Network Adapter Deep diveNetwork Adapter Deep dive
Network Adapter Deep diveNaoto MATSUMOTO
 
x86_64 Hardware Deep dive
x86_64 Hardware Deep divex86_64 Hardware Deep dive
x86_64 Hardware Deep diveNaoto MATSUMOTO
 
ADS-B, AIS, APRS cheatsheet
ADS-B, AIS, APRS cheatsheetADS-B, AIS, APRS cheatsheet
ADS-B, AIS, APRS cheatsheetNaoto MATSUMOTO
 
3/4G USB modem Cheat Sheet
3/4G USB modem Cheat Sheet3/4G USB modem Cheat Sheet
3/4G USB modem Cheat SheetNaoto MATSUMOTO
 
How To Train Your ARM(SBC)
How To  Train Your ARM(SBC)How To  Train Your ARM(SBC)
How To Train Your ARM(SBC)Naoto MATSUMOTO
 
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~Naoto MATSUMOTO
 
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)Naoto MATSUMOTO
 
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化Naoto MATSUMOTO
 
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)Naoto MATSUMOTO
 

Mehr von Naoto MATSUMOTO (20)

Alder Lake-S CPU Temperature Monitoring
Alder Lake-S CPU Temperature MonitoringAlder Lake-S CPU Temperature Monitoring
Alder Lake-S CPU Temperature Monitoring
 
CPU製品出荷状況と消費電力の見える化
CPU製品出荷状況と消費電力の見える化CPU製品出荷状況と消費電力の見える化
CPU製品出荷状況と消費電力の見える化
 
5Gの見える化
5Gの見える化5Gの見える化
5Gの見える化
 
2023年以降のサーバークラスタリング設計(メモ)
2023年以降のサーバークラスタリング設計(メモ)2023年以降のサーバークラスタリング設計(メモ)
2023年以降のサーバークラスタリング設計(メモ)
 
防災を考慮した水中調査の一考察
防災を考慮した水中調査の一考察防災を考慮した水中調査の一考察
防災を考慮した水中調査の一考察
 
旅するパケットの見える化
旅するパケットの見える化旅するパケットの見える化
旅するパケットの見える化
 
LTE-M/NB IoTを試してみる nRF9160/Thingy:91
LTE-M/NB IoTを試してみる nRF9160/Thingy:91LTE-M/NB IoTを試してみる nRF9160/Thingy:91
LTE-M/NB IoTを試してみる nRF9160/Thingy:91
 
災害時における無線モニタリングによる社会インフラの見える化
災害時における無線モニタリングによる社会インフラの見える化災害時における無線モニタリングによる社会インフラの見える化
災害時における無線モニタリングによる社会インフラの見える化
 
AMDGPU ROCm Deep dive
AMDGPU ROCm Deep diveAMDGPU ROCm Deep dive
AMDGPU ROCm Deep dive
 
Network Adapter Deep dive
Network Adapter Deep diveNetwork Adapter Deep dive
Network Adapter Deep dive
 
RTL2838 DVB-T Deep dive
RTL2838 DVB-T Deep diveRTL2838 DVB-T Deep dive
RTL2838 DVB-T Deep dive
 
x86_64 Hardware Deep dive
x86_64 Hardware Deep divex86_64 Hardware Deep dive
x86_64 Hardware Deep dive
 
ADS-B, AIS, APRS cheatsheet
ADS-B, AIS, APRS cheatsheetADS-B, AIS, APRS cheatsheet
ADS-B, AIS, APRS cheatsheet
 
curl --http3 cheatsheet
curl --http3 cheatsheetcurl --http3 cheatsheet
curl --http3 cheatsheet
 
3/4G USB modem Cheat Sheet
3/4G USB modem Cheat Sheet3/4G USB modem Cheat Sheet
3/4G USB modem Cheat Sheet
 
How To Train Your ARM(SBC)
How To  Train Your ARM(SBC)How To  Train Your ARM(SBC)
How To Train Your ARM(SBC)
 
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
全国におけるCOVID-19対策の見える化 ~宿泊業の場合~
 
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
我が国の電波の使用状況/携帯電話向け割当 (2019年3月1日現在)
 
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
私たちに訪れる(かもしれない)未来と計算機によるモノコトの見える化
 
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
仮想化環境におけるバイナリー・ポータビリティの考察 (WebAssemblyの場合)
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

BeautifulSoup / selenium Deep dive

  • 1. BeautifulSoup/Selenium Deep dive 6th May, 2020 SAKURA Internet, Inc. Research Center SR / Naoto MATSUMOTO (C) Copyright 1996-2020 SAKURA Internet Inc
  • 2. BeautifulSoup sample code 2 # apt install python3-pip # pip3 install BeautifulSoup4 # pip3 install lxml # vi bs4.py #!/usr/bin/env python3 import re import sys import requests from bs4 import BeautifulSoup as bs4 url = sys.argv[1] key = sys.argv[2] html = requests.get(url) soup = bs4(html.content,'lxml',from_encoding='utf-8') for script in soup(["script", "style"]): script.decompose() text = soup.get_text() regex = re.compile(key) for line in text.splitlines(): if line: match = regex.search(line) if match: print("%s, %s" % (url, line)) # python3 bs4.py http://www.sakura.ad.jp VPS http://www.sakura.ad.jp, さくらのVPS SOURCE: SAKURA Internet Research Center (2020/05)
  • 3. Selenium sample code 3 # apt install python-pip curl -y # apt install -y unzip xvfb libxi6 libgconf-2-4 -y # apt install chromium-chromedriver -y # pip install selenium # pip install pyvirtualdisplay # vi se.py # encoding: utf-8 import sys import time from selenium import webdriver from selenium.webdriver.chrome.options import Options options = Options() options.add_argument('--headless') options.add_argument('disable-infobars') options.add_argument('--no-sandbox') driver = webdriver.Chrome(chrome_options=options) word = "site:" + sys.argv[1] + " " + sys.argv[2] url= "https://www.google.com/search?q={}&safe=off".format(word) driver.get(url) time.sleep(1) for i, g in enumerate(driver.find_elements_by_class_name("g")): s = g.find_element_by_class_name("s") x = s.find_element_by_class_name("st").text print(x) driver.quit() SOURCE: SAKURA Internet Research Center (2020/05)