SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
or Yet Another grep for the CPAN !
TPC 2017 Nicolas atoomic@cpan.org @atoomic
grep.metacpan.org
or ack.metacpan.org ?
The Original One
from David Leadbeater
Other alternatives ?
• ☞ CPAN::Visitor
• quick & dirty solution ?
download all the tarballs
extract them
use “grep -r” or ack ?
And YES this works !
Could we improve it ?
Yes we can !
> git init .
> git add .
> git commit -m init
> git tag root
> git grep My::Package
# git alias start
you said, git grep ?
-E, --extended-regexp, -G, --basic-regexp
Use POSIX extended/basic regexp for
patterns. Default is to use basic regexp.
-P, --perl-regexp
Use Perl-compatible regexp for patterns.
Requires libpcre to be compiled in.
Pushing git to the limits
1,042,384 files 

( 1,008,366 for the extracted CPAN )
~17 Gb git repo
~1.5 Gb .git index
GitHub…
Build a frontend
very basic Dancer App
daily cron to update the git repo
Let’s use the website
Filter
How Fast ?
worst case: no match - timeout ~15 sec
easy search: pretty fast ~5 sec
Optimizations
• return when we have enough results (for first page)
• keep running the query in background
• queries are cached
• only do “grep -P” when required
Where is it ?
grep.metacpan.org
Known Bugs
• pretty young project



• GitHub tickets


PCRE search not working in production
Links on module name broken in code extract
patches welcome !
Toolchain Summit 2017 Lyon
Thanks to everyone and all event sponsors

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

How to lock a Python in a cage? Managing Python environment inside an R project
How to lock a Python in a cage?  Managing Python environment inside an R projectHow to lock a Python in a cage?  Managing Python environment inside an R project
How to lock a Python in a cage? Managing Python environment inside an R project
 
Google I/O 2016 recap
Google I/O 2016 recapGoogle I/O 2016 recap
Google I/O 2016 recap
 
Automating AWS Infrastructure Provisioning Using Concourse and Terraform
Automating AWS Infrastructure Provisioning Using Concourse and TerraformAutomating AWS Infrastructure Provisioning Using Concourse and Terraform
Automating AWS Infrastructure Provisioning Using Concourse and Terraform
 
rtpengine and kamailio - or how to simulate calls at scale
rtpengine and kamailio - or how to simulate calls at scalertpengine and kamailio - or how to simulate calls at scale
rtpengine and kamailio - or how to simulate calls at scale
 
Track4K in production at the University of Cape Town
Track4K in production at the University of Cape TownTrack4K in production at the University of Cape Town
Track4K in production at the University of Cape Town
 
rtpengine - Media Relaying and Beyond
rtpengine - Media Relaying and Beyondrtpengine - Media Relaying and Beyond
rtpengine - Media Relaying and Beyond
 
Python仮想環境構築の基礎と ツールの比較
Python仮想環境構築の基礎と ツールの比較Python仮想環境構築の基礎と ツールの比較
Python仮想環境構築の基礎と ツールの比較
 
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動するStargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
Stargz Snapshotter: イメージのpullを省略してcontainerdでコンテナを高速に起動する
 
p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4
 
Java. Есть ли свет в конце тоннеля
Java. Есть ли свет в конце тоннеляJava. Есть ли свет в конце тоннеля
Java. Есть ли свет в конце тоннеля
 
SAP Inside Track Vienna 2018 #sitVIE - Back to the Future by adopting OO in A...
SAP Inside Track Vienna 2018 #sitVIE - Back to the Future by adopting OO in A...SAP Inside Track Vienna 2018 #sitVIE - Back to the Future by adopting OO in A...
SAP Inside Track Vienna 2018 #sitVIE - Back to the Future by adopting OO in A...
 
Ce di l_1800_0
Ce di l_1800_0Ce di l_1800_0
Ce di l_1800_0
 
Python meetup 2
Python meetup 2Python meetup 2
Python meetup 2
 
第 10 回 Webteko
第 10 回 Webteko第 10 回 Webteko
第 10 回 Webteko
 
Html5 devconf nodejs_devops_shubhra
Html5 devconf nodejs_devops_shubhraHtml5 devconf nodejs_devops_shubhra
Html5 devconf nodejs_devops_shubhra
 
Git
GitGit
Git
 
Git Aliases of the Gods!
Git Aliases of the Gods!Git Aliases of the Gods!
Git Aliases of the Gods!
 
GitGot: The Swiss Army Chainsaw of Git Repo Management
GitGot: The Swiss Army Chainsaw of Git Repo ManagementGitGot: The Swiss Army Chainsaw of Git Repo Management
GitGot: The Swiss Army Chainsaw of Git Repo Management
 

Ähnlich wie grep.metacpan.org

Ähnlich wie grep.metacpan.org (20)

Git Started With Git
Git Started With GitGit Started With Git
Git Started With Git
 
Everything you didn't know you needed
Everything you didn't know you neededEverything you didn't know you needed
Everything you didn't know you needed
 
Pipfile, pipenv, pip… what?!
Pipfile, pipenv, pip… what?!Pipfile, pipenv, pip… what?!
Pipfile, pipenv, pip… what?!
 
Using GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnlUsing GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnl
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB
 
Smalltalk on Git
Smalltalk on GitSmalltalk on Git
Smalltalk on Git
 
Being dangerous with git
Being dangerous with gitBeing dangerous with git
Being dangerous with git
 
The Secrets of The FullStack Ninja - Part A - Session I
The Secrets of The FullStack Ninja - Part A - Session IThe Secrets of The FullStack Ninja - Part A - Session I
The Secrets of The FullStack Ninja - Part A - Session I
 
Pcapy and dpkt - tcpdump on steroids - Ran Leibman - DevOpsDays Tel Aviv 2018
Pcapy and dpkt - tcpdump on steroids - Ran Leibman - DevOpsDays Tel Aviv 2018Pcapy and dpkt - tcpdump on steroids - Ran Leibman - DevOpsDays Tel Aviv 2018
Pcapy and dpkt - tcpdump on steroids - Ran Leibman - DevOpsDays Tel Aviv 2018
 
How to Really Get Git
How to Really Get GitHow to Really Get Git
How to Really Get Git
 
Becoming a Git Master - Nicola Paolucci
Becoming a Git Master - Nicola PaolucciBecoming a Git Master - Nicola Paolucci
Becoming a Git Master - Nicola Paolucci
 
Becoming a Git Master
Becoming a Git MasterBecoming a Git Master
Becoming a Git Master
 
Python performance engineering in 2017
Python performance engineering in 2017Python performance engineering in 2017
Python performance engineering in 2017
 
CPAN Training
CPAN TrainingCPAN Training
CPAN Training
 
Matt Gauger - Git & Github web414 December 2010
Matt Gauger - Git & Github web414 December 2010Matt Gauger - Git & Github web414 December 2010
Matt Gauger - Git & Github web414 December 2010
 
Git why how when and more
Git   why how when and moreGit   why how when and more
Git why how when and more
 
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
 
zebra & openconfigd Introduction
zebra & openconfigd Introductionzebra & openconfigd Introduction
zebra & openconfigd Introduction
 
Sacándole jugo a git
Sacándole jugo a gitSacándole jugo a git
Sacándole jugo a git
 
Working with Git
Working with GitWorking with Git
Working with Git
 

Mehr von ℕicolas ℝ.

Mehr von ℕicolas ℝ. (8)

Overloading Perl OPs using XS
Overloading Perl OPs using XSOverloading Perl OPs using XS
Overloading Perl OPs using XS
 
2018 Perl Retrospective at Houston.pm
2018 Perl Retrospective at Houston.pm2018 Perl Retrospective at Houston.pm
2018 Perl Retrospective at Houston.pm
 
Lightning Talk Perl Test mock module
Lightning Talk Perl Test mock moduleLightning Talk Perl Test mock module
Lightning Talk Perl Test mock module
 
Perl XS by example
Perl XS by examplePerl XS by example
Perl XS by example
 
Introduction to Perl Internals
Introduction to Perl InternalsIntroduction to Perl Internals
Introduction to Perl Internals
 
Amazon::Dash::Button
Amazon::Dash::ButtonAmazon::Dash::Button
Amazon::Dash::Button
 
YAPC::EU 2015 - Perl Conferences
YAPC::EU 2015 - Perl ConferencesYAPC::EU 2015 - Perl Conferences
YAPC::EU 2015 - Perl Conferences
 
Perl object ?
Perl object ?Perl object ?
Perl object ?
 

Kürzlich hochgeladen

Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
ChloeMeadows1
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
lolsDocherty
 

Kürzlich hochgeladen (17)

I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
The Rise of Subscription-Based Digital Services.pdf
The Rise of Subscription-Based Digital Services.pdfThe Rise of Subscription-Based Digital Services.pdf
The Rise of Subscription-Based Digital Services.pdf
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
AI Generated 3D Models | AI 3D Model Generator
AI Generated 3D Models | AI 3D Model GeneratorAI Generated 3D Models | AI 3D Model Generator
AI Generated 3D Models | AI 3D Model Generator
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
GOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdfGOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdf
 
Free scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirtsFree scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirts
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
Registry Data Accuracy Improvements, presented by Chimi Dorji at SANOG 41 / I...
Registry Data Accuracy Improvements, presented by Chimi Dorji at SANOG 41 / I...Registry Data Accuracy Improvements, presented by Chimi Dorji at SANOG 41 / I...
Registry Data Accuracy Improvements, presented by Chimi Dorji at SANOG 41 / I...
 

grep.metacpan.org

  • 1. or Yet Another grep for the CPAN ! TPC 2017 Nicolas atoomic@cpan.org @atoomic grep.metacpan.org or ack.metacpan.org ?
  • 2. The Original One from David Leadbeater
  • 3. Other alternatives ? • ☞ CPAN::Visitor • quick & dirty solution ? download all the tarballs extract them use “grep -r” or ack ? And YES this works !
  • 5. Yes we can ! > git init . > git add . > git commit -m init > git tag root > git grep My::Package # git alias start
  • 6. you said, git grep ? -E, --extended-regexp, -G, --basic-regexp Use POSIX extended/basic regexp for patterns. Default is to use basic regexp. -P, --perl-regexp Use Perl-compatible regexp for patterns. Requires libpcre to be compiled in.
  • 7. Pushing git to the limits 1,042,384 files 
 ( 1,008,366 for the extracted CPAN ) ~17 Gb git repo ~1.5 Gb .git index
  • 9.
  • 10. Build a frontend very basic Dancer App daily cron to update the git repo
  • 11. Let’s use the website
  • 12.
  • 13.
  • 15. How Fast ? worst case: no match - timeout ~15 sec easy search: pretty fast ~5 sec Optimizations • return when we have enough results (for first page) • keep running the query in background • queries are cached • only do “grep -P” when required
  • 16. Where is it ? grep.metacpan.org
  • 17. Known Bugs • pretty young project
 
 • GitHub tickets 
 PCRE search not working in production Links on module name broken in code extract patches welcome !
  • 18. Toolchain Summit 2017 Lyon Thanks to everyone and all event sponsors