SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Top 10 Perl Performance Tips

         Perrin Harkins
       We Also Walk Dogs
Devel::NYTProf
Ground Rules

● Make a repeatable test to measure progress with
   ○ Sometimes turns up surprises
● Use a profiler (Devel::NYTProf) to find where the time is
  going
   ○ Don't flail and waste time optimizing the wrong things!
● Try to weigh the cost of developer time vs buying more
  hardware
   ○ Optimization is crack for developers, hard to know when
     to stop
1. The Big Picture

● The biggest gains usually come from changing your high-
  level approach
    ○ Is there a more efficient algorithm?
    ○ Can you restructure to reduce duplicated effort?
● Sometimes you just need to tune your SQL
● A boatload of RAM hides a multitude of sins
● The bottleneck is usually I/O
    ○ Files
    ○ Database
    ○ Network
    ○ Batch I/O often makes a huge difference
2. Use DBI Efficiently

● Can make a huge difference in tight loops with many small
  queries
● connect_cached() avoids connection overhead
    ○ Or use your favorite connection cache, but beware
      overuse of ping()
● prepare_cached() avoids object creation and server-side
  prepare overhead
● Use bind parameters to reuse SQL statements instead of
  creating new ones
2. Use DBI Efficiently

● Use bind_cols() in a fetch() loop for most efficient retrieval.
    ○ Less copying is faster.
    ○ Alternatively, fetchrow_arrayref()
● prepare() and then many execute() calls is faster
  than do()
2. Use DBI Efficiently

● Turn off AutoCommit for batch changes
   ○ Commit every thousand rows or so saves work for your
     database
● Use your database's bulk loader when possible
   ○ Writing rows to CSV and using MySQL's LOAD DATA
     INFILE crushes the fastest DBI code
   ○ 10X speedup is not unusual
2. Use DBI Efficiently

● Use ORMs Wisely
   ○ Consider using straight DBI for the most performance
     sensitive sections
      ■ Removing a layer means fewer method calls and
        faster code
   ○ Write report queries by hand if they seem slow
      ■ Optimizer hints and choices about SQL variations are
        beyond the scope of ORMs but make a huge
        difference for this kind of query
3. Choose the Fastest Hash Storage

● memcached is not the fastest option for a local cache
   ○ BerkeleyDB (not DB_File!) and Cache::FastMmap are
     about twice as fast
● CHI abstracts the storage layer
   ○ Useful if you think network strategy may change later
3. Choose the Fastest Hash Storage

Cache                     Get time   Set time Run time
CHI::Driver::Memory       0.03ms     0.05ms 0.35s
BerkeleyDb                0.05ms     0.17ms   0.57s
Cache::FastMmap           0.06ms     0.09ms   0.62s
CHI::Driver::File         0.10ms     0.26ms   1.11s
Cache::Memcached::Fast    0.12ms     0.15ms   1.23s
Memcached::libmemcached   0.14ms     0.16ms   1.40s
CHI::Driver::DBI Sqlite   0.11ms     1.94ms   2.05s
Cache::Memcached          0.29ms     0.21ms   2.88s
CHI::Driver::DBI MySQL    0.45ms     0.33ms   4.41s
4. Generate Code and Compile to a
Subroutine
 ● This is how most templating tools work.
 ● Remove the cost of things that won't change for a while
    ○ Skip re-parsing templates
    ○ Skip large groups of conditionals
    ○ Choose architecture-specific code

my %subs;
my $code = qq{print "Hello $thingn";};
$subs{'hello'} = eval "sub { $code }";
$subs{'hello'}->();
5. Sling Text Efficiently

 ● Slurp files when possible.

my $text = do { local $/; <$fh>; }

 ● Seems obvious, but I still see people doing this:
my @lines = <$fh>;
my $text = join('', @lines);
 ● Consider memory with huge files.
5. Sling Text Efficiently

 ● Use a "sliding window" to search very large files.
    ○ Too big to slurp, but line-by-line is slow.
    ○ Chunks of 8K or 16K are much faster, but require book-
      keeping code.
    ○ http://www.perlmonks.org/?node_id=128925
 ● Use the cheapest string tests you can get away with.
    ○ index() beats a regex when you just want to know if a
      string contains another string
 ● Use a fast CSV parser
    ○ Text::CSV_XS is much faster than the regexes you
      copied from that web page.
6. Replace LWP With Something
Faster
● LWP is amazing, but modules built on C libraries tend to be
  faster.
    ○ LWP::Curl
    ○ HTTP::Lite
    ○ Maybe HTTP::Async for parallel

             LWP                32.8/s
             HTTP::Async        64.5/s
             HTTP::Lite         200/s
             LWP::Curl          1000/s
7. Use a Fast Serializer

 ● Data::Dumper is great for debugging, but slow for
   serialization.
 ● JSON::XS is the new speed king, and is human-readable
   and cross-language.
 ● Storable handles more and is second-best in speed.
7. Use a Fast Serializer

   YAML                84.7/s

   XML::Simple         800/s

   Data::Dumper        2143/s

   FreezeThaw          2635/s

   YAML::Syck          4307/s

   JSON::Syck          4654/s

   Storable            9774/s

   JSON::XS            41473/s
8. Avoid Startup Costs

● Use a daemon to run code persistently
   ○ Skip the costs of compiling
   ○ Cache data
   ○ Open connections ahead of time
● mod_perl, FastCGI, Plack, etc. for web
● PPerl for command-line
   ○ Or hit your web server with lwp-get
9. Sometimes You Have to Get Crazy

 ● Use the @_ array directly to avoid copying

sub add_to_sql {
    my $sqlbase = shift; # hashref
    my ($name, $value) = @_;
    if ($value) {
        push(@{ $sqlbase->{'names'} }, $name);
        push(@{ $sqlbase->{'values'} }, $value);
    }
    return $sqlbase;
}
9. Sometimes You Have to Get Crazy

sub add_to_sql {
   # takes 3 params: hashref, name, and value
   return if not $_[2];

     push(@{ $_[0]->{'names'} }, $_[1]);
     push(@{ $_[0]->{'values'} }, $_[2]);
}

    ● 40% faster than original
    ● More than 40% harder to read
10. Consider Compiling Your Own Perl

● Compiling without threads can be good for a free 15% or so.
● No code changes needed!
● Has maintenance costs.
Resources

Tim Bunce's Advanced DBI slides:
http://www.slideshare.net/Tim.Bunce/dbi-advanced-tutorial-
2007

Also see Tim's NYTProf slides:
http://www.slideshare.net/Tim.Bunce/develnytprof-v4-at-oscon-
201007

man perlperf

Programming Perl appendix on performance
Thank you!

Slides will be available on the
     conference website
Avoid tie()

 ● Slower than method calls!
 ● PITA to debug too.
Use a Fast Sort

● For sorting on derived keys, consider a GRT sort.
   ○ Faster than Schwartzian Transform
   ○ Use Sort::Maker to build it.

Weitere ähnliche Inhalte

Andere mochten auch

Social institution
Social institutionSocial institution
Social institution
Sandy Viceno
 

Andere mochten auch (11)

Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Perl Memory Use - LPW2013
Perl Memory Use - LPW2013
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
DBI Advanced Tutorial 2007
DBI Advanced Tutorial 2007DBI Advanced Tutorial 2007
DBI Advanced Tutorial 2007
 
Luigi presentation OA Summit
Luigi presentation OA SummitLuigi presentation OA Summit
Luigi presentation OA Summit
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with Luigi
 
No sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbNo sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodb
 
DBI
DBIDBI
DBI
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
Engineering a robust(ish) data pipeline with Luigi and AWS Elastic Map Reduce
Engineering a robust(ish) data pipeline with Luigi and AWS Elastic Map ReduceEngineering a robust(ish) data pipeline with Luigi and AWS Elastic Map Reduce
Engineering a robust(ish) data pipeline with Luigi and AWS Elastic Map Reduce
 
Social institution
Social institutionSocial institution
Social institution
 

Ähnlich wie Top 10 Perl Performance Tips

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
Pgbr 2013 postgres on aws
Pgbr 2013   postgres on awsPgbr 2013   postgres on aws
Pgbr 2013 postgres on aws
Emanuel Calvo
 

Ähnlich wie Top 10 Perl Performance Tips (20)

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
Scaling Up with PHP and AWS
Scaling Up with PHP and AWSScaling Up with PHP and AWS
Scaling Up with PHP and AWS
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Scaling symfony apps
Scaling symfony appsScaling symfony apps
Scaling symfony apps
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Java
 
Hive at booking
Hive at bookingHive at booking
Hive at booking
 
Pgbr 2013 postgres on aws
Pgbr 2013   postgres on awsPgbr 2013   postgres on aws
Pgbr 2013 postgres on aws
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
 
Demo 0.9.4
Demo 0.9.4Demo 0.9.4
Demo 0.9.4
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 

Mehr von Perrin Harkins

Mehr von Perrin Harkins (13)

PyGotham 2014 Introduction to Profiling
PyGotham 2014 Introduction to ProfilingPyGotham 2014 Introduction to Profiling
PyGotham 2014 Introduction to Profiling
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notes
 
Scalable talk notes
Scalable talk notesScalable talk notes
Scalable talk notes
 
Low maintenance perl notes
Low maintenance perl notesLow maintenance perl notes
Low maintenance perl notes
 
Choosing a Web Architecture for Perl
Choosing a Web Architecture for PerlChoosing a Web Architecture for Perl
Choosing a Web Architecture for Perl
 
Building Scalable Websites with Perl
Building Scalable Websites with PerlBuilding Scalable Websites with Perl
Building Scalable Websites with Perl
 
Efficient Shared Data in Perl
Efficient Shared Data in PerlEfficient Shared Data in Perl
Efficient Shared Data in Perl
 
Choosing a Templating System
Choosing a Templating SystemChoosing a Templating System
Choosing a Templating System
 
Scaling Databases with DBIx::Router
Scaling Databases with DBIx::RouterScaling Databases with DBIx::Router
Scaling Databases with DBIx::Router
 
Low-Maintenance Perl
Low-Maintenance PerlLow-Maintenance Perl
Low-Maintenance Perl
 
Care and Feeding of Large Web Applications
Care and Feeding of Large Web ApplicationsCare and Feeding of Large Web Applications
Care and Feeding of Large Web Applications
 
The Most Common Template Toolkit Mistake
The Most Common Template Toolkit MistakeThe Most Common Template Toolkit Mistake
The Most Common Template Toolkit Mistake
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Top 10 Perl Performance Tips

  • 1. Top 10 Perl Performance Tips Perrin Harkins We Also Walk Dogs
  • 3. Ground Rules ● Make a repeatable test to measure progress with ○ Sometimes turns up surprises ● Use a profiler (Devel::NYTProf) to find where the time is going ○ Don't flail and waste time optimizing the wrong things! ● Try to weigh the cost of developer time vs buying more hardware ○ Optimization is crack for developers, hard to know when to stop
  • 4. 1. The Big Picture ● The biggest gains usually come from changing your high- level approach ○ Is there a more efficient algorithm? ○ Can you restructure to reduce duplicated effort? ● Sometimes you just need to tune your SQL ● A boatload of RAM hides a multitude of sins ● The bottleneck is usually I/O ○ Files ○ Database ○ Network ○ Batch I/O often makes a huge difference
  • 5. 2. Use DBI Efficiently ● Can make a huge difference in tight loops with many small queries ● connect_cached() avoids connection overhead ○ Or use your favorite connection cache, but beware overuse of ping() ● prepare_cached() avoids object creation and server-side prepare overhead ● Use bind parameters to reuse SQL statements instead of creating new ones
  • 6. 2. Use DBI Efficiently ● Use bind_cols() in a fetch() loop for most efficient retrieval. ○ Less copying is faster. ○ Alternatively, fetchrow_arrayref() ● prepare() and then many execute() calls is faster than do()
  • 7. 2. Use DBI Efficiently ● Turn off AutoCommit for batch changes ○ Commit every thousand rows or so saves work for your database ● Use your database's bulk loader when possible ○ Writing rows to CSV and using MySQL's LOAD DATA INFILE crushes the fastest DBI code ○ 10X speedup is not unusual
  • 8. 2. Use DBI Efficiently ● Use ORMs Wisely ○ Consider using straight DBI for the most performance sensitive sections ■ Removing a layer means fewer method calls and faster code ○ Write report queries by hand if they seem slow ■ Optimizer hints and choices about SQL variations are beyond the scope of ORMs but make a huge difference for this kind of query
  • 9. 3. Choose the Fastest Hash Storage ● memcached is not the fastest option for a local cache ○ BerkeleyDB (not DB_File!) and Cache::FastMmap are about twice as fast ● CHI abstracts the storage layer ○ Useful if you think network strategy may change later
  • 10. 3. Choose the Fastest Hash Storage Cache Get time Set time Run time CHI::Driver::Memory 0.03ms 0.05ms 0.35s BerkeleyDb 0.05ms 0.17ms 0.57s Cache::FastMmap 0.06ms 0.09ms 0.62s CHI::Driver::File 0.10ms 0.26ms 1.11s Cache::Memcached::Fast 0.12ms 0.15ms 1.23s Memcached::libmemcached 0.14ms 0.16ms 1.40s CHI::Driver::DBI Sqlite 0.11ms 1.94ms 2.05s Cache::Memcached 0.29ms 0.21ms 2.88s CHI::Driver::DBI MySQL 0.45ms 0.33ms 4.41s
  • 11. 4. Generate Code and Compile to a Subroutine ● This is how most templating tools work. ● Remove the cost of things that won't change for a while ○ Skip re-parsing templates ○ Skip large groups of conditionals ○ Choose architecture-specific code my %subs; my $code = qq{print "Hello $thingn";}; $subs{'hello'} = eval "sub { $code }"; $subs{'hello'}->();
  • 12. 5. Sling Text Efficiently ● Slurp files when possible. my $text = do { local $/; <$fh>; } ● Seems obvious, but I still see people doing this: my @lines = <$fh>; my $text = join('', @lines); ● Consider memory with huge files.
  • 13. 5. Sling Text Efficiently ● Use a "sliding window" to search very large files. ○ Too big to slurp, but line-by-line is slow. ○ Chunks of 8K or 16K are much faster, but require book- keeping code. ○ http://www.perlmonks.org/?node_id=128925 ● Use the cheapest string tests you can get away with. ○ index() beats a regex when you just want to know if a string contains another string ● Use a fast CSV parser ○ Text::CSV_XS is much faster than the regexes you copied from that web page.
  • 14. 6. Replace LWP With Something Faster ● LWP is amazing, but modules built on C libraries tend to be faster. ○ LWP::Curl ○ HTTP::Lite ○ Maybe HTTP::Async for parallel LWP 32.8/s HTTP::Async 64.5/s HTTP::Lite 200/s LWP::Curl 1000/s
  • 15. 7. Use a Fast Serializer ● Data::Dumper is great for debugging, but slow for serialization. ● JSON::XS is the new speed king, and is human-readable and cross-language. ● Storable handles more and is second-best in speed.
  • 16. 7. Use a Fast Serializer YAML 84.7/s XML::Simple 800/s Data::Dumper 2143/s FreezeThaw 2635/s YAML::Syck 4307/s JSON::Syck 4654/s Storable 9774/s JSON::XS 41473/s
  • 17. 8. Avoid Startup Costs ● Use a daemon to run code persistently ○ Skip the costs of compiling ○ Cache data ○ Open connections ahead of time ● mod_perl, FastCGI, Plack, etc. for web ● PPerl for command-line ○ Or hit your web server with lwp-get
  • 18. 9. Sometimes You Have to Get Crazy ● Use the @_ array directly to avoid copying sub add_to_sql { my $sqlbase = shift; # hashref my ($name, $value) = @_; if ($value) { push(@{ $sqlbase->{'names'} }, $name); push(@{ $sqlbase->{'values'} }, $value); } return $sqlbase; }
  • 19. 9. Sometimes You Have to Get Crazy sub add_to_sql { # takes 3 params: hashref, name, and value return if not $_[2]; push(@{ $_[0]->{'names'} }, $_[1]); push(@{ $_[0]->{'values'} }, $_[2]); } ● 40% faster than original ● More than 40% harder to read
  • 20. 10. Consider Compiling Your Own Perl ● Compiling without threads can be good for a free 15% or so. ● No code changes needed! ● Has maintenance costs.
  • 21. Resources Tim Bunce's Advanced DBI slides: http://www.slideshare.net/Tim.Bunce/dbi-advanced-tutorial- 2007 Also see Tim's NYTProf slides: http://www.slideshare.net/Tim.Bunce/develnytprof-v4-at-oscon- 201007 man perlperf Programming Perl appendix on performance
  • 22. Thank you! Slides will be available on the conference website
  • 23. Avoid tie() ● Slower than method calls! ● PITA to debug too.
  • 24. Use a Fast Sort ● For sorting on derived keys, consider a GRT sort. ○ Faster than Schwartzian Transform ○ Use Sort::Maker to build it.