This document provides 10 tips for improving Perl performance. Some key tips include using a profiler like Devel::NYTProf to identify bottlenecks, optimizing database queries with DBI, choosing fast hash storage like BerkeleyDB, avoiding serialization with Data::Dumper in favor of faster options like JSON::XS, and considering compiling Perl without threads for a potential 15% speed boost. Proper use of profiling is emphasized to avoid wasting time optimizing the wrong parts of code.
3. Ground Rules
● Make a repeatable test to measure progress with
○ Sometimes turns up surprises
● Use a profiler (Devel::NYTProf) to find where the time is
going
○ Don't flail and waste time optimizing the wrong things!
● Try to weigh the cost of developer time vs buying more
hardware
○ Optimization is crack for developers, hard to know when
to stop
4. 1. The Big Picture
● The biggest gains usually come from changing your high-
level approach
○ Is there a more efficient algorithm?
○ Can you restructure to reduce duplicated effort?
● Sometimes you just need to tune your SQL
● A boatload of RAM hides a multitude of sins
● The bottleneck is usually I/O
○ Files
○ Database
○ Network
○ Batch I/O often makes a huge difference
5. 2. Use DBI Efficiently
● Can make a huge difference in tight loops with many small
queries
● connect_cached() avoids connection overhead
○ Or use your favorite connection cache, but beware
overuse of ping()
● prepare_cached() avoids object creation and server-side
prepare overhead
● Use bind parameters to reuse SQL statements instead of
creating new ones
6. 2. Use DBI Efficiently
● Use bind_cols() in a fetch() loop for most efficient retrieval.
○ Less copying is faster.
○ Alternatively, fetchrow_arrayref()
● prepare() and then many execute() calls is faster
than do()
7. 2. Use DBI Efficiently
● Turn off AutoCommit for batch changes
○ Commit every thousand rows or so saves work for your
database
● Use your database's bulk loader when possible
○ Writing rows to CSV and using MySQL's LOAD DATA
INFILE crushes the fastest DBI code
○ 10X speedup is not unusual
8. 2. Use DBI Efficiently
● Use ORMs Wisely
○ Consider using straight DBI for the most performance
sensitive sections
■ Removing a layer means fewer method calls and
faster code
○ Write report queries by hand if they seem slow
■ Optimizer hints and choices about SQL variations are
beyond the scope of ORMs but make a huge
difference for this kind of query
9. 3. Choose the Fastest Hash Storage
● memcached is not the fastest option for a local cache
○ BerkeleyDB (not DB_File!) and Cache::FastMmap are
about twice as fast
● CHI abstracts the storage layer
○ Useful if you think network strategy may change later
10. 3. Choose the Fastest Hash Storage
Cache Get time Set time Run time
CHI::Driver::Memory 0.03ms 0.05ms 0.35s
BerkeleyDb 0.05ms 0.17ms 0.57s
Cache::FastMmap 0.06ms 0.09ms 0.62s
CHI::Driver::File 0.10ms 0.26ms 1.11s
Cache::Memcached::Fast 0.12ms 0.15ms 1.23s
Memcached::libmemcached 0.14ms 0.16ms 1.40s
CHI::Driver::DBI Sqlite 0.11ms 1.94ms 2.05s
Cache::Memcached 0.29ms 0.21ms 2.88s
CHI::Driver::DBI MySQL 0.45ms 0.33ms 4.41s
11. 4. Generate Code and Compile to a
Subroutine
● This is how most templating tools work.
● Remove the cost of things that won't change for a while
○ Skip re-parsing templates
○ Skip large groups of conditionals
○ Choose architecture-specific code
my %subs;
my $code = qq{print "Hello $thingn";};
$subs{'hello'} = eval "sub { $code }";
$subs{'hello'}->();
12. 5. Sling Text Efficiently
● Slurp files when possible.
my $text = do { local $/; <$fh>; }
● Seems obvious, but I still see people doing this:
my @lines = <$fh>;
my $text = join('', @lines);
● Consider memory with huge files.
13. 5. Sling Text Efficiently
● Use a "sliding window" to search very large files.
○ Too big to slurp, but line-by-line is slow.
○ Chunks of 8K or 16K are much faster, but require book-
keeping code.
○ http://www.perlmonks.org/?node_id=128925
● Use the cheapest string tests you can get away with.
○ index() beats a regex when you just want to know if a
string contains another string
● Use a fast CSV parser
○ Text::CSV_XS is much faster than the regexes you
copied from that web page.
14. 6. Replace LWP With Something
Faster
● LWP is amazing, but modules built on C libraries tend to be
faster.
○ LWP::Curl
○ HTTP::Lite
○ Maybe HTTP::Async for parallel
LWP 32.8/s
HTTP::Async 64.5/s
HTTP::Lite 200/s
LWP::Curl 1000/s
15. 7. Use a Fast Serializer
● Data::Dumper is great for debugging, but slow for
serialization.
● JSON::XS is the new speed king, and is human-readable
and cross-language.
● Storable handles more and is second-best in speed.
16. 7. Use a Fast Serializer
YAML 84.7/s
XML::Simple 800/s
Data::Dumper 2143/s
FreezeThaw 2635/s
YAML::Syck 4307/s
JSON::Syck 4654/s
Storable 9774/s
JSON::XS 41473/s
17. 8. Avoid Startup Costs
● Use a daemon to run code persistently
○ Skip the costs of compiling
○ Cache data
○ Open connections ahead of time
● mod_perl, FastCGI, Plack, etc. for web
● PPerl for command-line
○ Or hit your web server with lwp-get
18. 9. Sometimes You Have to Get Crazy
● Use the @_ array directly to avoid copying
sub add_to_sql {
my $sqlbase = shift; # hashref
my ($name, $value) = @_;
if ($value) {
push(@{ $sqlbase->{'names'} }, $name);
push(@{ $sqlbase->{'values'} }, $value);
}
return $sqlbase;
}
19. 9. Sometimes You Have to Get Crazy
sub add_to_sql {
# takes 3 params: hashref, name, and value
return if not $_[2];
push(@{ $_[0]->{'names'} }, $_[1]);
push(@{ $_[0]->{'values'} }, $_[2]);
}
● 40% faster than original
● More than 40% harder to read
20. 10. Consider Compiling Your Own Perl
● Compiling without threads can be good for a free 15% or so.
● No code changes needed!
● Has maintenance costs.
21. Resources
Tim Bunce's Advanced DBI slides:
http://www.slideshare.net/Tim.Bunce/dbi-advanced-tutorial-
2007
Also see Tim's NYTProf slides:
http://www.slideshare.net/Tim.Bunce/develnytprof-v4-at-oscon-
201007
man perlperf
Programming Perl appendix on performance