Even experienced software developers often guess wrong about where the performance bottlenecks are in their programs. Therefore, profile your program to see where the performance bottlenecks are and concentrate on optimizing them.
Erlang/OTP contains several tools to help finding bottlenecks:
fprofprovides the most detailed information about where the program time is spent, but it significantly slows down the program it profiles.
eprof provides time information of each function used in the program. No call graph is produced, but
eprof has considerable less impact on the program it profiles.
If the program is too large to be profiled by
cprof tools can be used to locate code parts that are to be more thoroughly profiled using
coverprovides execution counts per line per process, with less overhead than
fprof. Execution counts can, with some caution, be used to locate potential performance bottlenecks.
cprofis the most lightweight tool, but it only provides execution counts on a function basis (for all processes, not per process).
The tools are further described in
For a large system, it can be interesting to run profiling on a simulated and limited scenario to start with. But bottlenecks have a tendency to appear or cause problems only when many things are going on at the same time, and when many nodes are involved. Therefore, it is also desirable to run profiling in a system test plant on a real target system.
For a large system, you do not want to run the profiling tools on the whole system. Instead you want to concentrate on central processes and modules, which contribute for a big part of the execution.
When analyzing the result file from the profiling activity, look for functions that are called many times and have a long "own" execution time (time excluding calls to other functions). Functions that are called a lot of times can also be interesting, as even small things can add up to quite a bit if repeated often. Also ask yourself what you can do to reduce this time. The following are appropriate types of questions to ask yourself:
These questions are not always trivial to answer. Some benchmarks might be needed to back up your theory and to avoid making things slower if your theory is wrong. For details, see
fprof measures the execution time for each function, both own time, that is, how much time a function has used for its own execution, and accumulated time, that is, including called functions. The values are displayed per process. You also get to know how many times each function has been called.
fprof is based on trace to file to minimize runtime performance impact. Using
fprof is just a matter of calling a few library functions, see the
fprof manual page in Tools .
fprof was introduced in R8.
eprof is based on the Erlang
eprof shows how much time has been used by each process, and in which function calls this time has been spent. Time is shown as percentage of total time and absolute time. For more information, see the
eprof manual page in Tools.
The primary use of
cover is coverage analysis to verify test cases, making sure that all relevant code is covered.
cover counts how many times each executable line of code is executed when a program is run, on a per module basis.
Clearly, this information can be used to determine what code is run very frequently and can therefore be subject for optimization. Using
cover is just a matter of calling a few library functions, see the
cover manual page in Tools.
cprof is something in between
cover regarding features. It counts how many times each function is called when the program is run, on a per module basis.
cprof has a low performance degradation effect (compared with
fprof) and does not need to recompile any modules to profile (compared with
cover). For more information, see the
cprof manual page in Tools.
|Tool||Results||Size of Result||Effects on Program Execution Time||Records Number of Calls||Records Execution Time||Records Called by||Records Garbage Collection|
| ||Per process to screen/file||Large||Significant slowdown||Yes||Total and own||Yes||Yes|
| ||Per process/function to screen/file||Medium||Small slowdown||Yes||Only total||No||No|
| ||Per module to screen/file||Small||Moderate slowdown||Yes, per line||No||No||No|
| ||Per module to caller||Small||Small slowdown||Yes||No||No||No|
The main purpose of benchmarking is to find out which implementation of a given algorithm or function is the fastest. Benchmarking is far from an exact science. Today's operating systems generally run background tasks that are difficult to turn off. Caches and multiple CPU cores does not facilitate benchmarking. It would be best to run UNIX computers in single-user mode when benchmarking, but that is inconvenient to say the least for casual testing.
Benchmarks can measure wall-clock time or CPU time.
timer:tc/3measures wall-clock time. The advantage with wall-clock time is that I/O, swapping, and other activities in the operating system kernel are included in the measurements. The disadvantage is that the measurements vary a lot. Usually it is best to run the benchmark several times and note the shortest time, which is to be the minimum time that is possible to achieve under the best of circumstances.
runtimemeasures CPU time spent in the Erlang virtual machine. The advantage with CPU time is that the results are more consistent from run to run. The disadvantage is that the time spent in the operating system kernel (such as swapping and I/O) is not included. Therefore, measuring CPU time is misleading if any I/O (file or socket) is involved.
It is probably a good idea to do both wall-clock measurements and CPU time measurements.
Some final advice:
© 2010–2017 Ericsson AB
Licensed under the Apache License, Version 2.0.