|
PerfSuite
Description
PerfSuite is an easy-to-use collection of tools and libraries to support application software performance analysis on Linux-based systems (x86, x86-64, ia64, ppc64 and ppc32). It includes components to assist with performance measurement tasks, such as hardware performance counting and profiling, and itimer profiling.
How to use PerfSuite
-
PerfSuite's "
psrun " tool requires the measured program to be dynamically linked. You can use "file <program_path> " to check this. The default compiler options on Blue Waters will build a statically linked executable. You can add the "-dynamic " option to the linker ("ld") or the compiler linker driver program ("cc") to create a dynamically linked executable.
-
Load the PerfSuite module:
module load perfsuite
or
module load perfsuite/<specific_version>
-
Use PerfSuite's "
psrun " to count or profile an executable without recompiling or relinking. To count, do:
aprun -n <num> psrun -f -p <my_program> <my_pgm_args>
To profile, do:
aprun -n <num> psrun -C -c <profiling_conf_xml> \
-f -p <my_program> <my_pgm_args>
-
Use PerfSuite's "
psprocess " to post-process the generated XML files:
psprocess <my_pgm.*.xml>
Examples
module load perfsuite
aprun -n 8 psrun -C -c papi_profile_cycles.xml \
/home/user123/namd-12.3/bin/namd -c a.conf
psprocess namd.0.98765.nid01234.xml
How to use PerfSuite to access Cray Gemini network counters
-
Make sure the "craype-network-gemini", "craype-interlagos" and "papi-5.1.0.2 <or later>" modules are loaded. Currently the first 2 are in the default list. The PAPI one is automatically done when a user loads the "perfsuite" or "perfsuite/<version>" module.
-
Set the environment variable CRAY_NPU_ACCESS to 1, 2, or 4 depending on your needs. An example:
export CRAY_NPU_ACCESS=1
Please see Cray documentation "Using the PAPI Cray NPU Component" for details.
-
Use "aprun" to start a job to make sure that it runs on a compute node, so that the job indeed uses the Gemini network.
-
Use a PerfSuite configuration file that contains only the Gemini events -- that is, no mixing of PAPI preset/native CPU events such as "PAPI_TOT_CYC" with the Gemini NPU events. This is a PAPI restriction.
An example:
nid25331 $ cat gm_events-2.xml
<?xml version="1.0" encoding="UTF-8" ?>
<ps_hwpc_eventlist class="PAPI">
<ps_hwpc_event name="GM_RMT_PERF_PUT_BYTES_RX" type="native">
<ps_hwpc_event name="GM_RMT_PERF_SEND_BYTES_RX" type="native">
</ps_hwpc_eventlist>
nid25331 $ aprun -n 1 psrun -c gm_events-2.xml top -b -n5 > /dev/null
The full list of available Gemini NPU events can be obtained by running "aprun -n 1 papi_native_avail" with the PAPI module loaded. They are the events named "craynpu:::GM_...", close to the end of the output. Both with and without the leading "craynpu:::" string work.
Known Issues
There are two minor issues with the perfsuite/1.1.3 module. They occur only when GNU compilers are used, and only when doing profiling.
-
On both login and compute nodes, when running psprocess to do source code mapping -- to find the line numbers that are hot spots from profiled samples -- with GNU compiler generated programs does not work. The cause is likely due to issues in libbfd, as the "addr2line" utility in the bfd-utils package does not work either.
-
On login nodes, when running psprocess with psrun-generated profile XML files, PerfSuite's "psprocess" gave error messages at the beginning, complaining about BFD dwarf version, such as:
ERROR> BFD: Dwarf Error: found dwarf version '4',
this reader only handles version 2 and 3 information.
This is because libbfd version on login nodes (2.20.0.20100122-0.7.9) are different from that on the compute nodes (2.21.1). You can safely ignore these error messages.
Additional Information / References
-
For MPI programs, please remember to use the "-f" option (meaning "fork") for "
psrun "; for OpenMP programs, use the "-p" option (meaning "pthread"); for hybrid programs (MPI+OpenMP), use both "-f -p" options.
-
PerfSuite project web site: http://perfsuite.ncsa.illinois.edu.
|