Parallelware Trainer is an Integrated Development Environment designed to facilitate the learning, usage, and implementation of OpenMP/OpenACC parallel programming, along with the ability to test the performance improvements of particular parallel implementations.
The tool detects candidate loops for parallelizeing with OpenMP and OpenACC and provides parallel constructs for such loops.
It supports the C and C++ programming languages as well as multi-threading, tasking (loop-level only) and GPU offloading paradigms using both OpenMP and OpenACC.
Using Parallelware Trainer¶
Parallelware Trainer has a graphical user interface. Thus, you need to connect to a NERSC host either using the NoMachine client or enabling X11 forwarding (with
-Y) if you decide to use
You’ll need to load the pwtrainer module:
module load pwtrainer
Although it is not compulsory, most of the time you will want to build and run your code from Parallelware Trainer. Therefore, you should also load the modules required to build and run your code.
You need to run Parallelware Trainer on compute nodes in an interactive batch job. This is especially true when your code is using Cray MPI, Cray SHMEM, UPC, etc., as your code will fail to run on login nodes. Note also that it is against the NERSC policy to run compute-intensive work on login nodes.
Parallelware Trainer comes with bundled examples that you can use to get to know the tool and learn different parallelization strategies. You can install them through the 'Help > Install Examples' menu option:
You can open any folder containing code to start working right away without having to do any kind of project setup. Simply go to 'File > Open Project' and open either one of the examples you installed through the Install Examples menu option or any other folder with code:
You can then open a source code file by double-clicking it in the Project explorer panel on the left. Parallelization opportunities will be shown as green circles in the corresponding line number:
You can click them to create different parallel versions of the loop using multi-threading, tasking (loop-level only) and GPU offloading paradigms with OpenMP or OpenACC:
After creating different versions of the code you can build and run them from Parallelware Trainer to experiment and compare their performance:
Parallelware Trainer will also detect and report defects and recommendations right in the code editor:
We strongly recommend that you take a look at the user manual to get to know all the functionalities of Parallelware Trainer. You can access it through the 'Help > Open User Manual' menu option.
Additional info on the tool can be found in the Appentra webpage.
Use Profiling Tools¶
Since the tool relies on a static code pattern analysis in making parallelization suggestions, it does not know how much actual performance improvement will be achieved with adoption of suggested parallelization changes. To assess the resulting performance, you will need to profile code performance using profiling tools before and after the changes. If the suggested parallelization was not a performance hotspot, one is expected to only observe minor performance gains. Users are expected to work further on optimizing their code (cache use optimizations, chunk scheduling, loop collapsing, etc.) with help of a profiling tool.