Departmental seminar - John Craig Muller, Rajendra Singh and Eric MellickWhen:
Tuesday, 7th March 2017 1:00 pmWhere: EITC E2-105
Title: Use of Parallel and High Performance Computing in Power Systems Simulation and Its Challenges
Parallel processing is becoming a larger and more important area of study in both academic research and within practical application. Most Power system simulation software utilize parallel processing. Simulations in power systems are a collection of smaller electrical networks, complex models and switching elements, which run iteratively at very small computation steps. Networks are solved and may require to exchange values at the end of each step. With the increase in the size of network and details in models, computation becomes complex and intensive. Thus, use of advanced computing becomes evident. Data parallel processing increases the throughput of simulation studies by utilizing significantly large number of computing units to solve multiple data sets in parallel using a single executable. Initial parameters are fed to the simulation and the final result is collected, thus incurring low Inter-Process communication (IPC). Task parallel processing in simulation software is used to improve the performance of a single simulation.
The challenge is to realize a very large scale simulation into multiple simulations/tasks and then connected with each other using an IPC mechanism to exchange values. Values are exchanged every computation step, for example a 10 second simulation running at 10 us computation step requires 1 million message exchanges. This makes task parallelism a very communication intensive approach. In addition the scale of the system makes it exceedingly expensive to build on a real-time hardware based solution and still retain full fidelity desired.
IPC mechanisms such as local memory have low communication latencies and perform well in localhost. TCP/IP is used for local and distributed computing but incurs high communication latencies.
InfiniBand network infrastructure and Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) provides latencies across hosts in the range of 1-2 us. Simulation software can be made RDMA/RoCE aware to achieve unmatched performances by utilizing scalability of distributed computing yet achieving local memory based communication latencies.
A problem generally faced in task parallel approach is breaking apart a large, tightly coupled power system simulation into independent tasks. Due to the large number of inter-dependencies in a mesh configuration, it is very difficult to determine the optimal break points. Each break reduces the computational burden in the individual subnetworks, however these breaks add to the number of processors and interconnections required. Since large systems can be comprised of hundreds of sub-networks and potentially hundreds of inter-connects, the process of optimally mapping them onto a fixed number of processors is challenging. Identifying the break points may require an intelligent method using graph theory to quantify the influence of each subnetwork.
1) Each process is optimal when evenly loaded, while minimizing inter-process communications.
2) Each machine must be maximally loaded to take advantage of local processing capabilities.
Both of these load balancing challenges can be represented by a graph partitioning problem, where given a graph and a number of required partitions, an optimal set of partitions can be obtained, knowing in advance the target hardware platform.