Skip to Main Content

Mastering Multi-GPU in Ansys Rocky Software and Enhancing Its Performance

8월 13, 2024

READ ALOUD

PAUSE READ

Sergio Villalva | Senior R&D Verification Engineer, Ansys
rocky-multi-gpu-blog-hero

Particulate material is encountered in many industries, including automotive, healthcare, high-tech, and mining. Modeling the dynamics of these bulk materials has been a challenge for years, both from a process perspective and regarding equipment performance.

Years ago, discrete elements method (DEM) simulations were restricted to small problems that used, for example, only thousands of large particles that were mostly spherical. Only recently have both numerical methods and hardware technology been able to achieve a level of fidelity that can provide true engineering value for making decisions. Innovative algorithms for DEM methods and the computational power from graphics processing units (GPUs) have opened new horizons.

Continual improvements in DEM codes and computational power have enabled closer-to-reality particle simulations. Users today can expect to simulate problems using the real particle shape and the actual particle size distribution (PSD), creating DEM simulations with millions of particles.

However, these enhancements in simulation accuracy have come at the cost of increased computational loads in processing time and memory requirements.

With Ansys Rocky particle dynamics simulation software, these loads can be offset by using GPUs, which provide the capacity to obtain results in a more practical time frame.

The Benefits of GPUs and How They Work

The addition of GPUs has helped make DEM a practical tool for engineering design. This has allowed GPU-accelerated software codes, such as those in Rocky software, to use DEM’s full capacity, handling tens of millions of particles deployed in a DEM simulation model. Enabled by multi-GPU parallel processing capabilities, it’s now possible to dramatically expand the range of applications and apply DEM simulations to analyze problems that include hundreds of millions of particles.

Moreover, in a world where we push multiphysics simulations ever further, Rocky software GPU and multi-GPU processing enables you to free up all your CPUs for coupled simulations, avoiding hardware competition.

GPUs offer the promise of significant increases in throughput for DEM simulations, and Ansys is at the forefront of this revolution with its native GPU implementation of the Rocky software multi-GPU solver.

Large-scale DEM simulations that have millions of particles use vast amounts of memory in hardware. In addition, CPU memory can be expensive and simulation performance can vary drastically. A single CPU or GPU has a limited amount of memory, and the particle count that can be handled is still restricted to this capacity.

The multi-GPU solver in Rocky software, however, overcomes this restriction by efficiently distributing and managing the combined memory of two or more GPUs in a single motherboard. For example, a cyclone separator using multi-GPU solver technology was able to simulate 200 million particles. These kinds of extremely high particle counts were not possible previously but are now a reality thanks to the multi-GPU capabilities found in Rocky software.

cyclone-separator

Cyclone simulation with 200 million particles

Rotary Mill Performance Benchmark Case

To better illustrate the gains in processing speed that are possible for common applications and the Rocky software solver’s scalability when running on high-end GPUs, a performance benchmark case of a rotary mill was developed. The case consists of a drum, partially filled with particles, that rotates at a constant speed. From the beginning of the simulation, all the particles are already inside the drum in a steady condition.

Benchmark case parameters:

  • Drum rotation speed: 1 revolution per second
  • Simulation time: 0.01 second
  • Particle count: 16 million and 32 million
  • Particle type: polyhedron (16 triangles)
rotary-mill-case

Rotary mill performance benchmark case

The particle type chosen for the benchmarking was a polyhedron with 16 triangles, which well represented a real shape condition.

To keep the same coordination number in both cases, the same ratio of particles/length and contacts/particles was considered, so the mill length was increased in the rotation axis direction as the number of particles increased.

The GPU’s configurations:

  • One times the NVIDIA H100 Tensor Core GPU (94GB total memory)
  • Two times the NVIDIA H100 Tensor Core GPU (188GB total memory, NVLink)
  • Four times the NVIDIA H100 Tensor Core GPU (320GB total memory, NVLink)
  • Eight times the NVIDIA H100 Tensor Core GPU (640GB total memory, NVLink)
polyhedron

Polyhedron particle composed of 16 triangles

rotary-mill-geometry

The rotary mill geometry used for 32 million particles was two times the length of the one used for 16 million particles.

In both cases, the Rocky software solver showed excellent scalability with the multi-GPU runs, achieving a relative speed-up of 6.7 times with 16 million particles and 7.1 times with 32 million particles, both on eight times NVIDIA H100 Tensor Core GPUs.

16m-particles-speedup

Relative speed-up for the case with 16 million particles

32m-particles-speedup

Relative speed-up for the case with 32 million particles

The total GPU memory used by the solver to run the case with 32 million particles was less than 90GB, which means that Rocky software can handle a similar case with more than 200 million real-shaped particles.

16m-particles-memory

Total GPU memory use for the case with 16 million particles

32m-particles-memory

Total GPU memory use for the case with 32 million particles

Optimizing Results Through the Rocky Software Multi-GPU Solver

Multi-GPU processing capacity is a big enabler for the next level of physics simulation for engineering.

Based on the rotary mill test results presented in this blog, it was shown that Rocky software is compatible with and performs extremely well on NVIDIA H100 GPUs. Moreover, the software can handle huge cases with over 200 million real-shaped particles due to its low memory consumption.

By aggregating computing power, the multi-GPU solver in Rocky software overcomes memory limitations and achieves a substantial performance increase. The software can speed up your particle simulations and facilitate large-scale simulations involving millions of particles.

For guidelines and recommendations before investing in new hardware, check out our GPU buying guide FAQ.

For more information about Rocky software, contact us.


Just for you. We have some additional resources you may enjoy.

TAKE A LOOK


Recommendations

Mastering Multi-GPU in Ansys Rocky Software and Enhancing Its Performance

Mastering Multi-GPU in Ansys Rocky Software and Enhancing Its Performance

Read how with Ansys Rocky particle dynamics simulation software, increased computational loads can be offset by using graphics processing units.

Maximizing Simulation by Coupling Structural and Particle Software

Maximizing Simulation by Coupling Structural and Particle Software

Combining the powers of structural and particle simulation enables you to gain a greater understanding of processes while also reducing costs and saving time.

How Simulation Boosts Efficiency in Battery Manufacturing

How Simulation Boosts Efficiency in Battery Manufacturing

Streamlining electric vehicle battery production using multiphysics simulation can help drive down costs and increase consumer sales.

The Advantage Blog

The Ansys Advantage blog, featuring contributions from Ansys and other technology experts, keeps you updated on how Ansys simulation is powering innovation that drives human advancement.