Executive Summary

Without losing you quickly due to fabulous marketing and product information. I thought it prudent if I get to the answer of my question as quickly as possible. What was the question? Should I invest in a companion processor and an ANSYS HPC Pack? Yes, now is the time.

If your current workstation is beginning to show its age and you are unable to purchase new hardware.
Get your critical results fast. It is painful waiting upwards of 50 hours for your solution to solve.
Assign a dollar value to your current frustration level.
Current pricing for Nvidia TESLA C2075 is around $2500. Current pricing for the NVidia Quadro 6000 is around $5,000.

“Keeping It Real”

Matt Sutton our Lead Software Development Engineer at PADT, Inc. was solving a very large ANSYS MAPDL acoustic ultrasound wave propagation model. The model labored over 12 hours to solve on Matt’s older Dell Precision 690 8 core workstation. We loaded the model onto our CUBE HVPC w8i with the NVidia Tesla C2075 and it solved the model in 50 minutes. That is a 15x speedup for Matt’s particular model!

Benchmarks

I started off with our benchmark assault using an industry standard ANSYS benchmark for HPC. This is a benchmark that NVidia requested we use for testing our TESLA C2075. The next benchmark that I used was an internal PADT, Inc. modal coupler benchmark.

Nvidia V14sp-5 ANSYS R14 Benchmark Matrix (Sparse solver, 2100k)

CUBE HVPC – w12a-GPU – 64GB

CONFIG A

64GB

Cores

Config A: Win v14 SMP In-Core

Config A: Win v14 SMP In-Core & GPU

Config A: Win v14 DIST In-Core

Config A: Win v14 DIST & GPU

Config A: Linux v14 SMP In-Core

Config A: Linux v14 SMP In-Core & GPU

Config A: Linux v14 DIST In-Core

Config A: Linux v14 DIST & GPU

speedup

1388.20

463.40

1425.50

294.40

1324.40

466.30

1312.30

292.10

870.00

447.20

855.00

232.40

817.00

422.00

975.40

221.80

670.60

440.20

588.00

245.10

625.50

394.00

601.60

208.10

594.30

439.10

605.00

222.00

480.10

383.90

542.30

181.10

539.50

426.90

435.00

283.40

491.30

392.80

396.90

220.80

538.40

437.10

402.20

211.10

480.10

394.80

343.80

183.10

CUBE HVPC – w8i-GPU – 64GB

CONFIG B

64GB

# Cores

Config B: Win v14 SMP In-Core

Config B: Win v14 SMP In-Core & GPU

Config B: Win v14 DIST In-Core

Config B: Win v14 DIST & GPU

Config B: Linux v14 SMP In-Core

Config B: Linux v14 SMP In-Core & GPU

Config B: Linux v14 DIST In-Core

Config B: Linux v14 DIST & GPU

1078.50

361.00

1111.90

235.70

1036.00

350.20

1072.90

250.60

645.50

330.70

652.30

185.50

608.10

312.00

790.90

193.20

494.30

322.70

458.30

233.70

464.90

303.20

502.70

178.70

438.50

328.30

462.20

230.40

406.10

304.80

451.20

166.00

2.5x

CUBE HVPC – w16i-GPU – 128GB

128 GB

# Cores

Config C: Linux v14 SMP In-Core & GPU

Config C: Linux v14 DIST & GPU

296.6

208.10

254.4

160.70

254.2

164.20

239.9

138.20

238.6

159.70

246.3

129.60

237.6

129.1

248.9

130.5

CUBE HVPC – PADT, Inc. – Coupling Modal Benchmark (PCG solver, ~1 Million DOF, 50 modes)

CUBE HVPC w12i w/GPU ANSYS 13.0	Shared Memory Parallel	INTEL XEON 2×6 @3.47GHz /144GB of RAM	w12i-GPU
	Processors	Time Spent Computing Solution (secs)	Date/Individuals Initials:
	2	5416.4	11/7/2011 – DRJM
	10+GPU	1914.2 (incore)	11/8/2011 – DRJM
	12+GPU	1946 (incore)	11/8/2011 – DRJM
CUBE HVPC w8i w/GPU ANSYS R14	Shared Memory Parallel	INTEL XEON 2×4 @2.8GHz /64GB of RAM	w8i-GPU
	6+GPU	3659.4 (out of core)	4/11/12 – DRJM
	8+GPU	3686.7 (out of core)	4/11/12 – DRJM
CUBE HVPC w16i w/GPU ANSYS R14	Shared Memory Parallel	INTEL XEON e5-2690 2×8 @2.9GHz /128GB of RAM	W16i-GPU
	14+GPU	2113 (incore)	4/18/12 – DRJM
	16+GPU	1533.9 (incore)	4/18/12 – DRJM

Summary, Debates, Controversy & Conclusions

One question that I hear often is this: “What operating systems is faster. Linux or Windows?”. My typical response will begin with “Well, it depends…” However, what the data illustrates with the two independent CPU workstations as well as Operating Systems. The ANSYS benchmarks were all performed in the same relative time frame. The only exception was the ANSYS modal analysis benchmark.

Are you ready for the answer? Here it is…yes Linux is faster than windows! Even if you come from the AMD CPU or INTEL CPU side of the tracks. Linux is faster! No big discovery right? I know we all knew this already but we were maybe afraid to ask by Operating system as well as CPU manufacture how much? Well with our 2.1 million degree of freedom Nvidia benchmark. Ummm, not very much as the data clearly indicates. However once again, the Linux based OS does give you a system performance advantage! If you use AMD or INTEL it is still faster. The next question that I hear next is: “What processor is faster AMD or INTEL?”. My typical response will begin with “Well, it depends…”

You did buy the NVidia TESLA C2075 GPU, okay this is great news! However, you figured it best that you keep it safe, stay the same, and continue to use Shared Memory Parallel solve method. As you ponder further on the speed up values. You will see once again that the best results are going to be found on the Linux Operating System and when you choose to solve in Distributed Memory Parallel mode. The AMD based CPU had the best speedup values when comparing the workstation against the two solve modes with a 2.2x’s speed up.

Lets begin to unpack this data even more and see if you can come up with your own judgments. This is where things might start to get controversial…so hang on.

Windows, Linux, AMD or INTEL

Time Spent Computing The Solution

ANSYS R14 Distributed Memory Parallel Results: (Incore)
- LINUX 64-bit:
  - One minute forty-seven seconds faster on 2x AMD CPU (343.80 seconds vs. 451.20 seconds)
- Windows 7 Professional 64-bit:
  - One minute faster on 2x AMD CPU (402.20 seconds vs. 462.20 seconds).
ANSYS R14 Shared Memory Parallel Results: (Incore)
- LINUX 64-bit
  - One minute fourteen seconds faster on 2x INTEL XEON CPU (406.10 seconds vs. 480.10 seconds)
- Windows 7 Professional 64-bit is sixty seconds faster solve time on AMD based CPU
  - One minute forty seconds faster on 2x INTEL XEON CPU (438.50 vs. 538.40 seconds)

The Nvidia TESLA C2075 GPU and ANSYS R14 – GPU to the rescue!

ANSYS R14 Distributed Memory Parallel with Nvidia TESLA C2075 GPU Assist Results
- LINUX 64-bit:
  - Seventeen seconds faster on 2x INTEL XEON CPU (166 seconds vs. 183.10 seconds)
  - 129.1 seconds was the fastest overall solve time achieved for any operating system on this benchmark!
  - 2 x INTEL XEON E5-2690 – fourteen cores with ANSYS R14 DMP w/GPU assist
- Windows 7 Professional 64-bit:
  - Nineteen seconds faster on 2x AMD CPU (211.10 seconds vs. 230.40 seconds).
  - 211 seconds was the fastest solve time achieved using Windows for this benchmark!
  - 2 x INTEL XEON X5560 – eight cores with ANSYS R14 DMP w/GPU assist
ANSYS R14 Shared Memory Parallel with Nvidia TESLA C2075 GPU Assist Results
- LINUX 64-bit:
  - Seventeen seconds faster on 2x INTEL XEON CPU (166 seconds vs. 183.10 seconds)
  - Config C: 237.6 seconds was the fastest overall solve time achieved for any operating system on this benchmark!
  - 2 x INTEL XEON E5-2690 – fourteen cores with ANSYS R14 SMP w/GPU assist
- Windows 7 Professional 64-bit:
  - Nineteen seconds faster on 2x AMD CPU (211.10 seconds vs. 230.40 seconds).
  - 211 seconds was the fastest solve time achieved using Windows for this benchmark!
  - 2 x INTEL XEON X5560 – eight cores with ANSYS R14 SMP w/GPU assist

ANSYS R14 Distributed Memory Parallel with GPU vs. ANSYS R14 Shared Memory Parallel with GPU speed up results:

Distributed Memory Parallel (DMP) or Shared Memory Parallel (SMP)

2 x AMD Opteron 4184 based workstation vs. “DMP or SMP”
- 2.2x’s speedup on Linux Operating System (183.10 seconds vs. 394.80 seconds)
- 2x’s speedup on Windows Operating System (211.10 seconds vs. 437.10 seconds)
2 x INTEL XEON E5-2690 based CPU vs. “DMP or SMP”
- 1.9x’s speedup on Linux Operating System (129.41 seconds vs. 237.6 seconds)

With or Without GPU? ANSYS R14 Distributed Memory Parallel speed up results

Distributed Memory Parallel (DMP) with GPU vs. Distributed Memory Parallel without GPU

AMD Opteron 4184 based workstation
- 2x’s speedup Linux Operating System (183.10 seconds vs. 343.80 seconds)
- 2x’s speedup on Windows Operating System (211.10 seconds vs. 402.2 seconds)
INTEL XEON x5560 based workstation
- 2.5x’s speedup on Linux Operating System (166 seconds vs. 451.20 seconds)
- 1.4x’s speedup on Windows Operating System (230.40 seconds vs. 462.20 seconds)

Some Notes:

ANSYS Base License – unlocks up to 2 CPU Cores
ANSYS HPC Pack – unlocks up to 8 CPU Cores and GPU
The total amount of system RAM you have affects your Distributed solve times. A minimum of 48GB of RAM is recommended.
The processing speed of your CPU affects your Shared Memory Parallel solve times.
Model limits for direct will depend on largest front sizes
- 6M DOF for 6GB Tesla C2075 and Quadro 6000
Model limits for iterative PCG and JCG
- 3M DOF for 6GB Tesla C2075 and Quadro 6000

Hardware Specs of Workstations:

CONFIG A – CUBE HVPC w12a-GPU

The AMD® based CUBE HVPC w12a FEA Simulation Workstation:

CPU: 2x AMD Opteron 4184 (2.8GHz Ghz 6 core)
GPU: NVIDIA TESLA C2075 Companion Processor
RAM: 64GB DDR3 1333Mhz ECC
HDD: (os and apps): 450GB WD Velociraptor 10k
HDD: (working directory): 6 x 1TB WD RE4 drives using LSI 2008 RAID 0
OS: Windows 7 Professional 64-bit, Linux 64-bit
OTHER: ANSYS R14, latest NVIDIA TESLA Drivers

(Here is a picture of Sam with two full
tower CUBE HVPC workstations getting
ready to ship out!)

CONFIG B – CUBE HVPC w8i-GPU

The INTEL® based CUBE HVPC w8i FEA Simulation Workstation:

CPU: 2x INTEL XEON x5560 (2.8GHz 4 core)
GPU: NVIDIA TESLA C2075 Companion Processor
RAM: 64GB DDR3 1333Mhz ECC
HDD: (os and apps): 146GB SAS 15k
HDD: (working directory): 3 x 73GB SAS 15k – LSI RAID 0
OS: Windows 7 Professional 64-bit, Linux 64-bit
OTHER: ANSYS R14, latest NVIDIA TESLA Drivers

CONFIG C – CUBE HVPC c16i-GPU

The INTEL® based CUBE HVPC w16i-GPU FEA Server:

CPU: 2x INTEL XEON e5-2690 (2.9GHz 8 core)
GRAPHICS/GPU: NVIDIA QUADRO 6000
RAM: 128GB DDR3 1600 MHz ECC
HDD: (os and apps): 300GB SATA III 10k WD Velociraptor
HDD: (working directory): 3 x 600GB SATA III 10k WD
OS: Linux 64-bit
OTHER: ANSYS R14, latest NVIDIA TESLA Drivers

CUBE HVPC w12i-GPU

The INTEL® based CUBE HVPC w12i-GPU FEA Simulation Workstation:

CPU: 2x INTEL XEON x5690 (3.47GHz 6 core)
GRAPHICS/GPU: NVIDIA QUADRO 6000
RAM: 144GB DDR3 1333Mhz ECC
HDD: (os and apps): 256GB SSD SATA III
HDD: (working directory): 4 x 600GB SAS2 15k – LSI RAID 0
OS: Windows 7 Professional 64-bit
OTHER: ANSYS R13

References

Council on Competitiveness – http://www.compete.org/
ANSYS, Inc. – http://www.ansys.com/
NVidia – http://www.nvidia.com/

To GPU or Not, This is No Longer The Question…

Executive Summary

“Keeping It Real”

Benchmarks

Nvidia V14sp-5 ANSYS R14 Benchmark Matrix (Sparse solver, 2100k)

CUBE HVPC – PADT, Inc. – Coupling Modal Benchmark (PCG solver, ~1 Million DOF, 50 modes)

Summary, Debates, Controversy & Conclusions

Hardware Specs of Workstations:

CONFIG A – CUBE HVPC w12a-GPU

(Here is a picture of Sam with two full tower CUBE HVPC workstations getting ready to ship out!)

CONFIG B – CUBE HVPC w8i-GPU

CONFIG C – CUBE HVPC c16i-GPU

CUBE HVPC w12i-GPU

References

Categories

PADT’s Pulse Newsletter

Upcoming Events

Experience Stratasys Truck Tour: Albuquerque, NM

Dynamic Simulation for Rocket Propellant Systems! - Webinar

Fluent Materials Processing Updates in Ansys 2024 R1 - Webinar

Experience Stratasys Truck Tour: Tempe, AZ

Simulation World 2024

Simulation World 2024

Simulation World 2024

Optics Updates in Ansys 2024 R1 - Webinar

Connect Updates in Ansys 2024 R1 - Webinar

Structures Updates in Ansys 2024 R1 (3) - Webinar

E-Mobility and Clean Energy Summit

Fluids Updates in Ansys 2024 R1 - Webinar

2024 CEO Leadership Retreat

PADT30 | Nerdtoberfest 2024

We Make Innovation Work

FOLLOW US

Search in PADT site

Contact Us

(Here is a picture of Sam with two full
tower CUBE HVPC workstations getting
ready to ship out!)