This May Be the Fastest ANSYS Mechanical Workstation we Have Built So Far

The Build Up

Its 6:30am and a dark shadow looms in Eric’s doorway. I wait until Eric finishes his Monday morning company updates. “Eric check this out, the CUBE HVPC w16i-k20x we built for our latest customer ANSYS Mechanical scaled to 16 cores on our test run.” The left eyebrow of Eric’s slightly rises up. I know I have him now I have his full and complete attention.

Why is this huge news?

This is why; Eric knows and probably many of you reading this also know that solving differential equations, distributed, parallel along with using graphic processing unit makes our hearts skip a beat. The finite element method used for solving these equations is CPU intensive and I/O intensive. This is headline news type stuff to us geek types. We love scratching our way along the compute processing power grids to utilize every bit of performance out of our hardware!

Oh and yes a lower time to solve is better! No GPU’s were harmed in this tests. Only one NVIDIA TESLA k20X GPU was used during the test.

Take a Deep Breath and Start from the Beginning:

I have been gathering and hording years’ worth of ANSYS mechanical benchmark data. Why? Not sure really after all I am wanna-be ANSYS Analysts. However, it wasn’t until a couple weeks ago that I woke up to the why again. MY CUBE HVPC team sold a dual socket INTEL Ivy bridge based workstation to a customer out of Washington state. Once we got the order, our Supermicro reseller‘s phone has been bouncing of the desk. After some back and forth, this is how the parts arrive directly from Supermicro, California. Yes, designed in the U.S.A.  And they show up in one big box:

clip_image002[4]

Normal is as Normal Does

As per normal is as normal does, I ran the series of ANSYS benchmarks. You know the type of benchmarks that perform coupled-physics simulations and solving really huge matrix numbers. So I ran ANSYS v14sp-5, ANSYS FLUENT benchmarks and some benchmarks for this customer, the types of runs they want to use the new machine for. So I was talking these benchmark results over with Eric. He thought that now is a perfect time to release the flood of benchmark data. Well some/a smidge of the benchmark data. I do admit the data does get overwhelming so I have tried to trim down the charts and graphs to the bare minimum. So what makes this workstation recipe for the fastest ANSYS Mechanical workstation so special? What is truly exciting enough to tip me over in my overstuffed black leather chair?

The Fastest Ever? Yup we have been Changed Forever

Not only is it the fastest ANSYS Mechanical workstation running on CUBE HVPC hardware.  It uses two INTEL CPU’s at 22 nanometers. Additionally, this is the first time that we have had an INTEL dual socket based workstation continue to gain faster times on and up to its maximum core count when solving in ANSYS Mechanical APDL.

Previously the fastest time was on the CUBE HVPC w16i-GPU workstation listed below. And it peaked at 14 cores. 

Unfortunately we only had time before we shipped the system off to gather two runs: 14 and 16 cores on the new machine. But you can see how fast that was in this table.  It was close to the previous system at 14 cores, but blew past it at 16 whereas the older system actually got clogged up and slowed down:

  Run Time (Sec)
Cores Used Config B Config C Config D
14 129.1 95.1 91.7
16 130.5 99 83.5

And here are the results as a bar graph for all the runs with this benchmark:

CUBE-Benchmark-ANSYS-2013_11_01

  We can’t wait to build one of these with more than one motherboard, maybe a 32 core system with infinband connecting the two. That should allow some very fast run times on some very, very large problems.

ANSYS V14sp-5 ANSYS R14 Benchmark Details

  • Elements : SOLID187, CONTA174, TARGE170
  • Nodes : 715,008
  • Materials : linear elastic
  • Nonlinearities : standard contact
  • Loading : rotational velocity
  • Other : coupling, symentric, matrix, sparse solver
  • Total DOF : 2.123 million
  • ANSYS 14.5.7

Here are the details and the data of the March 8, 2013 workstation:

Configuration C = CUBE HVPC w16i-GPU

  • CPU: 2x INTEL XEON e5-2690 (2.9GHz 8 core)
  • GPU: NVIDIA TESLA K20 Companion Processor
  • GRAPHICS: NVIDIA QUADRO K5000
  • RAM: 128GB DDR3 1600Mhz ECC
  • HD RAID Controller: SMC LSI 2208 6Gbps
  • HDD: (os and apps): 160GB SATA III SSD
  • HDD: (working directory):6x 600GB SAS2 15k RPM 6Gbps
  • OS: Windows 7 Professional 64-bit, Linux 64-bit
  • Other: ANSYS R14.0.8 / ANSYS R14.5

Here are the details from the new, November 1, 2013 workstation:

Configuration D = CUBE HVPC w16i-k20x

  • CPU: 2x INTEL XEON e5-2687W V2 (3.4GHz)
  • GPU: NVIDIA TESLA K20X Companion Processor
  • GRAPHICS: NVIDIA QUADRO K4000
  • RAM: 128GB DDR3 1600Mhz ECC
  • HDD: (os and apps): 4 x 240GB Enterprise Class Samsung SSD 6Gbps
  • HD RAID CONTROLLER: SMC LSI 2208 6Gbps
  • OS: Windows 7 Professional 64-bit, Linux 64-bit
  • Other: ANSYS 14.5.7

You can view the output from the run on the newer box (Configuration D) here:

Here is a picture of the Configuration D machine with the info on its guts:

clip_image006[4]clip_image008[4]

What is Inside that Chip:

The one (or two) CPU that rules them all: http://ark.intel.com/products/76161/

Intel® Xeon® Processor E5-2687W v2

  • Status: Launched
  • Launch Date: Q3’13
  • Processor Number: E5-2687WV2
  • # of Cores: 8
  • # of Thread: 16
  • Clock Speed: 3.4 GHz
  • Max Turbo Frequency: 4 GHz
  • Cache:  25 MB
  • Intel® QPI Speed:  8 GT/s
  • # of QPI Link:  2
  • Instruction Se:  64-bit
  • Instruction Set Extension:  Intel® AVX
  • Embedded Options Available:  No
  • Lithography:  22 nm
  • Scalability:  2S Only
  • Max TDP:  150 W
  • VID Voltage Range:  0.65–1.30V
  • Recommended Customer Price:  BOX : $2112.00, TRAY: $2108.00

The GPU’s that just keep getting better and better:

Features

TESLA C2075

TESLA K20X

TESLA K20

Number and Type of GPU

FERMI

Kepler GK110

Kepler GK110

Peak double precision floating point performance

515 Gflops

1.31 Tflops

1.17 Tflops

Peak single precision floating point performance

1.03 Tflops

3.95 Tflops

3.52 Tflops

Memory Bandwidth (ECC off)

144 GB/sec

250 GB/sec

208 GB/sec

Memory Size (GDDR5)

6GB

6GB

5GB

CUDA Cores

448

2688

2496

clip_image012[4]

Ready to Try one Out?

If you are as impressed as we are, then it is time for you to try out this next iteration of the Intel chip, configured for simulation by PADT, on your problems.  There is no reason for you to be using a CAD box or a bloated web server as your HPC workstation for running ANSYS Mechanical and solving in ANSYS Mechanical APDL.  Give us a call, our team will take the time to understand the types of problems you run, the IT environment you run in, and custom configure the right system for you:

http://www.padtinc.com/products/hardware/cube-hvpc,
email: garrett.smith@padtinc.com,
or call 480.813.4884