I’m All Bound Up! : A Brief Discussion on What It Means To Be Compute Bound or I/O Bound

CUBE-HVPC-512-core-closeup2-1000hWe often get questions from our customers, both ANSYS product and CUBE HVPC users, on how to get their jobs to run faster. Should they get better disk drives or focus on CPU performance. We have found that disk drive performance often gets the blame when it is undeserving. To help figure this out, the first thing we do t is look at the output from an ANSYS Mechanical/Mechanical APDL run. Here is an email, slightly modified to leave the user anonymous, that shows our most recent case of this:

From: David Mastel – PADT, Inc.
To: John Engineering

Subject: Re: Re: Re: Relatively no difference between SSD vs. SAS2 15k RPM solve times?

Hi John, so I took a look at your ANSYS Mechanical output files – Based on the problem you are running the machine is Compute Bound. Here is the process on how I came to that conclusion. Additionally, at the end of this email I have included a few recommendations.

All the best,
David

Example 1:

The bottom section of an ANSYS out file for a 2 x 240GB
Samsung 843 SSD RAID0 array:

Total CPU time for main thread                    :      105.9 seconds
Total CPU time summed for all threads             :      119.1 seconds

Elapsed time spent pre-processing model (/PREP7)  :        0.0 seconds
Elapsed time spent solution – preprocessing       :       10.3 seconds
Elapsed time spent computing solution             :       83.5 seconds
Elapsed time spent solution – postprocessing      :        3.9 seconds
Elapsed time spent post-processing model (/POST1) :        0.0 seconds

Equation solver computational rate                :   319444.9 Mflops
Equation solver effective I/O rate                :    26540.1 MB/sec

Maximum total memory used                         :    48999.0 MB
Maximum total memory allocated                    :    54896.0 MB
Maximum total memory available                    :        128 GB

+—— E N D   D I S T R I B U T E D   A N S Y S   S T A T I S T I C S ——-+

*—————————————————————————*
|                                                                           |
|                       DISTRIBUTED ANSYS RUN COMPLETED                     |
|                                                                           |
|—————————————————————————|
|                                                                           |
|            Release 14.5.7         UP20130316         WINDOWS x64          |
|                                                                           |
|—————————————————————————|
|                                                                           |
| Database Requested(-db)   512 MB    Scratch Memory Requested       512 MB |
| Maximum Database Used     447 MB    Maximum Scratch Memory Used   4523 MB |
|                                                                           |
|—————————————————————————|
|                                                                           |
|        CP Time      (sec) =        119.01       Time  =  15:41:54         |
|        Elapsed Time (sec) =        117.000       Date  =  10/21/2013      |
|                                                                           |
*—————————————————————————*

For a quick refresher on what it means to be compute bound or I/O bound, let’s review what ANSYS Mechanical APDL tells you.

When looking at your ANSYS Mechanical APDL (this file is created during the solve in ANSYS Mechanical, since ANSYS Mechanical is just running ANSYS Mechanical APDL behind the scenes) out files; I/O bound and Compute bound are essentially the following:

I/O Bound:

  1. When Elapsed Time greater than Main Thread CPU time

Compute Bound:

  1. When Elapsed time is equals (approx) to Main Thread CPU time

Example 2:
CUBE HVPC – Samsung 843 – 240GB SATA III SSD 6Gbps – RAID 0
Total CPU time for main thread :  105.9  seconds
Elapsed Time (sec) :   117.000        seconds

CUBE HVPC – Hitachi 600GB SAS2 15k RPM – RAID 0
Total CPU time for main thread  :  109.0 seconds
Elapsed Time (sec) :   120.000       seconds

Recommendations for a CPU compute bound ANSYS server or workstation:

When computers are compute bound I normally recommend the following.

  1. Add faster processors – check!
  2. Use more cores for solve – I think you are in the process of doing this now?
  3. Instead of running SMP go DMP – unable to use DMP with your solve
  4. Add an Accelerator card (NVidia Tesla K20x). Which unfortunately does not help in your particular solving situation.

Please let me know if you need any further information:

David

David Mastel
Manager, Information Technology

Phoenix Analysis & Design Technologies
7755 S. Research Dr, Suite 110
Tempe, AZ  85284
David.Mastel@PADTINC.com

The hardware we have available changes every couple of months or so, but right now, for a user who is running this type of ANSYS Mechanical/Mechanical APDL model, we are recommending the following configuration:

CUBE HVPC Recommended Workstation for ANSYS Mechanical: CUBE HVPC w16-KGPU

CUBE HVPC
MODEL

COST

CHASIS

PROCESSOR

CORES

MEMORY

CUBE HVPC
w16i-kgpu

$ 16,164.00

Mid-Tower
(black quiet edition)

Dual Socket INTEL XEON
e5-2687 v2,
16 cores
@ 3.4GHz

16 = 2 x 8

128GB DDR3-1866 ECC REG

STORAGE

RAID
CONTROLLER

GRAPHICS

ACCELERATOR

OS

OTHER

4 x 240GB
SATA II SSD 6 Gbps
2 x 600GB
SAS2 6Gbps
(2.1 TB)

SMC LSI 2208
6Gbps

NVIDIA
QUADRO K5000

NVIDIA
TESLA K20

Microsoft
Windows 7
Professional 64-bit

ANSYS R14.5.7
ANSYS R15

Cube_logo_Trg_Wide_150w

Here are some references to the some basic information and the systems we recommend:
http://www.intel.com
http://en.wikipedia.org/wiki/CPU_bound
http://en.wikipedia.org/wiki/I/O_bound
http://www.supermicro.com
http://www.cube-hvpc.com
http://www.ansys.com
http://www.nvidia.com

One Reply to “I’m All Bound Up! : A Brief Discussion on What It Means To Be Compute Bound or I/O Bound”

  1. i’ve read your blog “ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON & part2” i have questions, is there any possible building a system which contain benefits for mechanical and CFD analysis? and i’ve ever compared mechanical analysis on Intel(i7 base, 4C8T) and AMD(Bulldozer base, 8C8T) system, the results showed Intel has much better performance. however, for CFD analysis AMD base is better but not much difference, why? btw, i used SMP setting for mechanical analysis on AMD Bulldozer base system, but there always ran with 4 cores, why? thank you