ANSYS HPC Distributed Parallel Processing Decoded: CUBE Workstation

ANSYS HPC Distributed Parallel Processing Decoded: CUBE Workstation

Meanwhile, in the real world the land of the missing-middle:  To read and learn more about the missing middle please read this article by Dr. Stephen Wheat. Click Here

This blog post is about distributed parallel processing performance in a missing-middle world of science, tech, engineering & numerical simulation. I will be using two of PADT, Inc.’s very own CUBE workstations along with ANSYS 17.2. To illustrate facts and findings on the ANSYS HPC benchmarks. I will also show you how to decode and extract key bits of data out of your own ANSYS benchmark out files. This information will assist you with locating and describing the performance how’s and why’s on your own numerical simulation workstations and HPC clusters. With the use of this information regarding your numerical simulation hardware. You will be able to trust and verify your decisions. Assist you with understanding in addition to explaining the best upgrade path for your own unique situation. In this example, I am providing to you in this post. I am illustrating a “worst case” scenario.

You already know you need to increase your parallel processing solves times of your models. “No I am not ready with my numerical simulation results. No I am waiting on Matt to finish running the solve of his model.” “Matt said that it will take four months to solve this model using this workstation. Is this true?!”

  1. How do I know what to upgrade and/or you often find yourself asking yourself. What do I really need to buy?
    1. One or three ANSYS HPC Packs?
    2. Purchase more compute power? NVidia TESLA K80’s GPU Accelerators? RAM? A Subaru or Volvo?
  2. I have no budget. Are you sure? Often IT departments set a certain amount of money for component upgrades and parts. Information you learn in these findings may help justify a $250-$5000 upgrade for you.
  3. These two machines as configured will not break the very latest HPC performance speed records. This exercise is a live real world example of what you would see in the HPC missing middle market.
  4.  Benchmarks were formed months after a hardware and software workstation refresh was completed using NO BUDGET, zip, zilch, nada, none.

Backstory regarding the two real-world internal CUBE FEA Workstations.

  1. These two CUBE Workstations were configured on a tight budget. Only the components at a minimum were purchased by PADT, Inc.
  2. These two internal CUBE workstations have been in live production, in use daily for one or two years.
    1. Twenty-four hours a day seven days a week.
  3. These two workstations were both in desperate need of some sort of hardware and operating system refresh.
  4. As part of Microsoft upgrade initiative in 2016.  Windows 10 Professional was upgraded for free! FREE!

Again, join me in this post and read about the journey of two CUBE workstations being reborn and able to produce impressive ANSYS benchmarks to appease the sense of wining in pure geek satisfaction.

Uh-oh?! $$$

As I mentioned, one challenge that I set for myself on this mission is that I would not allow myself to purchase any new hardware or software. What? That is correct; my challenge was that I would not allow myself to purchase new components for the refresh.

How would I ever succeed in my challenge? Think and then think again.

Harvesting the components of old workstations recently piling up in the IT Lab over the past year! That was the solution. This idea just may be the idea I needed for succeeding in my NO BUDGET challenge. First, utilize existing compute components from old tired machines that had showed in the IT boneyard. Talk to your IT department, you never know what they find or remember that they had laying around in their own IT boneyard. Next, I would also use any RMA’d parts that I could find that had trickled in over the past year. Indeed, by utilizing these old feeder workstations, I was on my way to succeeding in my no budget challenge. The leftovers? Please do not email me for the discarded not worthy components handouts. There is nothing left, none, those components are long gone a nice benefit from our recent in-house next PADT Tech Recycle event.

*** Public Service Announcement *** Please remember to reuse, recycle and erase old computer parts from the landfills.

CUBE Workstation Specifications

PADT, Inc. – CUBE w12ik Numerical Simulation Workstation

(INTENAL PADT CUBE Workstation “CUBE #10”)
1 x CUBE Mid-Tower Chassis (SQ edition)

2 x 6c @3.4GHz/ea (INTEL XEON e5-2643 V3 CPU)

Dual Socket motherboard

16 x 16GB DDR4-2133 MHz ECC REG DIMM

1 x SMC LSI 3108 Hardware RAID Controller – 12 Gb/s

4 x 600GB SAS2 15k RPM – 6 Gb/s – RAID0

3 x 2TB SAS2 7200 RPM Hard Drives – 6 Gb/s (Mid-Term Storage Array – RAID5)

NVIDIA QUADRO K6000 (NVidia Driver version 375.66)

2 x LED Monitors (1920 x 1080)

Windows 10 Professional 64-bit

ANSYS 17.2

INTEL MPI 5.0.3

PADT, Inc. CUBE w16i-k Numerical Simulation Workstation

(INTENAL PADT CUBE Workstation “CUBE #14″)
1 x CUBE Mid-Tower Chassis

2 x 8c @3.2GHz/ea (INTEL XEON e5-2667 V4 CPU)

Dual Socket motherboard

8 x 32GB DDR4-2400 MHz ECC REG DIMM

1 x SMC LSI 3108 Hardware RAID Controller – 12 Gb/s

4 x 600GB SAS3 15k RPM 2.5” 12 Gb/s – RAID0

2 x 6TB SAS3 7.2k RPM 3.5” 12 Gb/s – RAID1

NVIDIA QUADRO K6000 (NVidia Driver version 375.66)

2 x LED Monitors (1920 x 1080)

Windows 10 Professional 64-bit

ANSYS 17.2

INTEL MPI 5.0.3

The ANSYS sp-5 Ball Grid Array Benchmark

ANSYS Benchmark Test Case Information

  • BGA (V17sp-5)
    • Analysis Type Static Nonlinear Structural
    • Number of Degrees of Freedom 6,000,000
    • Equation Solver Sparse
    • Matrix Symmetric
  • ANSYS 17.2
  • ANSYS HPC Licensing Packs required for this benchmark –> (2) HPC Packs
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to listen to your specific needs.

Comparing the data from the 12 core CUBE vs. a 16 core CUBE with and without GPU Acceleration enabled.

ANSYS 17.2 Benchmark  SP-5 Ball Grid Array
CUBE w12i-k 2643 v3 CUBE w12i-k 2643 v3 w/GPU Acceleration Total Speedup w/GPU CUBE w16i-k 2667 V4 CUBE w16i-k 2667 V4 w/GPU Acceleration Total Speedup w/GPU
Cores CUBE  w12i w/NVIDIA QUADRO K6000 CUBE  w12i w/NVIDIA QUADRO K6000 CUBE  w16i w/NVIDIA QUADRO K6000 CUBE  w16i w/NVIDIA QUADRO K6000
2 878.9 395.9 2.22 X 888.4 411.2 2.16 X
4 485.0 253.3 1.91 X 499.4 247.8 2.02 X
6 386.3 228.2 1.69 X 386.7 221.5 1.75 X
8 340.4 199.0 1.71 X 334.0 196.6 1.70 X
10 269.1 184.6 1.46 X 266.0 180.1 1.48 X
11 235.7 212.0 1.11 X
12 230.9 171.3 1.35 X 226.1 166.8 1.36 X
14 213.2 173.0 1.23 X
15 200.6 152.8 1.31 X
16 189.3 166.6 1.14 X
GPU NOT ENABLED ENABLED NOT ENABLED ENABLED
11/15/2016 & 1/5/2017
CUBE w12i-k v17sp-5 Benchmark Graph 2017
CUBE w12i-k v17sp-5 Benchmark Graph 2017
CUBE w16i-k v17sp-5 Benchmark Graph 2017
CUBE w16i-k v17sp-5 Benchmark Graph 2017

Initial impressions

  1. I was very pleased with the results of this experiment. Using the Am I bound bound or I/O bound overall parallel performance indicators the data showed healthy workstations that were both I/O bound. I assumed the I/O bound issue would happen. During several of the benchmarks, the data reveals almost complete system bandwidth saturation. Upwards of ~82 GB/s of bandwidth created during the in-core distributed solve!
  2. I was pleasantly surprised to see a 1.7X or greater solve speedup using one ANSYS HPC licensing pack and GPU Acceleration!

The when and where of numerical simulation performance bottleneck’s for numerical simulation. Similar to how the clock is ticking on the wall, over the years I have focused on the question of, “is your numerical simulation compute hardware compute bound or I/O bound”. This quick and fast benchmark result will show general parallel performance of the workstation and help you find the performance sweet spot for your own numerical simulation hardware.

As a reminder, to determine the answer to that question you need to record the results of your CPU Time For Main Thread, Time Spent Computing Solution and Total Elapsed Time. If the results time for my CPU Main is about the same as my Total Elapsed Time result. The compute hardware is in a Compute Bound situation. If the Total Elapsed Time result is larger than the CPU Time For Main Thread than the compute hardware is I/O bound. I did the same analysis with these two CUBE workstations. I am pickier than most when it comes to tuning my compute hardware. So often I will use a percentage around 95 percent. The percentage column below determines if the workstation is Compute Bound or O/O bound. Generally, what I have found in the industry, is that a percentage of greater than 90% indicates the workstation is wither Compute Bound, I/O bound or in worst-case scenario is both.

**** Result sets data garnered from the ANSYS results.out files on these two CUBE workstations using ANSYS Mechanical distributed parallel solves.

Data mine that ANSYS results.out file!

The data is all there, at your fingertips waiting for you to trust and verify.

Compute Bound or I/O bound

Results 1 – Compute Cores Only

w12i-k

“CUBE #10”

Cores CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 914.2 878.9 917.0 99.69 YES NO
4 4 517.2 485.0 523.0 98.89 YES NO
6 6 418.8 386.3 422.0 99.24 YES NO
8 8 374.7 340.4 379.0 98.87 YES NO
10 10 302.5 269.1 307.0 98.53 YES NO
11 11 266.6 235.7 273.0 97.66 YES NO
12 12 259.9 230.9 268.0 96.98 YES NO
w16i-k

“CUBE #14”

Cores CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 925.8 888.4 927.0 99.87 YES NO
4 4 532.1 499.4 535.0 99.46 YES NO
6 6 420.3 386.7 425.0 98.89 YES NO
8 8 366.4 334.0 370.0 99.03 YES NO
10 10 299.7 266.0 303.0 98.91 YES NO
12 12 258.9 226.1 265.0 97.70 YES NO
14 14 244.3 213.2 253.0 96.56 YES NO
15 15 230.3 200.6 239.0 96.36 YES NO
16 16 219.6 189.3 231.0 95.06 YES NO

Results 2 – GPU Acceleration + Cores

w12i-k

“CUBE #10”

Cores  + GPU CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 416.3 395.9 435.0 95.70 YES YES
4 4 271.8 253.3 291.0 93.40 YES YES
6 6 251.2 228.2 267.0 94.08 YES YES
8 8 219.9 199.0 239.0 92.01 YES YES
10 10 203.2 184.6 225.0 90.31 YES YES
11 11 227.6 212.0 252.0 90.32 YES YES
12 12 186.0 171.3 213.0 87.32 NO YES
CUBE 14 Cores + GPU CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 427.2 411.2 453.0 94.30 YES YES
4 4 267.9 247.8 286.0 93.67 YES YES
6 6 245.4 221.5 259.0 94.75 YES YES
8 8 219.6 196.6 237.0 92.66 YES YES
10 10 201.8 180.1 222.0 90.90 YES YES
12 12 191.2 166.8 207.0 92.37 YES YES
14 14 195.2 173.0 217.0 89.95 NO YES
15 15 172.6 152.8 196.0 88.06 NO YES
16 16 177.1 166.6 213.0 83.15 NO YES

Identifying Memory, I/O, Parallel Solver Balance and Performance

Results 3 – Compute Cores Only

w12i-k

“CUBE #10”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve No GPU Maximum RAM used in GB
0.9376 0.8399 662.822706 5.609852 19123.88932 19.1 78
0.8188 0.8138 355.367914 3.082555 35301.9759 35.3 85
0.6087 0.6913 283.870728 2.729568 39165.1946 39.2 84
0.3289 0.4771 254.336758 2.486551 43209.70175 43.2 91
0.5256 0.644 191.218882 1.781095 60818.51624 60.8 94
0.5078 0.6805 162.258872 1.751974 61369.6918 61.4 95
0.3966 0.5287 157.315184 1.633994 65684.23821 65.7 96
w16i-k

“CUBE #14”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve No GPU Maximum RAM used in GB
0.9376 0.8399 673.225225 6.241678 17188.03613 17.2 78
0.8188 0.8138 368.869242 3.569551 30485.70397 30.5 85
0.6087 0.6913 286.269409 2.828212 37799.17161 37.8 84
0.3289 0.4771 251.115087 2.701804 39767.17792 39.8 91
0.5256 0.644 191.964388 1.848399 58604.0123 58.6 94
0.3966 0.5287 155.623476 1.70239 63045.28808 63.0 96
0.5772 0.6414 147.392121 1.635223 66328.7728 66.3 101
0.6438 0.5701 139.355605 1.484888 71722.92484 71.7 101
0.5098 0.6655 130.042438 1.357847 78511.36377 78.5 103

Results 4 – GPU Acceleration + Cores

w12i-k

“CUBE #10”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve % GPU Accelerated The Solve Maximum RAM used in GB
0.9381 0.8405 178.686155 5.516205 19448.54863 19.4 95.78 78
0.8165 0.8108 124.087864 3.031092 35901.34876 35.9 95.91 85
0.6116 0.6893 122.433584 2.536878 42140.01391 42.1 95.74 84
0.3365 0.475 112.33829 2.351058 45699.89654 45.7 95.81 91
0.5397 0.6359 103.586986 1.801659 60124.33358 60.1 95.95 94
0.5123 0.6672 137.319938 1.635229 65751.09125 65.8 85.17 95
0.4132 0.5345 97.252285 1.562337 68696.85627 68.7 95.75 97
w16i-k

“CUBE #14”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve % GPU Accelerated The Solve Maximum RAM used in GB
0.9381 0.8405 200.007118 6.054831 17718.44411 17.7 94.96 78
0.8165 0.8108 122.200896 3.357233 32413.68282 32.4 95.20 85
0.6116 0.6893 122.742966 2.624494 40733.2138 40.7 94.91 84
0.3365 0.475 114.618006 2.544626 42223.539 42.2 94.97 91
0.5397 0.6359 105.4884 1.821352 59474.26914 59.5 95.18 94
0.4132 0.5345 96.750618 1.988799 53966.06502 54.0 94.96 97
0.5825 0.6382 106.573973 1.989103 54528.26599 54.5 88.96 101
0.6604 0.566 91.345275 1.374242 77497.60151 77.5 92.21 101
0.5248 0.6534 107.672641 1.301668 81899.85539 81.9 85.07 103

The ANSYS results.out file – The decoding continues

CUBE w12i-k (“CUBE #10”)

  1. Elapsed Time Spent Computing The Solution
    1. This value determines how efficient or balanced the hardware solution for running in distributed parallel solving.
      1. Fastest Solve Time For CUBE 10
    2. 12 out of 12 Cores w/GPU @ 171.3 seconds Time Spent Computing The Solution
  2. Elapsed Time
    1. This value is the actual time to complete the entire solution process. The clock on the wall time.
    2. Fastest Time For CUBE10
      1. 12 out of 12 w/GPU @ 213.0 seconds
  3. CPU Time For Main Thread
    1. This value indicates the RAW number crunching time of the CPU.
    2. Fastest Time For CUBE10
      1. 12 out of 12 w/GPU @186.0 seconds
  4. GPU Acceleration
    1. The NVidia Quadro K6000 accelerated ~96% of the matrix factorization flops
    2. Actual percentage of GPU accelerated flops = 95.7456
  5. Cores and storage solver performance 12 out of 12 cores and using 1 NVidia Quadro K6000
    1. ratio of nonzeroes in factor (min/max) = 0.4132
    2. ratio of flops for factor (min/max) = 0.5345
      1. These two values above indicate to me that the system is well taxed for compute power/hardware viewpoint.
    3. Effective I/O rate (MB/sec) for solve = 68696.856274 (or 69 GB/sec)
      1. No issues here indicates that the workstation has ample bandwidth available for the solving.

CUBE w16i-k (“CUBE #14”)

  1. Elapsed Time Spent Computing The Solution
    1. This value determines how efficient or balanced the hardware solution for running in distributed parallel solving.
    2. Fastest Time For CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 152.8 seconds
  2. Elapsed Time
    1. This value is the actual time to complete the entire solution process. The clock on the wall time.
    2. CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 196.0 seconds
  3. CPU Time For Main Thread
    1. This value indicates the RAW number crunching time of the CPU.
    2. CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 172.6 seconds
  4. GPU Acceleration Percentage
    1. The NVIDIA QUADRO K6000 accelerated ~92% of the matrix factorization flops
    2. Actual percentage of GPU accelerated flops = 92.2065
  5. Cores and storage 12 out of 12 cores and one Nvidia Quadro K6000
    1. ratio of nonzeroes in factor (min/max) = 0.6604
    2. ratio of flops for factor (min/max) = 0.566
      1. These two values above indicate to me that the system is well taxed for compute power/hardware.
    3. Please note that when reviewing these two data points. A balanced solver performance is when both of these values are as close to 1.0000 as possible.
      1. At this point the compute hardware is no longer as efficient and these values will continue to move farther away from 1.0000.
    4. Effective I/O rate (MB/sec) for solve = 77497.6 MB/sec (or ~78 GB/sec)
      1. No issues here indicates that the workstation has ample bandwidth with fast I/O performance for in-core SPARSE Solver solving.
    1. Maximum amount of RAM used by the ANSYS distributed solve
      1. 103GB’s of RAM needed for in-core solve

Conclusions Summary And Upgrade Path Suggestions

It is important for you to locate your bottleneck on your numerical simulation hardware. By utilizing data provided in the ANSYS results.out files, you will be able to logically determine your worst parallel performance inhibitor and plan accordingly on how to resolve what is slowing the parallel performance of your distributed numerical simulation solve.

I/O Bound and/or Compute Bound Summary

  • I/O Bound
    • Both CUBE w12i-k “CUBE #10” and w16i-k “CUBE #14” are I/O Bound.
      • Almost immediately when GPU Acceleration is enabled.
      • When GPU Acceleration is not enabled, I/O bound is no longer an issue compute solving performance. However solve times are impacted due to available and unused compute power.
  • Compute Bound
    • Both CUBE w12i-k “CUBE #10” and w16i-k “CUBE #14” would benefit from additional Compute Power.
    • CUBE w12i-k “CUBE #10” would get the most bang for the buck by adding in the additional compute power.

Upgrade Path Recommendations

CUBE w12i-k “CUBE #10”

  1. I/O:
    1. Hard Drives
    2. Remove & replace the previous generation hard drives
      1. 3.5″ SAS2.0 6Gb/s 15k RPM Hard Drives
    3. Hard Drives could be upgraded to Enterprise Class SSD or PCIe NVMe
      1. COST =  HIGH
    1. Hard Drives could be upgraded to SAS 3.0 12 Gb/s Drives
      1. COST =  MEDIUM
  2.  RAM:
    1. Remove and replace the previous generation RAM
    2. Currently all available RAM slots of RAM are populated.
      1. Optimum slots per these two CPU’s are four slots of RAM per CPU. Currently eight slots of RAM per CPU are installed.
    3. RAM speeds 2133MHz ECC REG DIMM’
      1. Upgrade RAM to DDR4-2400MHz LRDIMM RAM
      2. COST =  HIGH
  3. GPU Acceleration
    1. Install a dedicated GPU Accelerator card such as an NVidia Tesla K40 or K80
    2. COST =  HIGH
  4.  CPU:
    1. Remove and replace the current previous generation CPU’s:
    2. Currently installed dual  x INTEL XEON e5-2643 V3
    3. Upgrade the CPU’s to the V4 (Broadwell) CPU’s
      1. COST =  HIGH

CUBE w16i-k “CUBE #14”

  1. I/O: Hard Drives SAS3.0 15k RPM Hard Drives 12Gbps 2.5”
    1.  Replace the current 2.5” SAS3 12Gb/s 15k RPM Drives with Enterprise Class SSD’s or PCIe NVMe disk
      1. COST =  HIGH
    2. Replace the 2.5″ SAS3 12 Gb/s hard drives with 3.5″ hard drives.
      1. COST =  HIGH
    3. INTEL 1.6TB P3700 HHHL AIC NVMe
      1. Click Here: https://www-ssl.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-p3700-series.html
  2. Currently a total of four Hard Drives are installed
    1. Increase existing hard drive count from four hard drives to a total ofsix or eight.
    2. Change RAID configuration to RAID 50
      1. COST =  HIGH
  3. RAM:
    1. Using DDR4-2400Mhz ECC REG DIMM’s
      1. Upgrade RAM to DDR4-2400MHz LRDIMM RAM
      2. COST =  HIGH

Considering RAM: When determining how much System RAM you need to perform a six million degree of freedom ANSYS numerical simulation. Add the additional amounts to your Maximum Amount of RAM used number indicated in your ANSYS results.out file.

  • ANSYS reserves  ~5% of your RAM
  • Office products can use an additional l ~10-15% to the above number
  • Operating System please add an additional ~5-10% for the Operating System
  • Other programs? For example, open up your windows task manager and look at how much RAM your anti-virus program is consuming. Add for the amount of RAM consumed by these other RAM vampires.

Terms & Definition Goodies:

  • Compute Bound
    • A condition that occurs when your CPU processing power sites idle while the CPU waits for the next set of instructions to calculate. This occurs most often when hardware bandwidth is unable to feed the CPU more data to calculate.
  • CPU Time For Main Thread
    • CPU time (or process time) is the amount of time for which a central processing unit (CPU) was used for processing instructions of a computer program or operating system, as opposed to, for example, waiting for input/output (I/O) operations or entering low-power (idle) mode.
  • Effective I/O rate (MB/sec) for solve
    • The amount of bandwidth used during the parallel distributed solve moving data from storage to CPU input and output totals.
    • For example the in-core 16 core + GPU solve using the CUBE w16i-k reached an effective I//O rate of 82 GB/s.
    • Theoretical system level bandwidth possible is ~96 GB/s
  • IO Bound
    • The ability for the input-output of the system hardware for reading, writing and flow of data pulsing through the system has become inefficient and/or detrimental to running an efficient parallel analysis.
  • Maximum total memory used
    • The maximum amount of memory used by analysis during your analysis.
  • Percentage (%) GPU Accelerated The Solve
    • The percentage of acceleration added to your distributed solve provided by the Graphics Processing Unit (GPU). The overall impact of the GPU will be diminished due to slow and saturated system bandwidth of your compute hardware.
  • Ratio of nonzeroes in factor (min/max)
    • A performance indicator of efficient and balanced the solver is performing on your compute hardware. In this example the solver performance is most efficient when this value is as close to the value of 1.0.
  • Ratio of flops for factor (min/max)
    • A performance indicator of efficient and balanced the solver is performing on your compute hardware. In this example the solver performance is most efficient when this value is as close to the value of 1.0.
  • Time (cpu & wall) for numeric factor
    • A performance indicator used to determine how the compute hardware bandwidth is affecting your solve times. When time (cpu & wall) for numeric factor & time (cpu & wall) for numeric solve values are somewhat equal it means that your compute hardware I/O bandwidth is having a negative impact on the distributed solver functions.
  • Time (cpu & wall) for numeric solve
    • A performance indicator used to determine how the compute hardware bandwidth is affecting your solve times. When time (cpu & wall) for numeric solve & time (cpu & wall) for numeric factor values are somewhat equal it means that your compute hardware I/O bandwidth is having a negative impact on the distributed solver functions.
  • Total Speedup w/GPU
    • Total performance gain for compute systems task using a Graphics Processing Unit (GPU).
  • Time Spent Computing Solution
    • The actual clock on the wall time that it took to compute the analysis.
  • Total Elapsed Time
    • The actual clock on the wall time that it took to complete the analysis.

References: