Spectre Side-Channel and Meltdown – How will living in this new reality affect the world of numerical simulation?

Literally, while I was sorting and running benchmarks and prepping the new benchmarks data originally titled. ANSYS Release 18.2 Ball Grid Array Benchmark information using two sixteen core INTEL® XEON® Gold 6130 CPU’s. I noticed that my news feeds had started to blow up with late breaking HPC news. The news as you may have guessed is the Spectre and Meltdown flaws that were recently published.

I thought to myself “Well this is just great the benchmarks that I just ran are no longer relevant.  My next thought was wait now I can show a real world example of exactly a percentage change. I have waited this long to run the ANSYS numerical simulation benchmarks on this new CPU architecture. I can wait a little longer to post my findings.” What now? Oh my more Late Breaking News! Research findings, Execution orders no barriers! Side channels used to get access to private address areas of the hardware! Wow this is a bad day. As I sat reading more news, then I drifted off daydreaming, then back to  my screen then the clock on the wall, great it is 2am already!, just go home…” Then thoughts immediate shifted and I was back thinking about indeed, how these hardware flaws impact the missing middle market? HPC numerical simulation!!! I dug in deep and pressed forward content with starting over on the benchmarks knowing after the patches released around Jan 9th will be a whole new world.

I decided to spare the ugly details related to the Spectre array bounds/brand prediction attack flaws. The out of order meltdown vulnerabilities! UGH! I seriously believe that someone has AI writing news articles written five or six different ways with each somehow saying the same thing. I also provide the links to the information and legal statements directly from a who’s who list of accountable parties:

Executive Summary:

  • * Remember every case is different so please do your run your own tests to verify how this new reality affects your hardware and software environment.*
    • Due to costs this machine has a single NVMe M.2 for the primary drive with a single 2TB SATA drive for its Mid-Term Storage area.
  • What was the impact for my benchmark?
    • Positive takeaway:
      • In all of the years of running the sp5 benchmark. I recorded the fastest benchmark time using this CUBE w32s, dual INTEL® XEON® Gold 6130 CPU workstation.
      • Using all thirty two cores 125.7 seconds for Solution Time (Time Spent Computing Solution).
        • Next, Coming in at 135.7 seconds the Solution Time metric after running the OS patches is my second fastest data point for the ANSYS sp5 benchmark.
          • ANSYS sp5 benchmark data – PADT, Inc. Currently from 2005 until this time.
      • The Solution Times continued to solve faster with each bump in cores.
      • Performance per dollar was maximized in this configuration.
    • Depending on number of cores used that I used for the ANSYS sp5 benchmark. I give the actual data below showing the percentage differences before and after:
      • Largest percentage difference:
        • Solution Time: -9.81% using four CPU cores.
        • Total Time: -7.87% using two CPU cores.
  • The need to turn the security screws down within your corporate enterprise network is now.
  • A rogue malicious agent needs to be on the inside of your corporate network to execute any sort of crafted attack. Much of these details are outlined in the Project Zero abstract.
  • Pay extra attention to just who you let on your internal network.
    • I reiterate the recommendations of many security professionals that you should already be restricting your internal company network and workstations to employee use. If you are not sure ask again.
  1. Spectre flaw:
    1. INTEL, ARM & AMD CPU’s are affected by the Spectre array bounds hardware attacks.
  2. Meltdown flaw:
    1. INTEL CPU’s and some ARM high performance CPU’s are affected by the “easier to exploit” Meltdown vulnerability.

I am also interested to see how continued insertion of code barriers and changed memory mappings affect my gaming performance. Haha! No, I am just kidding my numerical simulation performance benchmarks.

Clarifications & Definitions:

  • Unpatched Benchmark Data – No mitigation patches from Microsoft and NVidia addressing the Spectre and Meltdown flaws have been applied to the Windows 10 Professional OS running on the CUBE w32s that I use in this benchmark.
  • Patched Benchmark Data – I installed the batch of patches released by Microsoft as well as the NVDIA graphics card driver update released by NVIDIA addressing. NVIDIA indicates in their advisory that “their hardware their GPU hardware is not affected but they are updating their drivers to help mitigate the CPU security issue.” Huh? Installing now…
  • Solution Time – The amount of time in seconds that the CPU’s spent computing the solution. “The Time Spent Computing Solution”
  • Total Time – Total time in seconds that the entire process took. How the solve felt to the user also known as wall clock time.

The CUBE machine that I used in this ANSYS Test Case represent a fine balance based on price, performance and ANSYS HPC licenses used.

  • CUBE w32s, INTEL® XEON® Gold 6130 CPU, 128GB’s DDR4-2667MHz (1Rx4) ECC REG DIMM, Windows 10 Professional, ANSYS Release 18.2, INTEL MPI 5.0.1.3, 32 Total Cores, NVIDIA QUADRO P4000, Samsung EVO 960 Pro NVMe M.2, Toshiba 2TB 7200 RPM SATA 3 Drive.
  • Other notables, are you still paying attention?
    • My Supermicro X11Dai-N BIOS Settings:
      • BIOS Version: 2.0a
      • Execute Disable Bit: DISABLE
      • Hyper threading: ON
      • Intel Virtualization Technology: DISABLE
      • Core Enabled: 0
      • Power Technology: CUSTOM
      • Energy Performance Tuning: DISABLE
      • Energy performance BIAS setting: PERFORMANCE
      • P-State Coordination: HW_ALL
      • Package C-State Limit: C0/C1 State
      • CPU C3 Report: DISABLE
      • CPU C6 Report: DISABLE
      • Enhanced Halt State: DISABLE
    • With a read performance of up to 3,200MB/s and write performance of up to 1,900 MB/s using the Samsung NVMe M.2 drive was to tempting to pass up as my solve and temp solve area location. The bandwidth from the little feller was to impressive and continued to impress throughout the numerical simulation benchmarks.

My first overall impressions of this configuration is Wow! this workstation is fast, quiet and as you will see number crunches its way right on through to being my fastest documented workstation benchmark in this class. This extremely challenging and I/O intensive ANSYS benchmark is no match for this solver! Thumbs up and cheers to happy solving!

  • Cube w32s by PADT, Inc. ANSYS Release 18.2 FEA Benchmark
  • BGA (V18sp-5)
  • Transient nonlinear structural analysis of a electronic ball grid arrary
  • Analysis Type: Static Nonlinear Structural
  • Number of Degrees of Freedom: 6,000,000
  • Matrix: Symmetic

It Is All About The Data:

Benchmark data related to Pre and Post Spectre and Meltdown industry software patches on the CUBE w32s.

Table 1 – ANSYS sp5 Benchmark  – UnPatched Windows 10 Professional

ANSYS sp5 Benchmark  – Unpatched Windows 10 Professinal for Spectre and Meltdown hardware vulnerability – CUBE w32s
CPUs Solution Time Total Time
2 631.3 671
4 366.8 422
8 216 259
12 193 235
16 144.3 185
20 143.9 187
24 131.9 175
28 137.4 185
31 142.4 185
32 125.7 171
Apples to Apples, meltdown, spectre, ANSYS numerical simulation benchmark data
ANSYS Release 18.2 – SP5 Benchmark – Unpatched Windows 10 Professional CUBE w32s Solution and Total Time Values

Table 1.1 – ANSYS sp5 Benchmark  – Patched Windows 10 Professional

ANSYS sp5 Benchmark  – Patched Windows 10 Professional – CUBE w32s
CPUs Solution Time Total Time
2 683 726
4 405.5 446
8 235.8 277
12 209.2 251
16 148.8 191
20 145.7 189
24 136.3 182
28 138.7 186
31 134.6 179
32 135.7 179
Apples to Apples, meltdown, spectre, ANSYS numerical simulation benchmark data
ANSYS Release 18.2 – SP5 Benchmark – Patched Windows 10 Professional for the Sprectre and Meltdown hardware flaw – Solution And Total Time Values

Table 2 – ANSYS sp5 Benchmark  – The Before and After In Percentage Difference.

Percentage Difference – Not Patched vs. Patched for Sprectre, Meltdown
Solution Time Total Time
-7.94 -7.87
-9.81 -5.53
-8.34 -6.72
-7.57 -6.58
-2.73 -3.19
-1.09 -1.06
-2.87 -3.92
-0.81 -0.54
4.76 3.30
-6.74 -4.57

Fig 2.a

Percentage of impact for this example. Negative value means in this example. The patched Windows 10 Professional CUBE w32s is taking a performance hit.
Percentage of impact for this example. Negative value means “performance hit” in this example. Notice a very interesting blip of positive percentage at 31 cores. A patched CUBE w32s Windows 10 Professional for Sprectre and Meltdown hardware vulnerability. The data from this Windows 10 Professional CUBE w32s INTEL® XEON® Gold 6130 CPU is showing an impact related to the patches.

FIg 2.b

Percentage of impact for this example. Negative value means in this example. The patched Windows 10 Professional CUBE w32s is taking a performance hit.
Percentage of impact for this example. Negative value means there is some sort of impact. The patched Windows 10 Professional CUBE w32s will feel longer to solve by looking at the clock on the wall.
CUBE w32s in action - January 2018
CUBE w32s in action – January 2018

Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking more compute power!

What the heck is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM

Designed, tested and configured within your budget. We are happy to help and to listen to your specific needs.

CUBE w32s in action - January 2018
CUBE w32s in action – January 2018

Distributed ANSYS 18.1 with the SP-5 Benchmark using an INTEL 1.6TB NVMe

I recently had a chance to run a series of benchmarks on one of our latest CUBE numerical simulation workstations. I was amazed by the impressive benchmark numbers and wanted to share with you the details for the SP-5 benchmark using ANSYS 18.1. Hopefully this information will help you make the best decision the next time you need to upgrade your numerical simulation C Drive from whatever to now is the time to buy a Non-Volitile Memory Express drive. Total speedup using identical CUBE hardware, except for the INTEL DC P3700 NVMe drive @32 Cores is a 1.19x speedup!

  • Time Spent Computing Solution ANSYS SP-5 Benchmark
    • 161.7 seconds vs. 135.6 second
    • ANSYS 17.1 & ANSYS 18.1 Benchmarks

The link below is to a great article that I think will catch you up to speed regarding NVMe, PCIe and SSD Technology.

HDD Magazine hints NVME is coming, I say NVMe is already here…

CUBE w32iP Specifications (July 2017)

  • CUBE Mid-Tower Super Quiet Chassis (900W PS)
  • CPU: 32 INTEL Cores – 2 x INTEL e5-2697A V4 32c@2.6GHz/3.6GHz Turbo
  • OS: INTEL NVMe – 1 x 1.6TB INTEL Enterprise Class SSD
  • Mid-Term Storage: – 1 x 10TB Enterprise Class SATA 6Gbp/s, 256M, Helium sealed
  • RAM: 256GB DDR4-2400MHz LRDIMM RAM
  • GRAPHICS: NVIDIA QUADRO P6000 (24GB GDDR5X RAM)
  • MEDIA: DVD-RW/Audio 7.1 HD
  • Windows 10 Professional

Just how much faster the INTEL NVME drive performs over previously run ANSYS Benchmarks?

Check out the data for yourself:

  1. ANSYS 17.1 – SP-5 Benchmarks
  2. ANSYS Website
  3. HPC Advisory Council
  • ANSYS Benchmark Test Case Information.
  • ANSYS HPC Licensing Packs required for this benchmark
    • I used (2) HPC Packs to unlock all 32 cores.
  • 1.19x Total Speedup!
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM
    • Designed, tested and configured within your budget. We are happy to help and to  listen to your specific needs.

ANSYS SP-5 Benchmark Details

BGA (V18sp-5)

Analysis Type Static Nonlinear Structural
Number of Degrees of Freedom 6,000,000
Equation Solver Sparse
Matrix Symmetric
 July 2017 TIME SPENT COMPUTING SOLUTION TOTAL CPU TIME FOR MAIN THREAD ELAPSED TIME
CUBE w32iP CUEB w32iP CUBE w32iP
# of Cores CUBE w32iP CUBE w32iP CUBE w32iP
2 1034.3 1073.7 1076
4 594.7 630.3 633
6 431.5 465.7 472
8 333.4 367.9 377
10 268.7 302.6 316
12 243.6 276.5 287
14 223 256.2 264
16 186.8 219.3 227
18 180 212.4 226
20 174.4 207.4 220
22 164.5 197.4 209
24 155.6 188.2 199
26 147.1 179.2 193
28 146.4 178.2 190
30 140.8 168.5 196
31 140.4 164 196
32 135.6 158.1 182
WO/GPU Acceleration WO/GPU Acceleration WO/GPU Acceleration

July 2017, drjm, PADT, Inc.

CUBE W32iP SP-5 Benchmark Graph

CUBE w32iP with INTEL DC P3700 1.6TB

Click Here for more information on the engineering simulation workstations and clusters designed in-house at PADT, Inc.. PADT, Inc. is happy to be a premier re-seller and dealer of Supermicro hardware.

How To Update The Firmware Of An Intel® Solid-State Drive DC P3600

How To Update The Firmware Of An Intel® Solid-State Drive DC P3600 in four easy steps!

The Dr. says to keep that firmware fresh! so in this How To blog post I illustrate to you how to verify and/or update the firmware on a 1.2TB  Intel® Solid-State Drive DC 3600 Series NVMe MLC card.

CUBE Workstation Specifications – The Tester

PADT, Inc. – CUBE w32i Numerical Simulation Workstation

  • 2 x 16c @2.6GHz/ea. (INTEL XEON e5-2697A V4 CPU), 40M Cache, 9.6GT, 145 Watt/each
  • Dual Socket Super Micro X10DAi motherboard
  • 8 x 32GB DDR4-2400MHz ECC REG DIMM
  • 1 x NVIDIA QUADRO M2000 – 4GB GDDR5
  • 1 x  Intel® DC P3600 1.2TB, NVMe PCIe 3.0, MLC AIC 20nm
  • Windows 7 Ultimate Edition 64-bit

Step 1: Prepping

Check for and download the latest downloads for the Intel® Solid-State DC 3600 here: https://downloadcenter.intel.com/product/81000/Intel-SSD-DC-P3600-Series

You will need the latest downloads of the:

Intel® SSD Data Center Family for NVMe Drivers
  • Intel® Solid State Drive Toolbox

  • Intel® SSD Data Center Tool

  • Intel® SSD Data Center Family for NVMe Drivers

Step 2: Installation

After instaling, the Intel® Solid State Drive Toolbox and the Intel® SSD Data Center Tool reboot the workstation and move on to the next step.

INTEL SSD Toolbox
INTEL SSD Toolbox

INTEL SSD Toolbox Install

Step 3: Trust But Verify

Check the status of the 1.2TB NVMe card by running the INTEL SSD DATA Center Tool. Next, I will be using the Windows 7 Ultimate 64-bit version for the operating system. Running the INTEL DATA CENTER TOOLS  within an elevated command line prompt.

Right-Click –> Run As…Administrator
Command Line Text: isdct show –intelssd

INTEL DATA Center Command Line Tool
INTEL DATA Center Command Line Tool

As the image indicates below the firmware for this 1.2TB NVMe card is happy and it’s firmware is up to date! Yay!

If you have more than one SSD take note of the Drive Number.

  • Pro Tip – In this example the INTEL DC P3600 is Drive number zero. You can gather this information from the output syntax. –> Index : 0

Below is what the command line output text looks like while the firmware process is running.

C:\isdct >isdct.exe load –intelssd 0 WARNING! You have selected to update the drives firmware! Proceed with the update? (Y|N): y Updating firmware…The selected Intel SSD contains current firmware as of this tool release.
isdct.exe load –intelssd 0 WARNING! You have selected to update the drives firmware! Proceed with the update? (Y|N): n Canceled.
isdct.exe load –f –intelssd 0 Updating firmware… The selected Intel SSD contains current firmware as of this tool release.
isdct.exe load –intelssd 0 WARNING! You have selected to update the drives firmware! Proceed with the update? (Y|N): y Updating firmware… Firmware update successful.

Step 4: Reboot Workstation

The firmware update process has been completed.

shutdown /n

How-To: ANSYS 18 RSM CLIENT SETUP on Windows 2012 R2 HPC

We put this simple how-to together for users to speed up the process on getting your Remote Solve Manager client up and running on Microsoft Windows 2012 R2 HPC.

Download the step-by-step slides here:

padt-ansys-18-RSM-client-setup-win2012r2HPC.pdf

You might also be interested in a short article on the setup and use of monitoring for ANSYS R18 RSM.

ANSYS HPC Distributed Parallel Processing Decoded: CUBE Workstation

ANSYS HPC Distributed Parallel Processing Decoded: CUBE Workstation

Meanwhile, in the real world the land of the missing-middle:  To read and learn more about the missing middle please read this article by Dr. Stephen Wheat. Click Here

This blog post is about distributed parallel processing performance in a missing-middle world of science, tech, engineering & numerical simulation. I will be using two of PADT, Inc.’s very own CUBE workstations along with ANSYS 17.2. To illustrate facts and findings on the ANSYS HPC benchmarks. I will also show you how to decode and extract key bits of data out of your own ANSYS benchmark out files. This information will assist you with locating and describing the performance how’s and why’s on your own numerical simulation workstations and HPC clusters. With the use of this information regarding your numerical simulation hardware. You will be able to trust and verify your decisions. Assist you with understanding in addition to explaining the best upgrade path for your own unique situation. In this example, I am providing to you in this post. I am illustrating a “worst case” scenario.

You already know you need to increase your parallel processing solves times of your models. “No I am not ready with my numerical simulation results. No I am waiting on Matt to finish running the solve of his model.” “Matt said that it will take four months to solve this model using this workstation. Is this true?!”

  1. How do I know what to upgrade and/or you often find yourself asking yourself. What do I really need to buy?
    1. One or three ANSYS HPC Packs?
    2. Purchase more compute power? NVidia TESLA K80’s GPU Accelerators? RAM? A Subaru or Volvo?
  2. I have no budget. Are you sure? Often IT departments set a certain amount of money for component upgrades and parts. Information you learn in these findings may help justify a $250-$5000 upgrade for you.
  3. These two machines as configured will not break the very latest HPC performance speed records. This exercise is a live real world example of what you would see in the HPC missing middle market.
  4.  Benchmarks were formed months after a hardware and software workstation refresh was completed using NO BUDGET, zip, zilch, nada, none.

Backstory regarding the two real-world internal CUBE FEA Workstations.

  1. These two CUBE Workstations were configured on a tight budget. Only the components at a minimum were purchased by PADT, Inc.
  2. These two internal CUBE workstations have been in live production, in use daily for one or two years.
    1. Twenty-four hours a day seven days a week.
  3. These two workstations were both in desperate need of some sort of hardware and operating system refresh.
  4. As part of Microsoft upgrade initiative in 2016.  Windows 10 Professional was upgraded for free! FREE!

Again, join me in this post and read about the journey of two CUBE workstations being reborn and able to produce impressive ANSYS benchmarks to appease the sense of wining in pure geek satisfaction.

Uh-oh?! $$$

As I mentioned, one challenge that I set for myself on this mission is that I would not allow myself to purchase any new hardware or software. What? That is correct; my challenge was that I would not allow myself to purchase new components for the refresh.

How would I ever succeed in my challenge? Think and then think again.

Harvesting the components of old workstations recently piling up in the IT Lab over the past year! That was the solution. This idea just may be the idea I needed for succeeding in my NO BUDGET challenge. First, utilize existing compute components from old tired machines that had showed in the IT boneyard. Talk to your IT department, you never know what they find or remember that they had laying around in their own IT boneyard. Next, I would also use any RMA’d parts that I could find that had trickled in over the past year. Indeed, by utilizing these old feeder workstations, I was on my way to succeeding in my no budget challenge. The leftovers? Please do not email me for the discarded not worthy components handouts. There is nothing left, none, those components are long gone a nice benefit from our recent in-house next PADT Tech Recycle event.

*** Public Service Announcement *** Please remember to reuse, recycle and erase old computer parts from the landfills.

CUBE Workstation Specifications

PADT, Inc. – CUBE w12ik Numerical Simulation Workstation

(INTENAL PADT CUBE Workstation “CUBE #10”)
1 x CUBE Mid-Tower Chassis (SQ edition)

2 x 6c @3.4GHz/ea (INTEL XEON e5-2643 V3 CPU)

Dual Socket motherboard

16 x 16GB DDR4-2133 MHz ECC REG DIMM

1 x SMC LSI 3108 Hardware RAID Controller – 12 Gb/s

4 x 600GB SAS2 15k RPM – 6 Gb/s – RAID0

3 x 2TB SAS2 7200 RPM Hard Drives – 6 Gb/s (Mid-Term Storage Array – RAID5)

NVIDIA QUADRO K6000 (NVidia Driver version 375.66)

2 x LED Monitors (1920 x 1080)

Windows 10 Professional 64-bit

ANSYS 17.2

INTEL MPI 5.0.3

PADT, Inc. CUBE w16i-k Numerical Simulation Workstation

(INTENAL PADT CUBE Workstation “CUBE #14″)
1 x CUBE Mid-Tower Chassis

2 x 8c @3.2GHz/ea (INTEL XEON e5-2667 V4 CPU)

Dual Socket motherboard

8 x 32GB DDR4-2400 MHz ECC REG DIMM

1 x SMC LSI 3108 Hardware RAID Controller – 12 Gb/s

4 x 600GB SAS3 15k RPM 2.5” 12 Gb/s – RAID0

2 x 6TB SAS3 7.2k RPM 3.5” 12 Gb/s – RAID1

NVIDIA QUADRO K6000 (NVidia Driver version 375.66)

2 x LED Monitors (1920 x 1080)

Windows 10 Professional 64-bit

ANSYS 17.2

INTEL MPI 5.0.3

The ANSYS sp-5 Ball Grid Array Benchmark

ANSYS Benchmark Test Case Information

  • BGA (V17sp-5)
    • Analysis Type Static Nonlinear Structural
    • Number of Degrees of Freedom 6,000,000
    • Equation Solver Sparse
    • Matrix Symmetric
  • ANSYS 17.2
  • ANSYS HPC Licensing Packs required for this benchmark –> (2) HPC Packs
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to listen to your specific needs.

Comparing the data from the 12 core CUBE vs. a 16 core CUBE with and without GPU Acceleration enabled.

ANSYS 17.2 Benchmark  SP-5 Ball Grid Array
CUBE w12i-k 2643 v3 CUBE w12i-k 2643 v3 w/GPU Acceleration Total Speedup w/GPU CUBE w16i-k 2667 V4 CUBE w16i-k 2667 V4 w/GPU Acceleration Total Speedup w/GPU
Cores CUBE  w12i w/NVIDIA QUADRO K6000 CUBE  w12i w/NVIDIA QUADRO K6000 CUBE  w16i w/NVIDIA QUADRO K6000 CUBE  w16i w/NVIDIA QUADRO K6000
2 878.9 395.9 2.22 X 888.4 411.2 2.16 X
4 485.0 253.3 1.91 X 499.4 247.8 2.02 X
6 386.3 228.2 1.69 X 386.7 221.5 1.75 X
8 340.4 199.0 1.71 X 334.0 196.6 1.70 X
10 269.1 184.6 1.46 X 266.0 180.1 1.48 X
11 235.7 212.0 1.11 X
12 230.9 171.3 1.35 X 226.1 166.8 1.36 X
14 213.2 173.0 1.23 X
15 200.6 152.8 1.31 X
16 189.3 166.6 1.14 X
GPU NOT ENABLED ENABLED NOT ENABLED ENABLED
11/15/2016 & 1/5/2017
CUBE w12i-k v17sp-5 Benchmark Graph 2017
CUBE w12i-k v17sp-5 Benchmark Graph 2017
CUBE w16i-k v17sp-5 Benchmark Graph 2017
CUBE w16i-k v17sp-5 Benchmark Graph 2017

Initial impressions

  1. I was very pleased with the results of this experiment. Using the Am I bound bound or I/O bound overall parallel performance indicators the data showed healthy workstations that were both I/O bound. I assumed the I/O bound issue would happen. During several of the benchmarks, the data reveals almost complete system bandwidth saturation. Upwards of ~82 GB/s of bandwidth created during the in-core distributed solve!
  2. I was pleasantly surprised to see a 1.7X or greater solve speedup using one ANSYS HPC licensing pack and GPU Acceleration!

The when and where of numerical simulation performance bottleneck’s for numerical simulation. Similar to how the clock is ticking on the wall, over the years I have focused on the question of, “is your numerical simulation compute hardware compute bound or I/O bound”. This quick and fast benchmark result will show general parallel performance of the workstation and help you find the performance sweet spot for your own numerical simulation hardware.

As a reminder, to determine the answer to that question you need to record the results of your CPU Time For Main Thread, Time Spent Computing Solution and Total Elapsed Time. If the results time for my CPU Main is about the same as my Total Elapsed Time result. The compute hardware is in a Compute Bound situation. If the Total Elapsed Time result is larger than the CPU Time For Main Thread than the compute hardware is I/O bound. I did the same analysis with these two CUBE workstations. I am pickier than most when it comes to tuning my compute hardware. So often I will use a percentage around 95 percent. The percentage column below determines if the workstation is Compute Bound or O/O bound. Generally, what I have found in the industry, is that a percentage of greater than 90% indicates the workstation is wither Compute Bound, I/O bound or in worst-case scenario is both.

**** Result sets data garnered from the ANSYS results.out files on these two CUBE workstations using ANSYS Mechanical distributed parallel solves.

Data mine that ANSYS results.out file!

The data is all there, at your fingertips waiting for you to trust and verify.

Compute Bound or I/O bound

Results 1 – Compute Cores Only

w12i-k

“CUBE #10”

Cores CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 914.2 878.9 917.0 99.69 YES NO
4 4 517.2 485.0 523.0 98.89 YES NO
6 6 418.8 386.3 422.0 99.24 YES NO
8 8 374.7 340.4 379.0 98.87 YES NO
10 10 302.5 269.1 307.0 98.53 YES NO
11 11 266.6 235.7 273.0 97.66 YES NO
12 12 259.9 230.9 268.0 96.98 YES NO
w16i-k

“CUBE #14”

Cores CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 925.8 888.4 927.0 99.87 YES NO
4 4 532.1 499.4 535.0 99.46 YES NO
6 6 420.3 386.7 425.0 98.89 YES NO
8 8 366.4 334.0 370.0 99.03 YES NO
10 10 299.7 266.0 303.0 98.91 YES NO
12 12 258.9 226.1 265.0 97.70 YES NO
14 14 244.3 213.2 253.0 96.56 YES NO
15 15 230.3 200.6 239.0 96.36 YES NO
16 16 219.6 189.3 231.0 95.06 YES NO

Results 2 – GPU Acceleration + Cores

w12i-k

“CUBE #10”

Cores  + GPU CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 416.3 395.9 435.0 95.70 YES YES
4 4 271.8 253.3 291.0 93.40 YES YES
6 6 251.2 228.2 267.0 94.08 YES YES
8 8 219.9 199.0 239.0 92.01 YES YES
10 10 203.2 184.6 225.0 90.31 YES YES
11 11 227.6 212.0 252.0 90.32 YES YES
12 12 186.0 171.3 213.0 87.32 NO YES
CUBE 14 Cores + GPU CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 427.2 411.2 453.0 94.30 YES YES
4 4 267.9 247.8 286.0 93.67 YES YES
6 6 245.4 221.5 259.0 94.75 YES YES
8 8 219.6 196.6 237.0 92.66 YES YES
10 10 201.8 180.1 222.0 90.90 YES YES
12 12 191.2 166.8 207.0 92.37 YES YES
14 14 195.2 173.0 217.0 89.95 NO YES
15 15 172.6 152.8 196.0 88.06 NO YES
16 16 177.1 166.6 213.0 83.15 NO YES

Identifying Memory, I/O, Parallel Solver Balance and Performance

Results 3 – Compute Cores Only

w12i-k

“CUBE #10”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve No GPU Maximum RAM used in GB
0.9376 0.8399 662.822706 5.609852 19123.88932 19.1 78
0.8188 0.8138 355.367914 3.082555 35301.9759 35.3 85
0.6087 0.6913 283.870728 2.729568 39165.1946 39.2 84
0.3289 0.4771 254.336758 2.486551 43209.70175 43.2 91
0.5256 0.644 191.218882 1.781095 60818.51624 60.8 94
0.5078 0.6805 162.258872 1.751974 61369.6918 61.4 95
0.3966 0.5287 157.315184 1.633994 65684.23821 65.7 96
w16i-k

“CUBE #14”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve No GPU Maximum RAM used in GB
0.9376 0.8399 673.225225 6.241678 17188.03613 17.2 78
0.8188 0.8138 368.869242 3.569551 30485.70397 30.5 85
0.6087 0.6913 286.269409 2.828212 37799.17161 37.8 84
0.3289 0.4771 251.115087 2.701804 39767.17792 39.8 91
0.5256 0.644 191.964388 1.848399 58604.0123 58.6 94
0.3966 0.5287 155.623476 1.70239 63045.28808 63.0 96
0.5772 0.6414 147.392121 1.635223 66328.7728 66.3 101
0.6438 0.5701 139.355605 1.484888 71722.92484 71.7 101
0.5098 0.6655 130.042438 1.357847 78511.36377 78.5 103

Results 4 – GPU Acceleration + Cores

w12i-k

“CUBE #10”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve % GPU Accelerated The Solve Maximum RAM used in GB
0.9381 0.8405 178.686155 5.516205 19448.54863 19.4 95.78 78
0.8165 0.8108 124.087864 3.031092 35901.34876 35.9 95.91 85
0.6116 0.6893 122.433584 2.536878 42140.01391 42.1 95.74 84
0.3365 0.475 112.33829 2.351058 45699.89654 45.7 95.81 91
0.5397 0.6359 103.586986 1.801659 60124.33358 60.1 95.95 94
0.5123 0.6672 137.319938 1.635229 65751.09125 65.8 85.17 95
0.4132 0.5345 97.252285 1.562337 68696.85627 68.7 95.75 97
w16i-k

“CUBE #14”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve % GPU Accelerated The Solve Maximum RAM used in GB
0.9381 0.8405 200.007118 6.054831 17718.44411 17.7 94.96 78
0.8165 0.8108 122.200896 3.357233 32413.68282 32.4 95.20 85
0.6116 0.6893 122.742966 2.624494 40733.2138 40.7 94.91 84
0.3365 0.475 114.618006 2.544626 42223.539 42.2 94.97 91
0.5397 0.6359 105.4884 1.821352 59474.26914 59.5 95.18 94
0.4132 0.5345 96.750618 1.988799 53966.06502 54.0 94.96 97
0.5825 0.6382 106.573973 1.989103 54528.26599 54.5 88.96 101
0.6604 0.566 91.345275 1.374242 77497.60151 77.5 92.21 101
0.5248 0.6534 107.672641 1.301668 81899.85539 81.9 85.07 103

The ANSYS results.out file – The decoding continues

CUBE w12i-k (“CUBE #10”)

  1. Elapsed Time Spent Computing The Solution
    1. This value determines how efficient or balanced the hardware solution for running in distributed parallel solving.
      1. Fastest Solve Time For CUBE 10
    2. 12 out of 12 Cores w/GPU @ 171.3 seconds Time Spent Computing The Solution
  2. Elapsed Time
    1. This value is the actual time to complete the entire solution process. The clock on the wall time.
    2. Fastest Time For CUBE10
      1. 12 out of 12 w/GPU @ 213.0 seconds
  3. CPU Time For Main Thread
    1. This value indicates the RAW number crunching time of the CPU.
    2. Fastest Time For CUBE10
      1. 12 out of 12 w/GPU @186.0 seconds
  4. GPU Acceleration
    1. The NVidia Quadro K6000 accelerated ~96% of the matrix factorization flops
    2. Actual percentage of GPU accelerated flops = 95.7456
  5. Cores and storage solver performance 12 out of 12 cores and using 1 NVidia Quadro K6000
    1. ratio of nonzeroes in factor (min/max) = 0.4132
    2. ratio of flops for factor (min/max) = 0.5345
      1. These two values above indicate to me that the system is well taxed for compute power/hardware viewpoint.
    3. Effective I/O rate (MB/sec) for solve = 68696.856274 (or 69 GB/sec)
      1. No issues here indicates that the workstation has ample bandwidth available for the solving.

CUBE w16i-k (“CUBE #14”)

  1. Elapsed Time Spent Computing The Solution
    1. This value determines how efficient or balanced the hardware solution for running in distributed parallel solving.
    2. Fastest Time For CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 152.8 seconds
  2. Elapsed Time
    1. This value is the actual time to complete the entire solution process. The clock on the wall time.
    2. CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 196.0 seconds
  3. CPU Time For Main Thread
    1. This value indicates the RAW number crunching time of the CPU.
    2. CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 172.6 seconds
  4. GPU Acceleration Percentage
    1. The NVIDIA QUADRO K6000 accelerated ~92% of the matrix factorization flops
    2. Actual percentage of GPU accelerated flops = 92.2065
  5. Cores and storage 12 out of 12 cores and one Nvidia Quadro K6000
    1. ratio of nonzeroes in factor (min/max) = 0.6604
    2. ratio of flops for factor (min/max) = 0.566
      1. These two values above indicate to me that the system is well taxed for compute power/hardware.
    3. Please note that when reviewing these two data points. A balanced solver performance is when both of these values are as close to 1.0000 as possible.
      1. At this point the compute hardware is no longer as efficient and these values will continue to move farther away from 1.0000.
    4. Effective I/O rate (MB/sec) for solve = 77497.6 MB/sec (or ~78 GB/sec)
      1. No issues here indicates that the workstation has ample bandwidth with fast I/O performance for in-core SPARSE Solver solving.
    1. Maximum amount of RAM used by the ANSYS distributed solve
      1. 103GB’s of RAM needed for in-core solve

Conclusions Summary And Upgrade Path Suggestions

It is important for you to locate your bottleneck on your numerical simulation hardware. By utilizing data provided in the ANSYS results.out files, you will be able to logically determine your worst parallel performance inhibitor and plan accordingly on how to resolve what is slowing the parallel performance of your distributed numerical simulation solve.

I/O Bound and/or Compute Bound Summary

  • I/O Bound
    • Both CUBE w12i-k “CUBE #10” and w16i-k “CUBE #14” are I/O Bound.
      • Almost immediately when GPU Acceleration is enabled.
      • When GPU Acceleration is not enabled, I/O bound is no longer an issue compute solving performance. However solve times are impacted due to available and unused compute power.
  • Compute Bound
    • Both CUBE w12i-k “CUBE #10” and w16i-k “CUBE #14” would benefit from additional Compute Power.
    • CUBE w12i-k “CUBE #10” would get the most bang for the buck by adding in the additional compute power.

Upgrade Path Recommendations

CUBE w12i-k “CUBE #10”

  1. I/O:
    1. Hard Drives
    2. Remove & replace the previous generation hard drives
      1. 3.5″ SAS2.0 6Gb/s 15k RPM Hard Drives
    3. Hard Drives could be upgraded to Enterprise Class SSD or PCIe NVMe
      1. COST =  HIGH
    1. Hard Drives could be upgraded to SAS 3.0 12 Gb/s Drives
      1. COST =  MEDIUM
  2.  RAM:
    1. Remove and replace the previous generation RAM
    2. Currently all available RAM slots of RAM are populated.
      1. Optimum slots per these two CPU’s are four slots of RAM per CPU. Currently eight slots of RAM per CPU are installed.
    3. RAM speeds 2133MHz ECC REG DIMM’
      1. Upgrade RAM to DDR4-2400MHz LRDIMM RAM
      2. COST =  HIGH
  3. GPU Acceleration
    1. Install a dedicated GPU Accelerator card such as an NVidia Tesla K40 or K80
    2. COST =  HIGH
  4.  CPU:
    1. Remove and replace the current previous generation CPU’s:
    2. Currently installed dual  x INTEL XEON e5-2643 V3
    3. Upgrade the CPU’s to the V4 (Broadwell) CPU’s
      1. COST =  HIGH

CUBE w16i-k “CUBE #14”

  1. I/O: Hard Drives SAS3.0 15k RPM Hard Drives 12Gbps 2.5”
    1.  Replace the current 2.5” SAS3 12Gb/s 15k RPM Drives with Enterprise Class SSD’s or PCIe NVMe disk
      1. COST =  HIGH
    2. Replace the 2.5″ SAS3 12 Gb/s hard drives with 3.5″ hard drives.
      1. COST =  HIGH
    3. INTEL 1.6TB P3700 HHHL AIC NVMe
      1. Click Here: https://www-ssl.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-p3700-series.html
  2. Currently a total of four Hard Drives are installed
    1. Increase existing hard drive count from four hard drives to a total ofsix or eight.
    2. Change RAID configuration to RAID 50
      1. COST =  HIGH
  3. RAM:
    1. Using DDR4-2400Mhz ECC REG DIMM’s
      1. Upgrade RAM to DDR4-2400MHz LRDIMM RAM
      2. COST =  HIGH

Considering RAM: When determining how much System RAM you need to perform a six million degree of freedom ANSYS numerical simulation. Add the additional amounts to your Maximum Amount of RAM used number indicated in your ANSYS results.out file.

  • ANSYS reserves  ~5% of your RAM
  • Office products can use an additional l ~10-15% to the above number
  • Operating System please add an additional ~5-10% for the Operating System
  • Other programs? For example, open up your windows task manager and look at how much RAM your anti-virus program is consuming. Add for the amount of RAM consumed by these other RAM vampires.

Terms & Definition Goodies:

  • Compute Bound
    • A condition that occurs when your CPU processing power sites idle while the CPU waits for the next set of instructions to calculate. This occurs most often when hardware bandwidth is unable to feed the CPU more data to calculate.
  • CPU Time For Main Thread
    • CPU time (or process time) is the amount of time for which a central processing unit (CPU) was used for processing instructions of a computer program or operating system, as opposed to, for example, waiting for input/output (I/O) operations or entering low-power (idle) mode.
  • Effective I/O rate (MB/sec) for solve
    • The amount of bandwidth used during the parallel distributed solve moving data from storage to CPU input and output totals.
    • For example the in-core 16 core + GPU solve using the CUBE w16i-k reached an effective I//O rate of 82 GB/s.
    • Theoretical system level bandwidth possible is ~96 GB/s
  • IO Bound
    • The ability for the input-output of the system hardware for reading, writing and flow of data pulsing through the system has become inefficient and/or detrimental to running an efficient parallel analysis.
  • Maximum total memory used
    • The maximum amount of memory used by analysis during your analysis.
  • Percentage (%) GPU Accelerated The Solve
    • The percentage of acceleration added to your distributed solve provided by the Graphics Processing Unit (GPU). The overall impact of the GPU will be diminished due to slow and saturated system bandwidth of your compute hardware.
  • Ratio of nonzeroes in factor (min/max)
    • A performance indicator of efficient and balanced the solver is performing on your compute hardware. In this example the solver performance is most efficient when this value is as close to the value of 1.0.
  • Ratio of flops for factor (min/max)
    • A performance indicator of efficient and balanced the solver is performing on your compute hardware. In this example the solver performance is most efficient when this value is as close to the value of 1.0.
  • Time (cpu & wall) for numeric factor
    • A performance indicator used to determine how the compute hardware bandwidth is affecting your solve times. When time (cpu & wall) for numeric factor & time (cpu & wall) for numeric solve values are somewhat equal it means that your compute hardware I/O bandwidth is having a negative impact on the distributed solver functions.
  • Time (cpu & wall) for numeric solve
    • A performance indicator used to determine how the compute hardware bandwidth is affecting your solve times. When time (cpu & wall) for numeric solve & time (cpu & wall) for numeric factor values are somewhat equal it means that your compute hardware I/O bandwidth is having a negative impact on the distributed solver functions.
  • Total Speedup w/GPU
    • Total performance gain for compute systems task using a Graphics Processing Unit (GPU).
  • Time Spent Computing Solution
    • The actual clock on the wall time that it took to compute the analysis.
  • Total Elapsed Time
    • The actual clock on the wall time that it took to complete the analysis.

References:

ANSYS 17.2 FLUENT External Flow Over a Truck Body Polyhedral Mesh

Part 3: The ANSYS FLUENT Performance Comparison Series – CUBE Numerical Simulation Appliances by PADT, Inc.

November 22, 2016

External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m)

  • External flow over a truck body using a polyhedral mesh
  • This test case has around 14 million polyhedral cells
  • Uses the Detached Eddy Simulation (DES) model with the segregated implicit solver

ANSYS Benchmark Test Case Information

  • ANSYS HPC Licensing Packs required for this benchmark
    • I used three (3) HPC Packs to unlock all of the cores used during the ANSYS Fluent Test Cases of the CUBE appliances shown on the Figure 1 chart.
    • I did use four (4) HPC Packs for the two 256 core benchmarks shown on the data but only wanted the data for testing.
  • The best average seconds per iteration goes to the 2015 CUBE Intel® Xeon® e5-2667 V3 with a 0.625 time using 128 compute cores.
    • The 2015 CUBE Intel® Xeon® e5-2667 V3 outperformed the 256 core AMD Opteron™ series ANSYS Fluent 17.2 benchmarks.
    • Please note that different numbers of CUBE Compute Nodes were used in this test. However straight across CPU times are also shown for single nodes at 64 cores.
  • To illustrate this ANSYS Fluent test case as it relates to the real world. A completely new ANSYS HPC customer is likely to have up two (2) of the entry level INTEL CUBE Compute Nodes versus eight (8) CUBE compute nodes configuration.
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to listen to your specific needs.

Figure 1 – ANSYS 17.2 FLUENT Test Case Graph

truck_poly_14m
ANSYS FLUENT 17.2 External Flow Over a Truck Body – Graph
ANSYS FLUENT External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m) Test Case
Number of cells 14,000,000
Cell type polyhedral
Models DES turbulence
Solver segregated implicit

The CPU Information

The AMD Opteron™ 6000 Series Platform:

Yes, I am still impressed with the performance day after day, 24×7 of these AMD Opeteron CPU’s!  After years of operation the AMD Opteron™ series of processors are still relevant and powerful numerical simulation processors. heavy sigh…For example, after reviewing the ANSYS Fluent Test Case data you can see for yourselves below. The 2012 AMD Opteron™ and 2013 AMD Opteron™ CPU’s can still hang in there with the INTEL XEON CPU’s. However one INTEL CPU node vs. four AMD CPU nodes?

I thought a more realistic test case scenario would be to drop the number of AMD Compute Nodes down to four. Indeed, I could have thrown more of the CUBE Compute Nodes with the AMD Opteron™ series CPU’s inside of them. That is why you can see one 256 core benchmark score where I put all 64 cores on each node to the test. As one would hopefully see in their hardware performance unleashing ANSYS Fluent with 256 core did drop the iteration solve time for the test case with the CUBE Compute Appliances.

Realistically a brand new ANSYS HPC customer is not likely to have:

a) Vast qualities of cores (AMD or INTEL) & compute nodes for optimal distributive numerical solving

b) ANSYS HPC licensing for 512 cores

c) The available circuit breakers to provide power

The Intel® Xeon® CPU’s used for this ANSYS Fluent Test Case

  1. Intel® Xeon® Processor E5-2690 v4  (35M Cache, 2.60 GHz)
  2. Intel® Xeon® Processor E5-2667 v4  (25M Cache, 3.20 GHz)
  3. Intel® Xeon® Processor E5-2667 v3  (20M Cache, 3.20 GHz)
  4. Intel® Xeon® Processor E5-2667 v2  (25M Cache, 3.30 GHz)

The Estimated Wattage?

No the lights did not dim…but here is a quick comparison with energy use by estimated maximum Watt’s used metric shows up in volumes (decibels) and dollars ($$$) saved or spent.

Less & More!

Overall CUBE Compute Node drops in average watts estimated consumption, indeed has moved forward in progress over the past four years!

  • 2012 CUBE AMD Numerical Simulation Appliance with the Opteron™ 6278 – Four (4) Compute Nodes
    • Estimated CUBE Configuration @ Full Power: ~8000 Watts
  • 2013 CUBE AMD Numerical Simulation Appliance with the Opteron™ 6380
    • Estimated CUBE Configuration @ Full Power: ~7000 Watts
  • 2015 CUBE Numerical Simulation Appliance with the  Intel® Xeon® e5-2667 V3 – Eight (8) Compute Nodes
    • Estimated CUBE Configuration @ Full Power: ~4000 Watts
  • 2016 CUBE Numerical Simulation Appliance with the Intel® Xeon® e5-2667 V4 – One (1) Compute Node.
    • Estimated CUBE Configuration @ Full Power:  ~900 Watts
  • 2016 CUBE Numerical Simulation Appliance with the Intel® Xeon® e5-2690 V4 – Two (2) Compute Nodes
    • Estimated CUBE Configuration @ Full Power:  ~1200 Watts

Figure 2 – Estimated CUBE compute node power consumption as configured for this ANSYS FLUENT Test Case.

Power consumption means money
CUBE HPC Compute Node Power Consumption as configured

The CUBE phenomenon

2012 AMD Opteron™ 6278 2015 CUBE Intel® Xeon® e5-2667 V3
4 x Compute Node CUBE HPC Appliance 8 x Compute Node CUBE HPC Appliance
4 x 16c @2.4GHz/ea 2 x 8c @3.2GHz/ea  – Intel® Xeon® e5-2667 V3
Quad Socket motherboard Dual Socket motherboard
DDR3-1866 MHz ECC REG DDR4-2133 MHz ECC REG
5 x 600GB SAS2 15k RPM 4 x 600GB SAS3 15k RPM
40Gbps Infiniband QDR High Speed Interconnect 2016 CUBE Intel® Xeon® e5-2667 V4
2013 CUBE AMD Opteron™ 6380 1 x CUBE HPC Workstation
4 x Compute Node CUBE HPC Appliance 2 x 8c @3.2GHz/ea  – Intel® Xeon® e5-2667 V4
4 x 16c @2.5GHz/ea Dual Socket motherboard
Quad Socket  motherboard DDR4-2400 MHz LRDIMM
DDR3-1866 MHz ECC REG 6 x 600GB SAS3 15k RPM
3 x 600GB SAS2 15k RPM 2016 CUBE Intel® Xeon® e5-2690 V4
40Gbps Infiniband QDRT High Speed Interconnect 1 x 1U CUBE APPLIANCE – 2 Compute Nodes
2014 CUBE Intel® Xeon® e5-2667 V2 2 x 14c @2.6GHz/ea – Intel® Xeon® e5-2690 V4
1 x CUBE HPC Workstation Dual Socket motherboard
2 x 8c @3.3GHz/ea –  Intel® Xeon® e5-2667 V2 DR4-2400 MHz LRDIMM
Dual Socket motherboard 4 x 600GB SAS3 15k RPM – RAID 10
DDR3-1866 MHz ECC REG 56Gbps Infiniband FDR CPU High Speed Interconnect
3 x 600GB SAS2 15k RPM 10Gbps Ethernet Low Latency

Operating Systems Used

  1. Linux 64-bit
  2. Windows 7 Professional 64-Bit
  3. Windows 10 Professional 64-Bit
  4. Windows Server 2012 R2 Standard Edition w/HPC

It Is All About The Data

Test Metric – Average Seconds Per Iteration

  • Fastest Time: 0.625 seconds per iteration – 2015 CUBE Intel® Xeon® e5-2667 V3
  • ANSYS FLUENT 17.2
Cores 2014 CUBE Intel® Xeon® e5-2667 V2

(1 x Node)

2015 CUBE Intel® Xeon® e5-2667 V3

(8 x Nodes)

2016 CUBE Intel® Xeon® e5-2667 V4

(1 x Node)

2016 CUBE Intel® Xeon® e5-2690 V4

(2 x Nodes)

2012 AMD Opteron™ 6278

(4 x Nodes)

2013 CUBE AMD Opteron™ 6380

(4 x Nodes)

1 100.6 65.8 32.154 40.44 120.035 90.567
2 40.337 32.024 17.149 35.355 63.813 46.385
4 20.171 16.975 11.915 19.735 32.544 23.956
6 13.904 12.363 9.311 13.76 21.805 17.147
8 10.605 9.4 7.696 11.121 16.783 13.158
12 7.569 6.913 6.764 8.424 11.59 10.2
16 6.187 4.286 6.388 7.363 8.96 7.94
32 2.539 4.082 6.033 4.75
48 2.778 4.126 3.835
52 2.609 3.161 4.784
55 2.531 3.003 4.462
56 2.681 3.025 4.368
*64 3.871 5.004
64 2.688 2.746
96 2.433 2.202
128 0.625 2.112 2.367
256 1.461 3.531

* One (1) CUBE Compute Node with  4 x AMD Opteron™ Series CPU’s for a total of 64 cores was used to derive these two ANSYS Fluent Benchmark data points (Baseline).

PADT offers a line of high performance computing (HPC) systems specifically designed for CFD and FEA number crunching aimed at a balance between cost and performance. We call this concept High Value Performance Computing, or HVPC. These systems have allowed PADT and our customers to carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software.

http://www.cube-hvpc.com/

Related Blog Posts

ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

Part 2: ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

ANSYS 17.2 CFX Benchmark External Flow Over a LeMans Car

Wow? yet another ANSYS Bench marking blog post? I know, but I have had four blog posts in limbo for months. There is no better time than now and since it is Friday. Time to knock out another one of these fine looking ANSYS 17.2 bench marking results of my list!

The ANSYS 17.2 CFX External Flow Over a LeMans Car Test Case

…dun dun dah!

On The Fast Track! ANSYS 17.2
On The Fast Track! ANSYS 17.2

The ANSYS CFX test case has approximately 1.8 million nodes

  • 10 million elements, all tetrahedral
  • Solves compressible fluid flow with heat transfer using the k-epsilon turbulence model.

ANSYS Benchmark Test Case Information

  • ANSYS HPC Licensing Packs required for this benchmark
    • I used (3) HPC Packs to unlock all 56 cores of the CUBE a56i.
    • The fastest solve time goes to the CUBE a56i – Boom!
      • From start to finish a total of forty-six (46) ticks on the clock on the wall occurred.
      • A total of fifty-five (55) cores in use between two twenty-eight (28) core nodes.
      • Windows 2012 R2 Standard Edition w/HPC update 3
      • MS-MPI v7.1
      • ANSYS CFX 17.2
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to listen to your specific needs.

Figure 1 – ANSYS CFX benchmark data for the tetrahedral, 10 million elements External Flow Over a LeMans Car Test Case

ANSYS CFX Benchmark Data
ANSYS CFX Benchmark Data

ANSYS CFX Test Case Details – Click Here for more information on this benchmark

External Flow Over a LeMans Car
Number of nodes 1,864,025
Element type Tetrahedral
Models k-epsilon Turbulence, Heat Transfer
Solver Coupled Implicit

The CPU Information

The benchmark data is derived off of the running through the ANSYS CFX External Flow Over a LeMans Car test case. Take a minute or three to look at how these CPU’s perform with one of the very latest ANSYS releases, ANSYS Release 17.1 & ANSYS Release 17.2.

Wall Clock Time!

I have focused and tuned the numerical simulation machines with a focus on wall clock time for years now. What is funny if you ask Eric Miller we were talking about wall clock times this morning.

What is wall clock time? Simply put –> How does the solve time FEEL to the engineer…..yes, i just equated a feeling to a non-human event. Ah yes, to feel…oh and  I was reminded of old Van Halen song where David Lee Roth says.

Oh man, I think the clock is slow.

  I don’t feel tardy.

Class Dismissed!”

The CUBE phenomenon

CUBE a56i Appliance – Windows 2012 R2 Standard w/HPC
1U CUBE APPLIANCE (2 x 28)
4 x 14c @2.6GHz/ea – Intel® Xeon® e5-2690 V4
Dual Socket motherboard
256GB DDR4-2400 MHz LRDIMM
4 x 600GB SAS3 15k RPM
56Gbps Infiniband FDR CPU High Speed Interconnect
10Gbps Ethernet Low Latency
CUBE w32i Workstation – Windows 10 Professional
1 x 4U CUBE APPLIANCE
2 x 16c @2.6GHz/ea – Intel® Xeon® e5-2697a V4
Dual Socket motherboard
256GB DDR4-2400 MHz LRDIMM
2 x 600GB SAS3 15k RPM
NVIDIA QUADRO M4000

It Is All About The Data

 11/17/2016

PADT, Inc. – Tempe, AZ

ANSYS CFX 17.1 ANSYS CFX 17.1 ANSYS CFX 17.2
Total wall clock time Cores CUBE w32i CUBE a56i CUBE a56i
2 555 636 609
4 304 332 332
8 153 191 191
16 105 120 120
24 78 84 84
32 73 68 68
38 0 61 59
42 0 55 55
48 0 51 51
52 0 52 48
55 0 47 46
56 0 52 51

Picture Sharing Time!

Check out the pictures below of the Microsoft Server 2012 R2  HPC Cluster Manager.

I used the Windows Server 2012 R2  on both of the two compute nodes that make up the CUBE a56i.

Microsoft 2012 R2 w/HPC – is very quick, and oh so very powerful!

winhpc-cfx-56c-cpu

Windows 2012 HPC
Microsoft Windows 2012 R2 HPC. It is time…
INTEL XEON e5-2690 v4
The INTEL XEON e5-2690 v4 loves the turbo mode vrrooom It is time…

Please be safe out there in the wilds, you are all dismissed for the weekend!

ANSYS 17.1 FEA Benchmarks using v17-sp5

The CUBE machines that I used in this ANSYS Test Case represent a fine balance based on price, performance and ANSYS HPC licenses used.

Click Here for more information on the engineering simulation workstations and clusters designed in-house at PADT, Inc.. PADT, Inc. is happy to be a premier re-seller and dealer of Supermicro hardware.

  • ANSYS Benchmark Test Case Information.
  • ANSYS HPC Licensing Packs required for this benchmark
    • I used (2) HPC Packs to unlock all 32 cores.
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to  listen to your specific needs.

Figure 1 – ANSYS benchmark data from three excellent machines.

CUBE
CUBE by PADT, Inc. ANSYS Release 17.1 FEA Benchmark

BGA (V17sp-5)

BGA (V17sp-5)
Analysis Type Static Nonlinear Structural
Number of Degrees of Freedom 6,000,000
Equation Solver Sparse
Matrix Symmetric

Click Here for more information on the ANSYS Mechanical test cases. The ANSYS website has great information pertaining to the benchmarks that I am looking into today.

Pro Tip –> Lastly, please check out this article by Greg Corke one of my friends at ANSYS, Inc. I am using the ANSYS benchmark data fromthe Lenovo Thinkstation P910 as a baseline for my benchmark data.  Enjoy Greg’s article here!

  • The CPU Information

The benchmark data is derived off of the running through the BGA (sp-5) ANSYS test case. CPU’s and how they perform with one of the very latest ANSYS releases, ANSYS Release 17.1.

  1.  Intel® Xeon® e5-2680 V4
  2.  Intel® Xeon® e5-2667 V4
  3.  Intel® Xeon® e5-2697a V4
  • It Is All About The Data
    • Only one workstation was used for the data in this ANSYS Test Case
    • No GPU Accelerator cards are used for the data
    • Solution solve times are in seconds
ANSYS 17.1 Benchmark BGA v17sp-5
Lenovo ThinkStation P910 2680 V4 CUBE w16i 2667 V4 CUBE w32i 2697A V4
Cores Customer X  – 28 Core @2.4GHz/ea CUBE w16i CUBE w132i tS
2 1016 380.9 989.6 1.03
4 626 229.6 551.1 1.14
8 461 168.7 386.6 1.19
12 323 160.7 250.5 1.29
16 265 161.7 203.3 1.30
20 261 0 176.9 1.48
24 246 0 158.1 1.56
28 327 0 151.8 2.15
31 0 0 145.2 2.25
32 0 0 161.7 2.02
15-Nov-16 PADT, Inc. – Tempe, AZ –
  • Cube w16i Workstation – Windows 10 Professional
    1 x 4U CUBE APPLIANCE
    2 x 8c @3.2GHz/ea
    Dual Socket motherboard
    256GB DDR4-2400 MHz LRDIMM
    6 x 600GB SAS3 15k RPM
    NVIDIA QUADRO K6000
  • CUBE w32i Workstation – Windows 10 Professional
    1 x 4U CUBE APPLIANCE
    2 x 16c @2.6GHz/ea
    Dual Socket motherboard
    256GB DDR4-2400 MHz LRDIMM
    2 x 600GB SAS3 15k RPM
    NVIDIA QUADRO M4000
  • Lenovo Thinkstation P910 Workstation – Windows 10 Professional
    Lenovo P910 Workstation
    2 x 14c @2.4GHz/ea
    Dual Socket motherboard
    128GB DDR4-2400 MHz
    512GB NVMe SSD / 2 x 4TB SATA HDD / 512GB SATA SSD
    NVIDIA QUADRO M2000

As you will may have noticed above, the CUBE workstation with the Intel Xeon e5-2697A V4 had the fastest solution solve time for one workstation.

  • *** Using 31 cores the CUBE w32i finished the sp-5 test case in 145.2 seconds.

See 32 Cores of Power! CUBE by PADT, Inc.

cube-w32i-coresCUBE w32i

CUBE w32i

CUBE by PADT, Inc. of ANSYS 17.1 Benchmark Data for sp-5
CUBE by PADT, Inc. of ANSYS 17.1 Benchmark Data for sp-5

Thank you!

http://www.cube-hvpc.com/

Just One CUBE With Just One Click! A 1.3x Speedup For ANSYS® Mechanical™

Greetings from the HPC numerical simulation proving grounds of PADT, Inc. in Tempe, Arizona. While bench marking the very latest version of ANSYS® Mechanical™ I learned something very significant and I need to share this information with you right now.As I gazed down on the data outputs from the new solve.out files, I began to notice something. Yes change indeed, something was different, something had changed.

A brief pause for emphasis, in regards in overall ANSYS® productivity and amazing improvements please read this post.

However, pertaining to this blog post, I am focusing on one very important HPC performance metric to me. It is one of the many HPC performance metrics that I have used when creating a balanced HPC server for engineering simulation.. But wait there is more! so please wait just a little bit longer, for very soon I will post even more juicy pieces of data garnered from taken from these new ANSYS® benchmark solver files.

To recap in all of its bullets points & glories:

  • For today and just for today, we are focusing on just one of the performance metrics.
    • The Time Spent Computing The Solution!
  • This 1.3x speedup in solve times was achieved using just one CUBE workstation and with just one click!
    • Open ANSYS®and while you are creating your solve.
    • Select, withjust one click either the INTEL MPI or IBM Platform MPI.
    • Next, run your test repeat as necessary using whichever MPI version that you did not start your test with.

The ANSYS® Mechanical™ Benchmark Description:

  • V15sp-5
    • Sparse solver, symmetric matrix, 6000k DOFs, transient, nonlinear, structural analysis with 1 iteration
    • GPU Accelerator or Co-Processor enabled for: NVIDIA and Intel Phi
    • A large sized job for direct solvers, should run incore on machines with 128 GB or more of memory, good test of processor flop speed if running incore and I/O if running out-of-core

CUBE ANSYS Numerical Simulation Appliance Used:

The ANSYS® Mechanical™ Benchmark Results:


TIME SPENT COMPUTING THE SOLUTION TIME SPENT COMPUTING THE SOLUTION
IBM Platform MPI INTEL MPI
Cores 2016 CUBE w16i-v4 2016 CUBE w16i-v4 This Speedup is…X faster!
2 396.1 380.9 1.04
4 239.7 229.6 1.04
6 210.1 196.7 1.07
8 182.9 168.7 1.08
10 167.2 161.4 1.04
12 167.1 160.7 1.04
14 196.1 151.3 1.30
16 184.7 161.7 1.14

justonecubejustoneclickspeedup

Wow! using these latest 14nm INTEL® XEON®  CPU’s, phew, I have been forever changed! As you can see from the data above, in just one simple click, changing from the IBM Platform MPI to using INTEL MPI and look! the benchmark time spent computing times are faster! A 1.3x Speedup!

Now in this specific benchmark example along with the use of the latest  ANSYS® Mechanical achieving a 1.3x speedup without spending another penny is very wise and not so foolish.

Disclaimer: Please check with your ANSYS Software Sales Representative for the very latest on solver updates and information. Because some of the models and compatibility can very on the . You may need to use the MS-MPI, INTEL-MPI or IBM Platform MPI for your distributed solving. If you are not sure please contact your local ANSYS® Corporate Software Sales or ANSYS® Software Channel Partner that was assigned specifically to you and/or your company.

References:

http://www.ansys.com/Solutions/Solutions-by-Role/IT-Professionals/Platform-Support/Benchmarks-Overview/ANSYS-Mechanical-Benchmarks

How To Install And Configure xRDP and Same Session xRDP on CentOS 6.7 / RHEL 6.7

Introduction

What s xRDP? Taking directly from the xRDP website.

“Xrdp is the main server accepting connections from RDP clients. Xrdp contains the RDP, security, MCS, ISO, and TCP layers, a simple window manager and a few controls. Its a multi threaded single process server. It is in this process were the central management of the sessions are maintained. Central management includes shadowing a session and administrating pop ups to users. Xrdp is control by the configuration file xrdp.ini.

RDP has 3 security levels between the RDP server and RDP client. Low, medium and high. Low is 40 bit, data from the client to server is encrypted, medium is 40 bit encryption both ways and high is 128 bit encryption both ways. Xrdp currently supports all 3 encryption levels via the xrdp.ini file. RSA key exchange is used with both client and server randoms to establish the RC4 keys before the client connect.

Modules are loaded at runtime to provide the real functionality. Many different modules can be created to present one of many different desktops to the user. The modules are loadable to conserve memory and support both GPL and non GPL modules.

Multi threaded to provide optimal user performance. One client can’t slow them all down. One multi threaded process is also required for session shadowing with any module. The module doesn’t have to consider shadowing, the xrdp server does it. For example, you could shadow a VNC, RDP or a custom module session all from the same shadowing tool.

Build in window manager for sending pop ups to any user running any module. Also can be user to provide connection errors or prompts.

libvnc

Libvnc, a VNC module for xrdp. Libvnc provides a connection to VNC servers. Its a simple client only supporting a few VNC encodings(raw, cursor, copyrect). Emphasis on being small and fast. Normally, the xrdp server and the Xvnc server are the same machine so bitmap compression encodings would only slow down the session.

librdp

Librdp, an RDP module for xrdp. Librdp provides a connection to RDP servers. It only supports RDP4 connections currently.

sesman

Sesman, the session manager. Sesman is xrdp’s session manager. Xrdp connect to sesman to verify the user name / password, and also starts the user session if credentials are ok. This is a multi process / Linux only session manager. Sessions can be started or viewed from the command line via sesrun.”

STEP 1 – Setup xRDP Same on your CUBE Linux Compute Server:

  1. Add the following repository for the needed extra packages for enterprise linux
    • rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
      I am using the platform CentOS release 6.7 – 64 Bit for this installation of xRDP
  2. Install xRDP
    • yum install xrdp tigervnc-server -y
  3. Start xRDP
    • service xrdp start
  4. Enter the following commands to ensure that the xRDP services restart on a reboot
    • chkconfig xrdp on
    • chkconfig vncserver on
  5. Add the ANSYS linux users into the following groups:
    • users & video
  6. Now try it out!

mstsc

STEP 2.0.0 (optional) – How To Setup xRDP Same Session Remote Desktop on your CUBE Linux Compute Server:

2.0.1) Login as root via SSH

2.0.2) cd /etc/xrdp/

2.0.3) Edit the xrdp.ini file as the root user. Open and edit the xrdp.ini file. For same session sharing, locate and modify the last line of the xrdp.ini configuration file.

  • Change from port=-1 to port=ask-1

vi /etc/X11/Xrdp.ini
[globals]
bitmap_cache=yes
bitmap_compression=yes
port=3389
crypt_level=high
channel_code=1
max_bpp=24
#black=000000
#grey=d6d3ce
#dark_grey=808080
#blue=08246b
#dark_blue=08246b
#white=ffffff
#red=ff0000
#green=00ff00
#background=626c72
[xrdp1]
name=sesman-Xvnc
lib=libvnc.so
username=ask
password=ask
ip=127.0.0.1
port=ask-1

xrdp-1

2.0.4) Save the xrdp.ini and restart the xrdp service (command is below)

  • service xrdp restart

2.0.5) Next, For you MPI local or distributed Users, edit the following files

  • cd /etc/pam.d/
  • edit the file xrdp-sesman
  • add –> session required pam_limits.so

2.0.6) For users of xRDP same session management.

  • cd /etc/pam.d/
  • edit the common-session file
  • add –> session required pam_limits.so

STEP 2.1.0 – Open the Microsoft Remote Desktop client on your Windows Machine.

  • Try logging in from two machines or two sessions of Microsoft Remote Desktop
  • Enter the hostname or IP address of your CUBE Linux Compute Server
  • Login

(see screen capture below)

xrdp-login-presentation-graphic

STEP 2.2.0 – Pay Attention to a few things while logging in.

  • For you the originator of the RDP desktop session:
  • AS You are logging into the linux machine note the port number used for your login as the login window script executes.
  • PORT 5910

(see screen capture below)

xrdp-port-num

  •  Login! The new xRDP console session has been created on the Linux machine.
    • This session is the remote desktop session that you created so that you can share the same desktop with another user.

STEP 2.3.0 – Login process for you the secondary RDP user:

  • As you begin the remote desktop login process. Enter the port provided or that was created for the primary user. Our primary user noted and informed you that the port number for his RDP session was 5910.
  • Enter this port number into your session window when entering your login information via RDP:

(see screen capture below)

xrdp-port-num-5910

  •  Click OK to login to the desktop
  • Success! You are now both logged into the same RDP session. Both users will see the same screen and cursor move being controlled by the one or other user.

(see screen capture below)

xrdp-same-session-login

Final Thoughts pertaining to xRDP/remote desktop connections and screen sharing on 64-bit Linux.

Other/Secondary users who do not need to login to an already running/existing remote desktop session. Do not enter the port number the server, leave the port setting as -1. Logging in this way ensure you will have a unique NONSHARED remote desktop experience.

The Primary or the originator of the xRDP session. Do not use SYSTEM –> LOGOUT to close the RDP session.Simply minimize the session or click the X to close your window out.

(see screen capture below)

click-the-x

Are you aware that ANSYS recently released ANSYS 17.0? Well for you ANSYS CFD users check out the beautiful ANSYS FLUENT 17.0 GUI for Linux. If you look close enough at the screen capture you will notice that I was running one of the ANSYS fluent benchmarks.

The External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m) an ANSYS FLUENT benchmark.

(see screen capture below)

fluent-17.0-picture4

References/Notes/Performance Tuning for xRDP:

xRDP website – xRDP

Nvidia’s website reference: NVIDIA Graphics Cards: NVIDIA How To

Performance Tuning:

Verify that you have the latest Nvidia graphics card driver and/or you are having openGL issues:

  1. Not sure what Nvidia graphics card you have?
    1. Try running this command –> lspci -k | grep -A 2 -E “(VGA|3D)”
  2. If you already have the Nvidia graphics card driver installed but you are unsure what driver version is currently installed.
    1. Try running this command –> nvidia-smi
  3. Direct rendering –> Yes or No
    1. glxinfo|head -n 25
    2. glxinfo | grep OpenGL

Uh Oh! If the output of these commands look something like what you see below:

$ glxinfo|head -n 25
Xlib: extension “GLX” missing on display “:11.0”.
Xlib: extension “GLX” missing on display “:11.0”.
Xlib: extension “GLX” missing on display “:11.0”.
Xlib: extension “GLX” missing on display “:11.0”.
Xlib: extension “GLX” missing on display “:11.0”.
Error: couldn’t find RGB GLX visual or fbconfig
Xlib: extension “GLX” missing on display “:11.0”.
Xlib: extension “GLX” missing on display “:11.0”.
Xlib: extension “GLX” missing on display “:11.0”.
Xlib: extension “GLX” missing on display “:11.0”.
Xlib: extension “GLX” missing on display “:11.0”.
Xlib: extension “GLX” missing on display “:11.0”.
name of display: :11.0

Then please add this information to the end of the xorg.conf file and reboot the server.

  • The xorg.conf file is located in: /etc/X11
  • Section “Module”
    Load “extmod”
    Load “dbe”
    Load “type1”
    Load “freetype”
    Load “glx”
    EndSection

Other Features of the NVIDIA Installer

Without options, the .run file executes the installer after unpacking it. The installer can be run as a separate step in the process, or can be run at a later time to get updates, etc. Some of the more important commandline options of nvidia-installer are:

nvidia-installer options

–uninstall
During installation, the installer will make backups of any conflicting files and record the installation of new files. The uninstall option undoes an install, restoring the system to its pre-install state.

–latest
Connect to NVIDIA’s FTP site, and report the latest driver version and the url to the latest driver file.

–update
Connect to NVIDIA’s FTP site, download the most recent driver file, and install it.

–ui=none
The installer uses an ncurses-based user interface if it is able to locate the correct ncurses library. Otherwise, it will fall back to a simple commandline user interface. This option disables the use of the ncurses library.

This how to install for xRDP was installed onto a CentOS release 6.7 (Final) Linux using PADT, Inc. – CUBE engineering simulation compute servers.

The very latest install guide for PuTTY and Xming is here!

cubelogo-2014

This how to describes how to install PuTTY and Xming and then hook the two together to provide you the end-user with an X Window System display server, a set of traditional sample X applications and tools, and a set of fonts. These two products will help to eliminate many of your frustrations! Xming features support of several languages that many of our ANSYS Analyst’s use here at PADT, Inc. We truly enjoy and use these two products. One reason for why would should be interested is that by combining Xming and PuTTY for use in numerical simulation Mesa 3D, OpenGL, and GLX 3D graphics extensions capabilities work amazingly well! Kudos to the programmers, we love you!

Program references:

Xming

PuTTY

Server: CUBE Linux 64-bit Server
Client: Windows 7 Professional 64-bit

Step 1 – Install PuTTY first (accept defaults)

Step 2 – Install Xming (accept defaults)
o Download and install the program and fonts for XMING files:
 Program:
 Fonts:

Double-check that the Normal PuTTy link with SSH client is checked

xming1

Step 3 – After the program has completed installation.

Step 4 – Install the Xming fonts that you had downloaded earlier.

Verify that Xming has been started. You will notice a new running task inside of your task bar. If you hover over the X icon in your taskbar. It should Say something like “Xming Server:0.0”

Now let us hook them together. It is X and PUTTY time!

Step 5 – Open your PUTTY application.

xming2

  • Enter the hostname or IP address.
  • Enter in a Session name:
  • On the left side bar within PUTTY. Locate –> Connection  and then expand out –> SSH –> X11
    o Check –> Enable X11 forwarding

xming3

Save the new session –> Locating on the left panel of your PUTTY program (you may need to scroll up a little bit).

Click on the text –> Session and then Save the new session.

xming2

 

Yay! now open your newly saved session and login to a CUBE linux server to test and verify.

I always forget to remember tell people this TIP but for multi display types:  Start Xming in -multiwindow mode.

How? from Command Prompt (the Windows cmd console) or create a desktop shortcut.

“C:\Program Files\Xming\Xming.exe” -multiwindow -clipboard

Have a Happy Valentines Day Weekend and do not forget to show the penguin some love too. This penguin looks lonely and maybe needs a date?

penguin_sh

Keyless SSH in two easy steps. Wait, What?!

cubelogo-2014Within the ANSYS community and even more specifically, in regards to the various numerical simulation techniques the ANSYS users use to solve their problems.

One of the most powerful approaches to overcome the limitations of new complex problems is take multiple CPU’s and link them together over a distributed network of computers. Further unpacking this for you the reader, one critical piece to using parallel processing using a quality high performance message passing interface (MPI). The latest IBM® Platform™ MPI version 9.1.3 Fix Pack 1 is provided for you in your release with ANSYS 17.0.

When solving a model using distributed parallel algorithms, lately the communication for authenticating your credentials to make this login process seamless is known as Secure Shell, or SSH. SSH is a cryptographic (encrypted) network protocol to allow remote login and other network services to operate securely over an unsecured network.

Today let us all take the mystery and hocus pocus out setting up your keyless or password ssh keys. As you will see this very easy process to complete.

I begin my voyage into keyless freedom by first logging into one of our CUBE Linux server’s.

ssh-1

STEP 1 – Create the key

  • Type ssh-keygen –t rsa
  • Press the enter key three times
    •  (in some instances as shown in the screen capture below, You may see a prompt asking you to overwrite. In that case type y)

ssh-2

STEP 2 – Apply the key

  • Type ssh-copy-id –i ~/.ssh/id_rsa.pub mastel@cs0.padtinc.com
  • Type ssh-copy-id –i ~/.ssh/id_rsa.pub mastel@cs1.padtinc.com
  • Enter your current password that you would use to login to cs1.padtinc.com

ssh-4

All Done!

Now give it a try and verify test.
Login to the first server you setup, In my case CS0.
At the terminal command prompt type ssh cs1

ssh-5

BEST PRACTICE TIP:

I find it is my best practice is to also repeat the ssh-copy-id command using the simple name at the same time on each of the server.

That command would look like:
1. After you have completed Step 2.a listed out below you will also perform the same command locally.
a. ssh-copy-id –i ~/.ssh/id_rsa.pub mastel@cs0
b. enter your old password press enter.

Done, Done!

Additional Links:

Making Old Desks New at PADT

  • whiteboard-desks-icon1It has been a long time since I have written any articles. I thought to get me back into the flow of writing I would share a recent fun project that I completed at work where I was able reuse and re-purpose abandoned 20 year-old office desks. The issue started out a frustration related to note taking and I wanted something better. What is my frustration, how did it start? It was started by simple pet peeve of my own. I do not like using paper to jot down quick ideas, thoughts or a to-do on! I write numerous quick notes down during my day at work.

    Some examples of my daily office dilemma:

  • Rapid fire phone calls that can bounce my phone off the desk.
  • I just have to jot something down less than a single sentence down.
  • A conference call occurs I need to capture a couple quick thoughts down because I am such a great active listener and don’t want to interrupt.
  • Even sketching out a quick design for a new CUBE HPC cluster or workstation.

My whys may not be your whys and I feel like it is a time & resource waste! You might too especially when I the thoughts go something like this.

Should I:

  • Use a new piece of paper to write quick notes on? Nope
  • Find the special square colored sticky things? Nope
  • Dig through the paper recycling bin and get strange looks from my co-workers? Nope
  • Cut my own square colored sticky note things? Nope
  • I can’t seem to find a pen, open a brand new box of pens? Nope
  • Take your notes on the electronic device of your choosing, okay which one phone, laptop, and/or tablet or how about use that conference room computer? Then I end up having quick notes and scribbles EVERYWHERE!
  • Sigh…

I hope those points made you laugh and frames a picture that I was not in my comfort zone. I knew what I wanted. I had used the same note taking process for years. Probably every day I would use my two whiteboards to write quick notes on. Whiteboards worked for me, I loved my whiteboards and life was good. What happened and where the frustration occurred was that I had four office desk moves over a time span of a year at PADT, Inc. Guess what happened the new office areas did not have whiteboards in them!

Here is a picture of a bunch of abandoned desks here at PADT, Inc. I walk past desks like these every day. Then during the office moving a thought occurred to me that maybe I could use paste or mat whiteboard type surface to them and make a whiteboard type desk?

whiteboard-desks-01I figured that someone had already thought of the idea already and remembered about a business trip that I took to California this past year. I remember walking through the insides of startup lab office building. You could feel the venture capital money pulsing through the office walls. This office building environment was sophisticated and exciting. What did I notice? I am sure you can think of some good examples. Haha, but what I found fascinating was groups of people collaborating with dry-erase markers in hand and notes scribbled over entire sections of walls. On huge conference room tables I even saw that large sections of glass walls where used. Boom! I had my solution.

I did my research and this is what I used.

The primer & the solution:

The cost:

  • About $50 and a few hours of time
    • One package of the dry erase can do about 3-4 coats for a 30 sq ft area, or about two thick coast on two desks.

The steps:

  1. Lightly sand the top until smooth.
  2. Clean the top of the desk.
  3. Mask the ends of the table
  4. Apply coat of primer
  5. Apply the solution
    1. After the third or fourth coat is on, wait 3 days for use.

The results:

whiteboard-desks-02 whiteboard-desks-03 whiteboard-desks-04

Do It!

A 3D Mouse Testimonial

The following is from an email that I received from Johnathon Wright.  I think he likes his brand new 3DConnexion Space Pilot Pro.
-David Mastel
  IT Manager
  PADT, Inc.

——————-

top-panel-deviceRecently PADT became a certified reseller for 3Dconnexion. Shortly following the agreement a sleek and elegant SpacePilot PRO landed on my desk. Immediately the ergonomic design, LCD display, and blue LED under the space ball appealed to the techie inside of me. As a new 3D mouse user I was a little skeptical about the effectiveness of this little machine, yet it quickly has gained my trust as an invaluable tool to any Designer or Engineer. On a daily basis it allows me to seamlessly transition from CAD to 3D printing software and then to Geomagic Scanning software, allowing dynamic control of my models, screen views, hotkeys and shortcuts.

Outside of its consistency as an exceptional 3D modeling aid, the SpacePilot PRO also has a configurable home screen that allows quick navigation of email, calendar or tasks. This ensures that I can keep in touch with my team without having to ever leave my engineering programs, which is invaluable to my production on a daily basis. Whether you are a first time user who is looking to tryout a 3D Mouse for the first time or an experienced 3D mouse user who is looking to upgrade, you need to check out the SpacePilot Pro. I can’t imagine returning to producing CAD models or manipulating scan data without one. Combine the SpacePilot PRO cross-compatibility with its programmability and ease of use and you have a quality computer tool that applies to a wide range of users who are looking at new ways to increase productivity.

Link to You Tube video – watch it do its thing along with a look at my 3D scanning workstation, the GEOCUBE: http://youtu.be/fsfkLPaZJe4

Johnathon Wright
Applications Engineer,
Hardware Solutions
PADT, Inc.

———————————————————————————————-
Editors Note:

Not familiar with what a 3D Mouse is?  It is a device that lets a use control 3D objects on their computer in an intuitive manner. Just as you move a 2D mouse on the plane of your desk, you spin a 3D Mouse in all three dimensions.  Learn more here

spacepilot-pro-cad-professional-2-209-p

“Launch, Leave & Forget” – A Personal Journey of an IT Manager into Numerical Simulation HPC and how PADT is taking Compute Servers & Workstations to the Next Level

fire_and_forget_missileLaunch, Leave & Forget was a phrase that was first introduced in the 1960’s. Basically the US Government was developing missiles that when fired would no longer be needed to be guided or watched by the pilot. The fighter pilot was directing the missile mostly by line of sight and calculated guesswork off to a target in the distance. The pilot often would be shot down or would break away too early from guiding the launch vehicle. Hoping and guess work is not something we strive for when lives are at stake.

So I say all of that to say this. As it relates to virtual prototyping, Launch, Leave & Forget for numerical simulation is something that I have been striving for at PADT, Inc.
Striving internally and for our 1,800 unique customers that really need our help. We are passionate and desire to empower our customers to become comfortable, feel free to be creative and able to step back and let it go! Many of us have a unique and rewarding opportunity to work with customers from the point of design/or even the first to pick up the phone call. Onward to virtual prototyping, product development, Rapid Manufacturing and lastly on to something you can bring into the physical world. A physical prototype that has already gone through 5000 numerical simulations. Unlike the engineers in the 1960’s who would maybe get one, two or three shots at a working prototype. I think it is amazing that a company could go through 5000 different prototypes before finally introducing one into the real world.

clusterAt PADT I continue to look and search for new ways to Launch, Leave & Forget. One passion of mine is computers. I first started using a computer when I was nine years old. I was programming in BASIC creating complex little FOR NEXT statements before I was in seventh grade. Let’s fast forward… so I arrived at PADT in 2005. I was amazed at the small company I had arrived at, creativity and innovation was bouncing off the ceiling at this company. I had never seen anything like it! Humbled on more than one occasion as most of the ANSYS CFD analysts knew as much about computers as I did! No, not the menial IT tasks like networking, domain user creation, backups. What the PADT CFD/FEA Analysts communicated sometimes loudly was that their computers were slow! Humbled again I would retort but you have the fastest machine in the building. How could it be slow?! Your machine here is faster than our webserver in fact this was going to be our new web server. In 2005 then at a stalemate we would walk away both wondering why they solve was so slow! Over the years I would observe numerous issues. I remember spending hours using this ANSYS numerical simulation software. It was new to me and it was complicated! I would often knock on an Analysts door and ask if they had a couple minutes to show me how to run a simulation. Some of the programs I would have to ask two or three times, ANSYS FEA, ANSYS CFX, FLUENT on and on. Often using a round robin approach because I didn’t want to inconvenience the ANSYS Analysts. Probably some early morning around 3am the various ANSYS programs and the hardware, it all clicked with me. I was off and running ANSYS benchmarks on my own! Freedom!! Now I could experiment with the hardware configs. Armed with the ANSYS Fluent, and ANSYS FEA benchmark suites I wanted to make the numerical simulations run as fast or faster than they ever imagined possible! I wanted to please these ANSYS guys, why because I had never met anyone like these guys. I wanted to give them the power they deserved.

“What is the secret sauce or recipe for creating an effective numerical simulation?”

This is a comment that I would hear often. It could be on a conference call with a new customer or internally from our own ANSYS CFD Analysts and/or ANSYS FEA Analysts. “David, all I really care about is When I click ‘Calculate Run’ within ANSYS when is going to complete.” Or “how can we make this solver run faster?”

The secret sauce recipe? Have we signed an NDA yet? Just kidding. I have had the unique opportunity to not just observe ANSYS but other CFD/FEA code running on compute hardware. Learning better ways of optimizing hardware and software. Here is a fairly typical situation of how a typical process for architecting hardware for use with ANSYS software goes.

Getting Involved Early

When the sales guys let me I am often involved at the very beginning of a qualifying lead opportunity. My favorite time to talk to a customer is when a new customer calls me directly at the office.

Nothing but the facts sir!

I have years’ worth of benchmarking data. Do your users have any benchmarking data? Quickly have them run one of the ANSYS standard benchmarks. Just one benchmark can reveal to you a wealth of information about their current IT infrastructure.

Get your IT team onboard early!

This is a huge challenge! In general here are a few roadblocks that smart IT people have in place:

IT MANAGER RULES 101

1) No! talking to sales people
2) No! talking to sales people on the phone
3) No! talking to sales people via email
4) No! talking to sales people at seminars
5) If your boss emails or calls and says “please talk to this sales person @vulture & hawk”. Wait about a week. Then if the boss emails back and says “did you talk to this salesperson yet?” Pick up the phone and call sales rep @vulture & hawk.

it1What is this a joke? Nope, Most IT groups operate like this. Many are under staffed andin constant fix it mode. Most say and think like this. “I would appreciate it if you sat in my chair for one day. My phone constantly rings, so I don’t pick it up or I let it go to voicemail (until the voicemail box files up). Email constantly swoops in so it goes to junk mail. Seminar invites and meet and greets keep coming in – nope won’t go. Ultimately I know you are going to try to sell me something”.

Who have they been talking to? Do they even know what ANSYS is? I have been humbled over the years when it comes to hardware. I seriously believed the fastest web server at that moment in time would make a fast numerical simulation server.

If I can get on the phone with another IT Manager 90% of the time the walls come down and we can talk our own language. What do they say to me? Well I have had IT Managers and Directors tell me they would never buy a compute cluster or compute workstation from me. “Oh well our policy states that only buy from big boy pants Computer, Inc., mom & pop shop #343,” or the best one was ‘the owner’s nephew. He builds computers on the side.”. They stand behind their walls of policy and circumstance. But, at the end of the calls they are normally asking us to send a quote to them.

repair

So, now what?

Well, do you really know your software? Have you spent hours running different hardware configurations of the same workstation? Observing the read/writes of an eight drive 600GB SAS3 15k RPM 12Gbps RAID 0 configuration. Is 3 drives for the OS and 5 drives for the Solving array the best configuration for the hardware and software? Huh? What’s that?? Oh boy…