ANSYS Discovery Live: Observations on What it Is and Suggestions for Trying it Out

Yesterday ANSYS, Inc. did a webinar about a technology that was going to “Change the way simulation is done.”  If you have been around the world of FEA and CFD for the 30+ years I have you have heard that statement before.  And rarely does the actual product change match the hype.  Not true for ANSYS Discovery Live.  If anything, I think they are holding back.  This is disruptive, this is a tool that will change how people do simulation.  In this post I’ll share my thoughts on what it is and why I think it is so transformative, and then in the second half (go ahead, if you don’t want to listen to me go on and on about how much I like this tool, skip ahead) there are some tips on how to get your hands on it to see for yourself.

What is ANSYS Discovery Live?

ANSYS Discovery Live is a new multiple physics simulation platform that combines several key ingredients to produce a software tool that engineers can use to do almost instantaneous virtual prototypes of the behavior of their designs directly from their solid models. The developers at ANSYS, Inc. have combined their knowledge of advanced solver technology, making solvers parallel for Graphical Processor Units (GPUs, high-end graphics cards), direct solid modeling (SpaceClaim), and some advanced stuff on the discretization side I don’t think I can talk about. All of those things embedded inside SpaceClaim make ANSYS Discovery Live.

Once you have a solid model in the tool, you simply define what physics you want to solve and some boundary conditions, then it solves.  In almost real time. Right there in front of you. The equivalent steps of meshing, building the model, solving it, extracting results, and displaying the results are done automatically. It may iterate a few times to converge on a solution, but in a few seconds, you will have a good enough answer to give you insight into your design.

And that is the key point. This is not a replacement for ANSYS Mechanical, FLUENT, or HFSS. It is a tool for exploring your designs and gaining insight into their behavior. It allows the design engineer, with very little training or expertise, to exercise their design and see what happens.

The product lives inside ANSYS Spaceclaim and can be installed on its own.  It runs on Windows and requires a NVidia graphics card with a newer GPU (see below for more on that).  Right now the product is in pre-release mode and anyone, yes anyone, can go to www.ansys.com/discovery and download it and try it out. And please, share your feedback.  Expect the product to be released in the first quarter of 2018. Pricing and bundling have not been firmed up yet, but from what we have seen the plans are reasonable and make sense.

Why is it Unique in the Industry?

Some of the first comments I saw on social media about ANSYS Discovery Live after the webinar were that it is not a unique tool.  There are other GPU based solvers out there. That is true. But even though those tools are super fast at solving, they have not been widely adopted.  The ANSYS product is unique because it: 1) combines GPU based solvers for multiple physics and 2) is built into a fully functioning solid modeling tools.  A third might be that it is also an ANSYS product, which means it will be backed technically and supported well.

Why I think that the Simple Fact that it Exists is Important?

During an interview for a magazine article about innovation in product development this week I was asked what is keeping innovation from happening more often.  My answer was that most companies with the resources, both money and people, to innovate are choosing to acquire rather than innovate internally.  They let others raise money, take all the risk, work out all the problems, deal with all the issues of trying to make something new. And then when they succeed, they buy them. There is nothing morally wrong with that approach, it is just inefficient and inaccurate.  Every innovation has to not only survive its technical challenges, it has to survive being a startup.

What ANSYS, Inc. has done is the opposite. They could have purchased a GPU based solver startup and checked the box. But instead, they took people from different business units, several that were acquired, and put them together and said: “innovate… but make it something very useful.”  And they did.  The fact that they executed on the logistics of a new product that used new and old technology across physics and across software development realms, is fantastic.  It makes me feel good about ANSYS, Inc’s true dedication to improving their products.

How will it Change Simulation?

In my career, I have had the same conversation dozens of times “Let me go out to the lab and tinker with it, I’ll figure out what is going on.” That is the way you had to explore your product to get a “feel” for what is going on. Simulation took too long and you became so wrapped up in the process of building and running a model that you could not really explore the behavior of your product. Now we can.

ANSYS Discovery Live is called Discovery Live not because anyone at ANSYS is a marketing genius (sorry guys…) but because that is what it lets you do. Discover the behavior of your product live. You simply play with it and see what happens. And this will change simulation because we know can move from verification or optimization to simply experimenting and gaining a deeper understanding, early in the design process. We will still do what is now I guess called traditional simulation.  We will need more accuracy, more complex physics, loads, and behavior.  But early on we can learn so much by virtually experimenting.

Is it the Perfect Tool Right out of the Box?

This is not a perfect-does-everything tool.  First off, it is a pre-release.  The basic functionality to make it useful is there.  More than I thought would be available in a first release. But there are limitations because it is new, or because of the approach.  It is not as accurate as more traditional approaches. The way it works takes some shortcuts on geometry and can’t include some behaviors. This should improve over time but it will never be accurate as more time-consuming approaches that simply have more functionality.

Over the next two to three years we will see it mature and add functionality and accuracy. The GPU’s the tool depends on will offer more performance for less money as well. This is a journey, but right now everyone I have talked to who has actually played with the pre-release is very happy with the functionality and accuracy that is there now. Because it is sufficient to do the experimentation and exploration it was designed to allow.

How do you Try it Out?

ANSYS, Inc. realized that this type of tool demos so well, and is so different, that a skeptical group of engineers will not accept what they see in a webinar as accurate.  So they have made the pre-release available for use. You can download it and install it, or explore with it in the cloud through your browser.

  • To get started, go to www.ansys.com/discovery and look around. The videos are awesome!  When you are ready to try it out, click on Download Now. Fill out the form. Don’t complain.  Yes you will get a few emails and a salesperson (gasp!) may call you. It’s worth some emails and maybe a phone call.
  • Set yourself up there.  There is a verification code step and once you put that in and create your login, you have to click on some legal agreements, including export controls.  Save your login info, you will need it to get back in.
  • After that either start the download or the Cloud Trial Option.  The cloud trial didn’t work for me, read below how I got to that function.
  • If you chose download it will download a big Zip File, over 1 GB. It is a full solid modeler and CFD/Structural/Thermal solver…  so it is big.
  • Once it is there, unzip, and  run Setup.exe. follow the steps and you will be there.
  • If you don’t have a graphics card that will run this, then use the cloud demo.  Like I said above, the button didn’t work for me.  If you have that problem or you want to use it after your first login, go to:
  • https://discoveryforum.ansys.com/ and click on “Getting Started.”
  • Scroll down a bit and find the “Cloud Trial” post. That one takes you to the page where you can find a server near you to try things out on. It’s pretty slick.
  • If you need to get back here, use https://discoveryforum.ansys.com/ and log in with the email and password you gave at registration,
 Here is a PDF Guide with even more details and a quick start.

Hardware Requirements

The only sticky bit about this whole thing is that it run a subset of Nvidia graphics cards. So you have to have one of those cards. According to the information in the forum:

ANSYS Discovery Live relies on the latest GPU technology to provide its computation and visual experience.  To run the software, you will require:

– A dedicated NVIDIA GPU card based on the Kepler, Maxwell or Pascal architecture. Most dedicated NVIDIA GPU cards produced in 2013 or later will be based on one of these architectures.
– At least 4GB of video RAM (8GB preferred) on the GPU.

Also, please ensure you have the latest driver for your graphics card, available from NVIDIA Driver Downloads.  You can also refer to the post on Graphics Performance Benchmarks. Performance of Discovery Live is less dependent on machine CPU and RAM.  A recent generation 64-bit CPU running Windows, and at least 4GB of RAM will be sufficient. If you do not have a graphics card that meets these specifications, the software will not run. However, you can try ANSYS Discovery Live through an online cloud-based trial, which requires only an internet browser and a reasonably fast internet connection.

I didn’t know if my GPU on my laptop would work, so I went to https://www.techpowerup.com and put in my card model (nvidia m500m) and it told me it was Maxwell technology.

Go Forth and Discover, and Share

Don’t hesitate, download this and try it out.  Even if you are a high-end combustion simulation expert that will never need it, if you are interested in Simulation you should still try it out.   Use the forum to share your thoughts and questions.  The gallery is already filling up with some fantastic real world examples.

PADT Named ANSYS North American Channel Partner of the Year and Becomes an ANSYS Certified Elite Channel Partner

The ANSYS Sales Team at PADT was honored last week when we were recognized four times at the recent kickoff meeting for the ANSYS North American Sales orginization.  The most humbling of those trips up to the stage was when PADT was recognized as the North American Channel Partner of the Year for 2016.  It was humbling because there are so many great partners that we have had the privilege of worked with for almost 20 years now.  Our team worked hard, and our customers were fantastic, so we were able to make strides in adding capability at existing accounts, finding new customers that could benefit from ANSYS simulation tools, and expanding our reach further in Southern California.  It helps that simulation driven product development actually works, and ANSYS tools allow it to work well.

Here we are on stage, accepting the award:

PADT Accepts the Channel Partner of the Year Award. (L-R: ANSYS CEO Ajei Gopal, ANSYS VP Worldwide Sales and Customer Excellence Rick Mahoney, ANSYS Director of WW Channel Ravi Kumar, PADT Co-Owner Ward Rand, PADT Co-Owner Eric Miller, PADT Software Sales Manager Bob Calvin, ANSYS VP Sales for the Americas Ubaldo Rodriguez

We were also recognized two other times; for exceeding our sales goals and for making the cut to the annual President’s Club retreat.   As a reminder, PADT sells the full multiphysics product line from PADT in Southern California, Arizona, New Mexico, Colorado, Utah, and Nevada.  This is a huge geographic area with a very diverse set of industries and customers.

In addition, ANSYS, Inc. announced that PADT was one of several Channel Partners who had obtained Elite Certified Channel Partner status. This will allow PADT to provide our customers with better services and gives our team access to more resources within ANSYS, Inc.

Once we made it back from the forests and hills of Western Pennsylvania we were able to get a picture with the full sale team.  Great job guys:

We could not have had such a great 2016 without the support of everyone at PADT. The sales team, the application engineers, the support engineers, business operations, and everyone else that pitches in.   We look forward to making more customers happy in 2017 and coming back with additional hardware.

ANSYS HPC Distributed Parallel Processing Decoded: CUBE Workstation

ANSYS HPC Distributed Parallel Processing Decoded: CUBE Workstation

Meanwhile, in the real world the land of the missing-middle:  To read and learn more about the missing middle please read this article by Dr. Stephen Wheat. Click Here

This blog post is about distributed parallel processing performance in a missing-middle world of science, tech, engineering & numerical simulation. I will be using two of PADT, Inc.’s very own CUBE workstations along with ANSYS 17.2. To illustrate facts and findings on the ANSYS HPC benchmarks. I will also show you how to decode and extract key bits of data out of your own ANSYS benchmark out files. This information will assist you with locating and describing the performance how’s and why’s on your own numerical simulation workstations and HPC clusters. With the use of this information regarding your numerical simulation hardware. You will be able to trust and verify your decisions. Assist you with understanding in addition to explaining the best upgrade path for your own unique situation. In this example, I am providing to you in this post. I am illustrating a “worst case” scenario.

You already know you need to increase your parallel processing solves times of your models. “No I am not ready with my numerical simulation results. No I am waiting on Matt to finish running the solve of his model.” “Matt said that it will take four months to solve this model using this workstation. Is this true?!”

  1. How do I know what to upgrade and/or you often find yourself asking yourself. What do I really need to buy?
    1. One or three ANSYS HPC Packs?
    2. Purchase more compute power? NVidia TESLA K80’s GPU Accelerators? RAM? A Subaru or Volvo?
  2. I have no budget. Are you sure? Often IT departments set a certain amount of money for component upgrades and parts. Information you learn in these findings may help justify a $250-$5000 upgrade for you.
  3. These two machines as configured will not break the very latest HPC performance speed records. This exercise is a live real world example of what you would see in the HPC missing middle market.
  4.  Benchmarks were formed months after a hardware and software workstation refresh was completed using NO BUDGET, zip, zilch, nada, none.

Backstory regarding the two real-world internal CUBE FEA Workstations.

  1. These two CUBE Workstations were configured on a tight budget. Only the components at a minimum were purchased by PADT, Inc.
  2. These two internal CUBE workstations have been in live production, in use daily for one or two years.
    1. Twenty-four hours a day seven days a week.
  3. These two workstations were both in desperate need of some sort of hardware and operating system refresh.
  4. As part of Microsoft upgrade initiative in 2016.  Windows 10 Professional was upgraded for free! FREE!

Again, join me in this post and read about the journey of two CUBE workstations being reborn and able to produce impressive ANSYS benchmarks to appease the sense of wining in pure geek satisfaction.

Uh-oh?! $$$

As I mentioned, one challenge that I set for myself on this mission is that I would not allow myself to purchase any new hardware or software. What? That is correct; my challenge was that I would not allow myself to purchase new components for the refresh.

How would I ever succeed in my challenge? Think and then think again.

Harvesting the components of old workstations recently piling up in the IT Lab over the past year! That was the solution. This idea just may be the idea I needed for succeeding in my NO BUDGET challenge. First, utilize existing compute components from old tired machines that had showed in the IT boneyard. Talk to your IT department, you never know what they find or remember that they had laying around in their own IT boneyard. Next, I would also use any RMA’d parts that I could find that had trickled in over the past year. Indeed, by utilizing these old feeder workstations, I was on my way to succeeding in my no budget challenge. The leftovers? Please do not email me for the discarded not worthy components handouts. There is nothing left, none, those components are long gone a nice benefit from our recent in-house next PADT Tech Recycle event.

*** Public Service Announcement *** Please remember to reuse, recycle and erase old computer parts from the landfills.

CUBE Workstation Specifications

PADT, Inc. – CUBE w12ik Numerical Simulation Workstation

(INTENAL PADT CUBE Workstation “CUBE #10”)
1 x CUBE Mid-Tower Chassis (SQ edition)

2 x 6c @3.4GHz/ea (INTEL XEON e5-2643 V3 CPU)

Dual Socket motherboard

16 x 16GB DDR4-2133 MHz ECC REG DIMM

1 x SMC LSI 3108 Hardware RAID Controller – 12 Gb/s

4 x 600GB SAS2 15k RPM – 6 Gb/s – RAID0

3 x 2TB SAS2 7200 RPM Hard Drives – 6 Gb/s (Mid-Term Storage Array – RAID5)

NVIDIA QUADRO K6000 (NVidia Driver version 375.66)

2 x LED Monitors (1920 x 1080)

Windows 10 Professional 64-bit

ANSYS 17.2

INTEL MPI 5.0.3

PADT, Inc. CUBE w16i-k Numerical Simulation Workstation

(INTENAL PADT CUBE Workstation “CUBE #14″)
1 x CUBE Mid-Tower Chassis

2 x 8c @3.2GHz/ea (INTEL XEON e5-2667 V4 CPU)

Dual Socket motherboard

8 x 32GB DDR4-2400 MHz ECC REG DIMM

1 x SMC LSI 3108 Hardware RAID Controller – 12 Gb/s

4 x 600GB SAS3 15k RPM 2.5” 12 Gb/s – RAID0

2 x 6TB SAS3 7.2k RPM 3.5” 12 Gb/s – RAID1

NVIDIA QUADRO K6000 (NVidia Driver version 375.66)

2 x LED Monitors (1920 x 1080)

Windows 10 Professional 64-bit

ANSYS 17.2

INTEL MPI 5.0.3

The ANSYS sp-5 Ball Grid Array Benchmark

ANSYS Benchmark Test Case Information

  • BGA (V17sp-5)
    • Analysis Type Static Nonlinear Structural
    • Number of Degrees of Freedom 6,000,000
    • Equation Solver Sparse
    • Matrix Symmetric
  • ANSYS 17.2
  • ANSYS HPC Licensing Packs required for this benchmark –> (2) HPC Packs
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to listen to your specific needs.

Comparing the data from the 12 core CUBE vs. a 16 core CUBE with and without GPU Acceleration enabled.

ANSYS 17.2 Benchmark  SP-5 Ball Grid Array
CUBE w12i-k 2643 v3 CUBE w12i-k 2643 v3 w/GPU Acceleration Total Speedup w/GPU CUBE w16i-k 2667 V4 CUBE w16i-k 2667 V4 w/GPU Acceleration Total Speedup w/GPU
Cores CUBE  w12i w/NVIDIA QUADRO K6000 CUBE  w12i w/NVIDIA QUADRO K6000 CUBE  w16i w/NVIDIA QUADRO K6000 CUBE  w16i w/NVIDIA QUADRO K6000
2 878.9 395.9 2.22 X 888.4 411.2 2.16 X
4 485.0 253.3 1.91 X 499.4 247.8 2.02 X
6 386.3 228.2 1.69 X 386.7 221.5 1.75 X
8 340.4 199.0 1.71 X 334.0 196.6 1.70 X
10 269.1 184.6 1.46 X 266.0 180.1 1.48 X
11 235.7 212.0 1.11 X
12 230.9 171.3 1.35 X 226.1 166.8 1.36 X
14 213.2 173.0 1.23 X
15 200.6 152.8 1.31 X
16 189.3 166.6 1.14 X
GPU NOT ENABLED ENABLED NOT ENABLED ENABLED
11/15/2016 & 1/5/2017
CUBE w12i-k v17sp-5 Benchmark Graph 2017
CUBE w12i-k v17sp-5 Benchmark Graph 2017
CUBE w16i-k v17sp-5 Benchmark Graph 2017
CUBE w16i-k v17sp-5 Benchmark Graph 2017

Initial impressions

  1. I was very pleased with the results of this experiment. Using the Am I bound bound or I/O bound overall parallel performance indicators the data showed healthy workstations that were both I/O bound. I assumed the I/O bound issue would happen. During several of the benchmarks, the data reveals almost complete system bandwidth saturation. Upwards of ~82 GB/s of bandwidth created during the in-core distributed solve!
  2. I was pleasantly surprised to see a 1.7X or greater solve speedup using one ANSYS HPC licensing pack and GPU Acceleration!

The when and where of numerical simulation performance bottleneck’s for numerical simulation. Similar to how the clock is ticking on the wall, over the years I have focused on the question of, “is your numerical simulation compute hardware compute bound or I/O bound”. This quick and fast benchmark result will show general parallel performance of the workstation and help you find the performance sweet spot for your own numerical simulation hardware.

As a reminder, to determine the answer to that question you need to record the results of your CPU Time For Main Thread, Time Spent Computing Solution and Total Elapsed Time. If the results time for my CPU Main is about the same as my Total Elapsed Time result. The compute hardware is in a Compute Bound situation. If the Total Elapsed Time result is larger than the CPU Time For Main Thread than the compute hardware is I/O bound. I did the same analysis with these two CUBE workstations. I am pickier than most when it comes to tuning my compute hardware. So often I will use a percentage around 95 percent. The percentage column below determines if the workstation is Compute Bound or O/O bound. Generally, what I have found in the industry, is that a percentage of greater than 90% indicates the workstation is wither Compute Bound, I/O bound or in worst-case scenario is both.

**** Result sets data garnered from the ANSYS results.out files on these two CUBE workstations using ANSYS Mechanical distributed parallel solves.

Data mine that ANSYS results.out file!

The data is all there, at your fingertips waiting for you to trust and verify.

Compute Bound or I/O bound

Results 1 – Compute Cores Only

w12i-k

“CUBE #10”

Cores CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 914.2 878.9 917.0 99.69 YES NO
4 4 517.2 485.0 523.0 98.89 YES NO
6 6 418.8 386.3 422.0 99.24 YES NO
8 8 374.7 340.4 379.0 98.87 YES NO
10 10 302.5 269.1 307.0 98.53 YES NO
11 11 266.6 235.7 273.0 97.66 YES NO
12 12 259.9 230.9 268.0 96.98 YES NO
w16i-k

“CUBE #14”

Cores CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 925.8 888.4 927.0 99.87 YES NO
4 4 532.1 499.4 535.0 99.46 YES NO
6 6 420.3 386.7 425.0 98.89 YES NO
8 8 366.4 334.0 370.0 99.03 YES NO
10 10 299.7 266.0 303.0 98.91 YES NO
12 12 258.9 226.1 265.0 97.70 YES NO
14 14 244.3 213.2 253.0 96.56 YES NO
15 15 230.3 200.6 239.0 96.36 YES NO
16 16 219.6 189.3 231.0 95.06 YES NO

Results 2 – GPU Acceleration + Cores

w12i-k

“CUBE #10”

Cores  + GPU CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 416.3 395.9 435.0 95.70 YES YES
4 4 271.8 253.3 291.0 93.40 YES YES
6 6 251.2 228.2 267.0 94.08 YES YES
8 8 219.9 199.0 239.0 92.01 YES YES
10 10 203.2 184.6 225.0 90.31 YES YES
11 11 227.6 212.0 252.0 90.32 YES YES
12 12 186.0 171.3 213.0 87.32 NO YES
CUBE 14 Cores + GPU CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 427.2 411.2 453.0 94.30 YES YES
4 4 267.9 247.8 286.0 93.67 YES YES
6 6 245.4 221.5 259.0 94.75 YES YES
8 8 219.6 196.6 237.0 92.66 YES YES
10 10 201.8 180.1 222.0 90.90 YES YES
12 12 191.2 166.8 207.0 92.37 YES YES
14 14 195.2 173.0 217.0 89.95 NO YES
15 15 172.6 152.8 196.0 88.06 NO YES
16 16 177.1 166.6 213.0 83.15 NO YES

Identifying Memory, I/O, Parallel Solver Balance and Performance

Results 3 – Compute Cores Only

w12i-k

“CUBE #10”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve No GPU Maximum RAM used in GB
0.9376 0.8399 662.822706 5.609852 19123.88932 19.1 78
0.8188 0.8138 355.367914 3.082555 35301.9759 35.3 85
0.6087 0.6913 283.870728 2.729568 39165.1946 39.2 84
0.3289 0.4771 254.336758 2.486551 43209.70175 43.2 91
0.5256 0.644 191.218882 1.781095 60818.51624 60.8 94
0.5078 0.6805 162.258872 1.751974 61369.6918 61.4 95
0.3966 0.5287 157.315184 1.633994 65684.23821 65.7 96
w16i-k

“CUBE #14”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve No GPU Maximum RAM used in GB
0.9376 0.8399 673.225225 6.241678 17188.03613 17.2 78
0.8188 0.8138 368.869242 3.569551 30485.70397 30.5 85
0.6087 0.6913 286.269409 2.828212 37799.17161 37.8 84
0.3289 0.4771 251.115087 2.701804 39767.17792 39.8 91
0.5256 0.644 191.964388 1.848399 58604.0123 58.6 94
0.3966 0.5287 155.623476 1.70239 63045.28808 63.0 96
0.5772 0.6414 147.392121 1.635223 66328.7728 66.3 101
0.6438 0.5701 139.355605 1.484888 71722.92484 71.7 101
0.5098 0.6655 130.042438 1.357847 78511.36377 78.5 103

Results 4 – GPU Acceleration + Cores

w12i-k

“CUBE #10”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve % GPU Accelerated The Solve Maximum RAM used in GB
0.9381 0.8405 178.686155 5.516205 19448.54863 19.4 95.78 78
0.8165 0.8108 124.087864 3.031092 35901.34876 35.9 95.91 85
0.6116 0.6893 122.433584 2.536878 42140.01391 42.1 95.74 84
0.3365 0.475 112.33829 2.351058 45699.89654 45.7 95.81 91
0.5397 0.6359 103.586986 1.801659 60124.33358 60.1 95.95 94
0.5123 0.6672 137.319938 1.635229 65751.09125 65.8 85.17 95
0.4132 0.5345 97.252285 1.562337 68696.85627 68.7 95.75 97
w16i-k

“CUBE #14”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve % GPU Accelerated The Solve Maximum RAM used in GB
0.9381 0.8405 200.007118 6.054831 17718.44411 17.7 94.96 78
0.8165 0.8108 122.200896 3.357233 32413.68282 32.4 95.20 85
0.6116 0.6893 122.742966 2.624494 40733.2138 40.7 94.91 84
0.3365 0.475 114.618006 2.544626 42223.539 42.2 94.97 91
0.5397 0.6359 105.4884 1.821352 59474.26914 59.5 95.18 94
0.4132 0.5345 96.750618 1.988799 53966.06502 54.0 94.96 97
0.5825 0.6382 106.573973 1.989103 54528.26599 54.5 88.96 101
0.6604 0.566 91.345275 1.374242 77497.60151 77.5 92.21 101
0.5248 0.6534 107.672641 1.301668 81899.85539 81.9 85.07 103

The ANSYS results.out file – The decoding continues

CUBE w12i-k (“CUBE #10”)

  1. Elapsed Time Spent Computing The Solution
    1. This value determines how efficient or balanced the hardware solution for running in distributed parallel solving.
      1. Fastest Solve Time For CUBE 10
    2. 12 out of 12 Cores w/GPU @ 171.3 seconds Time Spent Computing The Solution
  2. Elapsed Time
    1. This value is the actual time to complete the entire solution process. The clock on the wall time.
    2. Fastest Time For CUBE10
      1. 12 out of 12 w/GPU @ 213.0 seconds
  3. CPU Time For Main Thread
    1. This value indicates the RAW number crunching time of the CPU.
    2. Fastest Time For CUBE10
      1. 12 out of 12 w/GPU @186.0 seconds
  4. GPU Acceleration
    1. The NVidia Quadro K6000 accelerated ~96% of the matrix factorization flops
    2. Actual percentage of GPU accelerated flops = 95.7456
  5. Cores and storage solver performance 12 out of 12 cores and using 1 NVidia Quadro K6000
    1. ratio of nonzeroes in factor (min/max) = 0.4132
    2. ratio of flops for factor (min/max) = 0.5345
      1. These two values above indicate to me that the system is well taxed for compute power/hardware viewpoint.
    3. Effective I/O rate (MB/sec) for solve = 68696.856274 (or 69 GB/sec)
      1. No issues here indicates that the workstation has ample bandwidth available for the solving.

CUBE w16i-k (“CUBE #14”)

  1. Elapsed Time Spent Computing The Solution
    1. This value determines how efficient or balanced the hardware solution for running in distributed parallel solving.
    2. Fastest Time For CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 152.8 seconds
  2. Elapsed Time
    1. This value is the actual time to complete the entire solution process. The clock on the wall time.
    2. CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 196.0 seconds
  3. CPU Time For Main Thread
    1. This value indicates the RAW number crunching time of the CPU.
    2. CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 172.6 seconds
  4. GPU Acceleration Percentage
    1. The NVIDIA QUADRO K6000 accelerated ~92% of the matrix factorization flops
    2. Actual percentage of GPU accelerated flops = 92.2065
  5. Cores and storage 12 out of 12 cores and one Nvidia Quadro K6000
    1. ratio of nonzeroes in factor (min/max) = 0.6604
    2. ratio of flops for factor (min/max) = 0.566
      1. These two values above indicate to me that the system is well taxed for compute power/hardware.
    3. Please note that when reviewing these two data points. A balanced solver performance is when both of these values are as close to 1.0000 as possible.
      1. At this point the compute hardware is no longer as efficient and these values will continue to move farther away from 1.0000.
    4. Effective I/O rate (MB/sec) for solve = 77497.6 MB/sec (or ~78 GB/sec)
      1. No issues here indicates that the workstation has ample bandwidth with fast I/O performance for in-core SPARSE Solver solving.
    1. Maximum amount of RAM used by the ANSYS distributed solve
      1. 103GB’s of RAM needed for in-core solve

Conclusions Summary And Upgrade Path Suggestions

It is important for you to locate your bottleneck on your numerical simulation hardware. By utilizing data provided in the ANSYS results.out files, you will be able to logically determine your worst parallel performance inhibitor and plan accordingly on how to resolve what is slowing the parallel performance of your distributed numerical simulation solve.

I/O Bound and/or Compute Bound Summary

  • I/O Bound
    • Both CUBE w12i-k “CUBE #10” and w16i-k “CUBE #14” are I/O Bound.
      • Almost immediately when GPU Acceleration is enabled.
      • When GPU Acceleration is not enabled, I/O bound is no longer an issue compute solving performance. However solve times are impacted due to available and unused compute power.
  • Compute Bound
    • Both CUBE w12i-k “CUBE #10” and w16i-k “CUBE #14” would benefit from additional Compute Power.
    • CUBE w12i-k “CUBE #10” would get the most bang for the buck by adding in the additional compute power.

Upgrade Path Recommendations

CUBE w12i-k “CUBE #10”

  1. I/O:
    1. Hard Drives
    2. Remove & replace the previous generation hard drives
      1. 3.5″ SAS2.0 6Gb/s 15k RPM Hard Drives
    3. Hard Drives could be upgraded to Enterprise Class SSD or PCIe NVMe
      1. COST =  HIGH
    1. Hard Drives could be upgraded to SAS 3.0 12 Gb/s Drives
      1. COST =  MEDIUM
  2.  RAM:
    1. Remove and replace the previous generation RAM
    2. Currently all available RAM slots of RAM are populated.
      1. Optimum slots per these two CPU’s are four slots of RAM per CPU. Currently eight slots of RAM per CPU are installed.
    3. RAM speeds 2133MHz ECC REG DIMM’
      1. Upgrade RAM to DDR4-2400MHz LRDIMM RAM
      2. COST =  HIGH
  3. GPU Acceleration
    1. Install a dedicated GPU Accelerator card such as an NVidia Tesla K40 or K80
    2. COST =  HIGH
  4.  CPU:
    1. Remove and replace the current previous generation CPU’s:
    2. Currently installed dual  x INTEL XEON e5-2643 V3
    3. Upgrade the CPU’s to the V4 (Broadwell) CPU’s
      1. COST =  HIGH

CUBE w16i-k “CUBE #14”

  1. I/O: Hard Drives SAS3.0 15k RPM Hard Drives 12Gbps 2.5”
    1.  Replace the current 2.5” SAS3 12Gb/s 15k RPM Drives with Enterprise Class SSD’s or PCIe NVMe disk
      1. COST =  HIGH
    2. Replace the 2.5″ SAS3 12 Gb/s hard drives with 3.5″ hard drives.
      1. COST =  HIGH
    3. INTEL 1.6TB P3700 HHHL AIC NVMe
      1. Click Here: https://www-ssl.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-p3700-series.html
  2. Currently a total of four Hard Drives are installed
    1. Increase existing hard drive count from four hard drives to a total ofsix or eight.
    2. Change RAID configuration to RAID 50
      1. COST =  HIGH
  3. RAM:
    1. Using DDR4-2400Mhz ECC REG DIMM’s
      1. Upgrade RAM to DDR4-2400MHz LRDIMM RAM
      2. COST =  HIGH

Considering RAM: When determining how much System RAM you need to perform a six million degree of freedom ANSYS numerical simulation. Add the additional amounts to your Maximum Amount of RAM used number indicated in your ANSYS results.out file.

  • ANSYS reserves  ~5% of your RAM
  • Office products can use an additional l ~10-15% to the above number
  • Operating System please add an additional ~5-10% for the Operating System
  • Other programs? For example, open up your windows task manager and look at how much RAM your anti-virus program is consuming. Add for the amount of RAM consumed by these other RAM vampires.

Terms & Definition Goodies:

  • Compute Bound
    • A condition that occurs when your CPU processing power sites idle while the CPU waits for the next set of instructions to calculate. This occurs most often when hardware bandwidth is unable to feed the CPU more data to calculate.
  • CPU Time For Main Thread
    • CPU time (or process time) is the amount of time for which a central processing unit (CPU) was used for processing instructions of a computer program or operating system, as opposed to, for example, waiting for input/output (I/O) operations or entering low-power (idle) mode.
  • Effective I/O rate (MB/sec) for solve
    • The amount of bandwidth used during the parallel distributed solve moving data from storage to CPU input and output totals.
    • For example the in-core 16 core + GPU solve using the CUBE w16i-k reached an effective I//O rate of 82 GB/s.
    • Theoretical system level bandwidth possible is ~96 GB/s
  • IO Bound
    • The ability for the input-output of the system hardware for reading, writing and flow of data pulsing through the system has become inefficient and/or detrimental to running an efficient parallel analysis.
  • Maximum total memory used
    • The maximum amount of memory used by analysis during your analysis.
  • Percentage (%) GPU Accelerated The Solve
    • The percentage of acceleration added to your distributed solve provided by the Graphics Processing Unit (GPU). The overall impact of the GPU will be diminished due to slow and saturated system bandwidth of your compute hardware.
  • Ratio of nonzeroes in factor (min/max)
    • A performance indicator of efficient and balanced the solver is performing on your compute hardware. In this example the solver performance is most efficient when this value is as close to the value of 1.0.
  • Ratio of flops for factor (min/max)
    • A performance indicator of efficient and balanced the solver is performing on your compute hardware. In this example the solver performance is most efficient when this value is as close to the value of 1.0.
  • Time (cpu & wall) for numeric factor
    • A performance indicator used to determine how the compute hardware bandwidth is affecting your solve times. When time (cpu & wall) for numeric factor & time (cpu & wall) for numeric solve values are somewhat equal it means that your compute hardware I/O bandwidth is having a negative impact on the distributed solver functions.
  • Time (cpu & wall) for numeric solve
    • A performance indicator used to determine how the compute hardware bandwidth is affecting your solve times. When time (cpu & wall) for numeric solve & time (cpu & wall) for numeric factor values are somewhat equal it means that your compute hardware I/O bandwidth is having a negative impact on the distributed solver functions.
  • Total Speedup w/GPU
    • Total performance gain for compute systems task using a Graphics Processing Unit (GPU).
  • Time Spent Computing Solution
    • The actual clock on the wall time that it took to compute the analysis.
  • Total Elapsed Time
    • The actual clock on the wall time that it took to complete the analysis.

References:

CUBE Systems are Now Part of the ANSYS, Inc. HPC Partner Program

CUBE-HVPC-Logo-wide_thumb.png

The relationship between ANSYS, Inc. and PADT is a long one that runs deep. And that relationship just got stronger with PADT joining the HPC Partner Program with our line of CUBE compute systems specifically designed for simulation. The partner program was set up by ANSYS, Inc. to work:

CUBE-HVPC-512-core-closeup3-1000h_thumb.jpg“… with leaders in high-performance computing (HPC) to ensure that the engineering simulation software is optimized on the latest computing platforms. In addition, HPC partners work with ANSYS to develop specific guidelines and recommended hardware and system configurations. This helps customers to navigate the rapidly changing HPC landscape and acquire the optimum infrastructure for running ANSYS software. This mutual commitment means that ANSYS customers get outstanding value from their overall HPC investment.”

CUBE-HVPC-512-core-stairs-1000h_thumb.jpg

PADT is very excited to be part of this program and to contribute to the ANSYS/HPC community as much as we can.  Users know they can count on PADT’s strong technical expertise with ANSYS Mechanical, ANSYS Mechanical APDL, ANSYS FLUENT, ANSYS CFX, ANSYS Maxwell, ANSYS HFSS, and other ANSYS, Inc. products, a true differentiator when compared with other hardware providers.

Customers around the US have fallen in love with their CUBE workstations, servers, mini-clusters, and clusters finding them to be the right mix between price and performance. CUBE systems let users carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software.

Assembled by PADT’s IT staff, CUBE computing systems are delivered with the customer’s simulation software loaded and tested. We configure each system specifically for simulation, making choices based upon PADT’s extensive experience using similar systems for the same kind of work. We do not add things a simulation user does not need, and focus on the hardware and setup that delivers performance.

CUBE-HVPC-512-core-front1-1000h_thumb.jpg

Is it time for you to upgrade your systems?  Is it time for you to “step out of the box, and step in to a CUBE?”  Download a brochure of typical systems to see how much your money can actually buy, visit the website, or contact us.  Our experts will spend time with you to understand your needs, your budget, and what your true goals are for HPC. Then we will design your custom system to meet those needs.

 

Ready to go for Turbo Expo 2013 in San Antonio

booth2The PADT and Flownex teams have our booth set up and ready to go for the next three days at Turbo Expo 2013.

This is always one of our favorite events because most of us came from this industry, and in fact all four of the founders were turbine-engine-engineers before we started PADT.  A special part of this years event is that we are introducing Flownex to the North American Turbo community as well as our CUBE HVPC computer systems.  So lots of new things to talk about along with our established offerings of ANSYS, Inc software consulting, customization, and training.

If you are there, please make sure you stop by our booth. We would love to see you and chat.

 

Here is our press release on the event:

p0

 

Trusted Partners for Turbomachinery Simulation

p1

The ASME Turbo Expo is the industry show where PADT feels at home the most.  Founded by experienced turbine engine simulation, design, and manufacturing engineers, the company has a true understanding of the real world needs of those who are focused on simulation for Turbomachinery.

Our primary focus for this year’s conference will be the full introduction of the Flownex Simulation Environment to North America.  This thermal-fluid system simulation tool started life as a solver for combustor analysis, and has grown up to be a full featured toolset that can model any fluid-thermal network in your engine or pump.  Flownex is ideal simulation software for the quick thermo-fluid analysis of gas turbine performance.

p2
It provides aircraft engine design and system engineers with the ability to simulate complicated air and gas flow patterns through fans, compressors and turbines; match compressor and turbine power and compile maps; calculate thrust, shaft power, combustion calculations with convection, conduction and radiation heat transfer; and determine fuel consumption.  If you are using an in-house tool or software written for other applications to model your flow networks, please come by to see how Flownex can reduce the amount of time you spend modeling your systems while increasing the fidelity of your models.s grown up to be a full featured toolset that can model any fluid-thermal network in your engine or pump.  Flownex is ideal simulation software for the quick thermo-fluid analysis of gas turbine performance.

p3

PADT’s reputation in the Turbomachinery industry is built on our expertise selling, using, supporting, and customizing the complete suite of ANSYS FEA and CFD.  Turbo companies come to us for training on ANSYS software, customization of analysis tools, FEA and CFD outsourcing, and HPC hardware because they know we know their business and how to maximize the return on their investment in simulation.  We can help anyone doing simulation on Turbomachinery in a variety of ways, stop on by to find out how.

p4

Another new area the PADT provides this type of help to turbo companies is by offering a complete line of High Value Performance Computer systems specifically designed for the simulation user.  From workstations to large clusters, PADT can custom design a system that hits the sweet spot between cost and performance, delivering faster turnaround of CFD and FEA runs for considerably less than systems offered by general purpose computer suppliers.

Stop by our booth to look at the hardware, software, training, and consulting that we offer to companies around the world to help them make their studies more efficient and effective.