Using External Data in ANSYS Mechanical to Tabular Loads with Multiple Variables

ANSYS Mechanical is great at applying tabular loads that vary with an independent variable. Say time or Z.  What if you want a tabular load that varies in multiple directions and time. You can use the External Data tool to do just that. You can also create a table with a single variable and modify it in the Command Editor.

In the Presentation below, I show how to do all of this in a step-by-step description.

PADT-ANSYS-Tabular-Loading-ANSYS-18

You can also download the presentation here.

Experiences with Developing a “Somewhat Large” ACT Extension in ANSYS

With each release of ANSYS the customization toolkit continues to evolve and grow.  Recently I developed what I would categorize as a decent sized ACT extension.    My purpose in this post is to highlight a few of the techniques and best practices that I learned along the way.

Why I chose C#?

Most ACT extensions are written in Python.  Python is a wonderfully useful language for quickly prototyping and building applications, frankly of all shapes and sizes.  Its weaker type system, plethora of libraries, large ecosystem and native support directly within the ACT console make it a natural choice for most ACT work.  So, why choose to move to C#?

The primary reasons I chose to use C# instead of python for my ACT work were the following:

  1. I prefer the slightly stronger type safety afforded by the more strongly typed language. Having a definitive compilation step forces me to show my code first to a compiler.  Only if and when the compiler can generate an assembly for my source do I get to move to the next step of trying to run/debug.  Bugs caught at compile time are the cheapest and generally easiest bugs to fix.  And, by definition, they are the most likely to be fixed.  (You’re stuck until you do…)
  2. The C# development experience is deeply integrated into the Visual Studio developer tool. This affords not only a great editor in which to write the code, but more importantly perhaps the world’s best debugger to figure out when and how things went wrong.   While it is possible to both edit and debug python code in Visual Studio, the C# experience is vastly superior.

The Cost of Doing ACT Business in C#

Unfortunately, writing an ACT extension in C# does incur some development cost in terms setting up the development environment to support the work.  When writing an extension solely in Python you really only need a decent text editor.  Once you setup your ACT extension according to the documented directory structure protocol, you can just edit the python script files directly within that directory structure.  If you recall, ACT requires an XML file to define the extension and then a directory with the same name that contains all of the assets defining the extension like scripts, images, etc…  This “defines” the extension.

When it comes to laying out the requisite ACT extension directory structure on disk, C# complicates things a bit.  As mentioned earlier, C# involves a compilation step that produces a DLL.  This DLL must then somehow be loaded into Mechanical to be used within the extension.  To complicate things a little further, Visual Studio uses a predefined project directory structure that places the build products (DLLs, etc…) within specific directories of the project depending on what type of build you are performing.   Therefore the compiled DLL may end up in any number of different directories depending on how you decide to build the project.  Finally, I have found that the debugging experience within Visual Studio is best served by leaving the DLL located precisely wherever Visual Studio created it.

Here is a summary list of the requirements/problems I encountered when building an ACT extension using C#

  1. I need to somehow load the produced DLL into Mechanical so my extension can use it.
  2. The DLL that is produced during compilation may end up in any number of different directories on disk.
  3. An ACT Extension must conform to a predefined structural layout on the filesystem. This layout does not map cleanly to the Visual studio project layout.
  4. The debugging experience in Visual Studio is best served by leaving the produced DLL exactly where Visual Studio left it.

The solution that I came up with to solve these problems was twofold.

First, the issue of loading the proper DLL into Mechanical was solved by using a combination of environment variables on my development machine in conjunction with some Python programming within the ACT main python script.  Yes, even though the bulk of the extension is written in C#, there is still a python script to sort of boot-load the extension into Mechanical.  More on that below.

Second, I decided to completely rebuild the ACT extension directory structure on my local filesystem every time I built the project in C#.  To accomplish this, I created in visual studio what are known as post-build events that allow you to specify an action to occur automatically after the project is successfully built.  This action can be quite generic.  In my case, the “action” was to locally run a python script and provide it with a few arguments on the command line.  More on that below.

Loading the Proper DLL into Mechanical

As I mentioned above, even an ACT extension written in C# requires a bit of Python code to bootstrap it into Mechanical.  It is within this bit of Python that I chose to tackle the problem of deciding which dll to actually load.  The code I came up with looks like the following:

Essentially what I am doing above is querying for the presence of a particular environment variable that is on my machine.  (The assumption is that it wouldn’t randomly show up on end user’s machine…) If that variable is found and its value is 1, then I determine whether or not to load a debug or release version of the DLL depending on the type of build.  I use two additional environment variables to specify where the debug and release directories for my Visual Studio project exist.  Finally, if I determine that I’m running on a user’s machine, I simply look for the DLL in the proper location within the extension directory.  Setting up my python script in this way enables me to forget about having to edit it once I’m ready to share my extension with someone else.  It just works.

Rebuilding the ACT Extension Directory Structure

The final piece of the puzzle involves rebuilding the ACT extension directory structure upon the completion of a successful build.  I do this for a few different reasons.

  1. I always want to have a pristine copy of my extension laid out on disk in a manner that could be easily shared with others.
  2. I like to store all of the various extension assets, like images, XML files, python files, etc… within the Visual Studio Project. In this way, I can force the project to be out of date and in need of a rebuild if any of these files change.  I find this particularly useful for working with the XML definition file for the extension.
  3. Having all of these files within the Visual Studio Project makes tracking thing within a version control system like SVN or git much easier.

As I mentioned before, to accomplish this task I use a combination of local python scripting and post build events in Visual Studio.  I won’t show the entire python code, but essentially what it does is programmatically work through my local file system where the C# code is built and extract all of the files needed to form the ACT extension.  It then deletes any old extension files that might exist from a previous build and lays down a completely new ACT extension directory structure in the specified location.  The definition of the post build event is specified within the project settings in Visual Studio as follows:

As you can see, all I do is call out to the system python interpreter and pass it a script with some arguments.  Visual Studio provides a great number of predefined variables that you can use to build up the command line for your script.  So, for example, I pass in a string that specifies what type of build I am currently performing, either “Debug” or “Release”.  Other strings are passed in to represent directories, etc…

The Synergies of Using Both Approaches

Finally, I will conclude with a note on the synergies you can achieve by using both of the approaches mentioned above.  One of the final enhancements I made to my post build script was to allow it to “edit” some of the text based assets that are used to define the ACT extension.  A text based asset is something like an XML file or python script.  What I came to realize is that certain aspects of the XML file that define the extension need to be different depending upon whether or not I wish to debug the extension locally or release the extension for an end user to consume.  Since I didn’t want to have to remember to make those modifications before I “released” the extension for someone else to use, I decided to encode those modifications into my post build script.  If the post build script was run after a “debug” build, I coded it to configure the extension for optimal debugging on my local machine.  However, if I built a “release” version of the extension, the post build script would slightly alter the XML definition file and the main python file to make it more suitable for running on an end user machine.   By automating it in this way, I could easily build for either scenario and confidently know that the resulting extension would be optimally configured for the particular end use.

Conclusions

Now that I have some experience in writing ACT extensions in C# I must honestly say that I prefer it over Python.  Much of the “extra plumbing” that one must invest in in order to get a C# extension up and running can be automated using the techniques described within this post.  After the requisite automation is setup, the development process is really straightforward.  From that point onward, the increased debugging fidelity, added type safety and familiarity a C based language make the development experience that much better!  Also, there are some cool things you can do in C# that I’m not 100% sure you can accomplish in Python alone.  More on that in later posts!

If you have ideas for an ACT extension to better serve your business needs and would like to speak with someone who has developed some extensions, please drop us a line.  We’d be happy to help out however we can!

 

Connection Groups and Your Sanity in ANSYS Mechanical

You kids don’t know how good you have it with automatic contact creation in Mechanical.  Back in my day, I’d have to use the contact wizard in MAPDL or show off my mastery of the ESURF command to define contacts between parts.  Sure, there were some macros somewhere on the interwebs that would go through and loop for surfaces within a particular offset, but for the sake of this stereotypical “old-tyme” rant, I didn’t use them (I actually didn’t, I was just TOO good at using ESURF to need anyone else’s help).

Image result for old tyme

Hey, it gets me from point A to B

In Mechanical contact is automatically generated based on a set of rules contained in the ‘Connection Group’ object:

image

It might look a little over-whelming, but really the only thing you’ll need to play around with is the ‘Tolerance Type’.  This can either ‘Slider’ or ‘Value’ (or use sheet thickness if you’re working with shells).  What this controls is the face offset value for which Mechanical will automatically build contact.  So in the picture shown above faces that are 5.9939E-3in apart will automatically have contact created.  You can play around with the slider value to change what the tolerance

image image image

As you can see, the smaller the tolerance slider the larger the ‘acceptable’ gap becomes.  If you change the Tolerance Type to be ‘Value’ then you can just directly type in a number.

Typically the default values do a pretty good job automatically defining contact.  However, what happens if you have a large assembly with a lot of thin parts?  Then what you run into is non-sensical contact between parts that don’t actually touch (full disclosure, I actually had to modify the contact settings to have the auto-generated contact do something like this…but I have seen this in other assemblies with very thin/slender parts stacked on top of each other):

image

In the image above, we see that contact has been defined between the bolt head and a plate when there is clearly a washer present.  So we can fix this by going in and specifying a value of 0, meaning that only surfaces that are touching will have contact defined.  But now let’s say that some parts of your assembly aren’t touching (maybe it’s bad CAD, maybe it’s a welded assembly, maybe you suppressed parts that weren’t important).

image

The brute force way to handle this would be to set the auto-detection value to be 0 and then go back and manually define the missing contacts using the options shown in the image above.  Or, what we could do is modify the auto-contact to be broken up into groups and apply appropriate rules as necessary.  The other benefit to this is if you’re working in large assemblies, you can retain your sanity by having contact generated region by region.   In the words of the original FE-guru, Honest Abe, it’s easier to manage things when they’re logically broken up into chunks.

image

Said No One Ever

Sorry…that was bad.  I figured in the new alt-fact world with falsely-attributed quotes to historical leaders, I might as well make something up for the oft-overlooked FE-crowd.

So, how do you go about implementing this?  Easy, first just delete the default connection group (right-mouse-click on it and select delete).  Next, just select a group of bodies and click the ‘Connection Group’ button:

image image image

In the image series above, I selected all the bolts and washers, clicked the connection group, and now I have created a connection group that will only automatically generate contact between the bolts and washers.  I don’t have to worry about contact being generated between the bolt and plate.  Rinse, lather, and repeat the process until you’ve created all the groups you want:

image

ALL the Connection Groups!

Now that you have all these connection groups, you can fine-tune the auto-detection rules to meet the ‘needs’ of those individual body groups.  Just zooming in on one of the groups:

image

By default, when I generate contact for this group I’ll get two contact pairs:

image image

While this may work, let’s say I don’t want a single contact pair for the two dome-like structures, but 2.  That way I can just change the behavior on the outer ‘ring’ to be frictionless and force the top to be bonded:

image

I modified the auto-detection tolerance to be a user-defined distance (note that when you type in a number and move your mouse over into the graphics window you will see a bulls-eye that indicates the search radius you just defined).  Next, I told the auto-detection not to group any auto-detected contacts together.  The result is I now get 3 contact pairs defined:

image image image

Now I can just modify the auto-generated contacts to have the middle-picture shown in the series above to be frictionless.  I could certainly just manually define the contact regions, but if you have an assembly of dozens/hundreds of parts it’s significantly easier to have Mechanical build up all the contact regions and then you just have to modify individual contact pairs to have the type/behavior/etc you want (bonded, frictionless, symmetric, asymmetric, custom pinball radius, etc).  This is also useful if you have bodies that need to be connected via face-to-edge or edge-to-edge contact (then you can set the appropriate priority as to which, if any of those types should be preserved over others).

So the plus side to doing all of this is that after any kind of geometry update you shouldn’t have much, if any, contact ‘repair’ to do.  All the bodies/rules have already been fine tuned to automatically build what you want/need.  You also know where to look to modify contacts (although using the ‘go to’ functionality makes that pretty easy as well).  That way you can define all these connection groups, leave everything as bonded and do a preliminary solve to ensure things look ‘okay’.  Then go back and start introducing some more reality into the simulation by allowing certain regions to move relative to each other.

The downside to doing your contacts this way is you risk missing an interface because you’re now defining the load path.  To deal with that you can just insert a dummy-modal environment into your project, solve, and check that you don’t have any 0-Hz modes.

Exploring High-Frequency Electromagnetic Theory with ANSYS HFSS

I recently had the opportunity to present an interesting experimental research paper at DesignCon 2017, titled Replacing High-Speed Bottlenecks with PCB Superhighways. The motivation behind the research was to develop a new high-speed signaling system using rectangular waveguides, but the most exciting aspect for me personally was salvaging a (perhaps contentious) 70 year old first-principles electromagnetic model. While it took some time to really understand how to apply the mathematics to design, their application led to an exciting convergence of theory, simulation, and measurement.

One of the most critical aspects of the design was exciting the waveguide with a monopole probe antenna. Many different techniques have been developed to match the antenna impedance to the waveguide impedance at the desired frequency, as well as increase the bandwidth. Yet, all of them rely on assumptions and empirical measurement studies. Optimizing a design to nanometer precision empirically would be difficult at best and even if the answer was found it wouldn’t inherently reveal the physics. To solve this problem, we needed a first-principles model, a simulation tool that could quickly iterate designs accurately, and some measurements to validate the simulation methodology.

A rigorous first-principles model was developed by Robert Collin in 1960, but this solution has since been forgotten and replaced by simplified rules. Unfortunately, these simplified rules are unable to deliver an optimal design or offer any useful insight to the critical parameters. In fairness, Collin’s equations are difficult to implement in design and validating them with measurement would be tedious and expensive. Because of this, empirical measurements have been considered a faster and cheaper alternative. However, we wanted the best of both worlds… we wanted the best design, for the lowest cost, and we wanted the results quickly.

For this study, we used ANSYS HFSS to simulate our designs. Before exploring new designs, we first wanted to validate our simulation methodology by correlating results with available measurements. We were able to demonstrate a strong agreement between Collin’s theory, ANSYS HFSS simulation, and VNA measurement.

Red simulated S-parameters strongly correlated with blue measurements.

To perform a series of parametric studies, we swept thousands of antenna design iterations across a wide frequency range of 50 GHz for structures ranging from 50-100 guide wavelengths long. High-performance computing gave us the ability to solve return loss and insertion loss S-parameters within just a few minutes for each design iteration by distributing across 48 cores.

Sample Parametric Design Sweep

Finally, we used the lessons we learned from Collin’s equations and the parametric study to develop a new signaling system with probe antenna performance never before demonstrated. You can read the full DesignCon paper here. The outcome also pertains to RF applications in addition to potentially addressing Signal Integrity concerns for future high-speed communication channels.

Rules-of-thumb are important to fast and practical design, but their application can many times be limited. Competitive innovation demands we explore beyond these limitations but the only way to match the speed and accuracy of design rules is to use simulations capable of offering fast design exploration with the same reliability as measurement. ANSYS HFSS gave us the ability to, not only optimize our design, but also teach us about the physics that explain our design and allow us to accurately predict the behavior of new innovative designs.

Importing Material Properties from Solidworks into ANSYS Mechanical…Finally!

Finally! One of the most common questions we get from our customers who use Solidworks is “Why can’t I transfer my materials from Solidworks? I have to type in the values all over again every time.”  Unfortunately, until now, ANSYS has not been able to access the Solidworks material library to access that information.

There is great news with ANSYS 18.  ANSYS is now able to import the material properties from Solidworks and use them in an analysis within Workbench.  Let’s see how it works.

I have a Solidworks assembly that I downloaded from Grabcad.  The creator had pre-defined all the materials for this model as you can see below.

Once you bring in the geometry into Workbench, just ensure that the Material Properties item is checked under the Geometry cell’s properties.  If you don’t see the panel, just right-click on the geometry cell and click on Properties.

Once you are in ANSYS Mechanical, for example you will see that the parts are already pre-defined with the material specified in Solidworks .

The trick now is to find out where this material is getting stored. If we go to Engineering Data, the only thing we will see is Structural Steel. However when we go to Engineering Data Sources that is where we see a new material library called CADMaterials.  That will show you a list of all the materials and their properties that were imported from a CAD tool such as Solidworks, Creo, NX, etc.

You can of course copy the material and store it for future use in ANSYS like any other material.  This will save you from having to manually define all the materials for a part or assembly from scratch within ANSYS.

Please let us know if you have any questions and we’ll be happy to answer them for you.

ANSYS Video Tips: ANSYS SpaceClaim 18.0 Skin Surface Tool Changes

There were some changes in ANSYS SpaceClaim to the very useful tool that lets you create a surface patch on scan or STL data at 18.0.  In this video we show how to create corner points for a surface patch boundary and how to get an accurate measurement of how far the surface you create deviates from the STL or scan data underneath.

How-To: ANSYS 18 RSM CLIENT SETUP on Windows 2012 R2 HPC

We put this simple how-to together for users to speed up the process on getting your Remote Solve Manager client up and running on Microsoft Windows 2012 R2 HPC.

Download the step-by-step slides here:

padt-ansys-18-RSM-client-setup-win2012r2HPC.pdf

You might also be interested in a short article on the setup and use of monitoring for ANSYS R18 RSM.

Monitoring Jobs Using ANSYS RSM 18.0

If you are an ANSYS RSM (Remote Solve Manager) user, you’ll find some changes in version 18.0. Most of the changes, which are improvements to the installation and configuration process, are under the hood from a user standpoint. One key change for users, though, is how you monitor a running job. This short entry shows how to do it in version 18.0.

Rather than bring up the RSM monitor window from the Start menu as was done in prior version, in 18.0 we launch the RSM job monitor directly from the Workbench window, by clicking on Jobs > Open Job Monitor… as shown here:

When a solution has been submitted to RSM for solution on a remote cluster or workstation, it will show up in the resulting Job Monitor window, like this:

Hopefully this saves some effort in trying to figure out where to monitor jobs you have submitted to RSM. Happy solving!

PADT Named ANSYS North American Channel Partner of the Year and Becomes an ANSYS Certified Elite Channel Partner

The ANSYS Sales Team at PADT was honored last week when we were recognized four times at the recent kickoff meeting for the ANSYS North American Sales orginization.  The most humbling of those trips up to the stage was when PADT was recognized as the North American Channel Partner of the Year for 2016.  It was humbling because there are so many great partners that we have had the privilege of worked with for almost 20 years now.  Our team worked hard, and our customers were fantastic, so we were able to make strides in adding capability at existing accounts, finding new customers that could benefit from ANSYS simulation tools, and expanding our reach further in Southern California.  It helps that simulation driven product development actually works, and ANSYS tools allow it to work well.

Here we are on stage, accepting the award:

PADT Accepts the Channel Partner of the Year Award. (L-R: ANSYS CEO Ajei Gopal, ANSYS VP Worldwide Sales and Customer Excellence Rick Mahoney, ANSYS Director of WW Channel Ravi Kumar, PADT Co-Owner Ward Rand, PADT Co-Owner Eric Miller, PADT Software Sales Manager Bob Calvin, ANSYS VP Sales for the Americas Ubaldo Rodriguez

We were also recognized two other times; for exceeding our sales goals and for making the cut to the annual President’s Club retreat.   As a reminder, PADT sells the full multiphysics product line from PADT in Southern California, Arizona, New Mexico, Colorado, Utah, and Nevada.  This is a huge geographic area with a very diverse set of industries and customers.

In addition, ANSYS, Inc. announced that PADT was one of several Channel Partners who had obtained Elite Certified Channel Partner status. This will allow PADT to provide our customers with better services and gives our team access to more resources within ANSYS, Inc.

Once we made it back from the forests and hills of Western Pennsylvania we were able to get a picture with the full sale team.  Great job guys:

We could not have had such a great 2016 without the support of everyone at PADT. The sales team, the application engineers, the support engineers, business operations, and everyone else that pitches in.   We look forward to making more customers happy in 2017 and coming back with additional hardware.

ANSYS HPC Distributed Parallel Processing Decoded: CUBE Workstation

ANSYS HPC Distributed Parallel Processing Decoded: CUBE Workstation

Meanwhile, in the real world the land of the missing-middle:  To read and learn more about the missing middle please read this article by Dr. Stephen Wheat. Click Here

This blog post is about distributed parallel processing performance in a missing-middle world of science, tech, engineering & numerical simulation. I will be using two of PADT, Inc.’s very own CUBE workstations along with ANSYS 17.2. To illustrate facts and findings on the ANSYS HPC benchmarks. I will also show you how to decode and extract key bits of data out of your own ANSYS benchmark out files. This information will assist you with locating and describing the performance how’s and why’s on your own numerical simulation workstations and HPC clusters. With the use of this information regarding your numerical simulation hardware. You will be able to trust and verify your decisions. Assist you with understanding in addition to explaining the best upgrade path for your own unique situation. In this example, I am providing to you in this post. I am illustrating a “worst case” scenario.

You already know you need to increase your parallel processing solves times of your models. “No I am not ready with my numerical simulation results. No I am waiting on Matt to finish running the solve of his model.” “Matt said that it will take four months to solve this model using this workstation. Is this true?!”

  1. How do I know what to upgrade and/or you often find yourself asking yourself. What do I really need to buy?
    1. One or three ANSYS HPC Packs?
    2. Purchase more compute power? NVidia TESLA K80’s GPU Accelerators? RAM? A Subaru or Volvo?
  2. I have no budget. Are you sure? Often IT departments set a certain amount of money for component upgrades and parts. Information you learn in these findings may help justify a $250-$5000 upgrade for you.
  3. These two machines as configured will not break the very latest HPC performance speed records. This exercise is a live real world example of what you would see in the HPC missing middle market.
  4.  Benchmarks were formed months after a hardware and software workstation refresh was completed using NO BUDGET, zip, zilch, nada, none.

Backstory regarding the two real-world internal CUBE FEA Workstations.

  1. These two CUBE Workstations were configured on a tight budget. Only the components at a minimum were purchased by PADT, Inc.
  2. These two internal CUBE workstations have been in live production, in use daily for one or two years.
    1. Twenty-four hours a day seven days a week.
  3. These two workstations were both in desperate need of some sort of hardware and operating system refresh.
  4. As part of Microsoft upgrade initiative in 2016.  Windows 10 Professional was upgraded for free! FREE!

Again, join me in this post and read about the journey of two CUBE workstations being reborn and able to produce impressive ANSYS benchmarks to appease the sense of wining in pure geek satisfaction.

Uh-oh?! $$$

As I mentioned, one challenge that I set for myself on this mission is that I would not allow myself to purchase any new hardware or software. What? That is correct; my challenge was that I would not allow myself to purchase new components for the refresh.

How would I ever succeed in my challenge? Think and then think again.

Harvesting the components of old workstations recently piling up in the IT Lab over the past year! That was the solution. This idea just may be the idea I needed for succeeding in my NO BUDGET challenge. First, utilize existing compute components from old tired machines that had showed in the IT boneyard. Talk to your IT department, you never know what they find or remember that they had laying around in their own IT boneyard. Next, I would also use any RMA’d parts that I could find that had trickled in over the past year. Indeed, by utilizing these old feeder workstations, I was on my way to succeeding in my no budget challenge. The leftovers? Please do not email me for the discarded not worthy components handouts. There is nothing left, none, those components are long gone a nice benefit from our recent in-house next PADT Tech Recycle event.

*** Public Service Announcement *** Please remember to reuse, recycle and erase old computer parts from the landfills.

CUBE Workstation Specifications

PADT, Inc. – CUBE w12ik Numerical Simulation Workstation

(INTENAL PADT CUBE Workstation “CUBE #10”)
1 x CUBE Mid-Tower Chassis (SQ edition)

2 x 6c @3.4GHz/ea (INTEL XEON e5-2643 V3 CPU)

Dual Socket motherboard

16 x 16GB DDR4-2133 MHz ECC REG DIMM

1 x SMC LSI 3108 Hardware RAID Controller – 12 Gb/s

4 x 600GB SAS2 15k RPM – 6 Gb/s – RAID0

3 x 2TB SAS2 7200 RPM Hard Drives – 6 Gb/s (Mid-Term Storage Array – RAID5)

NVIDIA QUADRO K6000 (NVidia Driver version 375.66)

2 x LED Monitors (1920 x 1080)

Windows 10 Professional 64-bit

ANSYS 17.2

INTEL MPI 5.0.3

PADT, Inc. CUBE w16i-k Numerical Simulation Workstation

(INTENAL PADT CUBE Workstation “CUBE #14″)
1 x CUBE Mid-Tower Chassis

2 x 8c @3.2GHz/ea (INTEL XEON e5-2667 V4 CPU)

Dual Socket motherboard

8 x 32GB DDR4-2400 MHz ECC REG DIMM

1 x SMC LSI 3108 Hardware RAID Controller – 12 Gb/s

4 x 600GB SAS3 15k RPM 2.5” 12 Gb/s – RAID0

2 x 6TB SAS3 7.2k RPM 3.5” 12 Gb/s – RAID1

NVIDIA QUADRO K6000 (NVidia Driver version 375.66)

2 x LED Monitors (1920 x 1080)

Windows 10 Professional 64-bit

ANSYS 17.2

INTEL MPI 5.0.3

The ANSYS sp-5 Ball Grid Array Benchmark

ANSYS Benchmark Test Case Information

  • BGA (V17sp-5)
    • Analysis Type Static Nonlinear Structural
    • Number of Degrees of Freedom 6,000,000
    • Equation Solver Sparse
    • Matrix Symmetric
  • ANSYS 17.2
  • ANSYS HPC Licensing Packs required for this benchmark –> (2) HPC Packs
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to listen to your specific needs.

Comparing the data from the 12 core CUBE vs. a 16 core CUBE with and without GPU Acceleration enabled.

ANSYS 17.2 Benchmark  SP-5 Ball Grid Array
CUBE w12i-k 2643 v3 CUBE w12i-k 2643 v3 w/GPU Acceleration Total Speedup w/GPU CUBE w16i-k 2667 V4 CUBE w16i-k 2667 V4 w/GPU Acceleration Total Speedup w/GPU
Cores CUBE  w12i w/NVIDIA QUADRO K6000 CUBE  w12i w/NVIDIA QUADRO K6000 CUBE  w16i w/NVIDIA QUADRO K6000 CUBE  w16i w/NVIDIA QUADRO K6000
2 878.9 395.9 2.22 X 888.4 411.2 2.16 X
4 485.0 253.3 1.91 X 499.4 247.8 2.02 X
6 386.3 228.2 1.69 X 386.7 221.5 1.75 X
8 340.4 199.0 1.71 X 334.0 196.6 1.70 X
10 269.1 184.6 1.46 X 266.0 180.1 1.48 X
11 235.7 212.0 1.11 X
12 230.9 171.3 1.35 X 226.1 166.8 1.36 X
14 213.2 173.0 1.23 X
15 200.6 152.8 1.31 X
16 189.3 166.6 1.14 X
GPU NOT ENABLED ENABLED NOT ENABLED ENABLED
11/15/2016 & 1/5/2017
CUBE w12i-k v17sp-5 Benchmark Graph 2017
CUBE w12i-k v17sp-5 Benchmark Graph 2017
CUBE w16i-k v17sp-5 Benchmark Graph 2017
CUBE w16i-k v17sp-5 Benchmark Graph 2017

Initial impressions

  1. I was very pleased with the results of this experiment. Using the Am I bound bound or I/O bound overall parallel performance indicators the data showed healthy workstations that were both I/O bound. I assumed the I/O bound issue would happen. During several of the benchmarks, the data reveals almost complete system bandwidth saturation. Upwards of ~82 GB/s of bandwidth created during the in-core distributed solve!
  2. I was pleasantly surprised to see a 1.7X or greater solve speedup using one ANSYS HPC licensing pack and GPU Acceleration!

The when and where of numerical simulation performance bottleneck’s for numerical simulation. Similar to how the clock is ticking on the wall, over the years I have focused on the question of, “is your numerical simulation compute hardware compute bound or I/O bound”. This quick and fast benchmark result will show general parallel performance of the workstation and help you find the performance sweet spot for your own numerical simulation hardware.

As a reminder, to determine the answer to that question you need to record the results of your CPU Time For Main Thread, Time Spent Computing Solution and Total Elapsed Time. If the results time for my CPU Main is about the same as my Total Elapsed Time result. The compute hardware is in a Compute Bound situation. If the Total Elapsed Time result is larger than the CPU Time For Main Thread than the compute hardware is I/O bound. I did the same analysis with these two CUBE workstations. I am pickier than most when it comes to tuning my compute hardware. So often I will use a percentage around 95 percent. The percentage column below determines if the workstation is Compute Bound or O/O bound. Generally, what I have found in the industry, is that a percentage of greater than 90% indicates the workstation is wither Compute Bound, I/O bound or in worst-case scenario is both.

**** Result sets data garnered from the ANSYS results.out files on these two CUBE workstations using ANSYS Mechanical distributed parallel solves.

Data mine that ANSYS results.out file!

The data is all there, at your fingertips waiting for you to trust and verify.

Compute Bound or I/O bound

Results 1 – Compute Cores Only

w12i-k

“CUBE #10”

Cores CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 914.2 878.9 917.0 99.69 YES NO
4 4 517.2 485.0 523.0 98.89 YES NO
6 6 418.8 386.3 422.0 99.24 YES NO
8 8 374.7 340.4 379.0 98.87 YES NO
10 10 302.5 269.1 307.0 98.53 YES NO
11 11 266.6 235.7 273.0 97.66 YES NO
12 12 259.9 230.9 268.0 96.98 YES NO
w16i-k

“CUBE #14”

Cores CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 925.8 888.4 927.0 99.87 YES NO
4 4 532.1 499.4 535.0 99.46 YES NO
6 6 420.3 386.7 425.0 98.89 YES NO
8 8 366.4 334.0 370.0 99.03 YES NO
10 10 299.7 266.0 303.0 98.91 YES NO
12 12 258.9 226.1 265.0 97.70 YES NO
14 14 244.3 213.2 253.0 96.56 YES NO
15 15 230.3 200.6 239.0 96.36 YES NO
16 16 219.6 189.3 231.0 95.06 YES NO

Results 2 – GPU Acceleration + Cores

w12i-k

“CUBE #10”

Cores  + GPU CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 416.3 395.9 435.0 95.70 YES YES
4 4 271.8 253.3 291.0 93.40 YES YES
6 6 251.2 228.2 267.0 94.08 YES YES
8 8 219.9 199.0 239.0 92.01 YES YES
10 10 203.2 184.6 225.0 90.31 YES YES
11 11 227.6 212.0 252.0 90.32 YES YES
12 12 186.0 171.3 213.0 87.32 NO YES
CUBE 14 Cores + GPU CPU Time For Main Thread Time Spent Computing Solution Total Elapsed Time % Compute Bound IO Bound
2 2 427.2 411.2 453.0 94.30 YES YES
4 4 267.9 247.8 286.0 93.67 YES YES
6 6 245.4 221.5 259.0 94.75 YES YES
8 8 219.6 196.6 237.0 92.66 YES YES
10 10 201.8 180.1 222.0 90.90 YES YES
12 12 191.2 166.8 207.0 92.37 YES YES
14 14 195.2 173.0 217.0 89.95 NO YES
15 15 172.6 152.8 196.0 88.06 NO YES
16 16 177.1 166.6 213.0 83.15 NO YES

Identifying Memory, I/O, Parallel Solver Balance and Performance

Results 3 – Compute Cores Only

w12i-k

“CUBE #10”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve No GPU Maximum RAM used in GB
0.9376 0.8399 662.822706 5.609852 19123.88932 19.1 78
0.8188 0.8138 355.367914 3.082555 35301.9759 35.3 85
0.6087 0.6913 283.870728 2.729568 39165.1946 39.2 84
0.3289 0.4771 254.336758 2.486551 43209.70175 43.2 91
0.5256 0.644 191.218882 1.781095 60818.51624 60.8 94
0.5078 0.6805 162.258872 1.751974 61369.6918 61.4 95
0.3966 0.5287 157.315184 1.633994 65684.23821 65.7 96
w16i-k

“CUBE #14”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve No GPU Maximum RAM used in GB
0.9376 0.8399 673.225225 6.241678 17188.03613 17.2 78
0.8188 0.8138 368.869242 3.569551 30485.70397 30.5 85
0.6087 0.6913 286.269409 2.828212 37799.17161 37.8 84
0.3289 0.4771 251.115087 2.701804 39767.17792 39.8 91
0.5256 0.644 191.964388 1.848399 58604.0123 58.6 94
0.3966 0.5287 155.623476 1.70239 63045.28808 63.0 96
0.5772 0.6414 147.392121 1.635223 66328.7728 66.3 101
0.6438 0.5701 139.355605 1.484888 71722.92484 71.7 101
0.5098 0.6655 130.042438 1.357847 78511.36377 78.5 103

Results 4 – GPU Acceleration + Cores

w12i-k

“CUBE #10”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve % GPU Accelerated The Solve Maximum RAM used in GB
0.9381 0.8405 178.686155 5.516205 19448.54863 19.4 95.78 78
0.8165 0.8108 124.087864 3.031092 35901.34876 35.9 95.91 85
0.6116 0.6893 122.433584 2.536878 42140.01391 42.1 95.74 84
0.3365 0.475 112.33829 2.351058 45699.89654 45.7 95.81 91
0.5397 0.6359 103.586986 1.801659 60124.33358 60.1 95.95 94
0.5123 0.6672 137.319938 1.635229 65751.09125 65.8 85.17 95
0.4132 0.5345 97.252285 1.562337 68696.85627 68.7 95.75 97
w16i-k

“CUBE #14”

Ratio of nonzeroes in factor (min/max) Ratio of flops for factor (min/max) Time (cpu & wall) for numeric factor Time (cpu & wall) for numeric solve Effective I/O rate (MB/sec) for solve Effective I/O rate (GB/sec) for solve % GPU Accelerated The Solve Maximum RAM used in GB
0.9381 0.8405 200.007118 6.054831 17718.44411 17.7 94.96 78
0.8165 0.8108 122.200896 3.357233 32413.68282 32.4 95.20 85
0.6116 0.6893 122.742966 2.624494 40733.2138 40.7 94.91 84
0.3365 0.475 114.618006 2.544626 42223.539 42.2 94.97 91
0.5397 0.6359 105.4884 1.821352 59474.26914 59.5 95.18 94
0.4132 0.5345 96.750618 1.988799 53966.06502 54.0 94.96 97
0.5825 0.6382 106.573973 1.989103 54528.26599 54.5 88.96 101
0.6604 0.566 91.345275 1.374242 77497.60151 77.5 92.21 101
0.5248 0.6534 107.672641 1.301668 81899.85539 81.9 85.07 103

The ANSYS results.out file – The decoding continues

CUBE w12i-k (“CUBE #10”)

  1. Elapsed Time Spent Computing The Solution
    1. This value determines how efficient or balanced the hardware solution for running in distributed parallel solving.
      1. Fastest Solve Time For CUBE 10
    2. 12 out of 12 Cores w/GPU @ 171.3 seconds Time Spent Computing The Solution
  2. Elapsed Time
    1. This value is the actual time to complete the entire solution process. The clock on the wall time.
    2. Fastest Time For CUBE10
      1. 12 out of 12 w/GPU @ 213.0 seconds
  3. CPU Time For Main Thread
    1. This value indicates the RAW number crunching time of the CPU.
    2. Fastest Time For CUBE10
      1. 12 out of 12 w/GPU @186.0 seconds
  4. GPU Acceleration
    1. The NVidia Quadro K6000 accelerated ~96% of the matrix factorization flops
    2. Actual percentage of GPU accelerated flops = 95.7456
  5. Cores and storage solver performance 12 out of 12 cores and using 1 NVidia Quadro K6000
    1. ratio of nonzeroes in factor (min/max) = 0.4132
    2. ratio of flops for factor (min/max) = 0.5345
      1. These two values above indicate to me that the system is well taxed for compute power/hardware viewpoint.
    3. Effective I/O rate (MB/sec) for solve = 68696.856274 (or 69 GB/sec)
      1. No issues here indicates that the workstation has ample bandwidth available for the solving.

CUBE w16i-k (“CUBE #14”)

  1. Elapsed Time Spent Computing The Solution
    1. This value determines how efficient or balanced the hardware solution for running in distributed parallel solving.
    2. Fastest Time For CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 152.8 seconds
  2. Elapsed Time
    1. This value is the actual time to complete the entire solution process. The clock on the wall time.
    2. CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 196.0 seconds
  3. CPU Time For Main Thread
    1. This value indicates the RAW number crunching time of the CPU.
    2. CUBE w16i-k “CUBE #14”
      1. 15 out of 16 Cores w/GPU @ 172.6 seconds
  4. GPU Acceleration Percentage
    1. The NVIDIA QUADRO K6000 accelerated ~92% of the matrix factorization flops
    2. Actual percentage of GPU accelerated flops = 92.2065
  5. Cores and storage 12 out of 12 cores and one Nvidia Quadro K6000
    1. ratio of nonzeroes in factor (min/max) = 0.6604
    2. ratio of flops for factor (min/max) = 0.566
      1. These two values above indicate to me that the system is well taxed for compute power/hardware.
    3. Please note that when reviewing these two data points. A balanced solver performance is when both of these values are as close to 1.0000 as possible.
      1. At this point the compute hardware is no longer as efficient and these values will continue to move farther away from 1.0000.
    4. Effective I/O rate (MB/sec) for solve = 77497.6 MB/sec (or ~78 GB/sec)
      1. No issues here indicates that the workstation has ample bandwidth with fast I/O performance for in-core SPARSE Solver solving.
    1. Maximum amount of RAM used by the ANSYS distributed solve
      1. 103GB’s of RAM needed for in-core solve

Conclusions Summary And Upgrade Path Suggestions

It is important for you to locate your bottleneck on your numerical simulation hardware. By utilizing data provided in the ANSYS results.out files, you will be able to logically determine your worst parallel performance inhibitor and plan accordingly on how to resolve what is slowing the parallel performance of your distributed numerical simulation solve.

I/O Bound and/or Compute Bound Summary

  • I/O Bound
    • Both CUBE w12i-k “CUBE #10” and w16i-k “CUBE #14” are I/O Bound.
      • Almost immediately when GPU Acceleration is enabled.
      • When GPU Acceleration is not enabled, I/O bound is no longer an issue compute solving performance. However solve times are impacted due to available and unused compute power.
  • Compute Bound
    • Both CUBE w12i-k “CUBE #10” and w16i-k “CUBE #14” would benefit from additional Compute Power.
    • CUBE w12i-k “CUBE #10” would get the most bang for the buck by adding in the additional compute power.

Upgrade Path Recommendations

CUBE w12i-k “CUBE #10”

  1. I/O:
    1. Hard Drives
    2. Remove & replace the previous generation hard drives
      1. 3.5″ SAS2.0 6Gb/s 15k RPM Hard Drives
    3. Hard Drives could be upgraded to Enterprise Class SSD or PCIe NVMe
      1. COST =  HIGH
    1. Hard Drives could be upgraded to SAS 3.0 12 Gb/s Drives
      1. COST =  MEDIUM
  2.  RAM:
    1. Remove and replace the previous generation RAM
    2. Currently all available RAM slots of RAM are populated.
      1. Optimum slots per these two CPU’s are four slots of RAM per CPU. Currently eight slots of RAM per CPU are installed.
    3. RAM speeds 2133MHz ECC REG DIMM’
      1. Upgrade RAM to DDR4-2400MHz LRDIMM RAM
      2. COST =  HIGH
  3. GPU Acceleration
    1. Install a dedicated GPU Accelerator card such as an NVidia Tesla K40 or K80
    2. COST =  HIGH
  4.  CPU:
    1. Remove and replace the current previous generation CPU’s:
    2. Currently installed dual  x INTEL XEON e5-2643 V3
    3. Upgrade the CPU’s to the V4 (Broadwell) CPU’s
      1. COST =  HIGH

CUBE w16i-k “CUBE #14”

  1. I/O: Hard Drives SAS3.0 15k RPM Hard Drives 12Gbps 2.5”
    1.  Replace the current 2.5” SAS3 12Gb/s 15k RPM Drives with Enterprise Class SSD’s or PCIe NVMe disk
      1. COST =  HIGH
    2. Replace the 2.5″ SAS3 12 Gb/s hard drives with 3.5″ hard drives.
      1. COST =  HIGH
    3. INTEL 1.6TB P3700 HHHL AIC NVMe
      1. Click Here: https://www-ssl.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-p3700-series.html
  2. Currently a total of four Hard Drives are installed
    1. Increase existing hard drive count from four hard drives to a total ofsix or eight.
    2. Change RAID configuration to RAID 50
      1. COST =  HIGH
  3. RAM:
    1. Using DDR4-2400Mhz ECC REG DIMM’s
      1. Upgrade RAM to DDR4-2400MHz LRDIMM RAM
      2. COST =  HIGH

Considering RAM: When determining how much System RAM you need to perform a six million degree of freedom ANSYS numerical simulation. Add the additional amounts to your Maximum Amount of RAM used number indicated in your ANSYS results.out file.

  • ANSYS reserves  ~5% of your RAM
  • Office products can use an additional l ~10-15% to the above number
  • Operating System please add an additional ~5-10% for the Operating System
  • Other programs? For example, open up your windows task manager and look at how much RAM your anti-virus program is consuming. Add for the amount of RAM consumed by these other RAM vampires.

Terms & Definition Goodies:

  • Compute Bound
    • A condition that occurs when your CPU processing power sites idle while the CPU waits for the next set of instructions to calculate. This occurs most often when hardware bandwidth is unable to feed the CPU more data to calculate.
  • CPU Time For Main Thread
    • CPU time (or process time) is the amount of time for which a central processing unit (CPU) was used for processing instructions of a computer program or operating system, as opposed to, for example, waiting for input/output (I/O) operations or entering low-power (idle) mode.
  • Effective I/O rate (MB/sec) for solve
    • The amount of bandwidth used during the parallel distributed solve moving data from storage to CPU input and output totals.
    • For example the in-core 16 core + GPU solve using the CUBE w16i-k reached an effective I//O rate of 82 GB/s.
    • Theoretical system level bandwidth possible is ~96 GB/s
  • IO Bound
    • The ability for the input-output of the system hardware for reading, writing and flow of data pulsing through the system has become inefficient and/or detrimental to running an efficient parallel analysis.
  • Maximum total memory used
    • The maximum amount of memory used by analysis during your analysis.
  • Percentage (%) GPU Accelerated The Solve
    • The percentage of acceleration added to your distributed solve provided by the Graphics Processing Unit (GPU). The overall impact of the GPU will be diminished due to slow and saturated system bandwidth of your compute hardware.
  • Ratio of nonzeroes in factor (min/max)
    • A performance indicator of efficient and balanced the solver is performing on your compute hardware. In this example the solver performance is most efficient when this value is as close to the value of 1.0.
  • Ratio of flops for factor (min/max)
    • A performance indicator of efficient and balanced the solver is performing on your compute hardware. In this example the solver performance is most efficient when this value is as close to the value of 1.0.
  • Time (cpu & wall) for numeric factor
    • A performance indicator used to determine how the compute hardware bandwidth is affecting your solve times. When time (cpu & wall) for numeric factor & time (cpu & wall) for numeric solve values are somewhat equal it means that your compute hardware I/O bandwidth is having a negative impact on the distributed solver functions.
  • Time (cpu & wall) for numeric solve
    • A performance indicator used to determine how the compute hardware bandwidth is affecting your solve times. When time (cpu & wall) for numeric solve & time (cpu & wall) for numeric factor values are somewhat equal it means that your compute hardware I/O bandwidth is having a negative impact on the distributed solver functions.
  • Total Speedup w/GPU
    • Total performance gain for compute systems task using a Graphics Processing Unit (GPU).
  • Time Spent Computing Solution
    • The actual clock on the wall time that it took to compute the analysis.
  • Total Elapsed Time
    • The actual clock on the wall time that it took to complete the analysis.

References:

Modeling 3D Printed Cellular Structures: Approaches

How can the mechanical behavior of cellular structures (honeycombs, foams and lattices) be modeled?

This is the second in a two-part post on the modeling aspects of 3D printed cellular structures. If you haven’t already, please read the first part here, where I detail the challenges associated with modeling 3D printed cellular structures.

The literature on the 3D printing of cellular structures is vast, and growing. While the majority of the focus in this field is on the design and process aspects, there is a significant body of work on characterizing behavior for the purposes of developing analytical material models. I have found that these approaches fall into 3 different categories depending on the level of discretization at which the property is modeled: at the level of each material point, or at the level of the connecting member or finally, at the level of the cell. At the end of this article I have compiled some of the best references I could find for each of the 3 broad approaches.

1. Continuum Modeling

The most straightforward approach is to use bulk material properties to represent what is happening to the material at the cellular level [1-4]. This approach does away with the need for any cellular level characterization and in so doing, we do not have to worry about size or contact effects described in the previous post that are artifacts of having to characterize behavior at the cellular level. However, the assumption that the connecting struts/walls in a cellular structure behave the same way the bulk material does can particularly be erroneous for AM processes that can introduce significant size specific behavior and large anisotropy. It is important to keep in mind that factors that may not be significant at a bulk level (such as surface roughness, local microstructure or dimensional tolerances) can be very significant when the connecting member is under 1 mm thick, as is often the case.

The level of error introduced by a continuum assumption is likely to vary by process: processes like Fused Deposition Modeling (FDM) are already strongly anisotropic with highly geometry-specific meso-structures and an assumption like this will generate large errors as shown in Figure 1. On the other hand, it is possible that better results may be had for powder based fusion processes used for metal alloys, especially when the connecting members are large enough and the key property being solved for is mechanical stiffness (as opposed to fracture toughness or fatigue life).

Fig 1. Load-displacement curves for ULTEM-9085 Honeycomb structures made with different FDM toolpath strategies

2. Cell Level Homogenization

The most common approach in the literature is the use of homogenization – representing the effective property of the cellular structure without regard to the cellular geometry itself. This approach has significantly lower computational expense associated with its implementation. Additionally, it is relatively straightforward to develop a model by fitting a power law to experimental data [5-8] as shown in the equation below, relating the effective modulus E* to the bulk material property Es and their respective densities (ρ and ρs), by solving for the constants C and n.

homogenizationeqn

While a homogenization approach is useful in generating comparative, qualitative data, it has some difficulties in being used as a reliable material model in analysis & simulation. This is first and foremost since the majority of the experiments do not consider size and contact effects. Secondly, even if these were considered, the homogenization of the cells only works for the specific cell in question (e.g. octet truss or hexagonal honeycomb) – so every new cell type needs to be re-characterized. Finally, the homogenization of these cells can lose insight into how structures behave in the transition region between different volume fractions, even if each cell type is calibrated at a range of volume fractions – this is likely to be exacerbated for failure modeling.

3. Member Modeling

The third approach involves describing behavior not at each material point or at the level of the cell, but at a level in-between: the connecting member (also referred to as strut or beam). This approach has been used by researchers [9-11] including us at PADT [12] by invoking beam theory to first describe what is happening at the level of the member and then use that information to build up to the level of the cells.

membermodeling
Fig 2. Member modeling approach: represent cellular structure as a collection of members, use beam theory for example, to describe the member’s behavior through analytical equations. Note: the homogenization equations essentially derive from this approach.

This approach, while promising, is beset with some challenges as well: it requires experimental characterization at the cellular level, which brings in the previously mentioned challenges. Additionally, from a computational standpoint, the validation of these models typically requires a modeling of the full cellular geometry, which can be prohibitively expensive. Finally, the theory involved in representing member level detail is more complex, makes assumptions of its own (e.g. modeling the “fixed” ends) and it is not proven adequately at this point if this is justified by a significant improvement in the model’s predictability compared to the above two approaches. This approach does have one significant promise: if we are able to accurately describe behavior at the level of a member, it is a first step towards a truly shape and size independent model that can bridge with ease between say, an octet truss and an auxetic structure, or different sizes of cells, as well as the transitions between them – thus enabling true freedom to the designer and analyst. It is for this reason that we are focusing on this approach.

Conclusion

Continuum models are easy to implement and for relatively isotropic processes and materials such as metal fusion, may be a good approximation of stiffness and deformation behavior. We know through our own experience that these models perform very poorly when the process is anisotropic (such as FDM), even when the bulk constitutive model incorporates the anisotropy.

Homogenization at the level of the cell is an intuitive improvement and the experimental insights gained are invaluable – comparison between cell type performances, or dependencies on member thickness & cell size etc. are worthy data points. However, caution needs to be exercised when developing models from them for use in analysis (simulation), though the relative ease of their computational implementation is a very powerful argument for pursuing this line of work.

Finally, the member level approach, while beset with challenges of its own, is a promising direction forward since it attempts to address behavior at a level that incorporates process and geometric detail. The approach we have taken at PADT is in line with this approach, but specifically seeks to bridge the continuum and cell level models by using cellular structure response to extract a point-wise material property. Our preliminary work has shown promise for cells of similar sizes and ongoing work, funded by America Makes, is looking to expand this into a larger, non-empirical model that can span cell types. If this is an area of interest to you, please connect with me on LinkedIn for updates. If you have questions or comments, please email us at info@padtinc.com or drop me a message on LinkedIn.

References (by Approach)

Bulk Property Models

[1] C. Neff, N. Hopkinson, N.B. Crane, “Selective Laser Sintering of Diamond Lattice Structures: Experimental Results and FEA Model Comparison,” 2015 Solid Freeform Fabrication Symposium

[2] M. Jamshidinia, L. Wang, W. Tong, and R. Kovacevic. “The bio-compatible dental implant designed by using non-stochastic porosity produced by Electron Beam Melting®(EBM),” Journal of Materials Processing Technology214, no. 8 (2014): 1728-1739

[3] S. Park, D.W. Rosen, C.E. Duty, “Comparing Mechanical and Geometrical Properties of Lattice Structure Fabricated using Electron Beam Melting“, 2014 Solid Freeform Fabrication Symposium

[4] D.M. Correa, T. Klatt, S. Cortes, M. Haberman, D. Kovar, C. Seepersad, “Negative stiffness honeycombs for recoverable shock isolation,” Rapid Prototyping Journal, 2015, 21(2), pp.193-200.

Cell Homogenization Models

[5] C. Yan, L. Hao, A. Hussein, P. Young, and D. Raymont. “Advanced lightweight 316L stainless steel cellular lattice structures fabricated via selective laser melting,” Materials & Design 55 (2014): 533-541.

[6] S. Didam, B. Eidel, A. Ohrndorf, H.‐J. Christ. “Mechanical Analysis of Metallic SLM‐Lattices on Small Scales: Finite Element Simulations versus Experiments,” PAMM 15.1 (2015): 189-190.

[7] P. Zhang, J. Toman, Y. Yu, E. Biyikli, M. Kirca, M. Chmielus, and A.C. To. “Efficient design-optimization of variable-density hexagonal cellular structure by additive manufacturing: theory and validation,” Journal of Manufacturing Science and Engineering 137, no. 2 (2015): 021004.

[8] M. Mazur, M. Leary, S. Sun, M. Vcelka, D. Shidid, M. Brandt. “Deformation and failure behaviour of Ti-6Al-4V lattice structures manufactured by selective laser melting (SLM),” The International Journal of Advanced Manufacturing Technology 84.5 (2016): 1391-1411.

Beam Theory Models

[9] R. Gümrük, R.A.W. Mines, “Compressive behaviour of stainless steel micro-lattice structures,” International Journal of Mechanical Sciences 68 (2013): 125-139

[10] S. Ahmadi, G. Campoli, S. Amin Yavari, B. Sajadi, R. Wauthle, J. Schrooten, H. Weinans, A. Zadpoor, A. (2014), “Mechanical behavior of regular open-cell porous biomaterials made of diamond lattice unit cells,” Journal of the Mechanical Behavior of Biomedical Materials, 34, 106-115.

[11] S. Zhang, S. Dilip, L. Yang, H. Miyanji, B. Stucker, “Property Evaluation of Metal Cellular Strut Structures via Powder Bed Fusion AM,” 2015 Solid Freeform Fabrication Symposium

[12] D. Bhate, J. Van Soest, J. Reeher, D. Patel, D. Gibson, J. Gerbasi, and M. Finfrock, “A Validated Methodology for Predicting the Mechanical Behavior of ULTEM-9085 Honeycomb Structures Manufactured by Fused Deposition Modeling,” Proceedings of the 26th Annual International Solid Freeform Fabrication, 2016, pp. 2095-2106

ANSYS 17.2 FLUENT External Flow Over a Truck Body Polyhedral Mesh

Part 3: The ANSYS FLUENT Performance Comparison Series – CUBE Numerical Simulation Appliances by PADT, Inc.

November 22, 2016

External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m)

  • External flow over a truck body using a polyhedral mesh
  • This test case has around 14 million polyhedral cells
  • Uses the Detached Eddy Simulation (DES) model with the segregated implicit solver

ANSYS Benchmark Test Case Information

  • ANSYS HPC Licensing Packs required for this benchmark
    • I used three (3) HPC Packs to unlock all of the cores used during the ANSYS Fluent Test Cases of the CUBE appliances shown on the Figure 1 chart.
    • I did use four (4) HPC Packs for the two 256 core benchmarks shown on the data but only wanted the data for testing.
  • The best average seconds per iteration goes to the 2015 CUBE Intel® Xeon® e5-2667 V3 with a 0.625 time using 128 compute cores.
    • The 2015 CUBE Intel® Xeon® e5-2667 V3 outperformed the 256 core AMD Opteron™ series ANSYS Fluent 17.2 benchmarks.
    • Please note that different numbers of CUBE Compute Nodes were used in this test. However straight across CPU times are also shown for single nodes at 64 cores.
  • To illustrate this ANSYS Fluent test case as it relates to the real world. A completely new ANSYS HPC customer is likely to have up two (2) of the entry level INTEL CUBE Compute Nodes versus eight (8) CUBE compute nodes configuration.
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to listen to your specific needs.

Figure 1 – ANSYS 17.2 FLUENT Test Case Graph

truck_poly_14m
ANSYS FLUENT 17.2 External Flow Over a Truck Body – Graph
ANSYS FLUENT External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m) Test Case
Number of cells 14,000,000
Cell type polyhedral
Models DES turbulence
Solver segregated implicit

The CPU Information

The AMD Opteron™ 6000 Series Platform:

Yes, I am still impressed with the performance day after day, 24×7 of these AMD Opeteron CPU’s!  After years of operation the AMD Opteron™ series of processors are still relevant and powerful numerical simulation processors. heavy sigh…For example, after reviewing the ANSYS Fluent Test Case data you can see for yourselves below. The 2012 AMD Opteron™ and 2013 AMD Opteron™ CPU’s can still hang in there with the INTEL XEON CPU’s. However one INTEL CPU node vs. four AMD CPU nodes?

I thought a more realistic test case scenario would be to drop the number of AMD Compute Nodes down to four. Indeed, I could have thrown more of the CUBE Compute Nodes with the AMD Opteron™ series CPU’s inside of them. That is why you can see one 256 core benchmark score where I put all 64 cores on each node to the test. As one would hopefully see in their hardware performance unleashing ANSYS Fluent with 256 core did drop the iteration solve time for the test case with the CUBE Compute Appliances.

Realistically a brand new ANSYS HPC customer is not likely to have:

a) Vast qualities of cores (AMD or INTEL) & compute nodes for optimal distributive numerical solving

b) ANSYS HPC licensing for 512 cores

c) The available circuit breakers to provide power

The Intel® Xeon® CPU’s used for this ANSYS Fluent Test Case

  1. Intel® Xeon® Processor E5-2690 v4  (35M Cache, 2.60 GHz)
  2. Intel® Xeon® Processor E5-2667 v4  (25M Cache, 3.20 GHz)
  3. Intel® Xeon® Processor E5-2667 v3  (20M Cache, 3.20 GHz)
  4. Intel® Xeon® Processor E5-2667 v2  (25M Cache, 3.30 GHz)

The Estimated Wattage?

No the lights did not dim…but here is a quick comparison with energy use by estimated maximum Watt’s used metric shows up in volumes (decibels) and dollars ($$$) saved or spent.

Less & More!

Overall CUBE Compute Node drops in average watts estimated consumption, indeed has moved forward in progress over the past four years!

  • 2012 CUBE AMD Numerical Simulation Appliance with the Opteron™ 6278 – Four (4) Compute Nodes
    • Estimated CUBE Configuration @ Full Power: ~8000 Watts
  • 2013 CUBE AMD Numerical Simulation Appliance with the Opteron™ 6380
    • Estimated CUBE Configuration @ Full Power: ~7000 Watts
  • 2015 CUBE Numerical Simulation Appliance with the  Intel® Xeon® e5-2667 V3 – Eight (8) Compute Nodes
    • Estimated CUBE Configuration @ Full Power: ~4000 Watts
  • 2016 CUBE Numerical Simulation Appliance with the Intel® Xeon® e5-2667 V4 – One (1) Compute Node.
    • Estimated CUBE Configuration @ Full Power:  ~900 Watts
  • 2016 CUBE Numerical Simulation Appliance with the Intel® Xeon® e5-2690 V4 – Two (2) Compute Nodes
    • Estimated CUBE Configuration @ Full Power:  ~1200 Watts

Figure 2 – Estimated CUBE compute node power consumption as configured for this ANSYS FLUENT Test Case.

Power consumption means money
CUBE HPC Compute Node Power Consumption as configured

The CUBE phenomenon

2012 AMD Opteron™ 6278 2015 CUBE Intel® Xeon® e5-2667 V3
4 x Compute Node CUBE HPC Appliance 8 x Compute Node CUBE HPC Appliance
4 x 16c @2.4GHz/ea 2 x 8c @3.2GHz/ea  – Intel® Xeon® e5-2667 V3
Quad Socket motherboard Dual Socket motherboard
DDR3-1866 MHz ECC REG DDR4-2133 MHz ECC REG
5 x 600GB SAS2 15k RPM 4 x 600GB SAS3 15k RPM
40Gbps Infiniband QDR High Speed Interconnect 2016 CUBE Intel® Xeon® e5-2667 V4
2013 CUBE AMD Opteron™ 6380 1 x CUBE HPC Workstation
4 x Compute Node CUBE HPC Appliance 2 x 8c @3.2GHz/ea  – Intel® Xeon® e5-2667 V4
4 x 16c @2.5GHz/ea Dual Socket motherboard
Quad Socket  motherboard DDR4-2400 MHz LRDIMM
DDR3-1866 MHz ECC REG 6 x 600GB SAS3 15k RPM
3 x 600GB SAS2 15k RPM 2016 CUBE Intel® Xeon® e5-2690 V4
40Gbps Infiniband QDRT High Speed Interconnect 1 x 1U CUBE APPLIANCE – 2 Compute Nodes
2014 CUBE Intel® Xeon® e5-2667 V2 2 x 14c @2.6GHz/ea – Intel® Xeon® e5-2690 V4
1 x CUBE HPC Workstation Dual Socket motherboard
2 x 8c @3.3GHz/ea –  Intel® Xeon® e5-2667 V2 DR4-2400 MHz LRDIMM
Dual Socket motherboard 4 x 600GB SAS3 15k RPM – RAID 10
DDR3-1866 MHz ECC REG 56Gbps Infiniband FDR CPU High Speed Interconnect
3 x 600GB SAS2 15k RPM 10Gbps Ethernet Low Latency

Operating Systems Used

  1. Linux 64-bit
  2. Windows 7 Professional 64-Bit
  3. Windows 10 Professional 64-Bit
  4. Windows Server 2012 R2 Standard Edition w/HPC

It Is All About The Data

Test Metric – Average Seconds Per Iteration

  • Fastest Time: 0.625 seconds per iteration – 2015 CUBE Intel® Xeon® e5-2667 V3
  • ANSYS FLUENT 17.2
Cores 2014 CUBE Intel® Xeon® e5-2667 V2

(1 x Node)

2015 CUBE Intel® Xeon® e5-2667 V3

(8 x Nodes)

2016 CUBE Intel® Xeon® e5-2667 V4

(1 x Node)

2016 CUBE Intel® Xeon® e5-2690 V4

(2 x Nodes)

2012 AMD Opteron™ 6278

(4 x Nodes)

2013 CUBE AMD Opteron™ 6380

(4 x Nodes)

1 100.6 65.8 32.154 40.44 120.035 90.567
2 40.337 32.024 17.149 35.355 63.813 46.385
4 20.171 16.975 11.915 19.735 32.544 23.956
6 13.904 12.363 9.311 13.76 21.805 17.147
8 10.605 9.4 7.696 11.121 16.783 13.158
12 7.569 6.913 6.764 8.424 11.59 10.2
16 6.187 4.286 6.388 7.363 8.96 7.94
32 2.539 4.082 6.033 4.75
48 2.778 4.126 3.835
52 2.609 3.161 4.784
55 2.531 3.003 4.462
56 2.681 3.025 4.368
*64 3.871 5.004
64 2.688 2.746
96 2.433 2.202
128 0.625 2.112 2.367
256 1.461 3.531

* One (1) CUBE Compute Node with  4 x AMD Opteron™ Series CPU’s for a total of 64 cores was used to derive these two ANSYS Fluent Benchmark data points (Baseline).

PADT offers a line of high performance computing (HPC) systems specifically designed for CFD and FEA number crunching aimed at a balance between cost and performance. We call this concept High Value Performance Computing, or HVPC. These systems have allowed PADT and our customers to carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software.

http://www.cube-hvpc.com/

Related Blog Posts

ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

Part 2: ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

ANSYS 17.2 CFX Benchmark External Flow Over a LeMans Car

Wow? yet another ANSYS Bench marking blog post? I know, but I have had four blog posts in limbo for months. There is no better time than now and since it is Friday. Time to knock out another one of these fine looking ANSYS 17.2 bench marking results of my list!

The ANSYS 17.2 CFX External Flow Over a LeMans Car Test Case

…dun dun dah!

On The Fast Track! ANSYS 17.2
On The Fast Track! ANSYS 17.2

The ANSYS CFX test case has approximately 1.8 million nodes

  • 10 million elements, all tetrahedral
  • Solves compressible fluid flow with heat transfer using the k-epsilon turbulence model.

ANSYS Benchmark Test Case Information

  • ANSYS HPC Licensing Packs required for this benchmark
    • I used (3) HPC Packs to unlock all 56 cores of the CUBE a56i.
    • The fastest solve time goes to the CUBE a56i – Boom!
      • From start to finish a total of forty-six (46) ticks on the clock on the wall occurred.
      • A total of fifty-five (55) cores in use between two twenty-eight (28) core nodes.
      • Windows 2012 R2 Standard Edition w/HPC update 3
      • MS-MPI v7.1
      • ANSYS CFX 17.2
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to listen to your specific needs.

Figure 1 – ANSYS CFX benchmark data for the tetrahedral, 10 million elements External Flow Over a LeMans Car Test Case

ANSYS CFX Benchmark Data
ANSYS CFX Benchmark Data

ANSYS CFX Test Case Details – Click Here for more information on this benchmark

External Flow Over a LeMans Car
Number of nodes 1,864,025
Element type Tetrahedral
Models k-epsilon Turbulence, Heat Transfer
Solver Coupled Implicit

The CPU Information

The benchmark data is derived off of the running through the ANSYS CFX External Flow Over a LeMans Car test case. Take a minute or three to look at how these CPU’s perform with one of the very latest ANSYS releases, ANSYS Release 17.1 & ANSYS Release 17.2.

Wall Clock Time!

I have focused and tuned the numerical simulation machines with a focus on wall clock time for years now. What is funny if you ask Eric Miller we were talking about wall clock times this morning.

What is wall clock time? Simply put –> How does the solve time FEEL to the engineer…..yes, i just equated a feeling to a non-human event. Ah yes, to feel…oh and  I was reminded of old Van Halen song where David Lee Roth says.

Oh man, I think the clock is slow.

  I don’t feel tardy.

Class Dismissed!”

The CUBE phenomenon

CUBE a56i Appliance – Windows 2012 R2 Standard w/HPC
1U CUBE APPLIANCE (2 x 28)
4 x 14c @2.6GHz/ea – Intel® Xeon® e5-2690 V4
Dual Socket motherboard
256GB DDR4-2400 MHz LRDIMM
4 x 600GB SAS3 15k RPM
56Gbps Infiniband FDR CPU High Speed Interconnect
10Gbps Ethernet Low Latency
CUBE w32i Workstation – Windows 10 Professional
1 x 4U CUBE APPLIANCE
2 x 16c @2.6GHz/ea – Intel® Xeon® e5-2697a V4
Dual Socket motherboard
256GB DDR4-2400 MHz LRDIMM
2 x 600GB SAS3 15k RPM
NVIDIA QUADRO M4000

It Is All About The Data

 11/17/2016

PADT, Inc. – Tempe, AZ

ANSYS CFX 17.1 ANSYS CFX 17.1 ANSYS CFX 17.2
Total wall clock time Cores CUBE w32i CUBE a56i CUBE a56i
2 555 636 609
4 304 332 332
8 153 191 191
16 105 120 120
24 78 84 84
32 73 68 68
38 0 61 59
42 0 55 55
48 0 51 51
52 0 52 48
55 0 47 46
56 0 52 51

Picture Sharing Time!

Check out the pictures below of the Microsoft Server 2012 R2  HPC Cluster Manager.

I used the Windows Server 2012 R2  on both of the two compute nodes that make up the CUBE a56i.

Microsoft 2012 R2 w/HPC – is very quick, and oh so very powerful!

winhpc-cfx-56c-cpu

Windows 2012 HPC
Microsoft Windows 2012 R2 HPC. It is time…
INTEL XEON e5-2690 v4
The INTEL XEON e5-2690 v4 loves the turbo mode vrrooom It is time…

Please be safe out there in the wilds, you are all dismissed for the weekend!

ANSYS R17 Topological Optimization Application Example – Saxophone Brace

topo-opt-sax-a2What is Topological Optimization? If you’re not familiar with the concept, in finite element terms it means performing a shape optimization utilizing mesh information to achieve a goal such as minimizing volume subject to certain loads and constraints. Unlike parameter optimization such as with ANSYS DesignXplorer, we are not varying geometry parameters. Rather, we’re letting the program decide on an optimal shape based on the removal of material, accomplished by deactivating mesh elements. If the mesh is fine enough, we are left with an ‘organic’ sculpted shape elements. Ideally we can then create CAD geometry from this organic looking mesh shape. ANSYS SpaceClaim has tools available to facilitate doing this.

topo-opt-sax-a1Topological optimization has seen a return to prominence in the last couple of years due to advances in additive manufacturing. With additive manufacturing, it has become much easier to make parts with the organic shapes resulting from topological optimization. ANSYS has had topological optimization capability both in Mechanical APDL and Workbench in the past, but the capabilities as well as the applications at the time were limited, so those tools eventually died off. New to the fold are ANSYS ACT Extensions for Topological Optimization in ANSYS Mechanical for versions 17.0, 17.1, and 17.2. These are free to customers with current maintenance and are available on the ANSYS Customer Portal.

In deciding to write this piece, I decided an interesting example would be the brace that is part of all curved saxophones. This brace connects the bell to the rest of the saxophone body, and provides stiffness and strength to the instrument. Various designs of this brace have been used by different manufacturers over the years. Since saxophone manufacturers like those in other industries are often looking for product differentiation, the use of an optimized organic shape in this structural component could be a nice marketing advantage.

This article is not intended to be a technical discourse on the principles behind topological optimization, nor is it intended to show expertise in saxophone design. Rather, the intent is to show an example of the kind of work that can be done using topological optimization and will hopefully get the creative juices flowing for lots of ANSYS users who now have access to this capability.

That being said, here are some images of example bell to body braces in vintage and modern saxophones. Like anything collectible, saxophones have fans of various manufacturers over the years, and horns going back to production as early as the 1920’s are still being used by some players. The older designs tend to have a simple thin brace connecting two pads soldered to the bell and body on each end. Newer designs can include rings with pivot connections between the brace and soldered pads.

topo-opt-sax-01
Half Ring Brace

 

Solid connection to bell, screw joint to body
Solid connection to bell, screw joint to body
Older thin but solid brace rigidly connected to soldered pads
Older thin but solid brace rigidly connected to soldered pads
topo-opt-sax-04
Modern ring design
Modern Dual Degree of Freedom with Revolute Joint Type Connections
Modern Dual Degree of Freedom with Revolute Joint Type Connections

Hopefully those examples show there can be variation in the design of this brace, while not largely tampering with the musical performance of the saxophone in general. The intent was to pick a saxophone part that could undergo topological optimization which would not significantly alter the musical characteristics of the instrument.

The first step was to obtain a CAD model of a saxophone body. Since I was not able to easily find one freely available on the internet that looked accurate enough to be useful, I created my own in ANSYS SpaceClaim using some basic measurements of an example instrument. I then modeled a ‘blob’ of material at the brace location. The idea is that the topological optimization process will remove non-needed material from this blob, leaving an optimized shape after a certain level of volume reduction.

Representative Solid Model Geometry Created in ANSYS SpaceClaim. Note ‘Blob’ of Material at Brace Location.
Representative Solid Model Geometry Created in ANSYS SpaceClaim. Note ‘Blob’ of Material at Brace Location.

In ANSYS Mechanical, the applied boundary conditions consisted of frictionless support constraints at the thumb rest locations and a vertical displacement constraint at the attachment point for the neck strap. Acceleration due to gravity was applied as well. Other loads, such as sideways inertial acceleration, could have been considered as well but were ignored for the sake of simplicity for this article. The material property used was brass, with values taken from Shigley and Mitchell’s Mechanical Engineering Design text, 1983 edition.

topo-opt-sax-07
Applied Boundary Conditions Were Various Constraints at A, B, and C, as well as Acceleration Due to Gravity.

This plot shows the resulting displacement distribution due to the gravity load:

topo-opt-sax-08

Now that things are looking as I expect, the next step is performing the topological optimization.

Once the topological optimization ACT Extension has been downloaded from the ANSYS Customer Portal and installed, ANSYS Mechanical will automatically include a Topological Optimization menu:

topo-opt-sax-09

I set the Design Region to be the blog of material that I want to end up as the optimized brace. I did a few trials with varying mesh refinement. Obviously, the finer the mesh, the smoother the surface of the optimized shape as elements that are determined to be unnecessary are removed from consideration. The optimization Objective was set to minimize compliance (maximize stiffness). The optimization Constraint was set to volume at 30%, meaning reduce the volume to 30% of the current value of the ‘blob’.
After running the solution and plotting Averaged Node Values, we can see the ANSYS-determined optimized shape:

topo-opt-sax-10
Two views of the optimized shape.

What is apparent when looking at these shapes is that the ‘solder patch’ where the brace attaches to the bell on one end and the body on the other end was allowed to be reduced. For example, in the left image we can see that a hole has been ‘drilled’ through the patch that would connect the brace to the body. On the other end, the patch has been split through the middle, making it look something like an alligator clip.

 

Another optimization run was performed in which the solder pads were held as surfaces that were not to be changed by the optimization. The resulting optimized shape is shown here:

topo-opt-sax-11

Noticing that my optimized shape seemed on the thick side when compared to production braces, I then changed the ‘blob’ in ANSYS SpaceClaim so that it was thinner to start with. With ANSYS it’s very easy to propagate geometry changes as all of the simulation and topological optimizations settings stay tied to the geometry as long as the topology of those items stays the same.

Here is the thinner chunk after making a simple change in ANSYS SpacClaim:

topo-opt-sax-12

And here is the result of the topological optimization using the thinner blob as the starting point:

topo-opt-sax-13

Using the ANSYS SpaceClaim Direct Modeler, the faceted STL file that results from the ANSYS topological optimization can be converted into a geometry file. This can be done in a variety of ways, including a ‘shrink wrap’ onto the faceted geometry as well as surfaces fit onto the facets. Another option is to fit geometry in a more general way in an around the faceted result. These methods can also be combined. SpaceClaim is really a great tool for this. Using SpaceClaim and the topological optimization (faceted) result, I came up with three different ‘looks’ of the optimized part.

Using ANSYS Workbench, it’s very easy to plug the new geometry component into the simulation model that I already had setup and run in ANSYS Mechanical using the ‘blob’ as the brace in the original model. I then checked the displacement and stress results to see how they compared.

First, we have an organic looking shape that is mostly faithful to the results from the topological optimization run. This image is from ANSYS SpaceClaim, after a few minutes of ‘digital filing and sanding’ work on the STL faceted geometry output from ANSYS Mechanical.

topo-opt-sax-14

This shows the resulting deflection from this first, ‘organic’ candidate:

topo-opt-sax-15

The next candidate is one where more traditional looking solid geometry was created in SpaceClaim, using the topological optimization result as a guide. This is what it looks like:

topo-opt-sax-16

This is the same configuration, but showing it in place within the saxophone bell and body model in ANSYS SpaceClaim:

topo-opt-sax-17

Next, here is the deformation result for our simple loading condition using this second geometry configuration:

topo-opt-sax-18

The third and final design candidate uses the second set of geometry as a starting point, and then adds a bit of style while still maintaining the topological optimization shape as an overall guide. Here is this third candidate in ANSYS SpaceClaim:

topo-opt-sax-19

Here are is the resulting displacement distribution using this design:

topo-opt-sax-20

This shows the maximum principal stress distribution within the brace for this candidate:

topo-opt-sax-21

Again, I want to emphasize that this was a simple example and there are other considerations that could have been included, such as loading conditions other than acceleration due to gravity. Also, while it’s simple to include modal analysis results, in the interest of brevity I have not included them here. The main point is that topological optimization is a tool available within ANSYS Mechanical using the ACT extension that’s available for download on the customer portal. This is yet another tool available to us within our ANSYS simulation suite. It is my hope that you will also explore what can be done with this tool.

Regarding this effort, clearly a next step would be to 3D print one or more of these designs and test it out for real. Time permitting, we’ll give that a try at some point in the future.

ANSYS 17.1 FEA Benchmarks using v17-sp5

The CUBE machines that I used in this ANSYS Test Case represent a fine balance based on price, performance and ANSYS HPC licenses used.

Click Here for more information on the engineering simulation workstations and clusters designed in-house at PADT, Inc.. PADT, Inc. is happy to be a premier re-seller and dealer of Supermicro hardware.

  • ANSYS Benchmark Test Case Information.
  • ANSYS HPC Licensing Packs required for this benchmark
    • I used (2) HPC Packs to unlock all 32 cores.
  • Please contact your local ANSYS Software Sales Representative for more information on purchasing ANSYS HPC Packs. You too may be able to speed up your solve times by unlocking additional compute power!
  • What is a CUBE? For more information regarding our Numerical Simulation workstations and clusters please contact our CUBE Hardware Sales Representative at SALES@PADTINC.COM Designed, tested and configured within your budget. We are happy to help and to  listen to your specific needs.

Figure 1 – ANSYS benchmark data from three excellent machines.

CUBE
CUBE by PADT, Inc. ANSYS Release 17.1 FEA Benchmark

BGA (V17sp-5)

BGA (V17sp-5)
Analysis Type Static Nonlinear Structural
Number of Degrees of Freedom 6,000,000
Equation Solver Sparse
Matrix Symmetric

Click Here for more information on the ANSYS Mechanical test cases. The ANSYS website has great information pertaining to the benchmarks that I am looking into today.

Pro Tip –> Lastly, please check out this article by Greg Corke one of my friends at ANSYS, Inc. I am using the ANSYS benchmark data fromthe Lenovo Thinkstation P910 as a baseline for my benchmark data.  Enjoy Greg’s article here!

  • The CPU Information

The benchmark data is derived off of the running through the BGA (sp-5) ANSYS test case. CPU’s and how they perform with one of the very latest ANSYS releases, ANSYS Release 17.1.

  1.  Intel® Xeon® e5-2680 V4
  2.  Intel® Xeon® e5-2667 V4
  3.  Intel® Xeon® e5-2697a V4
  • It Is All About The Data
    • Only one workstation was used for the data in this ANSYS Test Case
    • No GPU Accelerator cards are used for the data
    • Solution solve times are in seconds
ANSYS 17.1 Benchmark BGA v17sp-5
Lenovo ThinkStation P910 2680 V4 CUBE w16i 2667 V4 CUBE w32i 2697A V4
Cores Customer X  – 28 Core @2.4GHz/ea CUBE w16i CUBE w132i tS
2 1016 380.9 989.6 1.03
4 626 229.6 551.1 1.14
8 461 168.7 386.6 1.19
12 323 160.7 250.5 1.29
16 265 161.7 203.3 1.30
20 261 0 176.9 1.48
24 246 0 158.1 1.56
28 327 0 151.8 2.15
31 0 0 145.2 2.25
32 0 0 161.7 2.02
15-Nov-16 PADT, Inc. – Tempe, AZ –
  • Cube w16i Workstation – Windows 10 Professional
    1 x 4U CUBE APPLIANCE
    2 x 8c @3.2GHz/ea
    Dual Socket motherboard
    256GB DDR4-2400 MHz LRDIMM
    6 x 600GB SAS3 15k RPM
    NVIDIA QUADRO K6000
  • CUBE w32i Workstation – Windows 10 Professional
    1 x 4U CUBE APPLIANCE
    2 x 16c @2.6GHz/ea
    Dual Socket motherboard
    256GB DDR4-2400 MHz LRDIMM
    2 x 600GB SAS3 15k RPM
    NVIDIA QUADRO M4000
  • Lenovo Thinkstation P910 Workstation – Windows 10 Professional
    Lenovo P910 Workstation
    2 x 14c @2.4GHz/ea
    Dual Socket motherboard
    128GB DDR4-2400 MHz
    512GB NVMe SSD / 2 x 4TB SATA HDD / 512GB SATA SSD
    NVIDIA QUADRO M2000

As you will may have noticed above, the CUBE workstation with the Intel Xeon e5-2697A V4 had the fastest solution solve time for one workstation.

  • *** Using 31 cores the CUBE w32i finished the sp-5 test case in 145.2 seconds.

See 32 Cores of Power! CUBE by PADT, Inc.

cube-w32i-coresCUBE w32i

CUBE w32i

CUBE by PADT, Inc. of ANSYS 17.1 Benchmark Data for sp-5
CUBE by PADT, Inc. of ANSYS 17.1 Benchmark Data for sp-5

Thank you!

http://www.cube-hvpc.com/