Part 2: ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

AMD Opteron 6308, INTEL XEON e5-2690 & INTEL XEON e5-2667V2 Comparison using ANSYS FLUENT 14.5.7

Note: The information and data contained in this article was complied and generated on September 12, 2013 by PADT, Inc. on CUBE HVPC hardware using FLUEN 14.5.7.  Please remember that hardware and software change with new releases and you should always try to run your own benchmarks, on your own typical problems, to understand how performance will impact you.

By David Mastel

Due to the response to the original article on this subject,  I thought it would be good to do a quick follow-up using one of our latest CUBE HVPC builds. Again, the ANSYS Fluent standard benchmarks were used in garnering the stats on this dual socket INTEL XEON e5-2667V2 configuration.

CUBE HVPC Test configurations (Same as in last comparison)

  • Server 1: CUBE HVPC c16
  • CPU: 4, AMD Opteron 6308 @ 3.5GHz (Quad Core)
  • Memory: 256GB (32x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • Hardware RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  •  OS: Linux 64-bit / Kernel 2.6.32-358.18.1.e16.x86_64
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI
  • HCA: SMC AOC-UIBQ-M2 – QDR Infiniband
    • The IB card installed however solves were run distributed locally
  • Switch: MELLANOX IS5023 Non-Blocking 18-port switch

Server 2: CUBE HVPC c16i (Intel server from last comparison)

  • CPU: 2, INTEL XEON e5-2690 @ 2.9GHz (Octa Core)
  • Memory: 128GB (16x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Windows 7 Professional 64-bit
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI

Server 3: CUBE HVPC c16ivy (New “Ivy” based Intel server)

  • CPU: 2, INTEL XEON e5-2667V2 @ 3.3 (Octa Core)
  • Memory: 128GB (16x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Linux 64-bit / Kernel 2.6.32-358.18.1.e16.x86_64
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI
  • HCA: SMC – QDR Infiniband
    • The IB card installed however solves were run distributed locally

ANSYS FLUENT 14.5.7 Performance using the ANSYS FLUENT Benchmark suite provided by ANSYS, Inc.

ANSYS Fluent Benchmark page link:http://www.ansys.com/Support/Platform+Support/Benchmarks+Overview/ANSYS+Fluent+Benchmarks

Release ANSYS FLUENT 14.5.7 Test Cases
(20 Iterations each)

  • Reacting Flow with Eddy Dissipation Model (eddy_417k)
  • Single-stage Turbomachinery Flow (turbo_500k)
  • External Flow Over an Aircraft Wing (aircraft_2m)
  • External Flow Over a Passenger Sedan (sedan_4m)
  • External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m)
  • External Flow Over a Truck Body 14m (truck_14m)

Here are the results for all three machines, total and average time:

Intel-AMD-Flunet-Part2-Chart1Intel-AMD-Flunet-Part2-Chart2

 

Summary: Are you sure? Part 2

So I didn’t have to have the “Are you sure?” question with Eric this time and I didn’t bother triple checking the results because indeed, the Ivy Bridge-EP Socket 2011 is one fast CPU! That combined with a 0.022 micron manufacturing process  the data speaks for itself. For example, lets re-dig into the data for the External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m) benchmark and see what we find:

Intel-AMD-FLUENT-Details

 

 

 

 

 

 

 

 

 

 

 

Intel-AMD-FLUENT-summary

 

 

 

 

 

 

 

 

 

 

 

Current Pricing of INTEL® and AMD® CPU’s

Here is the up to the minute pricing for each CPU’s. I took these prices off of NewEgg and IngramMicro’s website. The date of the monetary values was captured on October 4, 2013.

Note AMD’s price per CPU went up and the INTEL XEON e5-2690 went down. Again, these prices based on today’s pricing, October 4, 2013.

AMD Opteron 6308 Abu Dhabi 3.5GHz 4MB L2 Cache 16MB L3 Cache Socket G34 115W Quad-Core Server Processor OS6308WKT4GHKWOF

  •  $501 x 4 = $2004.00

Intel Xeon E5-2690 2.90 GHz Processor – Socket LGA-2011, L2 Cache 2MB, L3 Cache 20 MB, 8 GT/s QPI

  • $1986.48 x 2 = $3972.96

Intel Xeon E5-2667V2 3.3 GHz Processor – Socket LGA-2011, L2 Cache 2MB, L3 Cache 25 MB, 8 GT/s QPI,

  • $1933.88 x 2 = $3867.76

REFERENCES:
http://www.ingrammicro.com
http://www.newegg.com

INTEL XEON e5-2667V2
http://ark.intel.com/products/75273/Intel-Xeon-Processor-E5-2667-v2-25M-Cache-3_30-GHz

INTEL XEON e5-2690
http://ark.intel.com/products/64596/

AMD Opteron 6308
http://www.amd.com/us/Documents/Opteron_6300_QRG.pdf

http://en.wikipedia.org/wiki/Double-precision_floating-point_format

http://en.wikipedia.org/wiki/Central_processing_unit#Integer_range

http://en.wikipedia.org/wiki/Floating_point

STEP OUT OF THE BOX, STEP INTO A CUBE

PADT offers a line of high performance computing (HPC) systems specifically designed for CFD and FEA number crunching aimed at a balance between cost and performance. We call this concept High Value Performance Computing, or HVPC. These systems have allowed PADT and our customers to carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software.

Let CUBE HVPC by PADT, Inc. quote you a configuration today!

 

ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

AMD Opteron 6308 & INTEL XEON e5-2690 Comparison using ANSYS FLUENT 14.5.7

Note: The information and data contained in this article was complied and generated on September 12, 2013 by PADT, Inc. on CUBE HVPC hardware using FLUEN 14.5.7.  Please remember that hardware and software change with new releases and you should always try to run your own benchmarks, on your own typical problems, to understand how performance will impact you.

A potential customer of ours was interested in a CUBE HVPC mini-cluster. They requested that I run benchmarks and garner some data on a two CPU’s. The CPU’s were benchmarked on two of our CUBE HVPC systems. One mini-cluster has dual INTEL® XEON e5-2690 CPU’s and another mini-cluster has quad AMD® Opteron 8308 CPU’s. The benchmarking was only run on a single server using a total of 16 cores on each machine. The same DDR3-1600 ECC Reg RAM, Supermicro LSI 2208 RAID Controller and Hitachi SAS2 15k RPM hard drives were used on each system.

clip_image002clip_image004clip_image006clip_image008

CUBE HVPC Test configurations:

Server 1: CUBE HVPC c16
  • CPU: 4, AMD Opteron 6308 @ 3.5GHz (Quad Core)
  • Memory: 256GB (32x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • Hardware RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Linux 64-bit / Kernel 2.6.32-358.18.1.e16.x86_64
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI
  • HCA: SMC AOC-UIBQ-M2 – QDR Infiniband
    • The IB card installed however solves were run distributed locally
  • Stack: RDMA 3.6-1.el6
  • Switch: MELLANOX IS5023 Non-Blocking 18-port switch
Server 2: CUBE HVPC c16i
  • CPU: 2, INTEL XEON e5-2690 @ 2.9GHz (Octa Core)
  • Memory: 128GB (16x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Windows 7 Professional 64-bit
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI

ANSYS FLUENT 14.6.7 Performance using the ANSYS FLUENT Benchmark suite provided by ANSYS, Inc.

The models we used can be downloaded from the ANSYS Fluent Benchmark page link: http://www.ansys.com/Support/Platform+Support/Benchmarks+Overview/ANSYS+Fluent+Benchmarks

Release ANSYS FLUENT 14.5.7 Test Cases  (20 Iterations each):
  • Reacting Flow with Eddy Dissipation Model (eddy_417k)
  • Single-stage Turbomachinery Flow (turbo_500k)
  • External Flow Over an Aircraft Wing (aircraft_2m)
  • External Flow Over a Passenger Sedan (sedan_4m)
  • External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m)
  • External Flow Over a Truck Body 14m (truck_14m)
Chart 1: Total Wall Clock Time in seconds: (smaller bar is better)

clip_image011

Chart 2: Average wall-clock time per iteration in seconds: (smaller bar is better)

clip_image015

 

Summary:

Are you sure?

That was the question Eric proposed to me after he reviewed the data and read this blog article before posting. I told him “yes I am sure data is data, and I even triple checked.” I basically re-ran several of the benchmarks to see if the solve times came out the same on these two CUBE HVPC workstations. I went on to tell Eric , “For example, lets dig into the data for the External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m) benchmark and see what we find.”

Quad socket Supermicro motherboard

4 x 4c AMD Opteron 6308 @3.5GHz

Dual socket Supermicro motherboard

2 x 8c INTEL e5-2690 @2.9GHz

clip_image002[1] clip_image004[1]

The INTEL XEON e5-2690 INTEL CPU dual socket motherboard is impressive; it may have been on the Top500 list of some of the fastest computers in the world ten years ago. Anyways, so after each solve I captured the solve data and as you can see below. The AMD Opteron wall clock time was faster than the INTEL XEON wall clock time.

So why did the AMD Opteron 6308 CPU pull away from the INTEL for the ANSYS FLUENT solve times? Lets take a look at couple of reasons why this happened. I will let you make your own conclusions.

  • Clock Speed, but would a 10.4GHz difference in total CPU speed make a 100% speedup in ANSYS Fluent wall-clock times?
  • Theoretical total of:
  • AMD® OPTERON 6308 = 16 x 3.5GHz = 56.0 GH
  • INTEL® XEON e5-2690 = 16 x 2.9GHz – 46.4 GHz
  • The floating point argument? The tic and tock of the great CPU saga continues.
  • At this moment in eternity, it is a known fact that the AMD Opteron 6308 and many of its brothers, have one floating point unit per two integer cores. INTEL has one integer core per one floating point core. However what this means to ANSYS CFD users in my MIS/IT simpleton terms is the AMD CPU was simply able to handle and process more data in this example.
  • It’s possible that there were more integer calculations required than floating point? If that is the case then the AMD CPU would have had eight pipelines for integer calculations. The AMD Opteron is able to process four floating point pipelines. While the INTEL CPU can process eight floating point pipelines.

Let us look at the details of what is on the motherboards as well.  4 data paths vs 2 can make a difference:

Dual socket Supermicro motherboard

2 x 8c INTEL e5-2690 @2.9GHz

Quad socket Supermicro motherboard

4 x 4c AMD Opteron 6308 @3.5GHz

Processor Technology 32-Naometer 32-Naometer SOI (silicon-on-insulator) technology
HyperTransport™ Technology Links

Quick Path Interconnect Links

Two links at up to 8GT/s per link up to 16 GB/s direction peak bandwidth per port Four x16 links at up to 6.4GT/s per link
Memory Integrated DDR3 memory controller – Up to 51.2 GB/s memory bandwidth per socket
Number of Channels and Types of Memory Four links at up to 51.2GB/s per link Four x16 links at up to 6.4GT/s per link
Number of Channels and Types of Memory Quad channel support Quad channel support
Packaging LGA2011-0 Socket G34 – 1944-pin organic Land Grid Array (LGA)
Current pricing of the CPU’s

Here is the up to the minute pricing for each CPU’s. I took these prices off of NewEgg and IngramMicro’s website. The date of the monetary values was captured on September 12, 2013.

  • AMD Opteron 6308 Abu Dhabi 3.5GHz 4MB L2 Cache 16MB L3 Cache Socket G34 115W Quad-Core Server Processor OS6308WKT4GHKWOF
    • $499.99 x 4 = $1999.96
  • Intel Xeon E5-2690 2.90 GHz Processor – Socket LGA-2011, L2 Cache 2MB, L3 Cache 20 MB, 8 GT/s QPI,
    • $2010.02 x 2 = $4020.40

STEP OUT OF THE BOX,
STEP INTO A CUBE

PADT offers a line of high performance computing (HPC) systems specifically designed for CFD and FEA number crunching aimed at a balance between cost and performance. We call this concept High Value Performance Computing, or HVPC. These systems have allowed PADT and our customers to carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software.

Let CUBE HVPC by PADT, Inc. quote you a configuration today!

Columbia: PADT’s Killer Kilo-Core CUBE Cluster is Online

iIn the back of PADT’s product development lab is a closet.  Yesterday afternoon PADT’s tireless IT team crammed themselves into the back of that closet and powered up our new cluster, bringing 1104 connected cores online.  It sounded like a jet taking off when we submitted a test FLUENT solve across all the cores.  Music to our ears.

We have recently become slammed with benchmarks for ANSYS and CUBE customers as well as our normal load of services work, so we decided it was time to pull the trigger and double the size of our cluster while adding a storage node.  And of course, we needed it yesterday.  So the IT team rolled up their sleeves, configured a design, ordered hardware, built it up, tested it all, and got it on line, in less than two weeks.  This was while they did their normal IT work and dealt with a steady stream of CUBE sales inquiries.  But it was a labor of love. We have all dreamed about breaking that thousand core barrier on one system, and this was our chance to make it happen.

If you need more horsepower and are looking for a solution that hits that sweet spot between cost and performance, visit our CUBE page at www.cube-hvpc.com and learn more about our workstations, servers, and clusters.  Our team (after they get a little rest) will be more than happy to work with you to configure the right system for your real world needs.

Now that the sales plug is done, lets take a look at the stats on this bad boy:

Name: Columbia
After the class of battlestars in Battlestar Galactica
Brand: CUBE High Value Performance Compute Cluster, by PADT
Nodes: 18
17 compute, 1 storage/control node, 4 CPU per Node
Cores: 1104
AMD Opteron: 4 x 6308 3.5 GHz, 32 x 6278 2.4 GHz, 36 x 6380 2.5 GHz
Interconnect: 18 port MELLANOX IB 4X QDR Infiniband switch
Memory: 4.864 Terabytes
Solve Disk: 43.5 TB RAID 0
Storage Disk: 64 TB RAID 50

Here are some pictures of the build and the final product:

a
A huge delivery from our supplier, Supermicro, started the process. This was the first pallet.

b
The build included installing the largest power strip any of us had ever seen.

c
Building a cluster consists of doing the same thing, over and over and over again.

f
We took over PADT’s clean room because it turns out you need a lot of space to build something this big.

g
It is fun to get the chance to build the machine you always wanted to build

h
2AM Selfie: Still going strong!

d
Almost there. After blowing a breaker, we needed to wait for some more
power to be routed to the closet.

e
Up and running!
Ratchet and Clank providing cooling air containment.

David, Sam, and Manny deserve a big shout-out for doing such a great job getting this thing up and running so fast!

When I logged on to my first computer, a TRS-80, in my high-school computer lab, I never, ever thought I would be running on a machine this powerful.  And I would have told people they were crazy if they said a machine with this much throughput would cost less than $300,000.  It is a good time to be a simulation user!

Now I just need to find a bigger closet for when we double the size again…

CUBE-HVPC-Logo-wide

CFX Expression Language – Part 1: Accessing CFD Simulation Information in CFX (and FLUENT)

This week we are presenting an introduction to CFX Expression Language. If you’re not familiar with CFX, it is one of the two CFD tools available from ANSYS, Inc., the other being Fluent. CFX has been part of the ANSYS family of engineering tools since 2003. It is relatively easy to use and can be run stand-alone or tightly integrated with other ANSYS products within ANSYS Workbench. We have some general information on CFX available at this link.

CFX Expression Language, or CEL, is the scripting language that allows us to define inputs as variables, capture outputs as variables, and perform operations on those variables. Through the use of CEL we can be more efficient in our CFD runs and better capture results that we need. With CEL we can access and manipulate information without needing to recompile code or access separate routines besides the main CFX applications.

Also note that since CEL can be used in CFD Post, it is useful for postprocessing FLUENT solutions in addition to CFX, since CFD Post is common to both CFX and FLUENT. There are some things to be aware of regarding FLUENT In CFD Post. This link in to the ANSYS 14.5 Help system explains it:

// User’s Guide :: 0 // 7. CFD-Post File Menu // 7.15. File Types Used and Produced by CFD-Post // 7.15.10. Limitations with FLUENT Files

If you are a user of APDL, ANSYS Parametric Design Language, what I have written above about CEL should look familiar. One difference, though, is that while Mechanical APDL is dimensionless, CFX is not. Therefore, CEL definitions contain units where appropriate.

CEL is typically used in CFX-Pre and CFD-Post. A handy editor is available to assist in the definition of the expressions. Most of the activity is enabled by right clicking.

Virtually any quantity in CFX that requires a value input can make use of CEL, including boundary conditions and material properties. CEL can also be used to access and enhance results information. Expressions defined in CEL can be used in design point studies in ANSYS Workbench, either as input or output parameters.

So, what kind of things can you do in an expression? In addition to accessing simulation information and storing it as a variable, you can manipulate values using operators such as add, subtract, multiply, divide, and raise to a power. You can also use built-in functions such as sine, cosine, tangent and other trig functions, exponent, log, square root, absolute value, minimum, maximum, etc.

There are many predefined values, including some common CFD constants such as pi, the universal gas constant, and Avogadro’s number. The available options are different in CFX pre vs. CFD Post, with relevant choices for each.

In CFX Pre, expressions are accessed by double clicking on Expressions in the tree. That takes you to the expression editor, as shown here:

image

Notice how units are defined for each expression, but they can be mixed if desired.

Regarding CFD Post, the example below shows three expressions defined in CFD Post. The expressions within the box are user-defined. The other expressions listed are setup automatically.

The values for forceX1 and forceX2 are calculated by extracting X-direction forces on two different surfaces. The surface names were defined in ANSYS Meshing in this case, as Named Selections. The value fdiffx is calculated by subtracting forceX1 from forceX2. The resulting value, fdiffx, has been specified as an output parameter in Workbench; hence the P-> symbol next to the name.

clip_image004

New expressions are created by right-clicking in the Expressions tab. The new expression value is given a name, then the definition is input, typically by right clicking and selecting from the menus of available quantities, like this:

clip_image006

The location of application for an expression can also be selected by right clicking:

clip_image008

So we’ve got our variables defined using CEL. Now what? Here are some things we can do with CEL variables:

1. Use them as inputs such as material properties or boundary condition values in CFX. If we are running multiple cases, it is typically much easier to define quantities that we want to vary this way. The values can then be changes in the Expression Window, or if defined as a parameter in Workbench, in the parameters view as part of a parameter study.

2. Use them for reporting results quantities of interest, such as forces at a desired location.

3. Use them as input or output parameters in a design point study or design optimization.

Hopefully this brief introduction gives you a glimpse at the power of CEL. In a future article we will look at using CEL for more advanced functionality, such as applying ramped or time varying boundary conditions, using IF statements, and monitoring expression values during solution.

Monster in the Closet: PADT Goes Live with 512 Core HVPC CUBE Cluster

imageThere is a closet in the back of PADT’s product development lab. It does not store empty boxes, old files, or obsolete hardware.  Within that closet is a monster.  Not the sort of monster that scares little children at night.  No, this is a monster that puts fear into the heart of those who try to paint high performance computing as a difficult and expensive task only to be undertaking by those who are in the priesthood.  It makes salespeople who earn fat commissions by selling consulting services and unnecessary add-ons quake in fear.

This closet holds PADT’s latest upgrade to our compute infrastructure: a 512 core CUBE HVPC Cluster.  No data center, no special consultants, no expensive add-ons. Just 512 cores chugging away at solving FLUENT and CFX problems, and pumping a large amount of heat up into the ceiling.

Here are the specifics:

CUBE C512 Columbia Class Cluster

  • 512 AMD 2.4GHz Cores (in 8 nodes, 4 sockets per node, 16 cores per socket)
  • 2TB RAM (256 GB per node of DDR3 1600 ECC RAM)
  • Raid Controller Card (1 per node)
  • 24TB Data Disk Space (3TB per node of SAS2 15k drives in RAID0)
  • Infiniband (8 Port switch, 40 Gbps)
  • 52 Port GIGE switch connected to 2 GIGE ports per node
  • 42 U Rack with thermal convection ducting (chimney)
  • Keyboard, monitor, mouse in drawer
  • CENTOS (switching to RedHat soon)

We built this system with CFD simulation in mind.  The original goal was to provide a proof of concept to expand our CUBE HVPC offering, showing that you can create a cluster of this size, with very good speed, for a price that small and medium sized companies can afford.  We also needed a way to run large problems for benchmarks in support of our ANSYS sales efforts and to provide faster technical support our FLUENT and CFX customers.  We already have a growing queue of benchmarks waiting to get into the machine.

The image above is the glamour shot.  Here is what it looks like in the closet:

image

Keeping with our theme of High Value Performance Computing we stuck it into this closet that was built for telephone equipment and networking equipment back at the turn of the century when Motorola had this suite.  We were able to fit a modern rack in next to an old rack that was in there. We then used the included duct to push the air up into our ceiling space and moved the A/C ducting to duct right into the front of the units.  We did need to keep the flow going into the rack instead of into the area under the networking and telephone switches, so we used an old video game poster:

image
Anyone remember Ratchet and Clank? 
Best PS2 games ever.

It works well and adds a little color to the closet.

So far our testing has shown some great numbers. Not the fastest cluster out there, but if you look at the cost, it offers incredible performance.   You could add a drive array over Infiniband, faster chips, and some redundant power. And it will run faster and more reliably, but it will cost much more.  We are cheap so we like this solution.

Oh yea, with the parts from our old CFD cluster and some new bits, we will be building a smaller mini-cluster using INTEL chips, a GPU or two, and a ton of fast disk and RAM as our FEA cluster.  Look for an update on that in a couple of months.

Interested in getting a cluster like this for you computing pleasure?  A system configured like this one will run about $150,000 (video game poster is extra). Visit our CUBE page to learn more or just shoot an email to sales@padtinc.com.  Don’t worry, we don’t sell these with sales people, someone from IT will get back with you.