ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

AMD Opteron 6308 & INTEL XEON e5-2690 Comparison using ANSYS FLUENT 14.5.7

Note: The information and data contained in this article was complied and generated on September 12, 2013 by PADT, Inc. on CUBE HVPC hardware using FLUEN 14.5.7.  Please remember that hardware and software change with new releases and you should always try to run your own benchmarks, on your own typical problems, to understand how performance will impact you. A potential customer of ours was interested in a CUBE HVPC mini-cluster. They requested that I run benchmarks and garner some data on a two CPU’s. The CPU’s were benchmarked on two of our CUBE HVPC systems. One mini-cluster has dual INTEL® XEON e5-2690 CPU’s and another mini-cluster has quad AMD® Opteron 8308 CPU’s. The benchmarking was only run on a single server using a total of 16 cores on each machine. The same DDR3-1600 ECC Reg RAM, Supermicro LSI 2208 RAID Controller and Hitachi SAS2 15k RPM hard drives were used on each system.

clip_image002clip_image004clip_image006clip_image008

CUBE HVPC Test configurations:

Server 1: CUBE HVPC c16
  • CPU: 4, AMD Opteron 6308 @ 3.5GHz (Quad Core)
  • Memory: 256GB (32x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • Hardware RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Linux 64-bit / Kernel 2.6.32-358.18.1.e16.x86_64
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI
  • HCA: SMC AOC-UIBQ-M2 – QDR Infiniband
    • The IB card installed however solves were run distributed locally
  • Stack: RDMA 3.6-1.el6
  • Switch: MELLANOX IS5023 Non-Blocking 18-port switch
Server 2: CUBE HVPC c16i
  • CPU: 2, INTEL XEON e5-2690 @ 2.9GHz (Octa Core)
  • Memory: 128GB (16x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Windows 7 Professional 64-bit
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI

ANSYS FLUENT 14.6.7 Performance using the ANSYS FLUENT Benchmark suite provided by ANSYS, Inc.

The models we used can be downloaded from the ANSYS Fluent Benchmark page link: http://www.ansys.com/Support/Platform+Support/Benchmarks+Overview/ANSYS+Fluent+Benchmarks
Release ANSYS FLUENT 14.5.7 Test Cases  (20 Iterations each):
  • Reacting Flow with Eddy Dissipation Model (eddy_417k)
  • Single-stage Turbomachinery Flow (turbo_500k)
  • External Flow Over an Aircraft Wing (aircraft_2m)
  • External Flow Over a Passenger Sedan (sedan_4m)
  • External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m)
  • External Flow Over a Truck Body 14m (truck_14m)
Chart 1: Total Wall Clock Time in seconds: (smaller bar is better)
clip_image011
Chart 2: Average wall-clock time per iteration in seconds: (smaller bar is better)
clip_image015  

Summary:

Are you sure?
That was the question Eric proposed to me after he reviewed the data and read this blog article before posting. I told him “yes I am sure data is data, and I even triple checked.” I basically re-ran several of the benchmarks to see if the solve times came out the same on these two CUBE HVPC workstations. I went on to tell Eric , “For example, lets dig into the data for the External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m) benchmark and see what we find.”
Quad socket Supermicro motherboard 4 x 4c AMD Opteron 6308 @3.5GHz Dual socket Supermicro motherboard 2 x 8c INTEL e5-2690 @2.9GHz
clip_image002[1] clip_image004[1]
The INTEL XEON e5-2690 INTEL CPU dual socket motherboard is impressive; it may have been on the Top500 list of some of the fastest computers in the world ten years ago. Anyways, so after each solve I captured the solve data and as you can see below. The AMD Opteron wall clock time was faster than the INTEL XEON wall clock time. So why did the AMD Opteron 6308 CPU pull away from the INTEL for the ANSYS FLUENT solve times? Lets take a look at couple of reasons why this happened. I will let you make your own conclusions.
  • Clock Speed, but would a 10.4GHz difference in total CPU speed make a 100% speedup in ANSYS Fluent wall-clock times?
  • Theoretical total of:
  • AMD® OPTERON 6308 = 16 x 3.5GHz = 56.0 GH
  • INTEL® XEON e5-2690 = 16 x 2.9GHz – 46.4 GHz
  • The floating point argument? The tic and tock of the great CPU saga continues.
  • At this moment in eternity, it is a known fact that the AMD Opteron 6308 and many of its brothers, have one floating point unit per two integer cores. INTEL has one integer core per one floating point core. However what this means to ANSYS CFD users in my MIS/IT simpleton terms is the AMD CPU was simply able to handle and process more data in this example.
  • It’s possible that there were more integer calculations required than floating point? If that is the case then the AMD CPU would have had eight pipelines for integer calculations. The AMD Opteron is able to process four floating point pipelines. While the INTEL CPU can process eight floating point pipelines.
Let us look at the details of what is on the motherboards as well.  4 data paths vs 2 can make a difference:

Dual socket Supermicro motherboard

2 x 8c INTEL e5-2690 @2.9GHz

Quad socket Supermicro motherboard

4 x 4c AMD Opteron 6308 @3.5GHz

Processor Technology 32-Naometer 32-Naometer SOI (silicon-on-insulator) technology
HyperTransport™ Technology Links Quick Path Interconnect Links Two links at up to 8GT/s per link up to 16 GB/s direction peak bandwidth per port Four x16 links at up to 6.4GT/s per link
Memory Integrated DDR3 memory controller – Up to 51.2 GB/s memory bandwidth per socket
Number of Channels and Types of Memory Four links at up to 51.2GB/s per link Four x16 links at up to 6.4GT/s per link
Number of Channels and Types of Memory Quad channel support Quad channel support
Packaging LGA2011-0 Socket G34 – 1944-pin organic Land Grid Array (LGA)
Current pricing of the CPU’s
Here is the up to the minute pricing for each CPU’s. I took these prices off of NewEgg and IngramMicro’s website. The date of the monetary values was captured on September 12, 2013.
  • AMD Opteron 6308 Abu Dhabi 3.5GHz 4MB L2 Cache 16MB L3 Cache Socket G34 115W Quad-Core Server Processor OS6308WKT4GHKWOF
    • $499.99 x 4 = $1999.96
  • Intel Xeon E5-2690 2.90 GHz Processor – Socket LGA-2011, L2 Cache 2MB, L3 Cache 20 MB, 8 GT/s QPI,
    • $2010.02 x 2 = $4020.40

STEP OUT OF THE BOX, STEP INTO A CUBE

PADT offers a line of high performance computing (HPC) systems specifically designed for CFD and FEA number crunching aimed at a balance between cost and performance. We call this concept High Value Performance Computing, or HVPC. These systems have allowed PADT and our customers to carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software. Let CUBE HVPC by PADT, Inc. quote you a configuration today!
This entry was posted in The Focus and tagged , , , . Bookmark the permalink.
  • http://twitter.com/jdrch jdrch

    Great post.

    They requested that I run benchmarks and garner some data on a two CPU’s.

    Fair enough, but you built 2 machines with different RAM and different OSes. Is that really a CPU comparison, or a total machine comparison?

    That said, however, not only is there the total clock speed advantage you mentioned, but also a system bus advantage. The 4 Opterons can get a total of 4 * 6.4 = 26.4 GT/s, while the Xeons are stuck with 2 * 8 = 16 GT/s. That’s a huge difference for something as RAM intensive as CFD.

    One last consideration: although the Opteron CUBE costs less, it also uses 460 W vs. the Xeon CUBE’s 270 W. But then again your computation is done in less than half the time.

    Personally, I’d pick the Opteron CUBE. Ironically from what I’ve seen so many people believe in the absolute dominance of Intel in the x86 world – as you alluded to – that high end AMD OEM machines have all but disappeared from the market. That’s sad.

  • Pingback: Workstation with new E5 Xeon Ivy-bridge -- CFD Online Discussion Forums

  • Pingback: The Focus | PADT, Inc. – The Blog | Page 2672

  • Hunter

    Also pay attention to the total memory channels of the 2 systems. Dual socket Xeon E5-2690 has total 8 memory channels (4 each), while the quad socket Opteron 6308 has total 16 memory channels (4 each). Both at 1600 MT/s. So for same 16-core run, quad socket Opteron has 2X memory bandwidth than dual socket E5-2690. I think this should be the 3rd reason why AMD is faster than Intel, besides CPU clock speed and bus bandwidth.