Caps and Limits on Hardware Resources in Microsoft Windows and Red Hat Enterprise Linux

windows-caps(Revised and updated February 10, 2014 to include pertinent, relevant Windows Server 2012 information as it relates to the world of numerical simulation)

Hi – One of our more popular blog articles from January 14, 2011. It has been over three years now and the blog article needs a refresh. It seems that as operating system provider’s release a new OS iteration, for Windows Operating System or Linux, that this may contribute to confusion when selecting the proper licensing for the numerical simulation computers physical hardware.

Hopefully this updated blog article will assist you in making sure your numerical simulation machines are licensed properly.

Sometime around 3am in October 2010. I found myself beating my head up against a server rack. I was frustrated with trying to figure out what was limiting my server hardware. I was aware of a couple limits that Microsoft had placed into its OS software. However, I had no idea how far reaching the limits were. I researched into two manufactures of two of the most used Operating Systems on the planet. I figured it would be best if I had a better understanding of these hardware limits. The physical socket and memory limit caps that are placed on the hardware by two of the most popular Operating Systems on the planet: Microsoft Windows 7, Windows Server 2008 R2, Windows Server 2012 and Red Hat Enterprise Linux.

So now let us fast-forward over three years, not much has changed because change is constant. The new Windows Server 2012 changes up the naming convention on us IT geeks. So pay attention because the Windows Server Standard or Enterprise edition you may have been used to has changed.

Limits on Cores, RAM, and USERS by Operating System

  • Microsoft Windows Operating Systems
    • Windows 7
      • Professional / Enterprise / Ultimate
        • Processor: 2 Socket limit (many cores)
        • Core limits:
          • 64-bit: 256 max quantity of cores in 1 physical processor
          • 32-bit: 32 cores max quantity of cores in 1 physical processor
        • RAM: 192 GB limit to amount of accessible
      • Home Premium
        • RAM: 16GB
      • Home Basic
        • RAM: 8GB
      • Starter Edition
        • RAM: 2 GB
    • Windows Server 2008
      • Standard & R2
        • Processor: 4 socket limit – (many cores)
          • (4 – Parts x 12core) = 48 cores
        • RAM: 32 GB
      • Windows Server 2008 R2 Foundation  (R2 releases are 64-bit only)
        • RAM: 128 GB
      • HPC Edition 2008 R2 (R2 releases are 64-bit only)
        • RAM: 128 GB
      • Windows Server 2008 R2 Datacenter (R2 releases are 64-bit only)
        • Processor: 8 socket limit
        • RAM: 2TB
      • Windows Server 2008 R2 Enterprise (R2 releases are 64-bit only)
        • Processor: 8 socket limit
        • RAM: 2TB
    • Windows Server 2012
      • Foundation
        • Processor: 1 socket licensed – (many cores)
        • RAM: 32 GB
        • User Limit: 15 users
      • Essentials
        • Processor: 2 socket licensed – (many cores)
        • RAM: 64 GB
        • User Limit: 25 users
      • Standard
        • Processor:  4 socket licensed* – (many cores)
        • RAM: 4TB
        • User Limit: unlimited
      • Datacenter
        • Processor: 4 socket licensed* – (many cores)
        • RAM: 4TB
        • User Limit: unlimited
      • R2
        • Processor: 4 socket licensed* – (many cores)
        • RAM: 4TB
        • User Limit: unlimited
  • Red Hat Enterprise Linux – 64-bit
    • Red Hat defines a logical CPU as any schedulable entity. So every core/thread in a multi-core/thread processor is a logical CPU
    • This information is by Product default.  Not the maximums of a fully licensed/subscribed REHL product.
    • Desktop
      • Processor: 1-2 CPU
      • RAM: 64 GB
    • Basic
      • Processor: 1-2 CPU
      • RAM: 16 GB
    • Enterprise
      • Processor: 1-8 CPU
      • RAM: 64 GB
    • NOTE: Red Hat would be happy to create custom subscriptions with yearly fees for other configurations to fit your specific environment. Please contact Red Hat to check on costs.

Okay great but what operating system platforms can I use with ANSYS R15?

ANSYS 15.0 Supported Platforms

ANSYS 15.0 is the currently released version. The specific operating system versions supported by ANSYS 15.0 products and License Manager are documented and posted at: 
   www.ansys.com/Support/Platform+Support.

ANSYS 15.0 includes support for the following:

  • Windows XP and Windows 7 (32-bit and 64-bit Professional and Enterprise versions)
  • Windows 8 (64-bit Professional and Enterprise versions)
  • Windows Server 2008 R2 Enterprise
  • Windows HPC Server 2008 R2 (64-bit)
  • Windows Server 2012 Standard version
  • Red Hat Enterprise Linux (RHEL) 5.7-5.9 and 6.2-6.4 (64-bit)
  • SUSE Enterprise Linux Server and Desktop (SLES / SLED) 11 SP1-SP2 (64-bit)

Not all applications are supported on all of these platforms. See detailed information, by product, at the URL noted above.

Final Thoughts

Approximate additional licensing cost to License Windows Server 2012 for a Quad Socket CPU motherboard:

  • Windows Server 2012 Foundation: Please call your OEM partner
  • Windows Server 2012 Essentials: $429 + User Client Access Licensing $$$
  • Windows Server 2012 Standard:  $ 1,500  + User Client Access Licensing $$$
  • Windows Server 2012 Datacenter: $ 10,500 + User Client Access Licensing $$$

References

 

I’m All Bound Up! : A Brief Discussion on What It Means To Be Compute Bound or I/O Bound

CUBE-HVPC-512-core-closeup2-1000hWe often get questions from our customers, both ANSYS product and CUBE HVPC users, on how to get their jobs to run faster. Should they get better disk drives or focus on CPU performance. We have found that disk drive performance often gets the blame when it is undeserving. To help figure this out, the first thing we do t is look at the output from an ANSYS Mechanical/Mechanical APDL run. Here is an email, slightly modified to leave the user anonymous, that shows our most recent case of this:

From: David Mastel – PADT, Inc.
To: John Engineering

Subject: Re: Re: Re: Relatively no difference between SSD vs. SAS2 15k RPM solve times?

Hi John, so I took a look at your ANSYS Mechanical output files – Based on the problem you are running the machine is Compute Bound. Here is the process on how I came to that conclusion. Additionally, at the end of this email I have included a few recommendations.

All the best,
David

Example 1:

The bottom section of an ANSYS out file for a 2 x 240GB
Samsung 843 SSD RAID0 array:

Total CPU time for main thread                    :      105.9 seconds
Total CPU time summed for all threads             :      119.1 seconds

Elapsed time spent pre-processing model (/PREP7)  :        0.0 seconds
Elapsed time spent solution – preprocessing       :       10.3 seconds
Elapsed time spent computing solution             :       83.5 seconds
Elapsed time spent solution – postprocessing      :        3.9 seconds
Elapsed time spent post-processing model (/POST1) :        0.0 seconds

Equation solver computational rate                :   319444.9 Mflops
Equation solver effective I/O rate                :    26540.1 MB/sec

Maximum total memory used                         :    48999.0 MB
Maximum total memory allocated                    :    54896.0 MB
Maximum total memory available                    :        128 GB

+—— E N D   D I S T R I B U T E D   A N S Y S   S T A T I S T I C S ——-+

*—————————————————————————*
|                                                                           |
|                       DISTRIBUTED ANSYS RUN COMPLETED                     |
|                                                                           |
|—————————————————————————|
|                                                                           |
|            Release 14.5.7         UP20130316         WINDOWS x64          |
|                                                                           |
|—————————————————————————|
|                                                                           |
| Database Requested(-db)   512 MB    Scratch Memory Requested       512 MB |
| Maximum Database Used     447 MB    Maximum Scratch Memory Used   4523 MB |
|                                                                           |
|—————————————————————————|
|                                                                           |
|        CP Time      (sec) =        119.01       Time  =  15:41:54         |
|        Elapsed Time (sec) =        117.000       Date  =  10/21/2013      |
|                                                                           |
*—————————————————————————*

For a quick refresher on what it means to be compute bound or I/O bound, let’s review what ANSYS Mechanical APDL tells you.

When looking at your ANSYS Mechanical APDL (this file is created during the solve in ANSYS Mechanical, since ANSYS Mechanical is just running ANSYS Mechanical APDL behind the scenes) out files; I/O bound and Compute bound are essentially the following:

I/O Bound:

  1. When Elapsed Time greater than Main Thread CPU time

Compute Bound:

  1. When Elapsed time is equals (approx) to Main Thread CPU time

Example 2:
CUBE HVPC – Samsung 843 – 240GB SATA III SSD 6Gbps – RAID 0
Total CPU time for main thread :  105.9  seconds
Elapsed Time (sec) :   117.000        seconds

CUBE HVPC – Hitachi 600GB SAS2 15k RPM – RAID 0
Total CPU time for main thread  :  109.0 seconds
Elapsed Time (sec) :   120.000       seconds

Recommendations for a CPU compute bound ANSYS server or workstation:

When computers are compute bound I normally recommend the following.

  1. Add faster processors – check!
  2. Use more cores for solve – I think you are in the process of doing this now?
  3. Instead of running SMP go DMP – unable to use DMP with your solve
  4. Add an Accelerator card (NVidia Tesla K20x). Which unfortunately does not help in your particular solving situation.

Please let me know if you need any further information:

David

David Mastel
Manager, Information Technology

Phoenix Analysis & Design Technologies
7755 S. Research Dr, Suite 110
Tempe, AZ  85284
David.Mastel@PADTINC.com

The hardware we have available changes every couple of months or so, but right now, for a user who is running this type of ANSYS Mechanical/Mechanical APDL model, we are recommending the following configuration:

CUBE HVPC Recommended Workstation for ANSYS Mechanical: CUBE HVPC w16-KGPU

CUBE HVPC
MODEL

COST

CHASIS

PROCESSOR

CORES

MEMORY

CUBE HVPC
w16i-kgpu

$ 16,164.00

Mid-Tower
(black quiet edition)

Dual Socket INTEL XEON
e5-2687 v2,
16 cores
@ 3.4GHz

16 = 2 x 8

128GB DDR3-1866 ECC REG

STORAGE

RAID
CONTROLLER

GRAPHICS

ACCELERATOR

OS

OTHER

4 x 240GB
SATA II SSD 6 Gbps
2 x 600GB
SAS2 6Gbps
(2.1 TB)

SMC LSI 2208
6Gbps

NVIDIA
QUADRO K5000

NVIDIA
TESLA K20

Microsoft
Windows 7
Professional 64-bit

ANSYS R14.5.7
ANSYS R15

Cube_logo_Trg_Wide_150w

Here are some references to the some basic information and the systems we recommend:
http://www.intel.com
http://en.wikipedia.org/wiki/CPU_bound
http://en.wikipedia.org/wiki/I/O_bound
http://www.supermicro.com
http://www.cube-hvpc.com
http://www.ansys.com
http://www.nvidia.com

This May Be the Fastest ANSYS Mechanical Workstation we Have Built So Far

The Build Up

Its 6:30am and a dark shadow looms in Eric’s doorway. I wait until Eric finishes his Monday morning company updates. “Eric check this out, the CUBE HVPC w16i-k20x we built for our latest customer ANSYS Mechanical scaled to 16 cores on our test run.” The left eyebrow of Eric’s slightly rises up. I know I have him now I have his full and complete attention.

Why is this huge news?

This is why; Eric knows and probably many of you reading this also know that solving differential equations, distributed, parallel along with using graphic processing unit makes our hearts skip a beat. The finite element method used for solving these equations is CPU intensive and I/O intensive. This is headline news type stuff to us geek types. We love scratching our way along the compute processing power grids to utilize every bit of performance out of our hardware!

Oh and yes a lower time to solve is better! No GPU’s were harmed in this tests. Only one NVIDIA TESLA k20X GPU was used during the test.

Take a Deep Breath and Start from the Beginning:

I have been gathering and hording years’ worth of ANSYS mechanical benchmark data. Why? Not sure really after all I am wanna-be ANSYS Analysts. However, it wasn’t until a couple weeks ago that I woke up to the why again. MY CUBE HVPC team sold a dual socket INTEL Ivy bridge based workstation to a customer out of Washington state. Once we got the order, our Supermicro reseller‘s phone has been bouncing of the desk. After some back and forth, this is how the parts arrive directly from Supermicro, California. Yes, designed in the U.S.A.  And they show up in one big box:

clip_image002[4]

Normal is as Normal Does

As per normal is as normal does, I ran the series of ANSYS benchmarks. You know the type of benchmarks that perform coupled-physics simulations and solving really huge matrix numbers. So I ran ANSYS v14sp-5, ANSYS FLUENT benchmarks and some benchmarks for this customer, the types of runs they want to use the new machine for. So I was talking these benchmark results over with Eric. He thought that now is a perfect time to release the flood of benchmark data. Well some/a smidge of the benchmark data. I do admit the data does get overwhelming so I have tried to trim down the charts and graphs to the bare minimum. So what makes this workstation recipe for the fastest ANSYS Mechanical workstation so special? What is truly exciting enough to tip me over in my overstuffed black leather chair?

The Fastest Ever? Yup we have been Changed Forever

Not only is it the fastest ANSYS Mechanical workstation running on CUBE HVPC hardware.  It uses two INTEL CPU’s at 22 nanometers. Additionally, this is the first time that we have had an INTEL dual socket based workstation continue to gain faster times on and up to its maximum core count when solving in ANSYS Mechanical APDL.

Previously the fastest time was on the CUBE HVPC w16i-GPU workstation listed below. And it peaked at 14 cores. 

Unfortunately we only had time before we shipped the system off to gather two runs: 14 and 16 cores on the new machine. But you can see how fast that was in this table.  It was close to the previous system at 14 cores, but blew past it at 16 whereas the older system actually got clogged up and slowed down:

  Run Time (Sec)
Cores Used Config B Config C Config D
14 129.1 95.1 91.7
16 130.5 99 83.5

And here are the results as a bar graph for all the runs with this benchmark:

CUBE-Benchmark-ANSYS-2013_11_01

  We can’t wait to build one of these with more than one motherboard, maybe a 32 core system with infinband connecting the two. That should allow some very fast run times on some very, very large problems.

ANSYS V14sp-5 ANSYS R14 Benchmark Details

  • Elements : SOLID187, CONTA174, TARGE170
  • Nodes : 715,008
  • Materials : linear elastic
  • Nonlinearities : standard contact
  • Loading : rotational velocity
  • Other : coupling, symentric, matrix, sparse solver
  • Total DOF : 2.123 million
  • ANSYS 14.5.7

Here are the details and the data of the March 8, 2013 workstation:

Configuration C = CUBE HVPC w16i-GPU

  • CPU: 2x INTEL XEON e5-2690 (2.9GHz 8 core)
  • GPU: NVIDIA TESLA K20 Companion Processor
  • GRAPHICS: NVIDIA QUADRO K5000
  • RAM: 128GB DDR3 1600Mhz ECC
  • HD RAID Controller: SMC LSI 2208 6Gbps
  • HDD: (os and apps): 160GB SATA III SSD
  • HDD: (working directory):6x 600GB SAS2 15k RPM 6Gbps
  • OS: Windows 7 Professional 64-bit, Linux 64-bit
  • Other: ANSYS R14.0.8 / ANSYS R14.5

Here are the details from the new, November 1, 2013 workstation:

Configuration D = CUBE HVPC w16i-k20x

  • CPU: 2x INTEL XEON e5-2687W V2 (3.4GHz)
  • GPU: NVIDIA TESLA K20X Companion Processor
  • GRAPHICS: NVIDIA QUADRO K4000
  • RAM: 128GB DDR3 1600Mhz ECC
  • HDD: (os and apps): 4 x 240GB Enterprise Class Samsung SSD 6Gbps
  • HD RAID CONTROLLER: SMC LSI 2208 6Gbps
  • OS: Windows 7 Professional 64-bit, Linux 64-bit
  • Other: ANSYS 14.5.7

You can view the output from the run on the newer box (Configuration D) here:

Here is a picture of the Configuration D machine with the info on its guts:

clip_image006[4]clip_image008[4]

What is Inside that Chip:

The one (or two) CPU that rules them all: http://ark.intel.com/products/76161/

Intel® Xeon® Processor E5-2687W v2

  • Status: Launched
  • Launch Date: Q3’13
  • Processor Number: E5-2687WV2
  • # of Cores: 8
  • # of Thread: 16
  • Clock Speed: 3.4 GHz
  • Max Turbo Frequency: 4 GHz
  • Cache:  25 MB
  • Intel® QPI Speed:  8 GT/s
  • # of QPI Link:  2
  • Instruction Se:  64-bit
  • Instruction Set Extension:  Intel® AVX
  • Embedded Options Available:  No
  • Lithography:  22 nm
  • Scalability:  2S Only
  • Max TDP:  150 W
  • VID Voltage Range:  0.65–1.30V
  • Recommended Customer Price:  BOX : $2112.00, TRAY: $2108.00

The GPU’s that just keep getting better and better:

Features

TESLA C2075

TESLA K20X

TESLA K20

Number and Type of GPU

FERMI

Kepler GK110

Kepler GK110

Peak double precision floating point performance

515 Gflops

1.31 Tflops

1.17 Tflops

Peak single precision floating point performance

1.03 Tflops

3.95 Tflops

3.52 Tflops

Memory Bandwidth (ECC off)

144 GB/sec

250 GB/sec

208 GB/sec

Memory Size (GDDR5)

6GB

6GB

5GB

CUDA Cores

448

2688

2496

clip_image012[4]

Ready to Try one Out?

If you are as impressed as we are, then it is time for you to try out this next iteration of the Intel chip, configured for simulation by PADT, on your problems.  There is no reason for you to be using a CAD box or a bloated web server as your HPC workstation for running ANSYS Mechanical and solving in ANSYS Mechanical APDL.  Give us a call, our team will take the time to understand the types of problems you run, the IT environment you run in, and custom configure the right system for you:

http://www.padtinc.com/products/hardware/cube-hvpc,
email: garrett.smith@padtinc.com,
or call 480.813.4884

Part 2: ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

AMD Opteron 6308, INTEL XEON e5-2690 & INTEL XEON e5-2667V2 Comparison using ANSYS FLUENT 14.5.7

Note: The information and data contained in this article was complied and generated on September 12, 2013 by PADT, Inc. on CUBE HVPC hardware using FLUEN 14.5.7.  Please remember that hardware and software change with new releases and you should always try to run your own benchmarks, on your own typical problems, to understand how performance will impact you.

By David Mastel

Due to the response to the original article on this subject,  I thought it would be good to do a quick follow-up using one of our latest CUBE HVPC builds. Again, the ANSYS Fluent standard benchmarks were used in garnering the stats on this dual socket INTEL XEON e5-2667V2 configuration.

CUBE HVPC Test configurations (Same as in last comparison)

  • Server 1: CUBE HVPC c16
  • CPU: 4, AMD Opteron 6308 @ 3.5GHz (Quad Core)
  • Memory: 256GB (32x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • Hardware RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  •  OS: Linux 64-bit / Kernel 2.6.32-358.18.1.e16.x86_64
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI
  • HCA: SMC AOC-UIBQ-M2 – QDR Infiniband
    • The IB card installed however solves were run distributed locally
  • Switch: MELLANOX IS5023 Non-Blocking 18-port switch

Server 2: CUBE HVPC c16i (Intel server from last comparison)

  • CPU: 2, INTEL XEON e5-2690 @ 2.9GHz (Octa Core)
  • Memory: 128GB (16x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Windows 7 Professional 64-bit
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI

Server 3: CUBE HVPC c16ivy (New “Ivy” based Intel server)

  • CPU: 2, INTEL XEON e5-2667V2 @ 3.3 (Octa Core)
  • Memory: 128GB (16x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Linux 64-bit / Kernel 2.6.32-358.18.1.e16.x86_64
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI
  • HCA: SMC – QDR Infiniband
    • The IB card installed however solves were run distributed locally

ANSYS FLUENT 14.5.7 Performance using the ANSYS FLUENT Benchmark suite provided by ANSYS, Inc.

ANSYS Fluent Benchmark page link:http://www.ansys.com/Support/Platform+Support/Benchmarks+Overview/ANSYS+Fluent+Benchmarks

Release ANSYS FLUENT 14.5.7 Test Cases
(20 Iterations each)

  • Reacting Flow with Eddy Dissipation Model (eddy_417k)
  • Single-stage Turbomachinery Flow (turbo_500k)
  • External Flow Over an Aircraft Wing (aircraft_2m)
  • External Flow Over a Passenger Sedan (sedan_4m)
  • External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m)
  • External Flow Over a Truck Body 14m (truck_14m)

Here are the results for all three machines, total and average time:

Intel-AMD-Flunet-Part2-Chart1Intel-AMD-Flunet-Part2-Chart2

 

Summary: Are you sure? Part 2

So I didn’t have to have the “Are you sure?” question with Eric this time and I didn’t bother triple checking the results because indeed, the Ivy Bridge-EP Socket 2011 is one fast CPU! That combined with a 0.022 micron manufacturing process  the data speaks for itself. For example, lets re-dig into the data for the External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m) benchmark and see what we find:

Intel-AMD-FLUENT-Details

 

 

 

 

 

 

 

 

 

 

 

Intel-AMD-FLUENT-summary

 

 

 

 

 

 

 

 

 

 

 

Current Pricing of INTEL® and AMD® CPU’s

Here is the up to the minute pricing for each CPU’s. I took these prices off of NewEgg and IngramMicro’s website. The date of the monetary values was captured on October 4, 2013.

Note AMD’s price per CPU went up and the INTEL XEON e5-2690 went down. Again, these prices based on today’s pricing, October 4, 2013.

AMD Opteron 6308 Abu Dhabi 3.5GHz 4MB L2 Cache 16MB L3 Cache Socket G34 115W Quad-Core Server Processor OS6308WKT4GHKWOF

  •  $501 x 4 = $2004.00

Intel Xeon E5-2690 2.90 GHz Processor – Socket LGA-2011, L2 Cache 2MB, L3 Cache 20 MB, 8 GT/s QPI

  • $1986.48 x 2 = $3972.96

Intel Xeon E5-2667V2 3.3 GHz Processor – Socket LGA-2011, L2 Cache 2MB, L3 Cache 25 MB, 8 GT/s QPI,

  • $1933.88 x 2 = $3867.76

REFERENCES:
http://www.ingrammicro.com
http://www.newegg.com

INTEL XEON e5-2667V2
http://ark.intel.com/products/75273/Intel-Xeon-Processor-E5-2667-v2-25M-Cache-3_30-GHz

INTEL XEON e5-2690
http://ark.intel.com/products/64596/

AMD Opteron 6308
http://www.amd.com/us/Documents/Opteron_6300_QRG.pdf

http://en.wikipedia.org/wiki/Double-precision_floating-point_format

http://en.wikipedia.org/wiki/Central_processing_unit#Integer_range

http://en.wikipedia.org/wiki/Floating_point

STEP OUT OF THE BOX, STEP INTO A CUBE

PADT offers a line of high performance computing (HPC) systems specifically designed for CFD and FEA number crunching aimed at a balance between cost and performance. We call this concept High Value Performance Computing, or HVPC. These systems have allowed PADT and our customers to carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software.

Let CUBE HVPC by PADT, Inc. quote you a configuration today!

 

ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

AMD Opteron 6308 & INTEL XEON e5-2690 Comparison using ANSYS FLUENT 14.5.7

Note: The information and data contained in this article was complied and generated on September 12, 2013 by PADT, Inc. on CUBE HVPC hardware using FLUEN 14.5.7.  Please remember that hardware and software change with new releases and you should always try to run your own benchmarks, on your own typical problems, to understand how performance will impact you.

A potential customer of ours was interested in a CUBE HVPC mini-cluster. They requested that I run benchmarks and garner some data on a two CPU’s. The CPU’s were benchmarked on two of our CUBE HVPC systems. One mini-cluster has dual INTEL® XEON e5-2690 CPU’s and another mini-cluster has quad AMD® Opteron 8308 CPU’s. The benchmarking was only run on a single server using a total of 16 cores on each machine. The same DDR3-1600 ECC Reg RAM, Supermicro LSI 2208 RAID Controller and Hitachi SAS2 15k RPM hard drives were used on each system.

clip_image002clip_image004clip_image006clip_image008

CUBE HVPC Test configurations:

Server 1: CUBE HVPC c16
  • CPU: 4, AMD Opteron 6308 @ 3.5GHz (Quad Core)
  • Memory: 256GB (32x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • Hardware RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Linux 64-bit / Kernel 2.6.32-358.18.1.e16.x86_64
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI
  • HCA: SMC AOC-UIBQ-M2 – QDR Infiniband
    • The IB card installed however solves were run distributed locally
  • Stack: RDMA 3.6-1.el6
  • Switch: MELLANOX IS5023 Non-Blocking 18-port switch
Server 2: CUBE HVPC c16i
  • CPU: 2, INTEL XEON e5-2690 @ 2.9GHz (Octa Core)
  • Memory: 128GB (16x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Windows 7 Professional 64-bit
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI

ANSYS FLUENT 14.6.7 Performance using the ANSYS FLUENT Benchmark suite provided by ANSYS, Inc.

The models we used can be downloaded from the ANSYS Fluent Benchmark page link: http://www.ansys.com/Support/Platform+Support/Benchmarks+Overview/ANSYS+Fluent+Benchmarks

Release ANSYS FLUENT 14.5.7 Test Cases  (20 Iterations each):
  • Reacting Flow with Eddy Dissipation Model (eddy_417k)
  • Single-stage Turbomachinery Flow (turbo_500k)
  • External Flow Over an Aircraft Wing (aircraft_2m)
  • External Flow Over a Passenger Sedan (sedan_4m)
  • External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m)
  • External Flow Over a Truck Body 14m (truck_14m)
Chart 1: Total Wall Clock Time in seconds: (smaller bar is better)

clip_image011

Chart 2: Average wall-clock time per iteration in seconds: (smaller bar is better)

clip_image015

 

Summary:

Are you sure?

That was the question Eric proposed to me after he reviewed the data and read this blog article before posting. I told him “yes I am sure data is data, and I even triple checked.” I basically re-ran several of the benchmarks to see if the solve times came out the same on these two CUBE HVPC workstations. I went on to tell Eric , “For example, lets dig into the data for the External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m) benchmark and see what we find.”

Quad socket Supermicro motherboard

4 x 4c AMD Opteron 6308 @3.5GHz

Dual socket Supermicro motherboard

2 x 8c INTEL e5-2690 @2.9GHz

clip_image002[1] clip_image004[1]

The INTEL XEON e5-2690 INTEL CPU dual socket motherboard is impressive; it may have been on the Top500 list of some of the fastest computers in the world ten years ago. Anyways, so after each solve I captured the solve data and as you can see below. The AMD Opteron wall clock time was faster than the INTEL XEON wall clock time.

So why did the AMD Opteron 6308 CPU pull away from the INTEL for the ANSYS FLUENT solve times? Lets take a look at couple of reasons why this happened. I will let you make your own conclusions.

  • Clock Speed, but would a 10.4GHz difference in total CPU speed make a 100% speedup in ANSYS Fluent wall-clock times?
  • Theoretical total of:
  • AMD® OPTERON 6308 = 16 x 3.5GHz = 56.0 GH
  • INTEL® XEON e5-2690 = 16 x 2.9GHz – 46.4 GHz
  • The floating point argument? The tic and tock of the great CPU saga continues.
  • At this moment in eternity, it is a known fact that the AMD Opteron 6308 and many of its brothers, have one floating point unit per two integer cores. INTEL has one integer core per one floating point core. However what this means to ANSYS CFD users in my MIS/IT simpleton terms is the AMD CPU was simply able to handle and process more data in this example.
  • It’s possible that there were more integer calculations required than floating point? If that is the case then the AMD CPU would have had eight pipelines for integer calculations. The AMD Opteron is able to process four floating point pipelines. While the INTEL CPU can process eight floating point pipelines.

Let us look at the details of what is on the motherboards as well.  4 data paths vs 2 can make a difference:

Dual socket Supermicro motherboard

2 x 8c INTEL e5-2690 @2.9GHz

Quad socket Supermicro motherboard

4 x 4c AMD Opteron 6308 @3.5GHz

Processor Technology 32-Naometer 32-Naometer SOI (silicon-on-insulator) technology
HyperTransport™ Technology Links

Quick Path Interconnect Links

Two links at up to 8GT/s per link up to 16 GB/s direction peak bandwidth per port Four x16 links at up to 6.4GT/s per link
Memory Integrated DDR3 memory controller – Up to 51.2 GB/s memory bandwidth per socket
Number of Channels and Types of Memory Four links at up to 51.2GB/s per link Four x16 links at up to 6.4GT/s per link
Number of Channels and Types of Memory Quad channel support Quad channel support
Packaging LGA2011-0 Socket G34 – 1944-pin organic Land Grid Array (LGA)
Current pricing of the CPU’s

Here is the up to the minute pricing for each CPU’s. I took these prices off of NewEgg and IngramMicro’s website. The date of the monetary values was captured on September 12, 2013.

  • AMD Opteron 6308 Abu Dhabi 3.5GHz 4MB L2 Cache 16MB L3 Cache Socket G34 115W Quad-Core Server Processor OS6308WKT4GHKWOF
    • $499.99 x 4 = $1999.96
  • Intel Xeon E5-2690 2.90 GHz Processor – Socket LGA-2011, L2 Cache 2MB, L3 Cache 20 MB, 8 GT/s QPI,
    • $2010.02 x 2 = $4020.40

STEP OUT OF THE BOX,
STEP INTO A CUBE

PADT offers a line of high performance computing (HPC) systems specifically designed for CFD and FEA number crunching aimed at a balance between cost and performance. We call this concept High Value Performance Computing, or HVPC. These systems have allowed PADT and our customers to carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software.

Let CUBE HVPC by PADT, Inc. quote you a configuration today!

Why do my ANSYS jobs take days and weeks to finish? Well it depends…

Real World Lessons on How to Minimize Run Time for ANSYS HPC

Recently I had a VP of Engineering start a phone conversation with me that went something like this. “Well Dave, you see this is how it is. We just spent a truckload of money on a 256 core cluster and our solve times are slower now than with our previous 128 core cluster. What the *&(( is going on here?!”

I imagine many of us have heard similar stories or even received the same questions from our co-workers, CEO’s & Directors. I immediately had my concerns and I truly thought carefully as to what I should say next. I recalled a conversation I had with one of my college professors. He had told me that when I find myself stepping into gray areas that a good start to the conversation is to say. “Well it depends…”

Guess what, that is exactly what I said. I said “Well it depends…” followed by going into explaining to him two fundamental pillars of computer science that have plagued most of us since computers were created: I said “Well you may be, CPU bound (compute bound) or I/O bound. He told me that they had paid a premium for the best CPU’s on the market and some other details about the HPC cluster. Garnering some of other details about the cluster my hunch was that his HPC cluster may actually be I/O bound.

I/O Bound

Basically this means that your cluster’s $2,000 worth of CPU’s are basically stalled out and sitting idle. The CPU’s are waiting for new data to process and move on. I also briefly explained that his HPC cluster may be compute bound. I quickly reassured him that the likelihood of his HPC cluster being compute bound was about 10% possible and very unlikely. I knew the specifications on the CPU’s in this HPC cluster and the likelihood that they were the issue of his ANSYS slow run times was low on my radar. These literally were the latest and greatest CPU’s ever to hit this planet (at that moment in time). So, let me step back a minute, to refresh our memories on what it means when a system is compute bound.

Compute Bound

Being compute bound means that the HPC cluster’s CPU’s were sitting at 99 or 100% for long periods of time. When this happens very bad things begin to happen to your HPC cluster. CPU requests to peripherals are delayed or infinitely lost to the ether. The HPC cluster may become unresponsive and even lock up.

All I could hear was silence on the other end. “Dave, I get it, I understand, please find the problem and fix our HPC cluster for us. ” I happily agreed to help out! I concluded our phone conversation asking that he send me the specific details, down to the nuts and bolts of the hardware! I also requested operating system and software that was installed and used on the 256 core HPC cluster.

What NOT to do when configuring an ANSYS Distributed HPC cluster.

Seeking that perfect balance!

After a quick NDA signing, a few dollars exchange and a sprinkle of some other legal things that lawyers get excited about. I set out to discover the cause. After reviewing the information provided to me I almost immediately saw three concerns:

To interconnect what?

Let Merriam-Webster describe it:

Definition of INTERCONNECT

transitive verb
: to connect with one another
intransitive verb
: to be or become mutually connected

— in·ter·con·nec·tion noun
— in·ter·con·nec·tiv·i·ty noun

1. The systems are interconnected with a series of wires.
2. The lessons are designed to show students how the two subjects interconnect
3.  A series of interconnecting stories

First Known Use of INTERCONNECT: 1865

Concern numeral Uno!!! Interconnect me

Though the company’s 256 core HPC cluster had a second dedicated GigE interconnect. Distributed ANSYS is highly bandwidth and latency bound often requiring more bandwidth than a dedicated NIC (Network Interface Card) may provide. Yes, the dedicated second GigE card interconnect was much better than trying to use a single NIC for all of the network traffic which would also include the CPU interconnect. I did have a few of the MAPDL output files from the customer that I could take a peek at. After reviewing the customer output files it became fairly clear that interconnect communication speeds between the 16 core x 16 server in the cluster was not adequate. The master Message Parsing Interface (MPI) process that Distributed ANSYS uses requires a high amount of bandwidth and low latency for proper distributed scaling to the other processes. Theoretically the data bandwidth between cores solving local to the machine will be higher than the bandwidth traveling across the various interconnect methods (see below). ANSYS, Inc. recommends Infiniband for CPU interconnect traffic. Here are a couple of reasons why they recommend this. See how the theoretical data limits increase going from Gigabit Ethernet up to FDR Infiniband.

Theoretical lane bandwidth limits for:

  • Gigabit Ethernet (GigE): ~128MB/s
  • Signal Data Rate (SDR): ~ 328 MB/s
  • Double Data Rate (DDR): ~640 MB/s
  • Quad Data Rate (QDR): ~1,280 MB/s
  • Fourteen Data Rate (FRD): ~1,800 MB/s

GEEK CRED: A few years ago companies such as MELLANOX started aggregating the Infiniband channels. The typical aggregate modifiers are 4X or even a 12X increase. So for example the 4X QDR Infiniband switch and cards I use at PADT and recommended to this customer, would have a (4X 10Gbit/s) or 5,120 MB/s of throughput! Here is a quick video that I made of a MELLANOX IS5023 18-port 4X QDR full bi-directional switch in action:

This is how you do it with a CUBE HVPC! MAPDL output file from our CUBE HVPC w16i-GPU workstation. This is running the ANSYS industry benchmark V14sp-5. I wanted to show the communication speeds between the master MPI process and the other solver processes to see just how fast the solvers can communicate. With a peak communication speed of 9593 MB/s this CUBE HVPC workstation rocks!

Chassis Profile 4u standard depth or rackmountable
CPU 1 x One Dual Socket
Chipset INTEL 602 Chipset
Processors 2 x INTEL e5-2690 @ 2.9GHz
Cores 2 x 8
Memory 128GB DDR3-1600 ECC Reg RAM
OS Drives 2 x 2.5″ SATA III 256GB SSD Drives RAID 0
DATA/HOME Hard Disk Drives 4 x 3.5″ SAS2 600GB 15kRPM drives RAID 0
SAS RAID (Onboard, Optional) RAID 0 (OS RAID)
SAS RAID (RAID card, Optional) LSI 2208 (DATA VOL RAID)
Networking (Onboard) Dual GigE (Intel i350)
Video (Onboard) NVIDIA QUADRO K5000
GPU (Optional) NVIDIA TESLA K2000
Operating System Windows 7 Professional 64-bit
Optional Installed Software ANSYS 14.5 Release

imageStats for CUBE HVPC Model Number : w16i-KGPU

Learn more about this and other CUBE HVPC systems here.

Concern #2: Using RAID 5 Array for Solving Disk Volume

The hard drives that are used for I/O during a solve, the solving volume, were configured in a RAID 5 hard disk array. Some sample data below showing the minimum write speed of a similar RAID 5 array. These are speeds that are better off seen in your long-term storage volume not on your solving/working directory.

LSI 2008 HITACHI ULTASTAR 15K600
Qty / Type / Size / RAID Qty 8 x 3.5″ SAS2 15k 600GB RAID 5
TEST # p1
min Read 204 MB/s
max Read 395  MB/s
Avg Read N/A
min Write 106 MB/s
max Write 243.5 MB/s
Avg Write N/A
Access Time N/A

Concern #3: Using RAID 1 for Operating System

The hard drive array for the OS was configured in a RAID 1 configuration. For a number cruncher server having RAID 1 is not necessary. If you absolutely have to have RAID 1. Please spend the extra money and go to a RAID 10 configuration.

I really don’t want to get into the seemingly infinite details of hard drives speeds, latency. Or even begin to explain to you if I should be using an onboard RAID Controller, dedicated RAID controller or a software RAID configuration completed within the OS. There is so much information available on the web that a person gets overloaded. When it comes to Distributed ANSYS, think fast hard drives and fast RAID controllers. Start researching your hard drives and RAID controllers using the list provided below. Again, only as a suggestion! I have listed the drives in order based on a very scientific and nerdy method. If I saw a pile of hard drives, what hard drive would I reach for first?

  1. I prefer using SEAGATE SAVVIO or HITACHI enterprise class drives. (Serial Attached SCSI) SAS2 6Gbit/s 3.5”15,000 RPM spindle drives (best bang for your dollar of space, more read & write heads over a 2.5” spindle hard drive).
  2. I prefer using Micron or INTEL SSD enterprise class SSD. SATA III Solid State Drive 6 Gbit/s (SSD sizes have increased however you will need more of these for an effective solving array and they still are not cheap).
  3. I prefer using the SEAGATE SAVVIO 2.5” enterprise class spindle drives. SAS2 6Gbit/s 2.5” 15,000 RPM spindle drives (if you need a small form factor, fast and additional storage. But the 2.5” drives do not have as many read & write heads as a 3.5” drive. In a situation where I need to slam 4 or 8 drives into a tight location.
    Right now, SEAGATE SAVVIO 2.5” are the way to go!  Here is a link to a data sheet.
    Another similar option is the HITACHI ULTRASTAR 15k600.  It’s spec sheet is here.
  4. SATA II 3Gbit/s 3.5” 7,200 RPM spindle drives are also a good option.  I prefer Western Digital RE4 1TB or 2TB drives. There spec sheet is here.

LSI 2108 RAID Controller and Hard Drive data/details:

image

How a CUBE HVPC System from PADT, Inc. balanced out this configuration and how much would it cost?

I quoted out the below items, installed and out the door (including my travel expenses, etc.) at: $30,601

The company ended up going with their own preferred hardware vendor. Understandable, one good thing is that we are now on the preferred purchasing supplier list. They were greatly appreciative of my consulting time and indicated that they will request a “must have” quote for a CUBE HVPC system the next refresh in a year. They want to go over 1,000 cores the next refresh.

I recommended that they install the following into the HPC cluster based: (note they already had blazing fast hard drives)

  • 16 – Supermicro AOC-S2208L-H8iR LSI 2208 RAID controller cards.
  • 32 – Supermicro CBL-0294L-01 cabling to connect the LSI RAID cards to the SAS2 hard drives.
  • 1 – MELLANOX IS5023 18-port 4X QDR Infiniband switch
  • 16 – Supermicro AOC-UIBQ-M2 Dual port 4X QDR Infiniband card
  • 16 – Supermicro QSFP Infiniband cables in a couple different lengths

A special thanks and shout out to Sheldon Imaoka of ANSYS, Inc. for inspiring me to write this blog article!

To GPU or Not, This is No Longer The Question…

imageExecutive Summary

Without losing you quickly due to fabulous marketing and product information. I thought it prudent if I get to the answer of my question as quickly as possible. What was the question? Should I invest in a companion processor and an ANSYS HPC Pack? Yes, now is the time.

  • If your current workstation is beginning to show its age and you are unable to purchase new hardware.
  • Get your critical results fast. It is painful waiting upwards of 50 hours for your solution to solve.
  • Assign a dollar value to your current frustration level.
  • Current pricing for Nvidia TESLA C2075 is around $2500. Current pricing for the NVidia Quadro 6000 is around $5,000.
 

“Keeping It Real”

Matt Sutton our Lead Software Development Engineer at PADT, Inc. was solving a very large ANSYS MAPDL acoustic ultrasound wave propagation model. The model labored over 12 hours to solve on Matt’s older Dell Precision 690 8 core workstation. We loaded the model onto our CUBE HVPC w8i with the NVidia Tesla C2075 and it solved the model in 50 minutes. That is a 15x speedup for Matt’s particular model!

 

Benchmarks

I started off with our benchmark assault using an industry standard ANSYS benchmark for HPC. This is a benchmark that NVidia requested we use for testing our TESLA C2075. The next benchmark that I used was an internal PADT, Inc. modal coupler benchmark.

Nvidia V14sp-5 ANSYS R14 Benchmark Matrix (Sparse solver, 2100k)

 

CUBE HVPC – w12a-GPU – 64GB

 

CONFIG A

               

64GB

#

Cores

Config A: Win v14 SMP In-Core

Config A: Win v14 SMP In-Core & GPU

Config A: Win v14 DIST In-Core

Config A: Win v14 DIST & GPU

Config A: Linux v14 SMP In-Core

Config A: Linux v14 SMP In-Core & GPU

Config A: Linux v14 DIST In-Core

Config A: Linux v14 DIST & GPU

   

speedup

  2 1388.20 463.40 1425.50 294.40 1324.40 466.30 1312.30 292.10      
  4 870.00 447.20 855.00 232.40 817.00 422.00 975.40 221.80      
  6 670.60 440.20 588.00 245.10 625.50 394.00 601.60 208.10      
  8 594.30 439.10 605.00 222.00 480.10 383.90 542.30 181.10      
  10 539.50 426.90 435.00 283.40 491.30 392.80 396.90 220.80      
  12 538.40 437.10 402.20 211.10 480.10 394.80 343.80 183.10     2x
                         
CUBE HVPC – w8i-GPU – 64GB CONFIG B                
64GB # Cores Config B: Win v14 SMP In-Core Config B: Win v14 SMP In-Core & GPU Config B: Win v14 DIST In-Core Config B: Win v14 DIST & GPU Config B: Linux v14 SMP In-Core Config B: Linux v14 SMP In-Core & GPU Config B: Linux v14 DIST In-Core Config B: Linux v14 DIST & GPU      
  2 1078.50 361.00 1111.90 235.70 1036.00 350.20 1072.90 250.60      
  4 645.50 330.70 652.30 185.50 608.10 312.00 790.90 193.20      
  6 494.30 322.70 458.30 233.70 464.90 303.20 502.70 178.70      
  8 438.50 328.30 462.20 230.40 406.10 304.80 451.20 166.00     2.5x
                         
CUBE HVPC – w16i-GPU – 128GB

 

                 
128 GB # Cores Config C: Linux v14 SMP In-Core & GPU Config C: Linux v14 DIST & GPU                  
  2 296.6 208.10                  
  4 254.4 160.70                  
  6 254.2 164.20                  
  8 239.9 138.20                  
  10 238.6 159.70                  
  12 246.3 129.60                  
  14 237.6 129.1                  
  16 248.9 130.5                  
image

 

CUBE HVPC – PADT, Inc. – Coupling Modal Benchmark (PCG solver, ~1 Million DOF, 50 modes)

 

CUBE HVPC w12i w/GPU

ANSYS 13.0

Shared Memory Parallel INTEL XEON 2×6 @3.47GHz /144GB of RAM w12i-GPU
  Processors Time Spent Computing Solution (secs) Date/Individuals Initials:
  2 5416.4 11/7/2011 – DRJM
  10+GPU 1914.2 (incore) 11/8/2011 – DRJM
  12+GPU 1946 (incore) 11/8/2011 – DRJM

CUBE HVPC w8i w/GPU

ANSYS R14

Shared Memory Parallel INTEL XEON 2×4 @2.8GHz /64GB of RAM w8i-GPU
  6+GPU 3659.4 (out of core) 4/11/12 – DRJM
  8+GPU 3686.7 (out of core) 4/11/12 – DRJM

CUBE HVPC w16i w/GPU

ANSYS R14

Shared Memory Parallel INTEL XEON e5-2690 2×8 @2.9GHz /128GB of RAM W16i-GPU
  14+GPU 2113 (incore) 4/18/12 – DRJM
  16+GPU 1533.9 (incore) 4/18/12 – DRJM

 

image

Summary, Debates, Controversy & Conclusions

One question that I hear often is this: “What operating systems is faster. Linux or Windows?”. My typical response will begin with “Well, it depends…” However, what the data illustrates with the two independent CPU workstations as well as Operating Systems. The ANSYS benchmarks were all performed in the same relative time frame. The only exception was the ANSYS modal analysis benchmark.

Are you ready for the answer? Here it is…yes Linux is faster than windows! Even if you come from the AMD CPU or INTEL CPU side of the tracks. Linux is faster! No big discovery right? I know we all knew this already but we were maybe afraid to ask by Operating system as well as CPU manufacture how much? Well with our 2.1 million degree of freedom Nvidia benchmark. Ummm, not very much as the data clearly indicates. However once again, the Linux based OS does give you a system performance advantage! If you use AMD or INTEL it is still faster. The next question that I hear next is: “What processor is faster AMD or INTEL?”. My typical response will begin with “Well, it depends…”

You did buy the NVidia TESLA C2075 GPU, okay this is great news! However, you figured it best that you keep it safe, stay the same, and continue to use Shared Memory Parallel solve method. As you ponder further on the speed up values. You will see once again that the best results are going to be found on the Linux Operating System and when you choose to solve in Distributed Memory Parallel mode. The AMD based CPU had the best speedup values when comparing the workstation against the two solve modes with a 2.2x’s speed up.

Lets begin to unpack this data even more and see if you can come up with your own judgments. This is where things might start to get controversial…so hang on.

Windows, Linux, AMD or INTEL

Time Spent Computing The Solution

  • ANSYS R14 Distributed Memory Parallel Results: (Incore)
    • LINUX 64-bit:
      • One minute forty-seven seconds faster on 2x AMD CPU (343.80 seconds vs. 451.20 seconds)
    • Windows 7 Professional 64-bit:
      • One minute faster on 2x AMD CPU (402.20 seconds vs. 462.20 seconds).
  • ANSYS R14 Shared Memory Parallel Results: (Incore)
    • LINUX 64-bit
      • One minute fourteen seconds faster on 2x INTEL XEON CPU (406.10 seconds vs. 480.10 seconds)
    • Windows 7 Professional 64-bit is sixty seconds faster solve time on AMD based CPU
      • One minute forty seconds faster on 2x INTEL XEON CPU (438.50 vs. 538.40 seconds)

The Nvidia TESLA C2075 GPU and ANSYS R14 – GPU to the rescue!

  • ANSYS R14 Distributed Memory Parallel with Nvidia TESLA C2075 GPU Assist Results
    • LINUX 64-bit:
      • Seventeen seconds faster on 2x INTEL XEON CPU (166 seconds vs. 183.10 seconds)
      • 129.1 seconds was the fastest overall solve time achieved for any operating system on this benchmark!
      • 2 x INTEL XEON E5-2690 – fourteen cores with ANSYS R14 DMP w/GPU assist
    • Windows 7 Professional 64-bit:
      • Nineteen seconds faster on 2x AMD CPU (211.10 seconds vs. 230.40 seconds).
      • 211 seconds was the fastest solve time achieved using Windows for this benchmark!
      • 2 x INTEL XEON X5560 – eight cores with ANSYS R14 DMP w/GPU assist
  • ANSYS R14 Shared Memory Parallel with Nvidia TESLA C2075 GPU Assist Results
    • LINUX 64-bit:
      • Seventeen seconds faster on 2x INTEL XEON CPU (166 seconds vs. 183.10 seconds)
      • Config C: 237.6 seconds was the fastest overall solve time achieved for any operating system on this benchmark!
      • 2 x INTEL XEON E5-2690 – fourteen cores with ANSYS R14 SMP w/GPU assist
    • Windows 7 Professional 64-bit:
      • Nineteen seconds faster on 2x AMD CPU (211.10 seconds vs. 230.40 seconds).
      • 211 seconds was the fastest solve time achieved using Windows for this benchmark!
      • 2 x INTEL XEON X5560 – eight cores with ANSYS R14 SMP w/GPU assist

ANSYS R14 Distributed Memory Parallel with GPU vs. ANSYS R14 Shared Memory Parallel with GPU speed up results:

Distributed Memory Parallel (DMP) or Shared Memory Parallel (SMP)

  • 2 x AMD Opteron 4184 based workstation vs. “DMP or SMP”
    • 2.2x’s speedup on Linux Operating System (183.10 seconds vs. 394.80 seconds)
    • 2x’s speedup on Windows Operating System (211.10 seconds vs. 437.10 seconds)
  • 2 x INTEL XEON E5-2690 based CPU vs. “DMP or SMP”
    • 1.9x’s speedup on Linux Operating System (129.41 seconds vs. 237.6 seconds)

With or Without GPU? ANSYS R14 Distributed Memory Parallel speed up results

Distributed Memory Parallel (DMP) with GPU vs. Distributed Memory Parallel without GPU

  • AMD Opteron 4184 based workstation
    • 2x’s speedup Linux Operating System (183.10 seconds vs. 343.80 seconds)
    • 2x’s speedup on Windows Operating System (211.10 seconds vs. 402.2 seconds)
  • INTEL XEON x5560 based workstation
    • 2.5x’s speedup on Linux Operating System (166 seconds vs. 451.20 seconds)
    • 1.4x’s speedup on Windows Operating System (230.40 seconds vs. 462.20 seconds)

Some Notes:

  • ANSYS Base License – unlocks up to 2 CPU Cores
  • ANSYS HPC Pack – unlocks up to 8 CPU Cores and GPU
  • The total amount of system RAM you have affects your Distributed solve times. A minimum of 48GB of RAM is recommended.
  • The processing speed of your CPU affects your Shared Memory Parallel solve times.
  • Model limits for direct will depend on largest front sizes
    • 6M DOF for 6GB Tesla C2075 and Quadro 6000
  • Model limits for iterative PCG and JCG
    • 3M DOF for 6GB Tesla C2075 and Quadro 6000

 

image

Hardware Specs of Workstations:

CONFIG A – CUBE HVPC w12a-GPU

The AMD® based CUBE HVPC w12a FEA Simulation Workstation:

  • CPU: 2x AMD Opteron 4184 (2.8GHz Ghz 6 core)
  • GPU: NVIDIA TESLA C2075 Companion Processor
  • RAM: 64GB DDR3 1333Mhz ECC
  • HDD: (os and apps): 450GB WD Velociraptor 10k
  • HDD: (working directory): 6 x 1TB WD RE4 drives using LSI 2008 RAID 0
  • OS: Windows 7 Professional 64-bit, Linux 64-bit
  • OTHER: ANSYS R14, latest NVIDIA TESLA Drivers

(Here is a picture of Sam with two full
tower CUBE HVPC workstations getting
ready to ship out!)

CONFIG B – CUBE HVPC w8i-GPU

The INTEL® based CUBE HVPC w8i FEA Simulation Workstation:

  • CPU: 2x INTEL XEON x5560 (2.8GHz 4 core)
  • GPU: NVIDIA TESLA C2075 Companion Processor
  • RAM: 64GB DDR3 1333Mhz ECC
  • HDD: (os and apps): 146GB SAS 15k
  • HDD: (working directory): 3 x 73GB SAS 15k – LSI RAID 0
  • OS: Windows 7 Professional 64-bit, Linux 64-bit
  • OTHER: ANSYS R14, latest NVIDIA TESLA Drivers

CONFIG C – CUBE HVPC c16i-GPU

The INTEL® based CUBE HVPC w16i-GPU FEA Server:

  • CPU: 2x INTEL XEON e5-2690 (2.9GHz 8 core)
  • GRAPHICS/GPU: NVIDIA QUADRO 6000
  • RAM: 128GB DDR3 1600 MHz ECC
  • HDD: (os and apps): 300GB SATA III 10k WD Velociraptor
  • HDD: (working directory): 3 x 600GB SATA III 10k WD
  • OS: Linux 64-bit
  • OTHER: ANSYS R14, latest NVIDIA TESLA Drivers

CUBE HVPC w12i-GPU

The INTEL® based CUBE HVPC w12i-GPU FEA Simulation Workstation: image

  • CPU: 2x INTEL XEON x5690 (3.47GHz 6 core)
  • GRAPHICS/GPU: NVIDIA QUADRO 6000
  • RAM: 144GB DDR3 1333Mhz ECC
  • HDD: (os and apps): 256GB SSD SATA III
  • HDD: (working directory): 4 x 600GB SAS2 15k – LSI RAID 0
  • OS: Windows 7 Professional 64-bit
  • OTHER: ANSYS R13

References

How To update ANSYS Release 13.0 to ANSYS Release 13.0 SP2 for 64-bit Linux

Pre-Install Tasks

· ANSYS Release 13.0 must already be installed on your machine

· Time needed to complete update: 15-30 minutes

Begin ANSYS 13.0 SP2 update procedure

1. Login and download the appropriate Linux version update from the ANSYS Customer Portal.

2. Save ANSYS130SP2_LINX54.tar to your install Desktop

3. Extract the files root Desktop

4. Open a Terminal window session and change directory to your Desktop then to the extracted folder on the desktop. In my example the folder is Save ANSYS130SP2_LINX54.tar_FILES

5. After you have verified that you are in this folder type. ./INSTALL

image

7. Review the ANSYS Software License agreement and click I AGREE to continue.

image

8. Verify your installation directory and click Next.

You must install into the ANSYS RELASE 13.0 SP2 into the same location as your original ANSYS RELEASE 13.0 location.

image

9. Select the components you would like to install. The installation GUI will by default have checked installed selected.

You can choose to install fewer products to update but you would not be able to select products to add with the update package. You would need to install from the main ANSYS RELASE 13.0 media and then install the SP2 on top of that installation.

image

Please note the amount of disk space required for the update. Approximately 7.2 GB. You will need to make sure that your Linux machine has at least 7.3GB free and available.


10. Verification screens – Dates

image

11. Installation screens – Various screens will scroll through as the installation manager extracts the package files. Screen shot of Extraction: Package 4 of 16

image

12. Installation screens – Various screens will scroll through as the installation manager extracts the package files. Screen shot of Extraction: Package 5 of 16

image

13. Installation screens – Various screens will scroll through as the installation manager extracts the package files. Screen shot of Extraction: Package 14 of 16

image

14. Completed update – Click Next to begin the ANSYS 13.0 Licensing Client Installation update

image

15. ANSYS 13.0 Licensing Client Installation – Begin verification

image

16. ANSYS 13.0 Licensing Client Installation – Configuration log and successful completion of ANSYS 13.0 Licensing Client update.

image

17. Click Finsh to end update installation script routine.

As indicated in the note on the IMPORTANT note below. Please run the update for the ANSYS, Inc. License Manager SP2 update after completing this procedure. This download is also available through the ANSYS Customer Portal

image

How to Get a Cool Grip on Your Data Center Cooling in Three Easy Steps!

CUBE 96 Mini-ClusterOver the years I have learned to do more with less. When it comes to information systems world, you all know the equation is often much more with much less. One of my to-do’s over the years, umm that continued to get bumped down on the priorities list. It is the juggling act of making sure that the data center has enough cooling vs. power vs. yes again in AZ cooling. If you are an IT professional or even an Engineer you really don’t have time to attempt to try to convince someone, anyone that we need to speed more money. Even if you use the effective philosophy of Time, Money and Quality. After dealing recently “wish ware software vendors this past year” I added a fourth dimension to the above philosophy. It’s called Functionality; here is what Merriam Webster had to say about Functionality. The quality or state of being functional; especially: the set of functions or capabilities associated with computer software or hardware or an electronic device. http://www.merriam-webster.com/dictionary/functionality?show=0&t=1307146105

  1. Time – Try searching the internet for terms such as data center cooling calculator, data center cooling costs or how can I save money with our data center cooling? You will suddenly have millions of search queries at your beckon. From video blogs on data center cooling, white papers on optimizing IT strategy on data center cooling. It is endless 3 million plus hits on just one search term data center cooling calculator. Wow, Start researching my young padawon learner, fill out those lead generating white papers. Keep it simple…
  2. Money – I would prefer that we used or dollars on buying a server with a couple INTEL XEON ET-8870 processor. Or how about a QUAD based AMD FX-8130P processor server! I do not have any budget for a I am sure fabulous data center cost benefit crisis analysis.
  3. Quality – Can I even understand what the end-result document white paper will read. Will I look like even more of an idiot? This needs to be accurate information. I will have to do it myself or use a third party data center analysis.
  4. Functionality – Will our current air conditioners hold up this year? What about when we hit 120 degrees? Oh my do I need to add more cooling power?

Wikipedia on British Thermal Unit (BTU) – http://en.wikipedia.org/wiki/British_thermal_unit

So why I care about a BTU? Because approximately one "ton of cooling", which is a typical way people talk about cooling devices in the USA, is 12,000 BTU/h. This is the amount of power needed to melt one short ton of ice in 24 hours. Locked away in a climate controlled vault is one of my data centers. “Said such vault may or may not contain the following items on any given day.” After all this is a mobile compute server world these days.

  • 13 Servers
  • 174 Cores (Mix of Windows/Linux servers)
  • 2 – Network Routers
  • 3 – Network Switches
  • Phone System
  • Voice Mail System
  • 1 LCD 20” monitor/KVM

Go Green in the Data Center! First, let’s get a “Cool Grip” on your data center…16,484.058 BTU/h

A couple years ago one of the our ANSYS Mechanical Simulation Engineers named Jason Krantz told me about a watt meter device. A handy little watt meter monitoring device designed by P3 International KILL A WATT™. Over the years that little watt meter devices has become one of closest friends and ally in IT. Today, I was able to quickly asses (realistically about four hours of time) just how many Watts of power each one of our servers, network devices, etc. used. I tried to be as accurate as I could without having to take out a second mortgage. So I made sure and verified that one of our PHD FE Analyst or CFD Analysts had our servers at our near 100% CPU use.

YOUTUBE VIDEO :: Check out this real-world example of a AMD Opteron 6174, 287 hour electrical cost usage test. The data shown in this video is of a server that has four AMD Opteron 6174 processors installed.

AMD Opteron 6174 Electrical Cost Usage

So, what is your magical number? Ours is 4,831 Watts

Do you know how many watts of power your server room is using? could you even logically guess what that number is? Our magical number for server room #1 turned out to be 4,831 Watts. I do need to state that I was unable to take some of the devices offline. When that was the case I used data pulled from the actual technical documents of the device’s manufacture website.

So what is your BTU/h number? Ours is 16,484.058 BTU/h? Oh and I don’t even like math? I know, I know, math was solved and perfected centuries ago. But how do I convert Watts into BTU?

I used a 99 cent app that I bought off of the iTunes App store called “Convert Units”.

  • Step 1 – Convert Watts into BTU/m
  • Step 2 – Then multiplied by 60 to get that value into BTU/h.
  • Step 3 – Speak to your Operations Department, send an email, shout from the rooftops!! We need at least a two ton Air Conditioning unit for Server Room #1

Now with the precious BTU/h value in hand. I was able to speak the same language as that of our Director of Operations & Facilities manager.

I wish you all could have been there when I walked up to Scott and told him the news. The dialogue went something like this:

“Scott I wanted to talk to you about server room #1’s cooling situation…(pause for dramatic expression). Almost immediately you could see Scott’s blood pressure rising. Scott’s brain quickly churning through mountains of Air Conditioning cooling information and data. I quickly calmed his anxiety and said these exact words. “Server Room #1’s BTU/h ratio is approximately 16,484.058 BTU/h.” It took Scott just a moment for this bit of information to register. I do believe that I actually heard the hallelujah chorus in the heavens. I could also see the peace that passes all understanding come across Scott’s face. It was if I could read his mind and he was thinking how is this non-operation/facilities type humanoid speaking my language? For Scott knew immediately that he had enough cooling power at this moment into to cool that data center down all summer long.

DATA CENTER #1 – 274.7343 BTU/m*60 = 16,484.058 BTU/h

How the heck are you making money today? Step out of the Box, Step Into a Cube Computers for CFD and FEA Simulation. http://www.cube-hvpc.com/

How To run ANSYS Release 13.0 Workbench on 64-bit Linux

Getting ANSYS Workbench up and running on Linux at R13 is pretty simple.  You just have to make sure that a few things are in place and some packages are loaded.  Then it works great.  Here is a quick HOW-TO on getting things going:

Pre-Install Tasks

  • Install CentOS 5.3 or greater or RHEL5
    • Download and install the latest graphics card drivers for your video card. Restart
      image
  • Next, Gnome Desktop Environment is required for optimum use.

  • Next, Using the Linux Package Manager. Select the Development main group and then select the additional libraries all needed. (see images below)

image

image

  • Select Optional packages and then select the additional MESA libraries (see below).

image

  • Next, select the Base System main group, then X Window system, and Legacy Software Support. With Legacy software support still selected. Click Optional Packages and select the additional package – openmotif22 and click close.

image

image

  • Restart the system

Post ANSYS Install Setup Tasks

  • Within your Terminal session. Type ProductConfig.sh

image

    • Click Configure Products, then select the products to configure or reconfigure
  • Pro/E Configuration GUI

image

  • Unigraphics NX install Configuration GUI

image

  • Click Continue and the product configuration script will run.

image

  • Click Finish

How to launching ANSYS Release 13.0 Workbench

  • Open a Linux terminal session:
  • Change your path to include /ansys_inc/v130/Framework/bin/Linux64

image

  • Next, launch the program by typing ./runwb2 and press enter

image

  • Basic opening up of a Design Modeler project
  • image

Here it is: ANSYS 13 Workbench on CentOS 5.5 64-bit Linux

image

Done!

To 40 Gb/s Infiniband QDR and Beyond! HOW-TO

In this article I will show you how to install and configure two 64-bit Linux servers with a 40Gb/s Infiniband interconnect. The two servers are connected directly together with a 7mm QDR Cable. No switch is used in this configuration or setup document.

The Inifinband cards used for the How-To have the Mellanox® ConnectX®-2 IB QDR chip MT26428. They are a dual Infiniband QSFP port with a 40Gb/s data rate per port. The cards use a PCI-E 2.0 x8 (5GT/s) slot in a UIO low-profile half-length form factor designed by SuperMicro.

Step 1: Install OpenFabric software – CentOS

  • Select OpenFabrics Enterprise Distribution

f1

  • Next, install the openib and mstflint
    • Select Optional Packages

f2

      • Select openib
      • Select mstflint
      • Click Close
    • Click Apply
    • Allow download and install to complete

Step 2: Verify that you have the most recent firmware for the Infiniband cards.

  • Go to the manufacture website and locate the firmware update bin file for your Infiniband card.
    • While you are at the site you also may want to download the latest software drivers for your specific Infiniband card.
    •  www.mellanox.com – Mellanox Technologies
  • Next, update the firmware for our Mellanox Technologies MT26428 Infiniband card
  • Open a Terminal session and type the following commands: your output will look similar to the output listed below. lspci

f3

Please note: the device id of your Infiniband card. In the above screen shot you will see that the pci device id for the card is 41:00.0

 

  • Next, begin the firmware update:
    • Make sure that your terminal window session is in your firmware update location. I saved the update.bin file on the Desktop of the root user.
    • Next type: mstflint –d 41:00.0 -i AOC.bin b
      • Checkout the screen shot below, amake sure within your file name there are no are in the file name of the update.bin file. If in the below command line entry the bin file read AOC-1.bin the firmware update would fail.

f4

  • After successful firmware update, restart the server and move on to Step 3.

Step 3: Configuring the network-scripts on Linux

  • In a terminal window change your path location to /etc/sysconfig/network-scripts/
  • Type vi ifcfg-ib0 to enter into text editor mode.

f5

  • Next, enter in the following text for the Infiniband card.

DEVICE=ib0
ONBOOT=yes
IPADDR=192.168.0.x
NETMASK=255.255.255.0
TYPE=Ethernet

PLEASE NOTE: Pertaining to the IPADDR entry above, a good rule of thumb rule that I use is to add an additional digit to the first IP number of your server. For example, if your server IP is 10.0.0.100 I would make the Infiniband IP address for IB port 0: 11.0.0.100 as well as using the correct subnetmask for your IP address range.

  • Finally, save the ifcfg-ib0 file by performing the following keystroke commands
    • Press shift and then :
    • Next type wq
      • Here is link to a blog article that I wrote listing several vi commands that I use the most. (You will need to scroll down they are under Step 2 )

http://www.padtinc.com/blog/post/2011/01/07/IT-Tips-Setting-Up-FreeNX-for-Remote-Access-to-Linux-Machines-from-Windows.aspx

Step 4: Verify that the following Infiniband services startup on reboot:

  • Open Service configuration manager off of the file menu click >System >Administration >Services
    • Select start on boot
  • Select opensm & openib on MASTER node
    • Please Note: Only one subnet manager needs to be running on any given Infiniband interconnect network
  • openib on each additional server within the Infiniband interconnect network

f6

  • Select the check box. Click Start and click Save
  • Restart servers

Done – The installation and configuration of the Infiniband card is now completed.

Important Linux Infiniband commands that I used during the installations

  • lspci – Lists all of the device ids
  • Ibv_devinfo – checks that the IB driver is running on all nodes and shows port status
  • Sminfo – check to see if the subnet manager is running.
  • Ibchecknet – checks the network connectivity status
  • Ibdiagnet – performs a set of tests on the IB network
  • Ibhosts – simple discover IB hosts
  • Ibstat – checks state of local IB port
  • Ibnodes – discovery of nodes

IT Tips – Is SP1 for Windows 7 Going to Slow Down my Machine?

ANSYS R13 Lemans CPU Wall Clock Benchmark Results Before and After a Windows 7 64-bit SP1 Update

I was curious what sort of impact the Windows 7 SP1 upgrade would have on a new install of ANSYS R13. Would the Service Pack upgrade have a positive or negative impact on the ANSYS R13 and additionally the Lemans benchmarks?  Here is what we found:

  • Hardware test machine: Single socket 6 core AMD C32 running 2.6GHz, 8GB of RAM, 2 x 160GB Intel SSD drives in RAID0, Windows 7 Professional
  • Benchmarks completed by Clinton Smith, Consulting Mechanical Engineer CFD and Thermal Analysis at PADT, Inc.

Results:

  • The benchmark results appear approximately the same before and after the install of the Service Pack.
  • No apparent issues or crashes with ANSYS R13 after installing the Windows 7 Service Pack 1 update
  • Six core benchmarks and one core benchmarks are slightly faster after SP1 upgrade.
  • Four core and two core benchmarks are slightly slower after SP1 upgrade:

ANSYS Le Mans Model, 1,864,025 Nodes

Cores

Wall Time (s)
Pre SP1

Wall Time (s)
With SP1

Diff

% Diff

1

7,220

7,190

-30

-0.4%

2

3,970

3,980

10

0.3%

4

2,300

2,320

20

0.9%

6

1,850

1,840

10

-0.5%

Conclusion:

It appears that overall the impact of the upgrade is a positive. No system stability or ANSYS R13 issues related to the upgrade.

Caps and Limits on Hardware Resources in Microsoft Windows

Sometime around 3am last October I found myself beating my head up against a server rack. I was frustrated with trying to figure out what was limiting my server hardware. I was aware of a couple limits that Microsoft had placed into its OS software. However, I had no idea how far reaching the limits were. I figured it would be best if I had a better understanding of these hardware limits. So I researched the caps that are placed on the hardware by two of the most popular Operating Systems on the planet: Microsoft Windows 7, Windows Server 2008 R2 editions and REHL.

I have compiled this easy to read superlative analysis of my facts and findings. Read on, I know you are curious to find out what I’ve uncovered.

Enjoy!


“They capped and limited US!"

Microsoft Windows Operating Systems

·        Windows 7

a.     Professional / Enterprise / Ultimate

                                                              i.      Processor: 2 Socket limit (many cores)

1.     Core limits:

a.     64-bit: 256 max quantity of cores in 1 physical processor

b.     32-bit: 32 cores max quantity of cores in 1 physical processor

                                                            ii.      RAM: 192 GB limit to amount of accessible

b.     Home Premium

                                                              i.      RAM: 16GB

c.      Home Basic

                                                              i.      RAM: 8GB

d.     Starter Edition

                                                              i.      RAM: 2 GB


 

·        Windows Server 2008

a.      Standard & R2

                                                              i.      Processor: 4 socket limit – (many cores)

1.      (4 – Parts x 12core) = 48 cores

                                                            ii.      RAM: 32 GB

·        Windows Server 2008 R2 Foundation  (R2 releases are 64-bit only)

                                                          iii.      RAM: 128 GB

·        HPC Edition 2008 R2 (R2 releases are 64-bit only)

                                                         iv.      RAM: 128 GB

·        Windows Server 2008 R2 Datacenter (R2 releases are 64-bit only)

a.     Processor: 8 socket limit

b.     RAM: 2TB

·        Windows Server 2008 R2 Enterprise (R2 releases are 64-bit only)

a.     Processor: 8 socket limit

b.     RAM: 2TB

Red Hat Enterprise Linux – 64-bit

·         Red Hat defines a logical CPU as any schedulable entity. So every core/thread in a multi-core/thread processor is a logical CPU

·         This information is by Product default.  Not the maximums of a fully licensed/subscribed REHL product.

a.     Desktop

                                                              i.      Processor: 1-2 CPU

                                                            ii.      RAM: 64 GB

b.     Basic

                                                              i.      Processor: 1-2 CPU

                                                            ii.      RAM: 16 GB

c.      Enterprise

                                                              i.      Processor: 1-8 CPU

1.     RAM: 64 GB

d.     *** Red Hat Enterprise Linux: Red Hat would be happy to create custom subscriptions with yearly fees for other configurations to fit your specific environment. Please contact Red Hat to check on costs.

References

http://msdn.microsoft.com/en-us/library/aa366778(VS.85).aspx#memory_limits

http://www.microsoft.com/hpc/en/us/default.aspx

http://www.redhat.com/rhel/compare/

IT Tips: Setting Up FreeNX for Remote Access to Linux Machines from Windows

How To Install FreeNX with Key-based authentication on CentOS 5.5 in a Windows Platform Environment

 

In this article I will show you how to install and configured one of our more popular remote access programs here at PADT, Inc.  The software that I will be showing you how to install is called FreeNX.  FreeNX is the opensource version of a product distributed by the company called NOMACHINE, http://www.nomachine.com/.  Another program that the Analysts have struggled with over the years and still is a favorite with some of the Analysts here at PADT, Inc. is a program called CYGWIN http://www.cygwin.com/. However, the analysts seem to prefer the fast interface and overall robustness of the FREENX server.

Within this guide I will attempt to break the install down into two components so that Network Administrators and IT Managers are able to have this up and running in 30 minutes. For my installation I used the How To install NX Server using FreeNX guide freely provided by the CentOS wiki site as a reference. http://wiki.centos.org/HowTos/FreeNX

Of course it’s okay to install NX server on FreeNX using the graphical user interface

STEP 1 – SERVER: NOMACHINE FREENX SERVER INSTALLATION USING GRAPHICAL INTERFACE

1.       Install FREENX on your CentOS 5.5 server. My test NX server is a DELL Poweredge 1950 server, that has an install of CentOS 5.5 64-bit edition.

a.       Console installation

                                                               i.      Login as root console

                                                             ii.      Click Applications à Add/Remove Software

                                                            iii.      You will see the following dialogue box.

                                                           iv.      clip_image002[3]_thumb[2]

                                                             v.      On The left had side scroll down the list until you see CentOS extras. On the right window select FreeNX and NX. Then select Apply. A screen shot is shown below.

                                                           vi.      clip_image004[3]_thumb[1]

                                                          vii.      Allow the dependencies check to run. When you see the image that looks like the one below prompting to Cancel or Continue. Select Continue

                                                        viii.      clip_image006[3]_thumb[1]

                                                           ix.      Next, Allow the system to download and install the packages.

                                                             x.      clip_image008[3]_thumb[1]

                                                           xi.      When it has completed the checkmark on FreeNX and NX will be selected and you can now close out the package manager.

STEP 2  – SERVER: Terminal Session command line changes for FreeNX Key-based authentication

1.       You may have thought you could get away with a GUI only installation of FreeNX but not this time.

2.       Click Applications à Accessories à Terminal

a.       clip_image010[3]_thumb[1]

3.       For my server example I choose to use vi as my text editor. However you may choose to use whatever text editor you may like. I guarantee any Linux user will immediately be impressed if you open up a terminal session and start editing the file using vi.

First lets go over of my most used vi commands: vi basics for Windows Network Administrators

                                                                           i.      i = insert text

                                                                         ii.      d = delete text

                                                                        iii.      d[space bar] = please delete the one piece of text to the right.

                                                                       iv.      dd = opps did I just delete the entire line?

                                                                         v.      :q! = Eek! get me out of here now I screwed up the file really bad. No I don’t want to save the file right now!

                                                                       vi.       :wq! = Yes save the file right now

                                                                      vii.      For further research and light reading please go here: http://www.uic.edu/depts/accc/software/unixgeneral/vi101.html

Editing the sshd_config file using vi

b.      Within your open terminal window type:

c.       cd /etc/ssh/

clip_image012[3]_thumb[1]

d.      Login as root. Because you are the Network Administrator or IT Manager. Hopefully you are already logged in as root

e.      Next type, vi sshd_config (reference or print out David’s most used vi commands first!)

f.        clip_image014[3]_thumb[1]

4.       Now don’t panic…using your down-arrow key tap down until you get to the area that reads Password Authentication. Remove the # mark by pressing (gently) d then the space bar. This will remove the # (remark) comment.

5.       Per the documentation modify if you have disallowed ssh password cleartext tunneled passwords. You will need to make the PasswordAuthentication  line to look like the above highlighted text.

6.       Next, add the following line below the text PasswordAuthetication yes

                                                                          i.      Within vi press i  

1.       This will put you into INSERT mode.

2.       Add the text AllowUsers nx

3.       Add any additional users similar to how I have it above.

a.       :ie.  AllowUsers nx userid1 userid2 Ansys, Inc.

4.       Now that you have that text added you have the vi basics for Windows Network Administrators command lists from above press. :wq!

Configure the NX database to allow passthrough authentication.

Editing the node.conf file using vi

7.       Next, we need to edit the node.conf file within the /etc/nxserver/ folder

a.       You should be back to your terminal session

b.      Within your open terminal window type:

c.       cd /etc/nxserver/

                                                                           i.      clip_image016[3]_thumb[1]

                                                                         ii.      vi node.conf

1.       With your vi editing session open tap down to the place in the file that reads  ## Authentication / Security directives

a.       clip_image018[3]_thumb[1]

b.      Un-remark # the ENABLE_PASSDB_ AUTHENTICATION =”0” and modified to read as the one highlighted above.

                                                                                                                                                   i.      ENABLE_PASSDB_AUTHENTICATION=”1”

Add your nx users to the NX Database

Add yourself to the nxserver database.

Suppose your username is ansys

·         [root@ben1]# nxserver –adduser ansys

·         NX> 100 NXSERVER – Version 1.5.0-60 OS (GPL)

·         NX> 1000 NXNODE – Version 1.5.0-60 OS (GPL)

·         NX> 716 Public key added to: /home/ ansys /.ssh/authorized_keys2

·         NX> 1001 Bye.

·         NX> 999 Bye

Assign a password for the userid’s.

Add your nx server password to the NX Database

·         [root@ben1]# nxserver –passwd ansys

·         NX> 100 NXSERVER – Version 1.5.0-60 OS (GPL)

·         New password:

·         Password changed.

·         NX> 999 Bye

Verify that you have added userid1on the AllowUsers line in the /etc/ssh/sshd_config file: and then reload sshd:

·         In your Terminal session type:

·         service sshd reload

 

·         clip_image020[3]_thumb[1]

 

STEP 3 – CLIENT INSTALLATION

·         You can download FreeNX windows client from here and install. Follow the instructions.

Key points to remember on your Windows Client Installation:

 

·         Follow the excellent instructions provided by NOMACHINE – However.

·         I suggest that you change Desktop to GNOME,

o    Ansys, Inc., Inc. installations prefers the GNOME desktop. Also increase your bandwidth slider to you current network connection.

·         I choose:

o   Host: your NX Server

o    LAN

o   Display of 1024×768 (or all available)

·         clip_image022[3]_thumb clip_image024[3]_thumb

 

The critical piece to finish the installation is to copy and paste the client key from the nxserver into your Windows Client install.

·         Located under the GENERAL tab (see image above) Click Key… and delete the key within your client installation.

o   Paste in the key from your new nx server install

o   Next, click Save.

·         To locate your nxserver key copy the text out of the file located in the directory:

o   /etc/nxserver/

§  vi client.id_dsa.key

§  copy the text out

o   client.id_dsa.key – Copy all of the text from this file and paste it into the Key.. file on your Windows Client installation.

§  As root user – highlight the text and Copy then paste into your Client

§  Again the location of the client.id_dsa.key is below

o   clip_image026[3]_thumb[1]