CoresOnDemand: Helping Engineers Do Their Magic

CoresOnDemand-Logo-120hEngineers Do Magic

In the world of simulation there are two facts of life. First, the deadline of “yesterday would be good” is not too uncommon. Funding deadlines, product roll-out dates, as well as unexpected project requirements are all reliable sources for last minute changes. Engineers are required to do quality work and deliver reliable results in limited time and resources. In essence perform sorcery.

af-01

Second, the size and complexity of models can vary wildly. Anything from fasteners and gaskets to complete systems or structures can be in the pipeline. Engineers can be looking at any combination of hundreds of variables that impact the resources required for a successful simulation.

Required CPU cores, RAM per core, interconnect speeds, available disk space, operating system and ANSYS version all vary depending on the model files, simulation type, size, run-time and target date for the results.

Engineers usually do magic. But sometimes limited time or resources that are out of reach can delay on-time delivery of project tasks.

At PADT, We Can Help

PADT Inc. has been nostrils deep in engineering services and simulation products for over 20 years. We know engineering, we know how to simulate engineering and we know ANSYS very well. To address the challenges our customers are facing, in 2015 PADT introduced CoresOnDemand to the engineering community.

af-02

CoresOnDemand offers the combination of our proven CUBE cluster, ANSYS simulation tools and the PADT experience and support as an on demand simulation resource. By focusing on the specific needs of ANSYS users, CoresOnDemand was built to deliver performance and flexibility for the full range of applications. Specifics about the clusters and their configurations can be found at CoresOnDemand.com.

CoresOnDemand is a high performance computing environment purpose built to help customers address numerical simulation needs that require compute power that isn’t available or that is needed on a temporary basis.

Call Us We’re Nice

CoresOnDemand is a new service in the world of on-demand computing. Prospective customers just need to give us a call or send us an inquiry here to get all of their questions answered. The engineers behind CoresOnDemand have a deep understanding of the ANSYS tools and distributed computing and are able to asses and properly size a compute environment that matches the needed resources.

Call us we’re nice!

Two Halves of the Nutshell

The process for executing a lease on a CoresOnDemand cluster is quite straight forward. There are two parts to a lease:

PART 1: How many cores & how long is the lease for?

By working with the PADT engineers – and possibly benchmarking their models – customers can set a realistic estimate on how many cores are required and how long their models need to run on the CoresOnDemand clusters. Normally, leases are in one-week blocks with incentives for longer or regular lease requirements.

Clusters are leased in one-week blocks, but we’re flexible.

Part 2: How will ANSYS be licensed?

An ANSYS license is required in order to run on the CoresOnDemand environment.  A license lease can be generated by contacting any ANSYS channel partner. PADT can generate license leases in Arizona, Colorado, New Mexico, Utah & Nevada. Licenses can also be borrowed from the customer’s existing license pool.

An ANSYS license may be leased from an ANSYS channel partner or borrowed from customer’s existing license pool.

Using the Cluster

Once the CoresOnDemand team has completed the cluster setup and user creation (takes a couple of hours for most cases), customers can login and begin using the cluster. The CoresOnDemand clusters allow customers to use the connection method they are comfortable with. All connections to CoresOnDemand are encrypted and are protected by a firewall and an isolated network environment.

Step 1: Transfer files to the cluster:

Files can be transferred to the cluster using Secure Copy Protocol which creates an encrypted tunnel for copying files. A graphical tool is also available for Windows users (& it’s freeJ). Also, larger files can be loaded to the cluster manually by sending a DVD, Blu-ray disk or external storage device to PADT. The CoresOnDemand team will mount the volume and can assist in the copying of data.

Step 2: Connect to the cluster and start jobs

Customers can connect to the cluster through an SSH connection. This is the most basic interface where users can launch interactive or batch processing jobs on the cluster. SSH is secure, fast and very stable. The downside of SSH is that is has limited graphical capabilities.

Another option is to use the Nice Software Desktop Cloud Visualization (DCV) interface. DCV provides enhanced interactive 2D/3D access over a standard network. It enables users to access the cluster from anywhere on virtually any device with a screen and an internet connection. The main advantage of DCV is the ability to start interactive ANSYS jobs and monitor them without the need for a continuous connection. For example, a user can connect from his laptop to launch the job and later use his iPad to monitor the progress.

af-04

Figure 1. 12 Million cell model simulated on CoresOnDemand

The CoresOnDemand environment also has the Torque resource manager implemented where customers can submit multiple jobs to a job queue and run them in sequence without any manual intervention.

Customers can use SCP or ship external storage to get data on the cluster. SSH or DCV can be used to access the cluster. Batch, interactive or Torque scheduler can be used to submit and monitor jobs.

All Done?

Once the simulation runs are completed customers usually choose one of two methods to transfer data back. First is to download the results over the internet using SCP (mentioned earlier) or have external media shipped back (External media can be encrypted if needed).

After the customer receives the data and confirms that all useful data was recovered from the cluster, CoresOnDemand engineers re-image the cluster to remove all user data, user accounts and logs. This marks the end of the lease engagement and customers can rest assured that CoresOnDemand is available to help…and it’s pretty fast too.

At the end of the lease customers can download their data or have it shipped on external media. The cluster is later re-imaged and all user data, accounts & logs are also deleted in preparation for the next customer.

CoresOnDemand-Advert-Rect-360w

Five Ways CoresOnDemand is Different than the Cloud

CoresOnDemand-Logo-120hIn a recent press release, PADT Inc. announced the launch of CoresOnDemand.com. CoresOnDemand offers CUBE simulation clusters for customers’ ANSYS numerical simulation needs. The clusters are designed from the ground up for running ANSYS numerical simulation codes and are tested and proven to deliver performance results.

CoresOnDemand_CFD-Valve-1

POWERFUL CLUSTER INFRASTRUCTURE

The current clusters available as part of the CoresOnDemand offering are:
1- CoresOnDemand – Paris:

80-Core Intel based cluster. Based on the Intel Xeon E5-2667 v.2 3.30GHz CPU’s, the cluster utilizes a 56Gbps InfiniBand Interconnect and is running a modified version of CentOS 6.6.

CoresOnDemand-Paris-Cluster-Figure

2- CoresOnDemand – Athena:

544-Core AMD based cluster. Based on the AMD Opteron 6380 2.50GHz CPU’s the cluster utilizes a 40Gbps InfiniBand Interconnect and is running a modified version of CentOS 6.6.

CoresOnDemand-Athena-Cluster-Figure

Five Key Differentiators

The things that make CoresOnDemand different than most other cloud computing providers are:

  1. CoresOnDemand is a non-traditional cloud. It is not an instance based cluster. There is no hypervisor or any virtualization layer. Users know what resources are assigned exclusively to them every time. No layers, no emulation, no delay and no surprises.
  2. CoresOnDemand utilizes all of the standard software designed to maximize the full use of hardware features and interconnect. There are no layers between the hardware and operating system.
  3. CoresOnDemand utilizes hardware that is purpose built and benchmarked to maximize performance of simulation tools instead of a general purpose server on caffeine.
  4. CoresOnDemand provides the ability to complete high performance runs on the compute specialized nodes and later performing post processing on a post-processing appropriate node.
  5. CoresOnDemand is a way to lease compute nodes completely and exclusively for the specified duration including software licenses, compute power and hardware interconnect.

CoresOnDemand is backed up by over 20 years of PADT Inc. experience and engineering know-how. Looking at the differentiating features of CoresOnDemand, it becomes apparent that the performance and flexibility of this solution are great advantages for addressing numerical simulation requirements of any type.

To learn more visit www.coresondemand.com or fill out our request form.

Or contact our experts at coresondemand@padtinc.com or 480.813.4884 to schedule a demo or to discuss your requirements.

CoresOnDemand-ANSYS-CUBE-PADT-1

Announcing CoresOnDemand.com – Dedicated Compute Power when you Need It

CoresOnDemand-Logo-120hWe are pleased to announce a new service that we feel is remote solving for FEA and CFD done right: CoresOnDemand.com.  We have taken our   proven CUBE Simulation Computers and built a cluster that users can simply rent.  So you get fast hardware, you get it all to your self, and you receive fantastic support from the ANSYS experts at PADT.

It is not a time share system, it is not a true "cloud" solution.  You tell us how many nodes you need and for how long and we rent them to you. You can submit batch or you can configure the machines however you need them.  Submit on the command line, through a batch scheduler, or run interactive. And when you are done, you do not have to send your files back to your desktop. We've loaded NICE DCV so you can do graphics intense pre- and post-processing from work or home, over the internet to our head nodes.  You can even work through your iPad.

CUBE-HVPC-512-core-closeup3-1000h

If you visit our Blog page a lot, you may have noticed the gray cloud logo with a big question mark next to it. If you guessed that was a hint that we were working on a cloud solution for ANSYS users, you were correct. We've had it up and running for a while but we kept "testing" it with  benchmarks for people buying CUBE computers. Plus we kept tweaking the setup to get the best user experience possible.  With today's announcement we are going live.

We created this service for a simple reason. Customers kept calling or emailing and asking if they could rent time on our machines.  We got started with the hardware but also started surveying and talking to users. Everyone is talking about the cloud and HPC, but we found few providers understood how to deliver the horsepower people needed in a usable way, and that users were frustrated with the offerings they had available. So we took our time and built a service that we would want to use, a service we would find considerable value in.

simulation-hardware ansys-expertise dependability

You can learn more by visiting www.CoresOnDemand.com. Or by reading the official press release included below. To get your started, here are some key facts you should know:

  1. We are running PADT CUBE computers, hooked together with infiniband. They are fast, they are loaded with RAM, and they have a ton of disk space. Since we do this type of solving all the time, we know what is needed
  2. This is a Bring Your Own License (BYOL) service. You will need to lease the licenses you need from whoever you get your ANSYS from.  As an ANSYS Channel partner we can help that process go smoothly.
  3. You do not share the hardware.  If you reserve a node, it is your node. No one else but your company can log in.  You can rent by the week, or the day.
  4. When you are done, we save the data you want us to save and then wipe the machines.  If you want us to save your "image" we can do that for a fee so next time you use the service, we can restore it to right where you were last time.
  5. Right now we are focused on ANSYS software products only. We feel strongly about focusing on what we know and maximizing value to the customers.
  6. This service is backed by PADT's technical support and IT staff. You would be hard pressed to find any other HPC provider out there who knows more about how to run ANSYS Mechanical, ANSYS Mechanical APDL, ANSYS FLUENT, ANSYS CFX, ANSYS HFSS, ANSYS MAXWELL, ANSYS LS-DYNA, ANSYS AUTODYN, ICEM CFD, and much more.

To talk to our team about running your next big job on CoresOnDemand.com contact us at 480-813-4884 or email cod@padtinc.com

CoresOnDemand-ANSYS-CUBE-PADT-1

See the official Press Release here

Press Release:

CoresOnDemand.com Launches as Dedicated ANSYS Simulation
High Performance Cloud Compute Resource 

PADT launches CoresOnDemand.com, a dedicated resource for users who need to run ANSYS simulation software in the cloud on optimized high performance computers.

Tempe, AZ – April 29, 2015 – Phoenix Analysis & Design Technologies, Inc. (PADT), the Southwest’s largest provider of simulation, product development, and 3D Printing services and products, is pleased to announce the launch of a new dedicated high performance compute resource for users of ANSYS simulation software – CoresOnDemand.com.  The team at PADT used their own experience, and the experience of their customers, to develop this unique cloud-based solution that delivers exceptional performance and a superior user experience. Unlike most cloud solutions, CoresOnDemand.com does not use virtual machines, nor do users share compute nodes. With CoresOnDemand.com users reserve one or more nodes for a set amount of time, giving them exclusive access to the hardware, while allowing them to work interactively and to set up the environment the way they want it.

The cluster behind CoresOnDemand.com is built by PADT’s IT experts using their own CUBE Simulation Computers (http://www.padtinc.com/cube), systems that are optimized for solving numerical simulation problems quickly and efficiently. This advantage is coupled with support from PADT’s experienced team, recognized technical experts in all things ANSYS. As a certified ANSYS channel partner, PADT understands the product and licensing needs of users, a significant advantage over most cloud HPC solutions.

“We kept getting calls from people asking if they could rent time on our in-house cluster. So we took a look at what was out there and talked to users about their experiences with trying to do high-end simulation in the cloud,” commented Eric Miller, Co-Owner of PADT. “What we found was that almost everyone was disappointed with the pay-per-cpu-second model, with the lack of product understanding on the part of the providers, and mediocre performance.  They also complained about having to bring large files back to their desktops to post-process. We designed CoresOnDemand.com to solve those problems.”

In addition to exclusive nodes, great hardware, and ANSYS expertise, CoresOnDemand.com adds another advantage by leveraging NICE Desktop Cloud Visualization (https://www.nice-software.com/products/dcv) to allow users to have true interactive connections to the cluster with real-time 3D graphics.  This avoids the need to download huge files or running blind in batch mode to review results. And as you would expect, the network connection and file transfer protocols available are industry standards and encrypted.

The initial cluster is configured with Intel and AMD-based CUBE Simulation nodes, connected through a high-speed Infiniband interconnect.  Each compute node has enough RAM and disk space to handle the most challenging FEA or CFD solves.  All ANSYS solvers and prep/post tools are available for use including: ANSYS Mechanical, ANSYS Mechanical APDL, ANSYS FLUENT, ANSYS CFX, ANSYS HFSS, ANSYS MAXWELL, ANSYS LS-DYNA, ANSYS AUTODYN, ICEM CFD, and much more. Users can serve their own licenses to CoresOnDemand.com or obtain a short-term lease, and PADT’s experts are on hand to help design the most effective licensing solution.

Pre-launch testing by PADT’s customers has shown that this model for remote on-demand solving works well.  Users were able to log in, configure their environment from their desktop at work or home, mesh, solve, and review results as if they had the same horsepower sitting right next to their desk.

To learn more about the CoresOnDemand: visit http://www.coresondemand.com, email cod@padtinc.com, or contact PADT at 480.813.4884. 

About Phoenix Analysis and Design Technologies

Phoenix Analysis and Design Technologies, Inc. (PADT) is an engineering product and services company that focuses on helping customers who develop physical products by providing Numerical Simulation, Product Development, and Rapid Prototyping solutions. PADT’s worldwide reputation for technical excellence and experienced staff is based on its proven record of building long term win-win partnerships with vendors and customers. Since its establishment in 1994, companies have relied on PADT because “We Make Innovation Work. “  With over 75 employees, PADT services customers from its headquarters at the Arizona State University Research Park in Tempe, Arizona, and from offices in Littleton, Colorado, Albuquerque, New Mexico, and Murray, Utah, as well as through staff members located around the country. More information on PADT can be found at http://www.PADTINC.com.

Using Bright CM to Manage a Linux Cluster

COD_Cluster-Bright-1What goes into managing a Linux HPC (High Performance Computing) cluster?

There is an endless list of software, tools and configurations that are required or recommended for efficiently managing a shared HPC cluster environment.

A shared HPC cluster typically has many layers that deliver a usable environment that doesn’t have to  depend on the users coordinating closely or the system administrators being superheroes of late-night patching and just-in-time recovery.

bright-f1

Figure 1 Typical Layers of a shared HPC cluster.

For each layer in the diagram above there are numerous open-source and paid software tools to choose from. The thing to note is that it’s not just a choice. System administrators have to work with the user requirements, compatibility tweaks and ease of implementation and use to come up with a perfect recipe (much like carrot cake). Once the choices have been made, users and system administrators have to train, learn and start utilizing these tools.

HPC @ PADT Inc.

At PADT Inc. we have several Linux based HPC clusters that are in high demand. Our Clusters are based on the Cube High Value Performance Computing (HVPC) systems and are designed to optimize the performance of numerical simulation software. We were facing several challenges that are common with building & maintaining HPC clusters. The challenges were mainly in the areas of security, imaging and deployment, resource management, monitoring and maintenance.

To solve these challenges there is an endless list of software tools and packages both open-source and commercial. Each one of these tools comes with its own steep learning curve and mounting time to test & implement.

Enter – Bright Computing

After testing several tools we came across the Bright Computing – Bright Cluster Manager (Bright CM). Bright CM eliminates the need for system administrators to manually install and configure the most common HPC cluster components. On top of that it provides the majority of the HPC software packages, tools and software libraries in their default software image.

A Bright CM cluster installation starts off with an extremely useful installation wizard that asks all of the right questions while giving the user full control to customize the installation. With a note pad, a couple of hours and a basic understanding of HPC clusters, you are ready to install your applications.

bright-f2

Figure 2. Installation Wizard

An all knowing dashboard helps system admins master and monitor the cluster(s) or if you prefer the CLI CM shell provides full functionality through command line. From the dashboard system admins can manage multiple clusters down to the finest details.

bright-f3

Figure 3. Cluster Management Interface.

An extensive cluster monitoring interface allows systems admins, users and key stakeholders to generate and view detailed reports about the different cluster components.

bright-f4

Figure 4. Cluster Monitoring Interface.

Bright CM has proven to be a valuable tool in managing and optimizing our HPC environment. For further information and a demo of Bright Cluster Manager please contact sales@padtinc.com.

From Piles to Power – My First PADT PC Build

Welcome to the PADT IT Department now build your own PC

[Editors Note: Ahmed has been here a lot longer than 2 weeks, but we have been keeping him busy so he is just now finding the time to publish this. ]

I have been working for PADT for a little over 2 weeks now. After taking the ceremonial office tour that left me with a fine white powder all over my shoes (it’s a PADT Inc special treat). I was taken to meet my team, David Mastel – My Boss for short, who is the IT commander & chief at PADT Inc. and Sam Goff – the all-knowing systems administrator.

I was shown to a cubicle that reminded me of the shady computer “recycling” outfits you’d see on a news report highlighting the vast amounts of abandoned hardware; except there were no CRT (tube) screens or little children working as slave labor.
aa1

Sacred Tradition

This tradition started with Sam, then Manny, and now it was my turn taking this rite of passage. As part of the PADT IT department, I am required by sacred tradition to build my own desktop with my bare hands – then I was handed a screwdriver.

My background is mixed and diverse but mostly has one thing in common. We usually depended on pre-built servers, systems and packages. Branded machines have an embedded promise of reliability, support and superiority over the custom built machines.

  1. What most people don’t know about branded machines is that they carry two pretty heavy tariffs.
  2. First, you are paying upfront for the support structure, development, R&D, supply chains that are required to pump out thousands of machines.
  3. Second, because these large companies are trying to maximize their margins, they will look for a proprietary cost effective configuration that will:
    1. Most probably fail or become obsolete as close as possible to the 3-year “expected” life-span of computers.
    2. Lock users into buying any subsequent upgrade or spare part from them.

Long Story short, the last time I fully built a desktop computer was back in college when a 2GB hard disk was a technological breakthrough that we could only imagine how many MP3’s we could store on it.

The Build

There were two computer cases on the ground, one resembled a 1990 Mercury Sable that was at most tolerable as a new car and the other looked more like 1990 BMW 325ci a little old but carries a heritage and potential to be great once again.
aa2

So with my obvious choice for a case I began to collect parts from the different bins and drawers and I was immediately shocked at how “organized” this room really was. So I picked up the following:

There are a few things that I would have chosen differently but were not available at the time of the build or were ridiculous for a work desktop would be:

  • Replaced 2 drives with SSD disks to hold OS and applications
  • Explored a more powerful Nvidia card (not really required but desired)

So after a couple of hours of fidgeting and checking manuals this is what the build looks like.
aa3

(The case above was the first prototype ANSYS Numerical Simulation workstation in 2010. It has a special place in David’s Heart)

Now to the Good STUFF! – Benchmarking the rebuilt CUBE prototype

ANSYS R15.0.7 FEA Benchmarks

Below are the results for the v15sp5 benchmark running distributed parallel on 4-Cores.
aa4

ANSYS R15.0.7 CFD Benchmarks

Below are the results for the aircraft_2m benchmark using parallel processing on 4-Cores.
aa5

This machine is a really cool sleeper computer that is more than capable at whatever I throw at it.

The only thing that worries me is that when Sam handed me the case to get started, David was trying –but failed- to hide a smile that makes me feel that there is something obviously wrong in my first build and I failed to catch it. I guess I will just wait and see.

Home Grown HPC on CUBE Systems

compute-cluster-1

A Little Project Background

Recently I’ve been working on developing a computer vision system for a long standing customer. We are developing software that enables them to use computers to “see” where a particular object is space, and accurately determine its precise location with respect to the camera. From that information, they can do all kinds of useful things.

In order to figure out where something is in 3D space from a 2D image you have to perform what is commonly referred to as pose estimation. It’s a highly interesting problem by itself, but it’s not something I want to focus on in detail here. If you are interested in obtaining more information, you can Google pose estimation or PnP problems. There are, however, a couple of aspects of that problem that do pertain to this blog article. First, pose estimation is typically a nonlinear, iterative process. (Not all algorithms are iterative, but the ones I’m using are.) Second, like any algorithm, its output is dependent upon its input; namely, the accuracy of its pose estimate is dependent upon the accuracy of the upstream image processing techniques. Whatever error happens upstream of this algorithm typically gets magnified as the algorithm processes the input.

The Problem I Wish to Solve

You might be wondering where we are going with HPC given all this talk about computer vision. It’s true that computer vision, especially image processing, is computationally intensive, but I’m not going to focus on that aspect. The problem I wanted to solve was this: Is there a particular kind of pattern that I can use as a target for the vision system such that the pose estimation is less sensitive to the input noise? In order to quantify “less sensitive” I needed to do some statistics. Statistics is almost-math, but just a hair shy. You can translate that statement as: My brain neither likes nor speaks statistics… (The probability of me not understanding statistical jargon is statistically significant. I took a p-test in a cup to figure that out…) At any rate, one thing that ALL statistics requires is a data set. A big data set. Making big data sets sounds like an HPC problem, and hence it was time to roll my own HPC.

The Toolbox and the Solution

My problem reduced down to a classic Monte Carlo type simulation. This particular type of problem maps very nicely onto a parallel processing paradigm known as Map-Reduce. The concept is shown below:
matt-hpc-1

The idea is pretty simple. You break the problem into chunks and you “Map” those chunks onto available processors. The processors do some work and then you “Reduce” the solution from each chunk into a single answer. This algorithm is recursive. That is, any single “Chunk” can itself become a new blue “Problem” that can be subdivided. As you can see, you can get explosive parallelism.

Now, there are tools that exist for this kind of thing. Hadoop is one such tool. I’m sure it is vastly superior to what I ended up using and implementing. However, I didn’t want to invest at this time in learning a specialized tool for this particular problem. I wanted to investigate a lower level tool on which this type of solution can be built. The tool I chose was node.js (www.nodejs.org).

I’m finding Node to be an awesome tool for hooking computers together in new and novel ways. It acts kind of like the post office in that you can send letters and messages and get letters and messages all while going about your normal day. It handles all of the coordinating and transporting. It basically sends out a helpful postman who taps you on the shoulder and says, “Hey, here’s a letter”. You are expected to do something (quickly) and maybe send back a letter to the original sender or someone else. More specifically, node turns everything that a computer can do into a “tap on the shoulder”, or an event. Things like: “Hey, go read this file for me.”, turns into, “OK. I’m happy to do that. I tell you what, I’ll tap you on the shoulder when I’m done. No need to wait for me.” So, now, instead of twiddling your thumbs while the computer spins up the harddrive, finds the file and reads it, you get to go do something else you need to do. As you can imagine, this is a really awesome way of doing things when stuff like network latency, hard drives spinning and little child processes that are doing useful work are all chewing up valuable time. Time that you could be using getting someone else started on some useful work. Also, like all children, these little helpful child processes that are doing real work never seem to take the same time to do the same task twice. However, simply being notified when they are done allows the coordinator to move on to other children. Think of a teacher in a class room. Everyone is doing work, but not at the same pace. Imagine if the teacher could only focus on one child at a time until that child fully finished. Nothing would ever get done!

Here is a little graph of our internal cluster at PADT cranking away on my Monte Carlo simulation.
matt-hpc-2

It’s probably impossible to read the axes, but that’s 1200+ cores cranking away. Now, here is the real kicker. All of the machines have an instance of node running on them, but one machine is coordinating the whole thing. The CPU on the master node barely nudges above idle. That is, this computer can manage and distribute all this work by barely lifting a finger.

Conclusion

There are a couple of things I want to draw your attention to as I wrap this up.

  1. CUBE systems aren’t only useful for CAE simulation HPC! They can be used for a wide range of HPC needs.
  2. PADT has a great deal of experience in software development both within the CAE ecosystem and outside of this ecosystem. This is one of the more enjoyable aspects of my job in particular.
  3. Learning new things is a blast and can have benefit in other aspects of life. Thinking about how to structure a problem as a series of events rather than a sequential series of steps has been very enlightening. In more ways than one, it is also why this blog article exists. My Monte Carlo simulator is running right now. I’m waiting on it to finish. My natural tendency is to busy wait. That is, spin brain cycles watching the CPU graph or the status counter tick down. However, in the time I’ve taken to write this article, my simulator has proceeded in parallel to my effort by eight steps. Each step represents generating and reducing a sample of 500,000,000 pose estimates! That is over 4 billion pose estimates in a little under an hour. I’ve managed to write 1,167 words…

CUBE_Logo_150w

Slide Rules, Logarithms, and Compute Servers

If any of you have been to PADT’s headquarters in Tempe, Arizona, you probably noticed the giant slide rule in the middle of our building.  You can see a portion of it in the picture below, at the top of our Training, Mentoring, and Support group picture.

PADT-TechSupport-Team-Prop

This thing is huge, over 6 feet (2 m) from side to side, in its un-extended position hanging on the wall.

In theory a gigantic slide rule could provide more accuracy, but our trophy, a Kueffel & Esser model 68 1929 copyrighted 1947 and 1961, was intended for teaching purposes in classrooms.  Most engineers had essentially pocket size or belt holder sized slide rules, also known as slip sticks. 

For the real thing, here is a picture of a slide rule used by Eric Miller’s father Col. BT Miller while at West Point from 1955 to 1958 as well as during his Master’s program in 1964.

Burt-Miller-SlideRule-D2

Why do we care about the slide rule today?  Have you ever seen World War II aircraft, submarines, or aircraft carriers?  These were designed using slide rules and/or logarithms.  The early space program?  Slide rules were used then too.  Some phenomenal engineering was accomplished by our predecessors using these devices.  Back then the numerical operations were just a tool to utilize their engineering knowledge.  Now I think we have a tendency to focus on the numerical due to its ease of use and impressive presentation, while perhaps forgetting or at least de-emphasizing the underlying engineering.  That’s not to say that we don’t have great engineers out there; rather it’s a call to energize you all to remember, consider, and utilize your engineering knowledge as you use your simulation tools.

By contrast, here is a picture of PADT’s brand new server room, with cluster machines being put together in the big cabinets.  Hundreds of cores.

servers

What about the giant slide rule?

My father found a thick book at an estate sale a few months ago.  There are a lot of retirees living in Arizona, so estate sales are quite common and popular.  They occur at a life stage when due to death or the need for assisted living, folks are no longer able to live in their home so the contents are sold, clearing out the home and generating some cash for the family.  This particular estate sale was for a retired engineer.  The book caught my father’s eye, first because it was quite thick and second because the title was, Mechanical Engineers’ Handbook.  Figuring it was a bargain for the amazing price of $1.00, he bought it for me.  This book is better known as Marks’ Handbook.  It’s apparently still in publication, at least as late as the 11th Edition in 2006, but the particular edition my father bought for me is the Fifth Edition from 1951.

marks-handbook

Although the slide rule is mostly a curiosity to us today, in 1951 it was state of the art for numerical computation.  While Marks’ has a couple of paragraphs on “Computing Machines”, described as “electrically driven mechanical desk calculators such as the Marchant, Monroe, or Friden”, the slide rule was what I will call the calculator of choice by mechanical engineers at the beginning of the 2nd half of the 20th century. 

As an aside, these mechanical calculators performed multiplication and division, using what I will describe as incredibly complex mechanisms.  Here is a link to a Wikipedia article on the Marchant Calculator:  http://en.wikipedia.org/wiki/Marchant_Calculator

Marks’ Handbook devotes about 3 pages to the operation of the slide rule, starting with simple multiplication and division and then discussing various methods of utilization and various types of slide rules.  It starts off by stating, “The slide rule is an indispensable aid in all problems in multiplication, division, proportion, squares, square roots, etc., in which a limited degree of accuracy is sufficient.” 

The slide rule operates using logarithms.  If you’re not familiar with using logarithms then you are probably younger than me, since I recall learning them in math class in probably junior high in the late 1970’s.  The slide rule uses common logarithms, meaning the log of a number is the exponent needed to raise a base of 10 to get that number.  For example, the common log of 100 is 2.  The common log table in the 1951 edition of Marks shows us that the common log of 4.44 is 0.6474.  For the sake of completeness, the ‘other’ logarithm is the natural log, meaning the base is the irrational number e, approximated as 2.718.

log-table

Getting back to common (base of 10) logs, the math magic is that logarithms allow for shortcuts in fairly complex computations.  For example, log (ab) = log a + log b.  That means if we want to multiple two fairly complicated numbers, we can simply look up the common log of each and add them together.  Similarly, log (a/b) = log a – log b. 

Here is an example, which I will keep simple.  Let’s say we want to multiple 0.0512 by 0.624.  On a calculator this is simple, but what if you are stranded on a remote island and all you have is a log table?  Knowing the equations above, you can look up the log of 0.0512 which is 0.7093-2 and the log of 0.624 which is 0.7952-1.  We now add:
adding_numbers

Writing that sum as a positive decimal minus an integer is important to being able to look up the antilogarithm or number whose log is 0.5045 – 2.

Looking up the number whose log is 0.5045 we get 3.195, using a little bit of linear interpolation.  The “-2” tells us to shift the decimal point to the left twice, meaning our answer is 0.003195.  Thus, using a little addition, some table lookup, a bit of in the head interpolation, and some knowledge on how to shift decimal points, we fairly easily arrive at the product of two three digit fractional numbers.  Now you are free to look for more coconuts on the island.  Or maybe get back to a hatch in the ground where you need to type in the numbers 4, 8, 14, 16, 23, and 42 every 108 minutes.  Oops, I’m really becoming Lost here…

sliderule-book

Getting back to the slide rule, one way to think of it is a graphical representation of the log tables.  In its most basic form, the slide rule consists of two logarithmic scales.  By lining up the scales, the log values can be added or subtracted.  For example, if we want to multiply something simple, like 4 x 6, we simply look from left to right on the scale on the ‘fixed’ portion of the slide rule to get to 4, then slide the moving portion of the slide so that its 1 lines up with the 4 found above on the fixed portion.  We then move left to right on the movable scale to find the 6.  Where the 6 on the movable slide lines up with on the fixed portion is our solution, 24.  What we’ve really done is add the log of 4 to the log of 6 and then find the antilog of that result, which is 24.  Now that we’ve found 24, we’re not Lost

We don’t intend to give detailed instructions on all phases of performing calculations using slide rules here, but hopefully you get the basics of how it is done.  There are plenty of online resources as well as slide rule apps that provide all sorts of details.  Besides multiplication and division, slide rules can be used for squares and square roots.  There are (were) specialty slide rules for other purposes.  Note that with additional knowledge and skill in visually interpolating on a log scale, up to 3 or even 4 significant digits can be determined depending on the size of the slide rule.

ted-slide-ruleThe author, attempting to prove that 4 x 6 is indeed 24

After having studied the Marks’ section on slide rules, experimenting with a slide rule app on an iPad as well as the PADT behemoth on the wall, I conclude that it was a very elegant method for calculating numbers much more quickly than could be done by traditional pencil and paper.  It’s must faster to add and subtract vs. complicated multiplication and long division.  My high school physics teacher actually spent a day or two teaching us how to use slide rules back in the early 1980’s.  By then they had been made functionally obsolete by scientific calculators, so looking back it was perhaps more about nostalgia than the math needed.  It does help me to appreciate the accomplishments made in science and engineering before the advent of numerical computing.

The preparation of this article has made me wonder what the guys and gals who used these tools proficiently back in the 1930’s, 40’s, and 50’s would think if they had access to the kind of compute power we have available today.  It also makes me wonder what people will think of our current tools 50 or 60 years from now.  When I first started in simulation over 25 years ago, it would have seemed quite a stretch to be able to solve simultaneously on hundreds if not thousands of compute cores as can be done today.  Back then we were happy to get time on the one number cruncher we had that was dedicated to ANSYS simulation.

Incidentally, this article was inspired by my colleague David Mastel’s recent blog entry on numerical simulation and how PADT is helping our customers take compute servers and work stations to the next level:

http://www.padtinc.com/blog/the-focus/launch-leave-forget-hpc-and-it-ansys

If you are ever in our PADT headquarters building in Tempe, don’t forget to look for the giant slide rule.  Now you will know its original purpose.

“Launch, Leave & Forget” – A Personal Journey of an IT Manager into Numerical Simulation HPC and how PADT is taking Compute Servers & Workstations to the Next Level

fire_and_forget_missileLaunch, Leave & Forget was a phrase that was first introduced in the 1960’s. Basically the US Government was developing missiles that when fired would no longer be needed to be guided or watched by the pilot. The fighter pilot was directing the missile mostly by line of sight and calculated guesswork off to a target in the distance. The pilot often would be shot down or would break away too early from guiding the launch vehicle. Hoping and guess work is not something we strive for when lives are at stake.

So I say all of that to say this. As it relates to virtual prototyping, Launch, Leave & Forget for numerical simulation is something that I have been striving for at PADT, Inc.
Striving internally and for our 1,800 unique customers that really need our help. We are passionate and desire to empower our customers to become comfortable, feel free to be creative and able to step back and let it go! Many of us have a unique and rewarding opportunity to work with customers from the point of design/or even the first to pick up the phone call. Onward to virtual prototyping, product development, Rapid Manufacturing and lastly on to something you can bring into the physical world. A physical prototype that has already gone through 5000 numerical simulations. Unlike the engineers in the 1960’s who would maybe get one, two or three shots at a working prototype. I think it is amazing that a company could go through 5000 different prototypes before finally introducing one into the real world.

clusterAt PADT I continue to look and search for new ways to Launch, Leave & Forget. One passion of mine is computers. I first started using a computer when I was nine years old. I was programming in BASIC creating complex little FOR NEXT statements before I was in seventh grade. Let’s fast forward… so I arrived at PADT in 2005. I was amazed at the small company I had arrived at, creativity and innovation was bouncing off the ceiling at this company. I had never seen anything like it! Humbled on more than one occasion as most of the ANSYS CFD analysts knew as much about computers as I did! No, not the menial IT tasks like networking, domain user creation, backups. What the PADT CFD/FEA Analysts communicated sometimes loudly was that their computers were slow! Humbled again I would retort but you have the fastest machine in the building. How could it be slow?! Your machine here is faster than our webserver in fact this was going to be our new web server. In 2005 then at a stalemate we would walk away both wondering why they solve was so slow! Over the years I would observe numerous issues. I remember spending hours using this ANSYS numerical simulation software. It was new to me and it was complicated! I would often knock on an Analysts door and ask if they had a couple minutes to show me how to run a simulation. Some of the programs I would have to ask two or three times, ANSYS FEA, ANSYS CFX, FLUENT on and on. Often using a round robin approach because I didn’t want to inconvenience the ANSYS Analysts. Probably some early morning around 3am the various ANSYS programs and the hardware, it all clicked with me. I was off and running ANSYS benchmarks on my own! Freedom!! Now I could experiment with the hardware configs. Armed with the ANSYS Fluent, and ANSYS FEA benchmark suites I wanted to make the numerical simulations run as fast or faster than they ever imagined possible! I wanted to please these ANSYS guys, why because I had never met anyone like these guys. I wanted to give them the power they deserved.

“What is the secret sauce or recipe for creating an effective numerical simulation?”

This is a comment that I would hear often. It could be on a conference call with a new customer or internally from our own ANSYS CFD Analysts and/or ANSYS FEA Analysts. “David, all I really care about is When I click ‘Calculate Run’ within ANSYS when is going to complete.” Or “how can we make this solver run faster?”

The secret sauce recipe? Have we signed an NDA yet? Just kidding. I have had the unique opportunity to not just observe ANSYS but other CFD/FEA code running on compute hardware. Learning better ways of optimizing hardware and software. Here is a fairly typical situation of how a typical process for architecting hardware for use with ANSYS software goes.

Getting Involved Early

When the sales guys let me I am often involved at the very beginning of a qualifying lead opportunity. My favorite time to talk to a customer is when a new customer calls me directly at the office.

Nothing but the facts sir!

I have years’ worth of benchmarking data. Do your users have any benchmarking data? Quickly have them run one of the ANSYS standard benchmarks. Just one benchmark can reveal to you a wealth of information about their current IT infrastructure.

Get your IT team onboard early!

This is a huge challenge! In general here are a few roadblocks that smart IT people have in place:

IT MANAGER RULES 101

1) No! talking to sales people
2) No! talking to sales people on the phone
3) No! talking to sales people via email
4) No! talking to sales people at seminars
5) If your boss emails or calls and says “please talk to this sales person @vulture & hawk”. Wait about a week. Then if the boss emails back and says “did you talk to this salesperson yet?” Pick up the phone and call sales rep @vulture & hawk.

it1What is this a joke? Nope, Most IT groups operate like this. Many are under staffed andin constant fix it mode. Most say and think like this. “I would appreciate it if you sat in my chair for one day. My phone constantly rings, so I don’t pick it up or I let it go to voicemail (until the voicemail box files up). Email constantly swoops in so it goes to junk mail. Seminar invites and meet and greets keep coming in – nope won’t go. Ultimately I know you are going to try to sell me something”.

Who have they been talking to? Do they even know what ANSYS is? I have been humbled over the years when it comes to hardware. I seriously believed the fastest web server at that moment in time would make a fast numerical simulation server.

If I can get on the phone with another IT Manager 90% of the time the walls come down and we can talk our own language. What do they say to me? Well I have had IT Managers and Directors tell me they would never buy a compute cluster or compute workstation from me. “Oh well our policy states that only buy from big boy pants Computer, Inc., mom & pop shop #343,” or the best one was ‘the owner’s nephew. He builds computers on the side.”. They stand behind their walls of policy and circumstance. But, at the end of the calls they are normally asking us to send a quote to them.

repair

So, now what?

Well, do you really know your software? Have you spent hours running different hardware configurations of the same workstation? Observing the read/writes of an eight drive 600GB SAS3 15k RPM 12Gbps RAID 0 configuration. Is 3 drives for the OS and 5 drives for the Solving array the best configuration for the hardware and software? Huh? What’s that?? Oh boy…

Help! My New HPC System is not High Performance!

It is an all too common feeling, that sinking feeling that leads to the phrase “Oh Crap” being muttered under your breath. You just spent almost a year getting management to pay for a new compute workstation, server or cluster. You did the ROI and showed an eight-month payback because of how much faster your team’s runs will be. But now you have the benchmark data on real models, and they are not good. “Oh Crap”

Although a frequent problem, and the root causes are often the same, the solutions can very. In this posting I will try and share with you what our IT and ANSYS technical support staff here at PADT have learned.

Hopefully this article can help you learn what to do to avoid or circumvent any future or current pitfalls if you order an HPC system. PADT loves numerical simulation, we have been doing this for twenty years now. We enjoy helping, and if you are stuck in this situation let us know.

Wall Clock Time

It is very easy to get excited about clock speeds, bus bandwidth, and disk access latency. But if you are solving large FEA or CFD models you really only care about one thing. Wall Clock Time. We cannot tell you how many times we have worked with customers, hardware vendors, and sometimes developers, who get all wrapped up in the optimization of one little aspect of the solving process. The problem with this is that high performance computing is about working in a system, and the system is only as good as its weakest link.

We see people spend thousands on disk drives and high speed disk controllers but come to discover that their solves are CPU bound, adding better disk drives makes no difference. We also see people blow their budget on the very best CPU’s but don’t invest in enough memory to solve their problems in-core. This often happens because when they look at benchmark data they look at one small portion and maximize that measurement, when that measurement often doesn’t really matter.

The fundamental thing that you need to keep in mind while ordering or fixing an HPC system for numerical simulation is this: all that matters is how long it takes in the real world from when you click “Solve” till your job is finished. I bring this up first because it is so fundamental, and so often ignored.

The Causes

As mentioned above, an HPC server or cluster is a system made up of hardware, software, and people who support it. And it is only as good as its weakest link. The key to designing or fixing your HPC system is to look at it as a system, find the weakest links, and improve that links performance. (OK, who remembers the “Weakest Link” lady? You know you kind of miss her…)

In our experience we have found that the cause for most poorly performing systems can be grouped into one of these categories:

  • Unbalanced System for the Problems Being Solved:

    One of the components in the system cannot keep up with the others. This can be hardware or software. More often than not it is the hardware being used. Let’s take a quick look at several gotchas in a misconfigured numerical simulation machine.

  • I/O is a Bottleneck
    Number crunching, memory, and storage are only as fast as the devices that transfer data between them.
  • Configured Wrong

    Out of simple lack of experience the wrong hardware is used, the OS settings are wrong, or drivers are not configured properly.

  • Unnecessary Stuff Added out of Fear

    People tend to overcompensate out of fear that something bad might happen, so they burden a system with software and redundant hardware to avoid a one in a hundred chance of failure, and slow down the other ninety-nine runs in the process.

Avoiding an Expensive Medium Performance Computing (MPC) System

The key to avoiding these situations is to work with an expert who knows the hardware AND the software, or become that expert yourself. That starts with reading the ANSYS documentation, which is fairly complete and detailed.

Often times your hardware provider will present themselves as the expert, and their heart may be in the right place. But only a handful of hardware providers really understand HPC for simulation. Most simply try and sell you the “best” configuration you can afford and don’t understand the causes of poor performance listed above. More often than we like, they sell a system that is great for databases, web serving, or virtual machines. That is not what you need.

A true numerical simulation hardware or software expert should ask you questions about the following, if they don’t, you should move on:

  • What solver will you use the most?
  • What is more important, cost or performance? Or better: Where do you want to be on the cost vs. performance curve?
  • How much scratch space do you need during a solve? How much storage do you need for the files you keep from a run?
  • How will you be accessing the systems, sending data back and forth, and managing your runs?

Another good test of an expert is if you have both FEA and CFD needs, they should not recommend a single system for you. You may be constrained by budget, but an expert should know the difference between the two solvers vis-à-vis HPC and design separate solutions for each.

If they push virtual machines on you, show them the door.

The next thing you should do is step back and take the advice of writing instructors. Start cutting stuff. (I know, if you have read my blog posts for a while, you know I’m not practicing what I preach. But you should see the first drafts…) You really don’t need huge costly UPS’, the expensive archival backup system, or some arctic chill bubbling liquid nitrogen cooling system. Think of it as a race car, if it doesn’t make the car go faster or keep the driver safe, you don’t need it.

A hard but important step in cutting things down to the basics is to try and let go of the emotional aspect. It is in many ways like picking out a car and the truth is, the red paint job doesn’t make it go any faster, and the fancy tail pipes will look good, but also don’t help. Don’t design for the worst-case model either. If 90% of your models run in 32GB or RAM, don’t do a 128GB system for that one run you need to do a year that is that big. Suffer a slow solve on that one and use the money to get a faster CPU, a better disk array, or maybe a second box.

Pull back, be an engineer, and just get what you need. Tape robots look cool, blinky lights and flashy plastic case covers even cooler. Do you really need that? Most of time the numerical simulation cruncher is locked up in a cold dark room. Having an intern move data to USB drives once a month may be a more practical solution.

Another aspect of cutting back is dealing with that fear thing. The most common mistake we see is people using RAID configurations for storing redundant data, not read/write speed. Turn off that redundant writing and dump across as many drives as you can in parallel, RAID 0. Yes you may lose a drive. Yes that means you lose a run. But if that happens once every six months, which is very unlikely, the lost productivity from those lost runs is small compared to the lost productivity of solving all those other runs on a slow disk array.

Intel-AMD-Flunet-Part2-Chart2Lastly, benchmark. This is obvious but often hard to do right. The key is to find real problems that represent a spectrum of the runs you plan on doing. Often different runs, even within the same solver, have different HPC needs. It is a good idea to understand which are more common and bias your design to those. Do not benchmark with standard benchmarks, use industry accepted benchmarks for numerical simulation. Yes it’s an amazing feeling knowing that your new cluster is number 500 on the Top 500 list. However if it is number 5000 on the ANSYS Numerical simulation benchmark list nobody wins.

Fixing the System You Have

As of late we have started tearing down clusters in numerous companies around the US. Of course we would love to sell you new hardware however at PADT, as mentioned before, we love numerical simulation. Fixing your current system may allow you to stretch that investment another year or more. As a co-owner of a twenty year old company, this makes me feel good about that initial investment. When we sick our IT team on extending the life of one of our systems, I start thinking about and planning for that next $150k investment we will need to do in a year or more.

Breathing new life into your existing hardware basically requires almost the same steps as avoiding a bad system in the first place. PADT has sent our team around the country helping companies breath new life into their existing infrastructure. The steps they use are the same but instead of designing stuff, we change things. Work with an expert, start cutting stuff out, breath new life into the growing old hardware, avoid fear and “cool factor” based choices, and verify everything.

Take a look and understand the output from your solvers, there is a lot of data in there. As an example, here is an article we wrote describing some of those hidden gems within your numerical simulation outputs. http://www.padtinc.com/blog/the-focus/ansys-mechanical-io-bound-cpu-bound

Play with things, see what helps and what hurts. It may be time to bring in an outside expert to look at things with fresh eyes.

Do not be afraid to push back against what IT is suggesting, unless you are very fortunate, they probably don’t have the same understanding as you do when it comes to numerical simulation computing. They care about security and minimizing the cost of maintaining systems. They may not be risk takers and they don’t like non-standard solutions. All of these can often result in a system that is configured for IT, and not fast numerical simulation solves. You may have to bring in senior management to solve this issue.

PADT is Here to Help

Cube_Logo_Target1The easiest way to avoid all of this is to simply purchase your HPC hardware from PADT.  We know simulation, we know HPC, and we can translate between engineers and IT.  This is simply because simulation is what we do, and have done since 1994.   We can configure the right system to meet your needs, at that point on the price performance curve you want.  Our CUBE systems also come preloaded and tested with your simulation software, so you don’t have to worry about getting things to work once the hardware shows up.

If you already have a system or are locked in to a provider, we are still here to help.  Our system architects can consult over the phone or in person, bringing their expertise to the table on fixing existing systems or spec’ing new ones.  In fact, the idea for this article came when our IT manager was reconfiguring a customer’s “name brand” cluster here in Phoenix, and he got a call from a user in the Midwest that had the exact same problem.  Lots of expensive hardware, and disappointing performance. They both had the wrong hardware for their problems, system bottlenecks, and configuration issues.

Learn more on our HPC Server and Cluster Performance Tuning page, or by contacting us. We would love to help out. It is what we like to do and we are good at it.

Video Tips: Parallel Part by Part Meshing in ANSYS v15.0

This video shows you a new capability in ANSYS v15.0 that allows multiple parts to be simultaneously meshed on multiple CPU cores…with no additional licenses required!

Exercising Parallel Meshing in ANSYS Mechanical R15

[The following is an email that Manoj sent the tech support staff at PADT. I thought is was perfect for a The Focus posting, so here it is – Eric]

First of all I found out a way to get Mesh Generation time (if no one knew about this).  In ANSYS Mechanical go to Tools->Options->Miscellaneous and turn “Report Performance Diagnostics in Messages” to Yes.  It will give you “Elapsed Time for Last Mesh Generation” in the Messages window.

clip_image001

clip_image002

Next I did a benchmark on the Parallel Part by Part meshing of a Helicopter Rotor Hub with 502 bodies.  The mesh settings were getting a mesh of about 560,026 elements and 1.23 million nodes.

clip_image004

I did Parallel Part by Part Meshing on this model with 1,2,4,6 and 8 cores and here are the results.

Can I say “I LIKE IT!”

1 core: 172 seconds (1.0)
2 core:  89 seconds (1.9)
4 core:  52 seconds (3.3)
6 core:  38 seconds (4.5)
8 core:  33 seconds (5.2)

image

Of course this is a small mesh so as the number of cores goes up, the benefits go down.   I will be doing some testing on some models that take a lot longer to mesh but wanted to start simple. I’ll make a video summarizing that study showing how to set up the whole process and the results.

If you are curious, Manoj is running on a PADT CUBE server. As configured it would cost around $19k. You could drop a few thousand of the price if you changed up cards or went with CPU’s that were not so leading edge.

Here are the SPECs:

CUBE HVPC w8i-KGPU
CUBE Mid-Tower Chassis – 26db quiet edition
Two XEON e5-2637 v2 (4 cores per, 3.5GHz each)
128 GB of DDR3-1600 ECC Reg RAM
NVIDIA QUADRO K5000
NVIDIA TESLA K20x
7.1 HD Audio (to really rock your webinars…)
SMC LSI 2208 RAID Card – 6Gbps
OS Drive: 2 x 256GB SSD 6gbps
Solver Array: 3 x 600GB SAS2 15k RPM 6Gbps

CUBE Systems are Now Part of the ANSYS, Inc. HPC Partner Program

CUBE-HVPC-Logo-wide_thumb.png

The relationship between ANSYS, Inc. and PADT is a long one that runs deep. And that relationship just got stronger with PADT joining the HPC Partner Program with our line of CUBE compute systems specifically designed for simulation. The partner program was set up by ANSYS, Inc. to work:

CUBE-HVPC-512-core-closeup3-1000h_thumb.jpg“… with leaders in high-performance computing (HPC) to ensure that the engineering simulation software is optimized on the latest computing platforms. In addition, HPC partners work with ANSYS to develop specific guidelines and recommended hardware and system configurations. This helps customers to navigate the rapidly changing HPC landscape and acquire the optimum infrastructure for running ANSYS software. This mutual commitment means that ANSYS customers get outstanding value from their overall HPC investment.”

CUBE-HVPC-512-core-stairs-1000h_thumb.jpg

PADT is very excited to be part of this program and to contribute to the ANSYS/HPC community as much as we can.  Users know they can count on PADT’s strong technical expertise with ANSYS Mechanical, ANSYS Mechanical APDL, ANSYS FLUENT, ANSYS CFX, ANSYS Maxwell, ANSYS HFSS, and other ANSYS, Inc. products, a true differentiator when compared with other hardware providers.

Customers around the US have fallen in love with their CUBE workstations, servers, mini-clusters, and clusters finding them to be the right mix between price and performance. CUBE systems let users carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software.

Assembled by PADT’s IT staff, CUBE computing systems are delivered with the customer’s simulation software loaded and tested. We configure each system specifically for simulation, making choices based upon PADT’s extensive experience using similar systems for the same kind of work. We do not add things a simulation user does not need, and focus on the hardware and setup that delivers performance.

CUBE-HVPC-512-core-front1-1000h_thumb.jpg

Is it time for you to upgrade your systems?  Is it time for you to “step out of the box, and step in to a CUBE?”  Download a brochure of typical systems to see how much your money can actually buy, visit the website, or contact us.  Our experts will spend time with you to understand your needs, your budget, and what your true goals are for HPC. Then we will design your custom system to meet those needs.

 

This May Be the Fastest ANSYS Mechanical Workstation we Have Built So Far

The Build Up

Its 6:30am and a dark shadow looms in Eric’s doorway. I wait until Eric finishes his Monday morning company updates. “Eric check this out, the CUBE HVPC w16i-k20x we built for our latest customer ANSYS Mechanical scaled to 16 cores on our test run.” The left eyebrow of Eric’s slightly rises up. I know I have him now I have his full and complete attention.

Why is this huge news?

This is why; Eric knows and probably many of you reading this also know that solving differential equations, distributed, parallel along with using graphic processing unit makes our hearts skip a beat. The finite element method used for solving these equations is CPU intensive and I/O intensive. This is headline news type stuff to us geek types. We love scratching our way along the compute processing power grids to utilize every bit of performance out of our hardware!

Oh and yes a lower time to solve is better! No GPU’s were harmed in this tests. Only one NVIDIA TESLA k20X GPU was used during the test.

Take a Deep Breath and Start from the Beginning:

I have been gathering and hording years’ worth of ANSYS mechanical benchmark data. Why? Not sure really after all I am wanna-be ANSYS Analysts. However, it wasn’t until a couple weeks ago that I woke up to the why again. MY CUBE HVPC team sold a dual socket INTEL Ivy bridge based workstation to a customer out of Washington state. Once we got the order, our Supermicro reseller‘s phone has been bouncing of the desk. After some back and forth, this is how the parts arrive directly from Supermicro, California. Yes, designed in the U.S.A.  And they show up in one big box:

clip_image002[4]

Normal is as Normal Does

As per normal is as normal does, I ran the series of ANSYS benchmarks. You know the type of benchmarks that perform coupled-physics simulations and solving really huge matrix numbers. So I ran ANSYS v14sp-5, ANSYS FLUENT benchmarks and some benchmarks for this customer, the types of runs they want to use the new machine for. So I was talking these benchmark results over with Eric. He thought that now is a perfect time to release the flood of benchmark data. Well some/a smidge of the benchmark data. I do admit the data does get overwhelming so I have tried to trim down the charts and graphs to the bare minimum. So what makes this workstation recipe for the fastest ANSYS Mechanical workstation so special? What is truly exciting enough to tip me over in my overstuffed black leather chair?

The Fastest Ever? Yup we have been Changed Forever

Not only is it the fastest ANSYS Mechanical workstation running on CUBE HVPC hardware.  It uses two INTEL CPU’s at 22 nanometers. Additionally, this is the first time that we have had an INTEL dual socket based workstation continue to gain faster times on and up to its maximum core count when solving in ANSYS Mechanical APDL.

Previously the fastest time was on the CUBE HVPC w16i-GPU workstation listed below. And it peaked at 14 cores. 

Unfortunately we only had time before we shipped the system off to gather two runs: 14 and 16 cores on the new machine. But you can see how fast that was in this table.  It was close to the previous system at 14 cores, but blew past it at 16 whereas the older system actually got clogged up and slowed down:

  Run Time (Sec)
Cores Used Config B Config C Config D
14 129.1 95.1 91.7
16 130.5 99 83.5

And here are the results as a bar graph for all the runs with this benchmark:

CUBE-Benchmark-ANSYS-2013_11_01

  We can’t wait to build one of these with more than one motherboard, maybe a 32 core system with infinband connecting the two. That should allow some very fast run times on some very, very large problems.

ANSYS V14sp-5 ANSYS R14 Benchmark Details

  • Elements : SOLID187, CONTA174, TARGE170
  • Nodes : 715,008
  • Materials : linear elastic
  • Nonlinearities : standard contact
  • Loading : rotational velocity
  • Other : coupling, symentric, matrix, sparse solver
  • Total DOF : 2.123 million
  • ANSYS 14.5.7

Here are the details and the data of the March 8, 2013 workstation:

Configuration C = CUBE HVPC w16i-GPU

  • CPU: 2x INTEL XEON e5-2690 (2.9GHz 8 core)
  • GPU: NVIDIA TESLA K20 Companion Processor
  • GRAPHICS: NVIDIA QUADRO K5000
  • RAM: 128GB DDR3 1600Mhz ECC
  • HD RAID Controller: SMC LSI 2208 6Gbps
  • HDD: (os and apps): 160GB SATA III SSD
  • HDD: (working directory):6x 600GB SAS2 15k RPM 6Gbps
  • OS: Windows 7 Professional 64-bit, Linux 64-bit
  • Other: ANSYS R14.0.8 / ANSYS R14.5

Here are the details from the new, November 1, 2013 workstation:

Configuration D = CUBE HVPC w16i-k20x

  • CPU: 2x INTEL XEON e5-2687W V2 (3.4GHz)
  • GPU: NVIDIA TESLA K20X Companion Processor
  • GRAPHICS: NVIDIA QUADRO K4000
  • RAM: 128GB DDR3 1600Mhz ECC
  • HDD: (os and apps): 4 x 240GB Enterprise Class Samsung SSD 6Gbps
  • HD RAID CONTROLLER: SMC LSI 2208 6Gbps
  • OS: Windows 7 Professional 64-bit, Linux 64-bit
  • Other: ANSYS 14.5.7

You can view the output from the run on the newer box (Configuration D) here:

Here is a picture of the Configuration D machine with the info on its guts:

clip_image006[4]clip_image008[4]

What is Inside that Chip:

The one (or two) CPU that rules them all: http://ark.intel.com/products/76161/

Intel® Xeon® Processor E5-2687W v2

  • Status: Launched
  • Launch Date: Q3’13
  • Processor Number: E5-2687WV2
  • # of Cores: 8
  • # of Thread: 16
  • Clock Speed: 3.4 GHz
  • Max Turbo Frequency: 4 GHz
  • Cache:  25 MB
  • Intel® QPI Speed:  8 GT/s
  • # of QPI Link:  2
  • Instruction Se:  64-bit
  • Instruction Set Extension:  Intel® AVX
  • Embedded Options Available:  No
  • Lithography:  22 nm
  • Scalability:  2S Only
  • Max TDP:  150 W
  • VID Voltage Range:  0.65–1.30V
  • Recommended Customer Price:  BOX : $2112.00, TRAY: $2108.00

The GPU’s that just keep getting better and better:

Features

TESLA C2075

TESLA K20X

TESLA K20

Number and Type of GPU

FERMI

Kepler GK110

Kepler GK110

Peak double precision floating point performance

515 Gflops

1.31 Tflops

1.17 Tflops

Peak single precision floating point performance

1.03 Tflops

3.95 Tflops

3.52 Tflops

Memory Bandwidth (ECC off)

144 GB/sec

250 GB/sec

208 GB/sec

Memory Size (GDDR5)

6GB

6GB

5GB

CUDA Cores

448

2688

2496

clip_image012[4]

Ready to Try one Out?

If you are as impressed as we are, then it is time for you to try out this next iteration of the Intel chip, configured for simulation by PADT, on your problems.  There is no reason for you to be using a CAD box or a bloated web server as your HPC workstation for running ANSYS Mechanical and solving in ANSYS Mechanical APDL.  Give us a call, our team will take the time to understand the types of problems you run, the IT environment you run in, and custom configure the right system for you:

http://www.padtinc.com/products/hardware/cube-hvpc,
email: garrett.smith@padtinc.com,
or call 480.813.4884

Part 2: ANSYS FLUENT Performance Comparison: AMD Opteron vs. Intel XEON

AMD Opteron 6308, INTEL XEON e5-2690 & INTEL XEON e5-2667V2 Comparison using ANSYS FLUENT 14.5.7

Note: The information and data contained in this article was complied and generated on September 12, 2013 by PADT, Inc. on CUBE HVPC hardware using FLUEN 14.5.7.  Please remember that hardware and software change with new releases and you should always try to run your own benchmarks, on your own typical problems, to understand how performance will impact you.

By David Mastel

Due to the response to the original article on this subject,  I thought it would be good to do a quick follow-up using one of our latest CUBE HVPC builds. Again, the ANSYS Fluent standard benchmarks were used in garnering the stats on this dual socket INTEL XEON e5-2667V2 configuration.

CUBE HVPC Test configurations (Same as in last comparison)

  • Server 1: CUBE HVPC c16
  • CPU: 4, AMD Opteron 6308 @ 3.5GHz (Quad Core)
  • Memory: 256GB (32x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • Hardware RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  •  OS: Linux 64-bit / Kernel 2.6.32-358.18.1.e16.x86_64
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI
  • HCA: SMC AOC-UIBQ-M2 – QDR Infiniband
    • The IB card installed however solves were run distributed locally
  • Switch: MELLANOX IS5023 Non-Blocking 18-port switch

Server 2: CUBE HVPC c16i (Intel server from last comparison)

  • CPU: 2, INTEL XEON e5-2690 @ 2.9GHz (Octa Core)
  • Memory: 128GB (16x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Windows 7 Professional 64-bit
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI

Server 3: CUBE HVPC c16ivy (New “Ivy” based Intel server)

  • CPU: 2, INTEL XEON e5-2667V2 @ 3.3 (Octa Core)
  • Memory: 128GB (16x8G) DDR3-1600 ECC Reg. RAM (1600MHz)
  • RAID Controller: Supermicro AOC-S2208L-H8iR 6Gbps, PCI-e x 8 Gen3
  • Hard Drives: Supermicro HDD-A0600-HUS156060VLS60 – Hitachi 600G SAS2.0 15K RPM 3.5″
  • OS: Linux 64-bit / Kernel 2.6.32-358.18.1.e16.x86_64
  • App: ANSYS FLUENT 14.5.7
  • MPI: Platform MPI
  • HCA: SMC – QDR Infiniband
    • The IB card installed however solves were run distributed locally

ANSYS FLUENT 14.5.7 Performance using the ANSYS FLUENT Benchmark suite provided by ANSYS, Inc.

ANSYS Fluent Benchmark page link:http://www.ansys.com/Support/Platform+Support/Benchmarks+Overview/ANSYS+Fluent+Benchmarks

Release ANSYS FLUENT 14.5.7 Test Cases
(20 Iterations each)

  • Reacting Flow with Eddy Dissipation Model (eddy_417k)
  • Single-stage Turbomachinery Flow (turbo_500k)
  • External Flow Over an Aircraft Wing (aircraft_2m)
  • External Flow Over a Passenger Sedan (sedan_4m)
  • External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m)
  • External Flow Over a Truck Body 14m (truck_14m)

Here are the results for all three machines, total and average time:

Intel-AMD-Flunet-Part2-Chart1Intel-AMD-Flunet-Part2-Chart2

 

Summary: Are you sure? Part 2

So I didn’t have to have the “Are you sure?” question with Eric this time and I didn’t bother triple checking the results because indeed, the Ivy Bridge-EP Socket 2011 is one fast CPU! That combined with a 0.022 micron manufacturing process  the data speaks for itself. For example, lets re-dig into the data for the External Flow Over a Truck Body with a Polyhedral Mesh (truck_poly_14m) benchmark and see what we find:

Intel-AMD-FLUENT-Details

 

 

 

 

 

 

 

 

 

 

 

Intel-AMD-FLUENT-summary

 

 

 

 

 

 

 

 

 

 

 

Current Pricing of INTEL® and AMD® CPU’s

Here is the up to the minute pricing for each CPU’s. I took these prices off of NewEgg and IngramMicro’s website. The date of the monetary values was captured on October 4, 2013.

Note AMD’s price per CPU went up and the INTEL XEON e5-2690 went down. Again, these prices based on today’s pricing, October 4, 2013.

AMD Opteron 6308 Abu Dhabi 3.5GHz 4MB L2 Cache 16MB L3 Cache Socket G34 115W Quad-Core Server Processor OS6308WKT4GHKWOF

  •  $501 x 4 = $2004.00

Intel Xeon E5-2690 2.90 GHz Processor – Socket LGA-2011, L2 Cache 2MB, L3 Cache 20 MB, 8 GT/s QPI

  • $1986.48 x 2 = $3972.96

Intel Xeon E5-2667V2 3.3 GHz Processor – Socket LGA-2011, L2 Cache 2MB, L3 Cache 25 MB, 8 GT/s QPI,

  • $1933.88 x 2 = $3867.76

REFERENCES:
http://www.ingrammicro.com
http://www.newegg.com

INTEL XEON e5-2667V2
http://ark.intel.com/products/75273/Intel-Xeon-Processor-E5-2667-v2-25M-Cache-3_30-GHz

INTEL XEON e5-2690
http://ark.intel.com/products/64596/

AMD Opteron 6308
http://www.amd.com/us/Documents/Opteron_6300_QRG.pdf

http://en.wikipedia.org/wiki/Double-precision_floating-point_format

http://en.wikipedia.org/wiki/Central_processing_unit#Integer_range

http://en.wikipedia.org/wiki/Floating_point

STEP OUT OF THE BOX, STEP INTO A CUBE

PADT offers a line of high performance computing (HPC) systems specifically designed for CFD and FEA number crunching aimed at a balance between cost and performance. We call this concept High Value Performance Computing, or HVPC. These systems have allowed PADT and our customers to carry out larger simulations, with greater accuracy, in less time, at a lower cost than name-brand solutions. This leaves you more cash to buy more hardware or software.

Let CUBE HVPC by PADT, Inc. quote you a configuration today!

 

Columbia: PADT’s Killer Kilo-Core CUBE Cluster is Online

iIn the back of PADT’s product development lab is a closet.  Yesterday afternoon PADT’s tireless IT team crammed themselves into the back of that closet and powered up our new cluster, bringing 1104 connected cores online.  It sounded like a jet taking off when we submitted a test FLUENT solve across all the cores.  Music to our ears.

We have recently become slammed with benchmarks for ANSYS and CUBE customers as well as our normal load of services work, so we decided it was time to pull the trigger and double the size of our cluster while adding a storage node.  And of course, we needed it yesterday.  So the IT team rolled up their sleeves, configured a design, ordered hardware, built it up, tested it all, and got it on line, in less than two weeks.  This was while they did their normal IT work and dealt with a steady stream of CUBE sales inquiries.  But it was a labor of love. We have all dreamed about breaking that thousand core barrier on one system, and this was our chance to make it happen.

If you need more horsepower and are looking for a solution that hits that sweet spot between cost and performance, visit our CUBE page at www.cube-hvpc.com and learn more about our workstations, servers, and clusters.  Our team (after they get a little rest) will be more than happy to work with you to configure the right system for your real world needs.

Now that the sales plug is done, lets take a look at the stats on this bad boy:

Name: Columbia
After the class of battlestars in Battlestar Galactica
Brand: CUBE High Value Performance Compute Cluster, by PADT
Nodes: 18
17 compute, 1 storage/control node, 4 CPU per Node
Cores: 1104
AMD Opteron: 4 x 6308 3.5 GHz, 32 x 6278 2.4 GHz, 36 x 6380 2.5 GHz
Interconnect: 18 port MELLANOX IB 4X QDR Infiniband switch
Memory: 4.864 Terabytes
Solve Disk: 43.5 TB RAID 0
Storage Disk: 64 TB RAID 50

Here are some pictures of the build and the final product:

a
A huge delivery from our supplier, Supermicro, started the process. This was the first pallet.

b
The build included installing the largest power strip any of us had ever seen.

c
Building a cluster consists of doing the same thing, over and over and over again.

f
We took over PADT’s clean room because it turns out you need a lot of space to build something this big.

g
It is fun to get the chance to build the machine you always wanted to build

h
2AM Selfie: Still going strong!

d
Almost there. After blowing a breaker, we needed to wait for some more
power to be routed to the closet.

e
Up and running!
Ratchet and Clank providing cooling air containment.

David, Sam, and Manny deserve a big shout-out for doing such a great job getting this thing up and running so fast!

When I logged on to my first computer, a TRS-80, in my high-school computer lab, I never, ever thought I would be running on a machine this powerful.  And I would have told people they were crazy if they said a machine with this much throughput would cost less than $300,000.  It is a good time to be a simulation user!

Now I just need to find a bigger closet for when we double the size again…

CUBE-HVPC-Logo-wide