Parallel Performance: ANSYS FLUENT R13 with Service Pack 2

Recently, PADT has conducted some parallel benchmarks with our Linux cluster.  The model used for the tests is an ANSYS FLUENT *.cas file with 26.5 million cells and 6.5 million nodes.  The physics in this model are fairly simple; it is modeling external, steady airflow over an object.  Simulations of 10 iterations were conducted using as many as 144 processors over three 48-core systems (the machine names are “cs0”, “cs1”, and “cube48”, and are summarized in Table 1).  The first two machines in Table 1 (cs0 and cs1) are connected together via Infiniband, so they effectively form a 96-core machine.  Furthremore, the cs0 and cs1 machines are connected to cube48 via GigE network ports.  Each of the machines has two GigE network ports each which connect to a Gigabit switch to allow for trunking.

Table 1 – Benchmark machine specifications

Machine

Processor Type

Processor Count

cs0

2.3 GHz AMD Opteron 6176SE

48

cs1

2.3 GHz AMD Opteron 6176SE

48

cube48

2.2 GHz AMD Opteron 6176SE

48
 
Two versions of FLUENT were tested; the original version 13 (R13), and the new Service Pack 2 for version 13 (R13 SP2).  The results were scaled according to the one-processor solve time to compute the “speedup” (defined as the solve time on 1 processor divided by the solve time on “N” processors), and are presented in Figure 1 with a comparison to “Linear” or ideal speedup (ideal speedup implies that the speedup value is equal to the number of processors for a given processor count).

image

Figure 1 – Speedup vs. Processors with 6.5 million node ANSYS FLUENT model

The first result in Figure 1 (blue curve) was obtained using as many as 48 processors on the cs0 machine before installing the Service Pack 2 for FLUENT version 13.  As illustrated in Figure 1, the speedup values of the blue result decrease as the number of processors increases.  The next result (green curve with triangles) was calculated using Service Pack 2 for ANSYS FLUENT version 13 on the cube48 machine using as many as 48 processors.  This result on cube48 presents perfect correspondence with Linear speedup, and even displays some values which are “super-linear”.  This behavior led us to suspect that the improvements in Service Pack 2 for ANSYS FLUENT version 13 were the primary cause of the increase in performance demonstrated by the cube48 speedup result.

However, further testing of the cs0 and cs1 machine with Service Pack 2 for ANSYS FLUENT seems to suggest otherwise.  These data are represented by the red squares in Figure 1.  Runs were conducted on as many as 144 processors, which involved a distributed run using 48 processors on cs0, 48 processors on cs1, and 48 processors on cube48.  The general trend of this result is the same as that recorded on cs0 before installing Service Pack 2 for ANSYS FLUENT version 13, suggesting that some other effect (likely machine-related) is present.   Our current hypothesis is that the socket 2 processor (which handles the Infiniband UIO card hardware) is causing the slowdown, primarily because the Infiniband switch is present only on the cs0 + cs1  machine, and not on the cube48 machine.  This was suggested by the manufacturer of Infiniband switch (SuperMicro).  Testing to asses this problem is on-going.