 |

MSC.Nastran V70.7 Distributed
Memory Parallel Test Results
This page contains the results
of several jobs running on different hardware platforms.
Each job was run by hardware partners. For best price performance results
based on Linux please contact Brad
Kindorf
Below is a summary of the
test problems.
| Name |
Ndof |
Description |
SOL |
MEM |
SCR Disk Used |
Total I/O |
Comments |
|
LGQDF |
93,375 |
Cube w/ interior |
108 |
100
Mb |
600
Mb |
1.3
Tb |
76 Frequency
Increments |
|
XLEMF |
658,354 |
Car Body |
111 |
400
Mb |
9000
Mb |
474
Gb |
449 + 39 Roots
|
|
XLRST |
739,651 |
Engine |
101 |
80
Mw |
4000
Mb |
18
Gb |
|
|
XLTDF |
529,257 |
Car Body |
108 |
450
Mb |
5000
Mb |
209
Gb |
32 Frequency
Increments |
|
XXCMD |
1,323,787 |
Car Body |
103 |
800
Mb |
43
Gb |
2.4
Tb |
1088 Roots |
MEM and SCR is per task.
Below is a description of
the hardware used. Detailed computer, disk, network and OS configuration
is listed the bottom of this page.
| Vendor |
Hardware |
O/S |
| |
|
Compaq |
HPC320 (ES40
EV6 500 MHz) |
UNIX 4.0 |
|
HP |
HP 9000 N4000-55
550 MHz PA8600 |
HP-UX 11.0 |
|
IBM |
RS/6000 SP 375
MHz POWER3 SMP |
AIX 4.3 |
|
NEC |
SX-5 Server
(4 GFlops) |
SuperUX 9.2
|
|
SGI |
Origin 2000
/ 400 MHz |
IRIX64 6.5 |
|
Sun |
E6500 400 MHz
|
Solaris 7 |
For best price performance results
based on Linux please contact Brad
Kindorf
| Elapsed
Times (sec) for LGQDF
|
| Vendor
|
serial
|
dmp=2 |
dmp=4 |
dmp=8 |
dmp=16
|
|
|
| Compaq
|
17106 |
8697 |
4610 |
2651 |
1633 |
| HP |
7519 |
4353 |
2110 |
1288 |
804 |
| IBM |
8924 |
4743 |
2495 |
1370 |
862 |
| NEC |
5543 |
3189 |
1765 |
958 |
615 |
| SGI |
16326 |
8518 |
4385 |
2326 |
1265 |
| Sun |
28674 |
18023 |
10250 |
5929 |
2818 |
|
 |

| Elapsed
Times (sec) for XLEMF
|
| Vendor
|
serial
|
dmp=2 |
dmp=4 |
dmp=8 |
|
|
| Compaq
|
8946 |
8226 |
6139 |
|
| HP |
7757 |
5776 |
4177 |
|
| IBM |
7253 |
5381 |
4339 |
3924 |
| NEC |
3397 |
2755 |
2291 |
2010 |
| SGI |
10820 |
9146 |
6851 |
5864 |
| Sun |
26316 |
17848 |
14101 |
12712 |
|
 |
| Elapsed
Times (sec) for XLRST
|
| Vendor
|
serial
|
dmp=2 |
dmp=4 |
dmp=8 |
|
|
| Compaq
|
968 |
713 |
522 |
436 |
| HP |
541 |
445 |
363 |
326 |
| IBM |
670 |
523 |
365 |
273 |
| NEC |
1208 |
1203 |
939 |
780 |
| SGI |
953 |
725 |
530 |
411 |
| Sun |
2255 |
1706 |
965 |
792 |
|
 |
| Elapsed
Times (sec) for XLTDF
|
| Vendor
|
serial
|
dmp=2 |
dmp=4 |
dmp=8 |
dmp=16
|
|
|
| Compaq
|
29941 |
14961 |
8426 |
6312 |
4247 |
| HP |
13934 |
7443 |
3833 |
2176 |
1604 |
| IBM |
15248 |
7920 |
4218 |
2372 |
1467 |
| NEC |
6694 |
3814 |
2499 |
1795 |
1606 |
| SGI |
23102 |
11907 |
6333 |
3726 |
2299 |
| Sun |
53834 |
28400 |
16857 |
13239 |
|
|
 |

| Elapsed
Times (sec) for XXCMD
|
| Vendor
|
serial
|
dmp=2 |
dmp=4 |
|
|
| Compaq
|
48207 |
31292 |
21488 |
| HP |
47399 |
28349 |
14578 |
| IBM |
42792 |
23510 |
16598 |
| NEC |
10519 |
7929 |
5718 |
| SGI |
55724 |
33453 |
22218 |
|
 |
Additional Comments:
Disk Issues:
- SMP and NUMA systems
run faster by using unique file systems per task. It is vitally important
that these systems use different scratch disks for each task. Serious
performance problems may result if this recomendation is not followed.
An example of this performance degradation can been seen below in which
the same disk was used for all processors:
Sun/Solaris Specific:
Additional Information:
Hardware Details:
- Compaq
Model: HPC320 (ES40 EV6 500 MHz)
OS Level: UNIX 4.0
Number of CPUs: 4 per node. DMP jobs were spread across nodes
with 1 CPU per node. The dmp=16 jobs used 2 CPUs
per node.
Memory: One node had 8Gb, two had 4Gb, 5 had 2 Gb
For XXCMD 4 nodes each had 4Gb
Cluster Network: Memory channel 2
Hardware Options: 4 Mb L2 cache, Cross bar interconnect to CPU and memory.
- HP
Model: 9000/800/N4000 550 Mhz PA8600, PA-RISC 2.0
OS Level: HP-UX B.11.0
Number of CPUs: 4 per node. dmp=2 jobs all ran on a single system,
4, 8, and 16 cpu jobs ran on cluster of 4 machines.
Memory: 16Gb/node
Virtual Mem: swap was configured as approx. 1.5x real memory
Cluster network: Hyperfabric (myrinet) high-speed interconnect
Disk: fibre channel striped array: 2 controllers, 10 disks
(10000 rpm, 18gb)
(each system had 1 of these)
- IBM
Model: RS/6000 SP with 375 MHz POWER3 SMP nodes
OS Level: AIX 4.3
Number of CPUs: 4 per node. DMP jobs were spread across nodes
with 1 CPU per node.
Memory: 8 Gb/node
Disk: The scratch file system on each node was striped across
eight 9.1 Gb SSA disks.
Notes: The following command was used to optimize I/O performance.
It is recommended for AIX systems running MSC.Nastran.
/usr/samples/kernel/vmtune -p5 -P10 -r8 -R128
-f120 -F560 -W128
- NEC
Model: SX-5 Server (4 GFlops)
OS Level: SuperUX 9.2
Number of CPUs: 16. All DMP jobs were made on a single system.
Memory: 128 Gb (real memory)
- SGI
Model: Origin 2000 / 400 MHz
OS Level: IRIX64 6.5
Number of CPUs: 32 cpus, 16 nodes with 2 cpus per node. All DMP jobs
were made on a single system with 1 process per node.
Memory: 64 Gb
Notes: There is an error in mpirun if the number of tasks
is greater or equal to 8. Therefore, runs with
8+ processors were manipulated manually. Please contact
MSC or SGI support for a workaround.
- SUN
Model: E6500 400 MHz
OS Level: Solaris 7
Number of CPUs: All jobs except xltdf w/ 16 CPUs ran on a single system:
a 30 cpu E6500 with 2 A5200 storage arrays.
The 16-processor job xltdf ran on a cluster of
2 nodes, each a 30-cpu E6500 with 2 A5200 arrays.
Additionally, for two runs (xltd[4,8]), a 12 cpu
E4500 was used with 2 T310 storage arrays to
illustrate perf. advantage of hardware RAID.
Memory: 1Gb/processor
Virtual Mem: swap was configured as approx. 2x real memory
Cluster network: Gigabit ethernet
MPI/HPC Level: HPC3.1 SunMPI libraries
Disk: Two types of external storage devices were used:
1) E6500 with 2 A5200 fibre channel striped arrays:
2 controllers, each A5200 with 11 disks (10,000 rpm, 9gb)
2) E4500 with 2 T310 fibre channel stripped arrays:
2 controllers, each T310 with 9 disks (10,000 rpm, 18gb)
|
 |