Você está na página 1de 56

Chapter 31.

Parallel Processing

The following sections describe the parallel-processing features of FLUENT.

• Section 31.1: Introduction to Parallel Processing


• Section 31.2: Starting Parallel FLUENT on a Windows System
• Section 31.3: Starting Parallel FLUENT on a Linux/UNIX System
• Section 31.4: Checking Network Connectivity
• Section 31.5: Partitioning the Grid
• Section 31.6: Checking and Improving Parallel Performance

31.1 Introduction to Parallel Processing


The FLUENT serial solver manages file input and output, data storage, and flow field
calculations using a single solver process on a single computer. FLUENT’s parallel solver
allows you to compute a solution by using multiple processes that may be executing on
the same computer, or on different computers in a network. Figures 31.1.1 and 31.1.2
illustrate the serial and parallel FLUENT architectures.
Parallel processing in FLUENT involves an interaction between FLUENT, a host process,
and a set of compute-node processes. FLUENT interacts with the host process and the
collection of compute nodes using a utility called cortex that manages FLUENT’s user
interface and basic graphical functions.
Parallel FLUENT splits up the grid and data into multiple partitions, then assigns each
grid partition to a different compute process (or node). The number of partitions is an
integral multiple of the number of compute nodes available to you (e.g., 8 partitions for 1,
2, 4, or 8 compute nodes). The compute-node processes can be executed on a massively-
parallel computer, a multiple-CPU workstation, or a network cluster of computers.

i In general, as the number of compute nodes increases, turnaround time


for the solution will decrease. However, parallel efficiency decreases as the
ratio of communication to computation increases, so you should be careful
to choose a large enough problem for the parallel machine.
FLUENT uses a host process that does not contain any grid data. Instead, the host
process only interprets commands from FLUENT’s graphics-related interface, cortex.


c Fluent Inc. September 29, 2006 31-1
Parallel Processing

CORTEX

Solver
File Input/Output
Disk
Data:
Cell
Face
Node

Figure 31.1.1: Serial FLUENT Architecture

CORTEX

HOST File Input/Output


Disk
FLUENT
MPI

COMPUTE NODES
Compute Node 0 Compute Node 1

Data: Data:
Cell Cell
Face Face
Node FLUENT FLUENT Node
MPI MPI

Socket
MP
Data: FLUENT FLUENT Data:
MPI MPI
Cell Cell
Face Face
Node Node

Compute Node 2 Compute Node 3

Figure 31.1.2: Parallel FLUENT Architecture

31-2
c Fluent Inc. September 29, 2006
31.1 Introduction to Parallel Processing

The host distributes those commands to the other compute nodes via a socket inter-
connect to a single designated compute node called compute-node-0. This specialized
compute node distributes the host commands to the other compute nodes. Each compute
node simultaneously executes the same program on its own data set. Communication
from the compute nodes to the host is possible only through compute-node-0 and only
when all compute nodes have synchronized with each other.
Each compute node is virtually connected to every other compute node, and relies on
inter-process communication to perform such functions as sending and receiving arrays,
synchronizing, and performing global operations (such as summations over all cells).
Inter-process communication is managed by a message-passing library. For example,
the message-passing library could be a vendor implementation of the Message Passing
Interface (MPI) standard, as depicted in Figure 31.1.2.
All of the parallel FLUENT processes (as well as the serial process) are identified by
a unique integer ID. The host collects messages from compute-node-0 and performs
operations (such as printing, displaying messages, and writing to a file) on all of the
data, in the same way as the serial solver.

Recommended Usage of Parallel FLUENT


The recommended procedure for using parallel FLUENT is as follows:

1. Start up the parallel solver. See Section 31.2: Starting Parallel FLUENT on a
Windows System and Section 31.3: Starting Parallel FLUENT on a Linux/UNIX
System for details.

2. Read your case file and have FLUENT partition the grid automatically upon loading
it. It is best to partition after the problem is set up, since partitioning has some
model dependencies (e.g., adaption on non-conformal interfaces, sliding-mesh and
shell-conduction encapsulation).
Note that there are other approaches for partitioning, including manual partitioning
in either the serial or the parallel solver. See Section 31.5: Partitioning the Grid
for details.

3. Review the partitions and perform partitioning again, if necessary.


See Section 31.5.6: Checking the Partitions for details on checking your partitions.

4. Calculate a solution. See Section 31.6: Checking and Improving Parallel Performance
for information on checking and improving the parallel performance.


c Fluent Inc. September 29, 2006 31-3
Parallel Processing

31.2 Starting Parallel FLUENT on a Windows System


You can run FLUENT on a Windows system using either command line options or the
graphical user interface.
Information about starting FLUENT on a Windows system is provided in the following
sections:

• Section 31.2.1: Starting Parallel FLUENT on a Windows System Using Command


Line Options
• Section 31.2.2: Starting Parallel FLUENT on a Windows System Using the Graph-
ical User Interface
• Section 31.2.3: Starting Parallel FLUENT with the Fluent Launcher
• Section 31.2.4: Starting Parallel FLUENT with the Microsoft Job Scheduler (win64
Only)

i See the separate installation instructions for more information about in-
stalling parallel FLUENT for Windows. The startup instructions below
assume that you have properly set up the necessary software, based on the
appropriate installation instructions.
Additional information about installation issues can also be found in the Frequently
Asked Questions section of the Fluent Inc. User Services Center (www.fluentusers.com).

31.2.1 Starting Parallel FLUENT on a Windows System Using Command Line


Options
To start the parallel version of FLUENT using command line options, you can use the
following syntax in a Command Prompt window:
fluent version -tnprocs [-pinterconnect] [-mpi=mpi type] -cnf=hosts file
-path\\computer name\share name
where

• version must be replaced by the version of FLUENT you want to run (2d, 3d, 2ddp,
or 3ddp).
• -path\\computer name\share name specifies the computer name and the shared
network name for the Fluent.Inc directory in UNC form.
For example, if FLUENT has been installed on computer1 and shared as fluent.inc,
then you should replace share name by the UNC name for the shared directory,
\\computer1\fluent.inc.

31-4
c Fluent Inc. September 29, 2006
31.2 Starting Parallel FLUENT on a Windows System

• -pinterconnect (optional) specifies the type of interconnect. The ethernet inter-


connect is used by default if the option is not explicitly specified. See Table 31.2.1,
Table 31.2.2, and Table 31.2.3 for more information.

• -mpi=mpi type (optional) specifies the type of MPI. If the option is not specified,
the default MPI for the given interconnect will be used (the use of the default MPI
is recommended). The available MPIs for Windows are shown in Table 31.2.2.

• -cnf=hosts file specifies the hosts file, which contains a list of the computers on
which you want to run the parallel job. If the hosts file is not located in the
directory where you are typing the startup command, you will need to supply the
full pathname to the file.
You can use a plain text editor such as Notepad to create the hosts file. The only
restriction on the filename is that there should be no spaces in it. For example,
hosts.txt is an acceptable hosts file name, but my hosts.txt is not.
Your hosts file (e.g., hosts.txt) might contain the following entries:

computer1
computer2

i The last entry must be followed by a blank line.

If a computer in the network is a multiprocessor, you can list it more than once.
For example, if computer1 has 2 CPUs, then, to take advantage of both CPUs, the
hosts.txt file should list computer1 twice:

computer1
computer1
computer2

• -tnprocs specifies the number of processes to use. When the -cnf option is present,
the hosts file argument is used to determine which computers to use for the parallel
job. For example, if there are 8 computers listed in the hosts file and you want to
run a job with 4 processes, set nprocs to 4 (i.e., -t4) and FLUENT will use the first
4 machines listed in the hosts file.

For example, the full command line to start a 3d parallel job on the first 4 computers
listed in a hosts file called hosts.txt is as follows:

fluent 3d -t4 -cnf=hosts.txt -path\\computer1\fluent.inc


c Fluent Inc. September 29, 2006 31-5
Parallel Processing

The default interconnect (ethernet) and the default communication library (mpich2)
will be used since these options are not specified.

i The first time that you try to run FLUENT in parallel, a separate Command
Prompt will open prompting you to verify the current Windows account
that you are logged into. Press the <Enter> key if the account is correct.
If you have a new account password, enter in your password and press
the <Enter> key, then verify your password and press the <Enter> key.
Once the username and password have been verified and encrypted into
the Windows Registry, then FLUENT parallel will launch.
The supported interconnects for dedicated parallel ntx86 and win64 Windows machines,
the associated MPIs for them, and the corresponding syntax are listed in Tables 31.2.1-
31.2.3:

Table 31.2.1: Supported Interconnects for the Windows Platform

Platform Processor Architecture Interconnects*


Windows 32-bit ntx86 ethernet (default)
64-bit win64 ethernet (default),
infiniband, myrinet

(*) Node processes on the same machine communicate by shared memory.

Table 31.2.2: Available MPIs for Windows Platforms

MPI Syntax Communication Notes


(flag) Library
mpich2 -mpi=mpich2 MPICH2 MPI (1), (2)
ms -mpi=ms Microsoft MPI (1), (2)
net -mpi=net socket (1), (2)

(1) Used with Shared Memory Machine (SSM) where the memory is shared between the processors on
a single machine.
(2) Used with Distributed Memory Machine (DMM) where each processor has it’s own memory associated
with it.

31-6
c Fluent Inc. September 29, 2006
31.2 Starting Parallel FLUENT on a Windows System

Table 31.2.3: Supported MPIs for Windows Architectures (Per Interconnect)

Architecture Ethernet Myrinet Infiniband


ntx86 mpich2 (default), - -
net
win64 mpich2 (default), ms ms
ms, net

31.2.2 Starting Parallel FLUENT on a Windows System Using the Graphical


User Interface
To run parallel FLUENT using the graphical user interface, type the usual startup com-
mand without a version (i.e., fluent -t2), and then use the Select Solver panel (Fig-
ure 31.3.1) to specify the parallel architecture and version information.
File −→Run...

Figure 31.2.1: The Select Solver Panel


c Fluent Inc. September 29, 2006 31-7
Parallel Processing

Perform the following actions:

1. Under Versions, specify the 3D or 2D single- or double-precision version by turning


the 3D and Double Precision options on or off, and turn on the Parallel option.

2. Under Options, select the interconnect or system in the Interconnect drop-down list.
The Default setting is recommended, because it selects the interconnect that should
provide the best overall parallel performance for your dedicated parallel machine.
For a symmetric multi-processor (SMP) system, the Default setting uses shared
memory for communication.
If you prefer to select a specific interconnect, you can choose either Ethernet/Shared
Memory MPI, Myrinet, Infiniband, or Ethernet via sockets. For more information
about these interconnects, see Table 31.2.1, Table 31.2.2, and Table 31.2.3.

3. Set the number of CPUs in the Processes field.

4. (optional) Specify the name of a file containing a list of machines, one per line, in
the Hosts File field.

5. Click the Run button to start the parallel version. No additional setup is required
once the solver starts.

i The first time that you try to run FLUENT in parallel, a separate Command
Prompt will open prompting you to verify the current Windows account
that you are logged into. Press the <Enter> key if the account is correct.
If you have a new account password, enter in your password and press
the <Enter> key, then verify your password and press the <Enter> key.
Once the username and password have been verified and encrypted into
the Windows Registry, then FLUENT parallel will launch.

31-8
c Fluent Inc. September 29, 2006
31.2 Starting Parallel FLUENT on a Windows System

31.2.3 Starting Parallel FLUENT with the Fluent Launcher


The Fluent Launcher (Figure 31.2.2), is a stand-alone Windows application that allows
you to launch FLUENT jobs from a computer with a Windows operating system to a
cluster of computers. Settings made in the Fluent Launcher panel (Figure 31.2.2) are
used to create a FLUENT parallel command. This command will then be distributed to
your network where typically another application may manage the session(s).
You can create a shortcut on your desktop pointing to the Fluent Launcher executable at

FLUENT_INC\fluent6.x\launcher\launcher.exe

where FLUENT INC is the root path to where FLUENT is installed, (i.e., usually the
FLUENT INC environment variable) and x indicates the release version of FLUENT).

Figure 31.2.2: The Fluent Launcher Panel


c Fluent Inc. September 29, 2006 31-9
Parallel Processing

The Fluent Launcher allows you to perform the following:

• Set options for your FLUENT executable, such as indicating a specific release or a
version number.

• Set parallel options, such as indicating the number of parallel processes (or if you
want to run a serial process), and an MPI type to use for parallel computations.

• Set additional options such as specifying the name and location of the current
working folder or a journal file.

When you are ready to launch your serial or parallel application, you can check the valid-
ity of the settings using the Check button (messages are displayed in the Log Information
window). When you are satisfied with the settings, click the Launch button to start the
parallel processes.
To return to your default settings for the Fluent Launcher, based on your current FLUENT
installation, click the Default button. The fields in the Fluent Launcher panel will return
to their original settings.
When you are finished using the Fluent Launcher, click the Close button. Any settings
that you have made in the panel are preserved when you re-open the Fluent Launcher.

31-10
c Fluent Inc. September 29, 2006
31.2 Starting Parallel FLUENT on a Windows System

Setting the Path to FLUENT


You need to specify the location of where FLUENT is installed on your system using the
Fluent.Inc Path field, or click ... to browse through your directory structure to locate the
installation folder (trying to use the UNC path if applicable). Once set, various fields in
the Fluent Launcher (e.g., Release, MPI Types, etc.) are automatically populated with the
available options, depending on the FLUENT installations that are available.

Setting Executable Options with the Fluent Launcher


Under Executable Options, you can indicate the release number, as well as the version of
the FLUENT executable that you want to run.

Specifying a FLUENT Release

Depending on what FLUENT releases are available in the Fluent.Inc Path, you can specify
the number associated with a given release in the Release list. The list is populated with
the FLUENT release numbers that are available in the Fluent Inc. Path field.

Specifying the Version of FLUENT

You can specify the dimensionality and the precision of the FLUENT product using the
Version list. There are four possible choices: 2d, 2ddp, 3d, or 3ddp. The 2d and 3d
options provide single-precision results for two-dimensional or three-dimensional prob-
lems, respectively. The 2ddp and 3ddp options provide double-precision results for two-
dimensional or three-dimensional problems, respectively.

Setting Parallel Options with the Fluent Launcher


Under Parallel Options, you can indicate the number of FLUENT processes, the specific
computer architecture you want to run the processes on, the type of MPI, as well as a
listing of computer nodes that you want to use in the calculations.

Specifying the Number of FLUENT Processes

You can specify the number of FLUENT processes in the Number of Processes field. You
can use the drop-down list to select from pre-set values of serial, 1, 2, 4, 8, 16,
32, or 64, or you can manually enter the number into the field yourself (e.g., 3, 10, etc.).
The range of parallel processes ranges from 1 to 1024. If Number of Processes is equal to
1, you might want to consider running the FLUENT job using the serial setting.


c Fluent Inc. September 29, 2006 31-11
Parallel Processing

Specifying the Computer Architecture

You can specify the computer architecture using the Architecture drop-down list. De-
pending on the selected release, the available options are ntx86 and win64.

Specifying the MPI Type

You can specify the MPI to use for the parallel computations using the MPI Types field.
The list of MPI types varies depending on the selected release and the selected architec-
ture. There are several options, based on the operating system of the parallel cluster.
For more information about the available MPI types, see Tables 31.2.1-31.2.2.

Specifying the List of Machines to Run FLUENT

Specify the hosts file using the Machine List or File field. You can use the ... button to
browse for a hosts file, or you can enter the machine names directly into the text field.
Machine names can be separated either by a comma or a space.

Setting Additional Options with the Fluent Launcher


Under Additional Options, you can specify a working folder and/or a journal file. In
addition, you can specify whether to use the Microsoft Scheduler or whether to use
benchmarking options.

Specifying the Working Folder

You can specify the path of your current working directory using the Working Folder field
or click ... to browse through your directory structure. Note that a UNC path cannot be
set as a working folder.

Specifying a Journal File

You can specify the path and name of a journal file using the Journal File field or click
... to browse through your directory structure to locate the file. Using the journal file,
you can automatically load the case, compile any user-defined functions, iterate until the
solution converges, and write results to a output file.

31-12
c Fluent Inc. September 29, 2006
31.2 Starting Parallel FLUENT on a Windows System

Specifying Whether or Not to Use the Microsoft Job Scheduler (win64 MS MPI Only)

For the Windows 64-bit MS MPI only, you can specify that you want to use the Mi-
crosoft Job Scheduler (see Section 31.2.4: Starting Parallel FLUENT with the Microsoft
Job Scheduler (win64 Only)) by selecting the Use Microsoft Scheduler check box. Once
selected, you can then enter a machine name in the with Head Node text field. If you
are running FLUENT on the head node, then you can keep the field empty. This op-
tion translates into the proper parallel command line syntax for using the Microsoft Job
Scheduler.

Specifying Whether or Not to Use the Fluent Launcher for Benchmarking

If you are creating benchmark cases using parallel FLUENT, you can enable the Benchmark
check box. This option involves having several benchmarking-related files available on
your machine. If you are missing any of the files, the Fluent Launcher informs you of
which files you need and how to locate them.

Fluent Launcher Example


The Fluent Launcher takes the options that you have specified and uses those settings to
create a FLUENT parallel command. This command (displayed in the Log Information
window) will then be distributed to your network where typically another application
may manage the session(s).
For example, if, in the Fluent Launcher panel, you specified your Fluent.Inc Path to be
\\my_computer\Fluent.Inc and under Executable Options, you selected 6.3.20 for the
Release, and 3d for the Version. Then, under Parallel Options, you selected 2 for the
number of Processes, win64 for the Architecture, selected mpich2 in the MPI Types field,
then entered the location of a Z:\fluent.host file in the Machine List or File field. If
you click the Check button, the command is displayed in the Log Information window.
When you click the Launch button, the Fluent Launcher would then generate the following
parallel command:
\\my_computer\Fluent.Inc\ntbin\win64\fluent.exe 3d -r6.3.20 -t2 -mpi=mpich2
-cnf=Z:\fluent.hosts -awin64


c Fluent Inc. September 29, 2006 31-13
Parallel Processing

31.2.4 Starting Parallel FLUENT with the Microsoft Job Scheduler (win64
Only)
The Microsoft Job Scheduler allows you to manage multiple jobs and tasks, allocate
computer resources, send tasks to compute nodes, and monitor jobs, tasks, and compute
nodes.
FLUENT currently supports Windows XP as well as the Windows Server operating sys-
tem (win64 only). The Windows Server operating system includes a “compute cluster
package” (CCP) that combines the Microsoft MPI type (msmpi) and Microsoft Job Sched-
uler. FLUENT provides a means of using the Microsoft Job Scheduler using the following
flag in the parallel command:
-ccp head-node-name
where -ccp indicates the use of the compute cluster package, and head-node-name indi-
cates the name of the head node of the computer cluster.
For example, if you want to use the Job Scheduler, the corresponding command syntax
would be:
fluent 3d -t2 -ccp head-node-name
Likewise, if you do not want to use the Job Scheduler, the following command syntax
can be used with msmpi:
fluent 3d -t2 -pmsmpi -cnf=host

i The first time that you try to run FLUENT in parallel, a separate Command
Prompt will open prompting you to verify the current Windows account
that you are logged into. If you have a new account password, enter in your
password and press the <Enter> key. If you want FLUENT to remember
your password on this machine, press the Y key and press the <Enter> key.
Once the username and password have been verified and encrypted into
the Windows Registry, then FLUENT parallel will launch.

i If you do not want to use the Microsoft Job Scheduler, but you still want
to use msmpi, you will need to stop the Microsoft Compute Cluster MPI
Service through the Control Panel, and you need to start your own version
of SMPD (the process manager for msmpi on Windows) using the following
command on each host on which you want to run FLUENT:
start smpd -d 0

31-14
c Fluent Inc. September 29, 2006
31.3 Starting Parallel FLUENT on a Linux/UNIX System

31.3 Starting Parallel FLUENT on a Linux/UNIX System


You can run FLUENT on a Linux/UNIX system using either command line options or
the graphical user interface.
Information about starting FLUENT on a Linux/UNIX system is provided in the following
sections:

• Section 31.3.1: Starting Parallel FLUENT on a Linux/UNIX System Using Com-


mand Line Options

• Section 31.3.2: Starting Parallel FLUENT on a Linux/UNIX System Using the


Graphical User Interface

• Section 31.3.3: Setting Up Your Remote Shell and Secure Shell Clients

31.3.1 Starting Parallel FLUENT on a Linux/UNIX System Using Command


Line Options
To start the parallel version of FLUENT using command line options, you can use the
following syntax in a command prompt window:
fluent version -tnprocs [-pinterconnect] [-mpi=mpi type] -cnf=hosts file
where

• version must be replaced by the version of FLUENT you want to run (2d, 3d, 2ddp,
or 3ddp).

• -pinterconnect (optional) specifies the type of interconnect. The ethernet inter-


connect is used by default if the option is not explicitly specified. See Table 31.3.1,
Table 31.3.2, and Table 31.3.3 for more information.

• -mpi=mpi type (optional) specifies the type of MPI. If the option is not specified,
the default MPI for the given interconnect will be used (the use of the default MPI
is recommended). The available MPIs for Linux/UNIX are shown in Table 31.3.2.

• -cnf=hosts file specifies the hosts file, which contains a list of the computers on
which you want to run the parallel job. If the hosts file is not located in the
directory where you are typing the startup command, you will need to supply the
full pathname to the file.
You can use a plain text editor to create the hosts file. The only restriction on
the filename is that there should be no spaces in it. For example, hosts.txt is an
acceptable hosts file name, but my hosts.txt is not.


c Fluent Inc. September 29, 2006 31-15
Parallel Processing

Your hosts file (e.g., hosts.txt) might contain the following entries:

computer1
computer2

i The last entry must be followed by a blank line.

If a computer in the network is a multiprocessor, you can list it more than once.
For example, if computer1 has 2 CPUs, then, to take advantage of both CPUs, the
hosts.txt file should list computer1 twice:

computer1
computer1
computer2

• -tnprocs specifies the number of processes to use. When the -cnf option is present,
the hosts file argument is used to determine which computers to use for the parallel
job. For example, if there are 10 computers listed in the hosts file and you want
to run a job with 5 processes, set nprocs to 5 (i.e., -t5) and FLUENT will use the
first 5 machines listed in the hosts file.

For example, to use the Myrinet interconnect, and to start the 3D solver with 4 compute
nodes on the machines defined in the text file called fluent.hosts, you can enter the
following in the command prompt:
fluent 3d -t4 -pmyrinet -cnf=fluent.hosts
Note that if the optional -cnf=hosts file is specified, a compute node will be spawned
on each machine listed in the file hosts file. (If you enter this optional argument, do not
include the square brackets.)
The supported interconnects for parallel Linux/UNIX machines are listed below (Ta-
ble 31.3.1, Table 31.3.2, and Table 31.3.3), along with their associated communication
libraries, the corresponding syntax, and the supported architectures:

31-16
c Fluent Inc. September 29, 2006
31.3 Starting Parallel FLUENT on a Linux/UNIX System

Table 31.3.1: Supported Interconnects for Linux/UNIX Platforms (Per Plat-


form)

Platform Processor Architecture Interconnects/Systems*


Linux 32-bit lnx86 ethernet (default), infiniband,
myrinet
64-bit lnamd64 ethernet (default), infiniband,
myrinet, crayx
64-bit Itanium lnia64 ethernet (default), infiniband,
myrinet, altix
Sun 32-bit ultra vendor** (default), ethernet
64-bit ultra 64 vendor** (default), ethernet
SGI 32-bit irix65 mips4 vendor** (default), ethernet
64-bit irix65 mips4 64 vendor** (default), ethernet
HP 32-bit hpux11 vendor** (default), ethernet
64-bit PA-RISC hpux11 64 vendor** (default), ethernet
64-bit Itanium hpux11 ia64 vendor** (default), ethernet
IBM 32-bit aix51 vendor** (default), ethernet
64-bit aix51 64 vendor** (default), ethernet

(*) Node processes on the same machine communicate by shared memory.


(**) vendor indicates a proprietary vendor interconnect. The specific proprietary interconnects that are
supported are dictated by those that the vendor’s MPI supports.


c Fluent Inc. September 29, 2006 31-17
Parallel Processing

Table 31.3.2: Available MPIs for Linux/UNIX Platforms

MPI Syntax Communication Notes


(flag) Library
hp -mpi=hp HP MPI General purpose for SMPs
and clusters
intel -mpi=intel Intel MPI General purpose for SMPs
and clusters
mpich2 -mpi=mpich2 MPICH2 MPI-2 implementation from
Argonne National
Laboratory. For both SMPs
and Ethernet clusters
mpich -mpi=mpich MPICH1 Legacy MPI from Argonne
National Laboratory
mpichmx -mpi=mpichmx MPICH-MX Only for Myrinet MX
clusters
mvapich -mpi=mvapich MVAPICH Only for Infiniband clusters
sgi -mpi=sgi SGI MPI for Altix Only for SGI Altix systems
(SMP); must start FLUENT
on a system where parallel
node processes are to run
cray -mpi=cray Cray MPI for XD1 Only for Cray XD1 systems
vendor -mpi=vendor Vendor MPI
net -mpi=net socket

31-18
c Fluent Inc. September 29, 2006
31.3 Starting Parallel FLUENT on a Linux/UNIX System

Table 31.3.3: Supported MPIs for Linux/UNIX Architectures (Per Intercon-


nect)

Architecture Ethernet Myrinet Infiniband Proprietary


Systems
lnx86 hp (default), hp hp -
mpich2, net
lnamd64 hp (default), hp (default), hp (default), cray [for -pcrayx]
intel, net mpich-mx intel,
mvapich
lnia64 hp (default), hp hp (default), sgi
intel, net intel [for -paltix]
aix51 64 vendor (default), - - vendor
mpich, net [for -pvendor]
hpux11 64 vendor (default), - - vendor
net [for -pvendor]
hpux11 ia64 vendor (default), - - vendor
net [for -pvendor]
irix65 mpis4 64 vendor (default), - - vendor
mpich, net [for -pvendor]
ultra 64 vendor (default), - - vendor
mpich, net [for -pvendor]
aix51 vendor (default), - - vendor
mpich, net [for -pvendor]
hpux11 vendor (default), - - vendor
mpich, net [for -pvendor]
irix65 mpis4 vendor (default), - - vendor
mpich, net [for -pvendor]
ultra vendor (default), - - vendor
mpich, net [for -pvendor]


c Fluent Inc. September 29, 2006 31-19
Parallel Processing

31.3.2 Starting Parallel FLUENT on a Linux/UNIX System Using the Graphical


User Interface
To run parallel FLUENT using the graphical user interface, type the usual startup com-
mand without a version (i.e., fluent), and then use the Select Solver panel (Figure 31.3.1)
to specify the parallel architecture and version information.
File −→Run...

Figure 31.3.1: The Select Solver Panel

31-20
c Fluent Inc. September 29, 2006
31.3 Starting Parallel FLUENT on a Linux/UNIX System

Perform the following steps:

1. Under Versions, specify the 3D or 2D single- or double-precision version by turning


the 3D and Double Precision options on or off, and turn on the Parallel option.

2. Under Options, select the interconnect or system in the Interconnect drop-down list.
The Default setting is recommended, because it selects the interconnect that should
provide the best overall parallel performance for your dedicated parallel machine.
For a symmetric multi-processor (SMP) system, the Default setting uses shared
memory for communication.
If you prefer to select a specific interconnect, you can choose either Ethernet/Shared
Memory MPI, Myrinet, Infiniband, Altix, Cray, or Ethernet via sockets. For more infor-
mation about these interconnects, see Table 31.3.1, Table 31.3.2, and Table 31.3.3.

3. Set the number of CPUs in the Processes field.

4. (optional) Specify the name of a file containing a list of machines, one per line, in
the Hosts File field.

5. Click the Run button to start the parallel version. No additional setup is required
once the solver starts.

31.3.3 Setting Up Your Remote Shell and Secure Shell Clients


For cluster computing on Linux or UNIX systems, most parallel versions of FLUENT will
need the user account set up such that you can connect to all nodes on the cluster (using
either the remote shell (rsh) client or the secure shell (ssh) client) without having to
enter a password each time for each machine.
Provided that the appropriate server daemons (either rshd or sshd) are running, this
section briefly describes how you can configure your system in order to use FLUENT for
parallel computing.


c Fluent Inc. September 29, 2006 31-21
Parallel Processing

Configuring the rsh Client


The remote shell client (rsh), is widely deployed and used. It is generally easy to con-
figure, and involves adding all the machine names, each on a single line, to the .rhosts
file in your home directory.
If you refer to the machine you are currently logged on as the ‘client’, and if you refer
to the remote machine to which you seek password-less login as the ‘server’, then on
the server, you can add the name of your client machine to the .rhosts file. The name
could be a local name or a fully qualified name with the domain suffix. Similarly, you can
add other clients from which you require similar access to this server. These machines
are then “trusted” and remote access is allowed without the further need for a password.
This setup assumes you have the same userid on all the machines. Otherwise, each line
in the .rhosts file would need to contain the machine name as well as the userid for the
client that you want access to. Please refer to your system documentation for further
usage options.
Note that for security purposes, the .rhosts file must be readable only by the user.

Configuring the ssh Client


The secure shell client (ssh), is a more secure alternative to rsh and is also used widely.
Depending on the specific protocol and the version deployed, configuration involves a few
steps. SSH1 and SSH2 are two current protocols. OpenSSH is an open implementation of
the SSH2 protocol and is backwards compatible with the SSH1 protocol. To add a client
machine, with respect to user configuration, the following steps are involved:

1. Generate a public-private key pair using ssh-keygen (or using a graphical user
interface client). For example:

% ssk-keygen -t dsa

where it creates a Digital Signature Authority (DSA) type key pair.

2. Place your public key on the remote host.


• For SSH1, insert the contents of the client (~/.ssh/identity.pub) into the
server (~/.ssh/authorized_keys).
• For SSH2, insert the contents of the client (~/.ssh/id_dsa.pub) into the server
(~/.ssh/authorized_keys2).

The client machine is now added to the access list and the user is no longer required to
type in a password each time. For additional information, consult your system adminis-
trator or refer to your system documentation.

31-22
c Fluent Inc. September 29, 2006
31.4 Checking Network Connectivity

31.4 Checking Network Connectivity


For any compute node, you can print network connectivity information that includes the
hostname, architecture, process ID, and ID of the selected compute node and all machines
connected to it. The ID of the selected compute node is marked with an asterisk.
The ID for the FLUENT host process is always host. The compute nodes are numbered
sequentially starting from node-0. All compute nodes are completely connected. In
addition, compute node 0 is connected to the host process.
To obtain connectivity information for a compute node, you can use the Parallel Connec-
tivity panel (Figure 31.4.1).
Parallel −→Show Connectivity...

Figure 31.4.1: The Parallel Connectivity Panel

Indicate the compute node ID for which connectivity information is desired in the Com-
pute Node field, and then click the Print button. Sample output for compute node 0 is
shown below:

------------------------------------------------------------------------------
ID Comm. Hostname O.S. PID Mach ID HW ID Name
------------------------------------------------------------------------------
host net balin Linux-32 17272 0 7 Fluent Host
n3 hp balin Linux-32 17307 1 10 Fluent Node
n2 hp filio Linux-32 17306 0 -1 Fluent Node
n1 hp bofur Linux-32 17305 0 1 Fluent Node
n0* hp balin Linux-32 17273 2 11 Fluent Node

O.S is the architecture, Comm. is the communication library (i.e., MPI type), PID is the
process ID number, Mach ID is the compute node ID, and HW ID is an identifier specific
to the interconnect used.


c Fluent Inc. September 29, 2006 31-23
Parallel Processing

31.5 Partitioning the Grid


Information about grid partitioning is provided in the following sections:

• Section 31.5.1: Overview of Grid Partitioning

• Section 31.5.2: Preparing Hexcore Meshes for Partitioning

• Section 31.5.3: Partitioning the Grid Automatically

• Section 31.5.4: Partitioning the Grid Manually

• Section 31.5.5: Grid Partitioning Methods

• Section 31.5.6: Checking the Partitions

• Section 31.5.7: Load Distribution

31.5.1 Overview of Grid Partitioning


When you use the parallel solver in FLUENT, you need to partition or subdivide the grid
into groups of cells that can be solved on separate processors (see Figure 31.5.1). You
can either use the automatic partitioning algorithms when reading an unpartitioned grid
into the parallel solver (recommended approach, described in Section 31.5.3: Partitioning
the Grid Automatically), or perform the partitioning yourself in the serial solver or after
reading a mesh into the parallel solver (as described in Section 31.5.4: Partitioning the
Grid Manually). In either case, the available partitioning methods are those described
in Section 31.5.5: Grid Partitioning Methods. You can partition the grid before or after
you set up the problem (by defining models, boundary conditions, etc.), although it is
better to partition after the setup, due to some model dependencies (e.g., adaption on
non-conformal interfaces, sliding-mesh and shell-conduction encapsulation).

i If your case file contains sliding meshes, or non-conformal interfaces on


which you plan to perform adaption during the calculation, you will have
to partition it in the serial solver. See Sections 31.5.3 and 31.5.4 for more
information.

i If your case file contains a mesh generated by the GAMBIT Hex Core mesh-
ing scheme or the TGrid Mesh/Hexcore menu option (hexcore mesh), you
must filter the mesh using the tpoly utility or TGrid prior to partitioning
the grid. See Section 31.5.2: Preparing Hexcore Meshes for Partitioning
for more information.
Note that the relative distribution of cells among compute nodes will be maintained
during grid adaption, except if non-conformal interfaces are present, so repartitioning
after adaption is not required. See Section 31.5.7: Load Distribution for more information.

31-24
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

If you use the serial solver to set up the problem before partitioning, the machine on
which you perform this task must have enough memory to read in the grid. If your
grid is too large to be read into the serial solver, you can read the unpartitioned grid
directly into the parallel solver (using the memory available in all the defined hosts)
and have it automatically partitioned. In this case you will set up the problem after an
initial partition has been made. You will then be able to manually repartition the case
if necessary. See Sections 31.5.3 and 31.5.4 for additional details and limitations, and
Section 31.5.6: Checking the Partitions for details about checking the partitions.

Domain
Before Partitioning

Interface
Boundary

After Partitioning Partition 0 Partition 1

Figure 31.5.1: Partitioning the Grid


c Fluent Inc. September 29, 2006 31-25
Parallel Processing

31.5.2 Preparing Hexcore Meshes for Partitioning


If you generate meshes using the GAMBIT Hex Core meshing scheme or the TGrid Mesh/Hexcore
menu option (hexcore meshes), you often have features that can interfere with partition-
ing. Such features include hanging nodes and overlapping parent-child faces, and are
located at the transition between the core of hexahedral cells and the surrounding body-
fitted mesh. To remove these features before you partition your hexcore meshes, you
must convert the transitional hexahedral cells into polyhedra. The dimensions of each
of these transitional cells remains the same after conversion, but these transitional cells
will have more than the original 6 faces. The conversion to polyhedra must take place
prior to reading the mesh into FLUENT, and can be done using either the tpoly utility
or TGrid.
When you use the tpoly utility, you must specify an input case file that contains a
hexcore mesh. This file can either be in ASCII or Binary format, and the file should be
unzipped. If the input file does not contain a hexcore mesh, then none of the cells are
converted to polyhedra. When you use the tpoly utility, you should specify an output
case file name. Once the input file has been processed by the tpoly filter, an ASCII
output file is generated.

i The output case file resulting from a tpoly conversion only contains mesh
information. None of the solver-related data of the input file is retained.
To convert a file using the tpoly filter, before starting FLUENT, type the following:

utility tpoly input filename output filename

You can also use TGrid to convert the transitional cells to polyhedra. You must either
read in or create the hexcore mesh in TGrid, and then save the mesh as a case file with
polyhedra. To do this, use the File/Write/Case... menu option, being sure to enable the
Write As Polyhedra option in the Select File dialog box.

Limitations
Converted hexcore meshes have the following limitations:

• The following grid manipulation tools are not available on polyhedral meshes:
– extrude-face-zone under the modify-zone option
– fuse
– skewness smoothing
– swapping (will not affect polyhedral cells)
• The polyhedral cells that result from the conversion are not eligible for adaption.
For more information about adaption, see Chapter 26: Adapting the Grid.

31-26
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

31.5.3 Partitioning the Grid Automatically


For automatic grid partitioning, you can select the bisection method and other options for
creating the grid partitions before reading a case file into the parallel version of the solver.
For some of the methods, you can perform pretesting to ensure that the best possible
partition is performed. See Section 31.5.5: Grid Partitioning Methods for information
about the partitioning methods available in FLUENT.
Note that if your case file contains sliding meshes, or non-conformal interfaces on which
you plan to perform adaption during the calculation, you will need to partition it in the
serial solver, and then read it into the parallel solver, with the Case File option turned
on in the Auto Partition Grid panel (the default setting).
The procedure for partitioning automatically in the parallel solver is as follows:

1. (optional) Set the partitioning parameters in the Auto Partition Grid panel (Fig-
ure 31.5.2).
Parallel −→Auto Partition...

Figure 31.5.2: The Auto Partition Grid Panel

If you are reading in a mesh file or a case file for which no partition information is
available, and you keep the Case File option turned on, FLUENT will partition the
grid using the method displayed in the Method drop-down list.
If you want to specify the partitioning method and associated options yourself, the
procedure is as follows:
(a) Turn off the Case File option. The other options in the panel will become
available.
(b) Select the bisection method in the Method drop-down list. The choices are
the techniques described in Section 31.5.5: Bisection Methods.
(c) You can choose to independently apply partitioning to each cell zone, or you
can allow partitions to cross zone boundaries using the Across Zones check
button. It is recommended that you not partition cells zones independently


c Fluent Inc. September 29, 2006 31-27
Parallel Processing

(by turning off the Across Zones check button) unless cells in different zones
will require significantly different amounts of computation during the solution
phase (e.g., if the domain contains both solid and fluid zones).
(d) If you have chosen the Principal Axes or Cartesian Axes method, you can improve
the partitioning by enabling the automatic testing of the different bisection
directions before the actual partitioning occurs. To use pretesting, turn on
the Pre-Test option. Pretesting is described in Section 31.5.5: Pretesting.
(e) Click OK.
If you have a case file where you have already partitioned the grid, and the number
of partitions divides evenly into the number of compute nodes, you can keep the
default selection of Case File in the Auto Partition Grid panel. This instructs FLUENT
to use the partitions in the case file.

2. Read the case file.


File −→ Read −→Case...

Reporting During Auto Partitioning


As the grid is automatically partitioned, some information about the partitioning process
will be printed in the text (console) window. If you want additional information, you can
print a report from the Partition Grid panel after the partitioning is completed.
Parallel −→Partition...
When you click the Print Active Partitions or Print Stored Partitions button in the Partition
Grid panel, FLUENT will print the partition ID, number of cells, faces, and interfaces, and
the ratio of interfaces to faces for each active or stored partition in the console window.
In addition, it will print the minimum and maximum cell, face, interface, and face-
ratio variations. See Section 31.5.6: Interpreting Partition Statistics for details. You can
examine the partitions graphically by following the directions in Section 31.5.6: Checking
the Partitions.

31.5.4 Partitioning the Grid Manually


Automatic partitioning in the parallel solver (described in Section 31.5.3: Partitioning
the Grid Automatically) is the recommended approach to grid partitioning, but it is
also possible to partition the grid manually in either the serial solver or the parallel
solver. After automatic or manual partitioning, you will be able to inspect the partitions
created (see Section 31.5.6: Checking the Partitions) and optionally repartition the grid,
if necessary. Again, you can do so within the serial or the parallel solver, using the
Partition Grid panel. A partitioned grid may also be used in the serial solver without any
loss in performance.

31-28
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

Guidelines for Partitioning the Grid


The following steps are recommended for partitioning a grid manually:

1. Partition the grid using the default bisection method (Principal Axes) and optimiza-
tion (Smooth).

2. Examine the partition statistics, which are described in Section 31.5.6: Interpret-
ing Partition Statistics. Your aim is to achieve small values of Interface ratio
variation and Global interface ratio while maintaining a balanced load (Cell
variation). If the statistics are not acceptable, try one of the other bisection meth-
ods.

3. Once you determine the best bisection method for your problem, you can turn on
Pre-Test (see Section 31.5.5: Pretesting) to improve it further, if desired.

4. You can also improve the partitioning using the Merge optimization, if desired.

Instructions for manual partitioning are provided below.

Using the Partition Grid Panel


For grid partitioning, you need to select the bisection method for creating the grid par-
titions, set the number of partitions, select the zones and/or registers, and choose the
optimizations to be used. For some methods, you can also perform pretesting to ensure
that the best possible bisection is performed. Once you have set all the parameters in the
Partition Grid panel to your satisfaction, click the Partition button to subdivide the grid
into the selected number of partitions using the prescribed method and optimization(s).
See above for recommended partitioning strategies.
You can set the relevant inputs in the Partition Grid panel (Figure 31.5.3 in the parallel
solver, or Figure 31.5.4 in the serial solver) in the following manner:
Parallel −→Partition...

1. Select the bisection method in the Method drop-down list. The choices are the
techniques described in Section 31.5.5: Bisection Methods.

2. Set the desired number of grid partitions in the Number integer number field. You
can use the counter arrows to increase or decrease the value, instead of typing in
the box. The number of grid partitions must be an integral multiple of the number
of processors available for parallel computing.


c Fluent Inc. September 29, 2006 31-29
Parallel Processing

Figure 31.5.3: The Partition Grid Panel in the Parallel Solver

Figure 31.5.4: The Partition Grid Panel in the Serial Solver

31-30
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

3. You can choose to independently apply partitioning to each cell zone, or you can
allow partitions to cross zone boundaries using the Across Zones check button. It is
recommended that you not partition cells zones independently (by turning off the
Across Zones check button) unless cells in different zones will require significantly
different amounts of computation during the solution phase (e.g., if the domain
contains both solid and fluid zones).

4. You can select Encapsulate Grid Interfaces if you would like the cells surrounding
all non-conformal grid interfaces in your mesh to reside in a single partition at all
times during the calculation. If your case file contains non-conformal interfaces
on which you plan to perform adaption during the calculation, you will have to
partition it in the serial solver, with the Encapsulate Grid Interfaces and Encapsulate
for Adaption options turned on.

5. If you have enabled the Encapsulate Grid Interfaces option in the serial solver, the
Encapsulate for Adaption option will also be available. When you select this op-
tion, additional layers of cells are encapsulated such that transfer of cells will be
unnecessary during parallel adaption.

6. You can activate and control the desired optimization methods (described in Sec-
tion 31.5.5: Optimizations) using the items under Optimizations. You can activate
the Merge and Smooth schemes by turning on the Do check button next to each
one. For each scheme, you can also set the number of Iterations. Each optimization
scheme will be applied until appropriate criteria are met, or the maximum number
of iterations has been executed. If the Iterations counter is set to 0, the optimization
scheme will be applied until completion, without limit on the maximum number of
iterations.

7. If you have chosen the Principal Axes or Cartesian Axes method, you can improve the
partitioning by enabling the automatic testing of the different bisection directions
before the actual partitioning occurs. To use pretesting, turn on the Pre-Test option.
Pretesting is described in Section 31.5.5: Pretesting.

8. In the Zones and/or Registers lists, select the zone(s) and/or register(s) for which
you want to partition. For most cases, you will select all Zones (the default) to
partition the entire domain. See below for details.

9. You can assign selected Zones and/or Registers to a specific partition ID by entering
a value for the Set Selected Zones and Registers to Partition ID. For example, if the
Number of partitions for your grid is 2, then you can only use IDs of 0 or 1. If
you have three partitions, then you can enter IDs of 0, 1, or 2. This can be useful
in situations where the gradient at a region is known to be high. In such cases,
you can mark the region or zone and set the marked cells to one of the partition
IDs, thus preventing the partition from going through that region. This in turn
will facilitate convergence. This is also useful in cases where mesh manipulation


c Fluent Inc. September 29, 2006 31-31
Parallel Processing

tools are not available in parallel. In this case, you can assign the related cells to
a particular ID so that the grid manipulation tools are now functional.
If you are running the parallel solver, and you have marked your region and assigned
an ID to the selected Zones and/or Registers, click the Use Stored Partitions button
to make the new partitions valid.
Refer to the example described later in this section for a demonstration of how
selected registers are assigned to a partition.

10. Click the Partition button to partition the grid.

11. If you decide that the new partitions are better than the previous ones (if the grid
was already partitioned), click the Use Stored Partitions button to make the newly
stored cell partitions the active cell partitions. The active cell partition is used for
the current calculation, while the stored cell partition (the last partition performed)
is used when you save a case file.

12. When using the dynamic mesh model in your parallel simulations, the Partition
panel includes an Auto Repartition option and a Repartition Interval setting. These
parallel partitioning options are provided because FLUENT migrates cells when
local remeshing and smoothing is performed. Therefore, the partition interface be-
comes very wrinkled and the load balance may deteriorate. By default, the Auto
Repartition option is selected, where a percentage of interface faces and loads are au-
tomatically traced. When this option is selected, FLUENT automatically determines
the most appropriate repartition interval based on various simulation parameters.
Sometimes, using the Auto Repartition option provides insufficient results, therefore,
the Repartition Interval setting can be used. The Repartition Interval setting lets you
to specify the interval (in time steps or iterations respectively) when a repartition
is enforced. When repartitioning is not desired, then you can set the Repartition
Interval to zero.

i Note that when dynamic meshes and local remeshing is utilized, updated
meshes may be slightly different in parallel FLUENT (when compared to
serial FLUENT or when compared to a parallel solution created with a
different number of compute nodes), resulting in very small differences in
the solutions.

31-32
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

Example of Setting Selected Registers to Specified Partition IDs


1. Start FLUENT in parallel. The case in this example was partitioned across two
nodes.

2. Read in your case.

3. Display the grid with the Partitions option enabled in the Display Grid panel (Fig-
ure 31.5.5).

Grid
FLUENT 6.3 (2d, segregated, ske)

Figure 31.5.5: The Partitioned Grid

4. Adapt your region and mark your cells (see Section 26.7.3: Performing Region
Adaption). This creates a register.


c Fluent Inc. September 29, 2006 31-33
Parallel Processing

5. Open the Partition Grid panel.

6. Keep the Set Selected Zones and Registers to Partition ID set to 0 and click the
corresponding button. This prints the following output to the FLUENT console
window:

>> 2 Active Partitions:


----------------------------------------------------------------------
Collective Partition Statistics: Minimum Maximum Total
----------------------------------------------------------------------
Cell count 459 459 918
Mean cell count deviation 0.0% 0.0%
Partition boundary cell count 11 11 22
Partition boundary cell count ratio 2.4% 2.4% 2.4%

Face count 764 1714 2461


Mean face count deviation -38.3% 38.3%
Partition boundary face count 13 13 17
Partition boundary face count ratio 0.8% 1.7% 0.7%

Partition neighbor count 1 1


----------------------------------------------------------------------
Partition Method Principal Axes
Stored Partition Count 2
Done.

31-34
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

7. Click the Use Stored Partitions button to make the new partitions valid. This
migrates the partitions to the compute-nodes. The following output is then printed
to the FLUENT console window:

Migrating partitions to compute-nodes.


>> 2 Active Partitions:
P Cells I-Cells Cell Ratio Faces I-Faces Face Ratio Neighbors
0 672 24 0.036 2085 29 0.014 1
1 246 24 0.098 425 29 0.068 1

----------------------------------------------------------------------
Collective Partition Statistics: Minimum Maximum Total
----------------------------------------------------------------------
Cell count 246 672 918
Mean cell count deviation -46.4% 46.4%
Partition boundary cell count 24 24 48
Partition boundary cell count ratio 3.6% 9.8% 5.2%

Face count 425 2085 2461


Mean face count deviation -66.1% 66.1%
Partition boundary face count 29 29 49
Partition boundary face count ratio 1.4% 6.8% 2.0%

Partition neighbor count 1 1


----------------------------------------------------------------------
Partition Method Principal Axes
Stored Partition Count 2
Done.

8. Display the grid (Figure 31.5.6).

9. This time, set the Set Selected Zones and Registers to Partition ID to 1 and click the
corresponding button. This prints a report to the FLUENT console.

10. Click the Use Stored Partitions button to make the new partitions valid and to
migrate the partitions to the compute-nodes.

11. Display the grid (Figure 31.5.7). Notice now that the partition appears in a different
location as specified by your partition ID.

i Although this example demonstrates setting selected registers to specific


partition IDs in parallel, it can be similarly applied in serial.


c Fluent Inc. September 29, 2006 31-35
Parallel Processing

Grid
FLUENT 6.3 (2d, segregated, ske)

Figure 31.5.6: The Partitioned ID Set to Zero

Grid
FLUENT 6.3 (2d, segregated, ske)

Figure 31.5.7: The Partitioned ID Set to 1

31-36
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

Partitioning Within Zones or Registers

The ability to restrict partitioning to cell zones or registers gives you the flexibility to
apply different partitioning strategies to subregions of a domain. For example, if your
geometry consists of a cylindrical plenum connected to a rectangular duct, you may
want to partition the plenum using the Cylindrical Axes method, and the duct using the
Cartesian Axes method.
If the plenum and the duct are contained in two different cell zones, you can select one
at a time and perform the desired partitioning, as described in Section 31.5.4: Using
the Partition Grid Panel. If they are not in two different cell zones, you can create a
cell register (basically a list of cells) for each region using the functions that are used
to mark cells for adaption. These functions allow you to mark cells based on physical
location, cell volume, gradient or isovalue of a particular variable, and other parameters.
See Chapter 26: Adapting the Grid for information about marking cells for adaption.
Section 26.11.1: Manipulating Adaption Registers provides information about manipu-
lating different registers to create new ones. Once you have created a register, you can
partition within it as described above.

i Note that partitioning within zones or registers is not available when Metis
is selected as the partition Method.
For dynamic mesh applications (see item 11 above), FLUENT stores the partition method
used to partition the respective zone. Therefore, if repartitioning is done, FLUENT uses
the same method that was used to partition the mesh.

Reporting During Partitioning

As the grid is partitioned, information about the partitioning process will be printed in
the text (console) window. By default, the solver will print the number of partitions
created, the number of bisections performed, the time required for the partitioning, and
the minimum and maximum cell, face, interface, and face-ratio variations. (See Sec-
tion 31.5.6: Interpreting Partition Statistics for details.) If you increase the Verbosity to
2 from the default value of 1, the partition method used, the partition ID, number of
cells, faces, and interfaces, and the ratio of interfaces to faces for each partition will also
be printed in the console window. If you decrease the Verbosity to 0, only the number of
partitions created and the time required for the partitioning will be reported.
You can request a portion of this report to be printed again after the partitioning is
completed. When you click the Print Active Partitions or Print Stored Partitions button
in the parallel solver, FLUENT will print the partition ID, number of cells, faces, and
interfaces, and the ratio of interfaces to faces for each active or stored partition in the
console window. In addition, it will print the minimum and maximum cell, face, interface,
and face-ratio variations. In the serial solver, you will obtain the same information about
the stored partition when you click Print Partitions. See Section 31.5.6: Interpreting


c Fluent Inc. September 29, 2006 31-37
Parallel Processing

Partition Statistics for details.

i Recall that to make the stored cell partitions the active cell partitions you
must click the Use Stored Partitions button. The active cell partition is
used for the current calculation, while the stored cell partition (the last
partition performed) is used when you save a case file.

Resetting the Partition Parameters

If you change your mind about your partition parameter settings, you can easily return
to the default settings assigned by FLUENT by clicking on the Default button. When you
click the Default button, it will become the Reset button. The Reset button allows you
to return to the most recently saved settings (i.e., the values that were set before you
clicked on Default). After execution, the Reset button will become the Default button
again.

31.5.5 Grid Partitioning Methods


Partitioning the grid for parallel processing has three major goals:

• Create partitions with equal numbers of cells.

• Minimize the number of partition interfaces—i.e., decrease partition boundary sur-


face area.

• Minimize the number of partition neighbors.

Balancing the partitions (equalizing the number of cells) ensures that each processor
has an equal load and that the partitions will be ready to communicate at about the
same time. Since communication between partitions can be a relatively time-consuming
process, minimizing the number of interfaces can reduce the time associated with this
data interchange. Minimizing the number of partition neighbors reduces the chances
for network and routing contentions. In addition, minimizing partition neighbors is
important on machines where the cost of initiating message passing is expensive compared
to the cost of sending longer messages. This is especially true for workstations connected
in a network.
The partitioning schemes in FLUENT use bisection algorithms to create the partitions, but
unlike other schemes which require the number of partitions to be a factor of two, these
schemes have no limitations on the number of partitions. For each available processor,
you will create the same number of partitions (i.e., the total number of partitions will be
an integral multiple of the number of processors).

31-38
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

Bisection Methods
The grid is partitioned using a bisection algorithm. The selected algorithm is applied to
the parent domain, and then recursively applied to the child subdomains. For example,
to divide the grid into four partitions, the solver will bisect the entire (parent) domain
into two child domains, and then repeat the bisection for each of the child domains,
yielding four partitions in total. To divide the grid into three partitions, the solver will
“bisect” the parent domain to create two partitions—one approximately twice as large
as the other—and then bisect the larger child domain again to create three partitions in
total.
The grid can be partitioned using one of the algorithms listed below. The most efficient
choice is problem-dependent, so you can try different methods until you find the one that
is best for your problem. See Section 31.5.4: Guidelines for Partitioning the Grid for
recommended partitioning strategies.

Cartesian Axes bisects the domain based on the Cartesian coordinates of the cells (see
Figure 31.5.8). It bisects the parent domain and all subsequent child subdomains
perpendicular to the coordinate direction with the longest extent of the active
domain. It is often referred to as coordinate bisection.

Cartesian Strip uses coordinate bisection but restricts all bisections to the Cartesian
direction of longest extent of the parent domain (see Figure 31.5.9). You can often
minimize the number of partition neighbors using this approach.

Cartesian X-, Y-, Z-Coordinate bisects the domain based on the selected Cartesian
coordinate. It bisects the parent domain and all subsequent child subdomains
perpendicular to the specified coordinate direction. (See Figure 31.5.9.)

Cartesian R Axes bisects the domain based on the shortest radial distance from the
cell centers to that Cartesian axis (x, y, or z) which produces the smallest interface
size. This method is available only in 3D.

Cartesian RX-, RY-, RZ-Coordinate bisects the domain based on the shortest ra-
dial distance from the cell centers to the selected Cartesian axis (x, y, or z). These
methods are available only in 3D.

Cylindrical Axes bisects the domain based on the cylindrical coordinates of the cells.
This method is available only in 3D.

Cylindrical R-, Theta-, Z-Coordinate bisects the domain based on the selected cylin-
drical coordinate. These methods are available only in 3D.

Metis uses the METIS software package for partitioning irregular graphs, developed by
Karypis and Kumar at the University of Minnesota and the Army HPC Research
Center. It uses a multilevel approach in which the vertices and edges on the fine


c Fluent Inc. September 29, 2006 31-39
Parallel Processing

graph are coalesced to form a coarse graph. The coarse graph is partitioned, and
then uncoarsened back to the original graph. During coarsening and uncoarsen-
ing, algorithms are applied to permit high-quality partitions. Detailed information
about METIS can be found in its manual [172].

i Note that when using the socket version (-pnet), the METIS partitioner
is not available. In this case, METIS partitioning can be obtained using
the partition filter, as described below.

i If you create non-conformal interfaces, and generate virtual polygonal faces,


your METIS partition can cross non-conformal interfaces by using the con-
nectivity of the virtual polygonal faces. This improves load balancing for
the parallel solver and minimizes communication by decreasing the number
of partition interface cells.

Polar Axes bisects the domain based on the polar coordinates of the cells (see Fig-
ure 31.5.12). This method is available only in 2D.

Polar R-Coordinate, Polar Theta-Coordinate bisects the domain based on the se-
lected polar coordinate (see Figure 31.5.12). These methods are available only in
2D.

Principal Axes bisects the domain based on a coordinate frame aligned with the prin-
cipal axes of the domain (see Figure 31.5.10). This reduces to Cartesian bisection
when the principal axes are aligned with the Cartesian axes. The algorithm is also
referred to as moment, inertial, or moment-of-inertia partitioning.
This is the default bisection method in FLUENT.

Principal Strip uses moment bisection but restricts all bisections to the principal axis
of longest extent of the parent domain (see Figure 31.5.11). You can often minimize
the number of partition neighbors using this approach.

Principal X-, Y-, Z-Coordinate bisects the domain based on the selected principal
coordinate (see Figure 31.5.11).

Spherical Axes bisects the domain based on the spherical coordinates of the cells. This
method is available only in 3D.

Spherical Rho-, Theta-, Phi-Coordinate bisects the domain based on the selected
spherical coordinate. These methods are available only in 3D.

31-40
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Contours of Cell Partition

Figure 31.5.8: Partitions Created with the Cartesian Axes Method

3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Contours of Cell Partition

Figure 31.5.9: Partitions Created with the Cartesian Strip or Cartesian X-


Coordinate Method


c Fluent Inc. September 29, 2006 31-41
Parallel Processing

3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Contours of Cell Partition

Figure 31.5.10: Partitions Created with the Principal Axes Method

3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Contours of Cell Partition

Figure 31.5.11: Partitions Created with the Principal Strip or Principal X-


Coordinate Method

31-42
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Contours of Cell Partition

Figure 31.5.12: Partitions Created with the Polar Axes or Polar Theta-
Coordinate Method

Optimizations
Additional optimizations can be applied to improve the quality of the grid partitions.
The heuristic of bisecting perpendicular to the direction of longest domain extent is
not always the best choice for creating the smallest interface boundary. A “pre-testing”
operation (see Section 31.5.5: Pretesting) can be applied to automatically choose the best
direction before partitioning. In addition, the following iterative optimization schemes
exist:

Smooth attempts to minimize the number of partition interfaces by swapping cells


between partitions. The scheme traverses the partition boundary and gives cells to
the neighboring partition if the interface boundary surface area is decreased. (See
Figure 31.5.13.)

Merge attempts to eliminate orphan clusters from each partition. An orphan cluster is
a group of cells with the common feature that each cell within the group has at least
one face which coincides with an interface boundary. (See Figure 31.5.14.) Orphan
clusters can degrade multigrid performance and lead to large communication costs.

In general, the Smooth and Merge schemes are relatively inexpensive optimization tools.


c Fluent Inc. September 29, 2006 31-43
Parallel Processing

Figure 31.5.13: The Smooth Optimization Scheme

Figure 31.5.14: The Merge Optimization Scheme

31-44
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

Pretesting
If you choose the Principal Axes or Cartesian Axes method, you can improve the bisection
by testing different directions before performing the actual bisection. If you choose not
to use pretesting (the default), FLUENT will perform the bisection perpendicular to the
direction of longest domain extent.
If pretesting is enabled, it will occur automatically when you click the Partition button
in the Partition Grid panel, or when you read in the grid if you are using automatic
partitioning. The bisection algorithm will test all coordinate directions and choose the
one which yields the fewest partition interfaces for the final bisection.
Note that using pretesting will increase the time required for partitioning. For 2D prob-
lems partitioning will take 3 times as long as without pretesting, and for 3D problems it
will take 4 times as long.

Using the Partition Filter


As noted above, you can use the METIS partitioning method through a filter in ad-
dition to within the Auto Partition Grid and Partition Grid panels. To perform METIS
partitioning on an unpartitioned grid, use the File/Import/Partition/Metis... menu item.
File −→ Import −→ Partition −→Metis...
FLUENT will use the METIS partitioner to partition the grid, and then read the par-
titioned grid into the solver. The number of partitions will be equal to the number of
processes. You can then proceed with the model definition and solution.

i Direct import to the parallel solver through the partition filter requires
that the host machine has enough memory to run the filter for the specified
grid. If not, you will need to run the filter on a machine that does have
enough memory. You can either start the parallel solver on the machine
with enough memory and repeat the process described above, or run the
filter manually on the new machine and then read the partitioned grid into
the parallel solver on the host machine.
To manually partition a grid using the partition filter, enter the following command:

utility partition input filename partition count output filename

where input filename is the filename for the grid to be partitioned, partition count is
the number of partitions desired, and output filename is the filename for the parti-
tioned grid. You can then read the partitioned grid into the solver (using the standard
File/Read/Case... menu item) and proceed with the model definition and solution.


c Fluent Inc. September 29, 2006 31-45
Parallel Processing

When the File/Import/Partition/Metis... menu item is used to import an unpartitioned


grid into the parallel solver, the METIS partitioner partitions the entire grid. You may
also partition each cell zone individually, using the File/Import/Partition/Metis Zone...
menu item.
File −→ Import −→ Partition −→Metis Zone...
This method can be useful for balancing the work load. For example, if a case has a
fluid zone and a solid zone, the computation in the fluid zone is more expensive than in
the solid zone, so partitioning each zone individually will result in a more balanced work
load.

31.5.6 Checking the Partitions


After partitioning a grid, you should check the partition information and examine the
partitions graphically.

Interpreting Partition Statistics


You can request a report to be printed after partitioning (either automatic or manual) is
completed. In the parallel solver, click the Print Active Partitions or Print Stored Partitions
button in the Partition Grid panel. In the serial solver, click the Print Partitions button.
FLUENT distinguishes between two cell partition schemes within a parallel problem: the
active cell partition and the stored cell partition. Initially, both are set to the cell partition
that was established upon reading the case file. If you re-partition the grid using the
Partition Grid panel, the new partition will be referred to as the stored cell partition. To
make it the active cell partition, you need to click the Use Stored Partitions button in the
Partition Grid panel. The active cell partition is used for the current calculation, while the
stored cell partition (the last partition performed) is used when you save a case file. This
distinction is made mainly to allow you to partition a case on one machine or network
of machines and solve it on a different one. Thanks to the two separate partitioning
schemes, you could use the parallel solver with a certain number of compute nodes to
subdivide a grid into an arbitrary different number of partitions, suitable for a different
parallel machine, save the case file, and then load it into the designated machine.
When you click Print Partitions in the serial solver, you will obtain information about the
stored partition.
The output generated by the partitioning process includes information about the recursive
subdivision and iterative optimization processes. This is followed by information about
the final partitioned grid, including the partition ID, number of cells, number of faces,
number of interface faces, ratio of interface faces to faces for each partition, number
of neighboring partitions, and cell, face, interface, neighbor, mean cell, face ratio, and
global face ratio variations. Global face ratio variations are the minimum and maximum
values of the respective quantities in the present partitions. For example, in the sample

31-46
c Fluent Inc. September 29, 2006
31.5 Partitioning the Grid

output below, partitions 0 and 3 have the minimum number of interface faces (10), and
partitions 1 and 2 have the maximum number of interface faces (19); hence the variation
is 10–19.
Your aim is to achieve small values of Interface ratio variation and Global interface
ratio while maintaining a balanced load (Cell variation).

>> Partitions:
P Cells I-Cells Cell Ratio Faces I-Faces Face Ratio Neighbors
0 134 10 0.075 217 10 0.046 1
1 137 19 0.139 222 19 0.086 2
2 134 19 0.142 218 19 0.087 2
3 137 10 0.073 223 10 0.045 1
------
Partition count = 4
Cell variation = (134 - 137)
Mean cell variation = ( -1.1% - 1.1%)
Intercell variation = (10 - 19)
Intercell ratio variation = ( 7.3% - 14.2%)
Global intercell ratio = 10.7%
Face variation = (217 - 223)
Interface variation = (10 - 19)
Interface ratio variation = ( 4.5% - 8.7%)
Global interface ratio = 3.4%
Neighbor variation = (1 - 2)

Computing connected regions; type ^C to interrupt.


Connected region count = 4

Note that partition IDs correspond directly to compute node IDs when a case file is read
into the parallel solver. When the number of partitions in a case file is larger than the
number of compute nodes, but is evenly divisible by the number of compute nodes, then
the distribution is such that partitions with IDs 0 to (M − 1) are mapped onto compute
node 0, partitions with IDs M to (2M − 1) onto compute node 1, etc., where M is equal
to the ratio of the number of partitions to the number of compute nodes.


c Fluent Inc. September 29, 2006 31-47
Parallel Processing

Examining Partitions Graphically


To further aid interpretation of the partition information, you can draw contours of the
grid partitions, as illustrated in Figures 31.5.8–31.5.12.
Display −→Contours...
To display the active cell partition or the stored cell partition (which are described above),
select Active Cell Partition or Stored Cell Partition in the Cell Info... category of the Contours
Of drop-down list, and turn off the display of Node Values. (See Section 28.1.2: Displaying
Contours and Profiles for information about displaying contours.)

i If you have not already done so in the setup of your problem, you will need
to perform a solution initialization in order to use the Contours panel.

31.5.7 Load Distribution


If the speeds of the processors that will be used for a parallel calculation differ signifi-
cantly, you can specify a load distribution for partitioning, using the load-distribution
text command.
parallel −→ partition −→ set −→load-distribution
For example, if you will be solving on three compute nodes, and one machine is twice as
fast as the other two, then you may want to assign twice as many cells to the first machine
as to the others (i.e., a load vector of (2 1 1)). During subsequent grid partitioning,
partition 0 will end up with twice as many cells as partitions 1 and 2.
Note that for this example, you would then need to start up FLUENT such that compute
node 0 is the fast machine, since partition 0, with twice as many cells as the others, will
be mapped onto compute node 0. Alternatively, in this situation, you could enable the
load balancing feature (described in Section 31.6.2: Load Balancing) to have FLUENT
automatically attempt to discern any difference in load among the compute nodes.

i If you adapt a grid that contains non-conformal interfaces, and you want
to rebalance the load on the compute nodes, you will have to save your case
and data files after adaption, read the case and data files into the serial
solver, repartition using the Encapsulate Grid Interfaces and Encapsulate for
Adaption options in the Partition Grid panel, and save case and data files
again. You will then be able to read the manually repartitioned case and
data files into the parallel solver, and continue the solution from where you
left it.

31-48
c Fluent Inc. September 29, 2006
31.6 Checking and Improving Parallel Performance

31.6 Checking and Improving Parallel Performance


To determine how well the parallel solver is working, you can measure computation and
communication times, and the overall parallel efficiency, using the performance meter.
You can also control the amount of communication between compute nodes in order to
optimize the parallel solver, and take advantage of the automatic load balancing feature
of FLUENT.
Information about checking and improving parallel performance is provided in the fol-
lowing sections:

• Section 31.6.1: Checking Parallel Performance

• Section 31.6.2: Optimizing the Parallel Solver

31.6.1 Checking Parallel Performance


The performance meter allows you to report the wall clock time elapsed during a com-
putation, as well as message-passing statistics. Since the performance meter is always
activated, you can access the statistics by printing them after the computation is com-
pleted. To view the current statistics, use the Parallel/Timer/Usage menu item.
Parallel −→ Timer −→Usage
Performance statistics will be printed in the text window (console).
To clear the performance meter so that you can eliminate past statistics from the future
report, use the Parallel/Timer/Reset menu item.
Parallel −→ Timer −→Reset


c Fluent Inc. September 29, 2006 31-49
Parallel Processing

The following example demonstrates how the current parallel statistics are displayed in
the console window:

Performance Timer for 1 iterations on 4 compute nodes


Average wall-clock time per iteration: 4.901 sec
Global reductions per iteration: 408 ops
Global reductions time per iteration: 0.000 sec (0.0%)
Message count per iteration: 801 messages
Data transfer per iteration: 9.585 MB
LE solves per iteration: 12 solves
LE wall-clock time per iteration: 2.445 sec (49.9%)
LE global solves per iteration: 27 solves
LE global wall-clock time per iteration: 0.246 sec (5.0%)
AMG cycles per iteration: 64 cycles
Relaxation sweeps per iteration: 4160 sweeps
Relaxation exchanges per iteration: 920 exchanges

Total wall-clock time: 4.901 sec


Total CPU time: 17.030 sec

A description of the parallel statistics is as follows:

• Average wall-clock time per iteration describes the average real (wall clock)
time per iteration.
• Global reductions per iteration describes the number of global reduction op-
erations (such as variable summations over all processes). This requires communi-
cation among all processes.
A global reduction is a collective operation over all processes for the given job that
reduces a vector quantity (the length given by the number of processes or nodes) to
a scalar quantity (e.g., taking the sum or maximum of a particular quantity). The
number of global reductions cannot be calculated from any other readily known
quantities. The number is generally dependent on the algorithm being used and
the problem being solved.
• Global reductions time per iteration describes the time per iteration for the
global reduction operations.
• Message count per iteration describes the number of messages sent between all
processes per iteration. This is important with regard to communication latency,
especially on high-latency interconnnects.
A message is defined as a single point-to-point, send-and-receive operation between
any two processes. (This excludes global, collective operations such as global re-
ductions.) In terms of domain decomposition, a message is passed from the process

31-50
c Fluent Inc. September 29, 2006
31.6 Checking and Improving Parallel Performance

governing one subdomain to a process governing another (usually adjacent) subdo-


main.
The message count per iteration is usually dependent on the algorithm being used
and the problem being solved. The message count and the number of messages
that are reported are totals for all processors.
The message count provides some insight into the impact of communication latency
on parallel performance. A higher message count indicates that the parallel perfor-
mance may be more adversely affected if a high-latency interconnect is being used.
Ethernet has a higher latency than Myrinet or Infiniband. Thus, a high message
count will more adversely affect performance with Ethernet than with Infiniband.

• Data transfer per iteration describes the amount of data communicated be-
tween processors per iteration. This is important with respect to interconnect
bandwidth.
Data transfer per iteration is usually dependent on the algorithm being used and the
problem being solved. This number generally increases with increases in problem
size, number of partitions, and physics complexity.
The data transfer per iteration may provide some insight into the impact of com-
munication bandwidth (speed) on parallel performance. The precise impact is often
difficult to quantify because it is dependent on many things including: ratio of data
transfer to calculations, and ratio of communication bandwidth to CPU speed. The
unit of data transfer is a byte.

• LE solves per iteration describes the number of linear systems being solved
per iteration. This number is dependent on the physics (non-reacting versus react-
ing flow) and the algorithms (pressure-based versus density-based solver), but is
independent of mesh size. For the pressure-based solver, this is usually the number
of transport equations being solved (mass, momentum, energy, etc.).

• LE wall-clock time per iteration describes the time (wall-clock) spent doing
linear equation solvers (i.e., multigrid).

• LE global solves per iteration describes the number of solutions on the coarse
level of the AMG solver where the entire linear system has been pushed to a single
processor (n0). The system is pushed to a single processor to reduce the compu-
tation time during the solution on that level. Scaling generally is not adversely
affected because the number of unknowns is small on the coarser levels.

• LE global wall-clock time per iteration describes the time (wall-clock) per
iteration for the linear equation global solutions (see above).

• AMG cycles per iteration describes the average number of multigrid cycles (V,
W, flexible, etc.) per iteration.


c Fluent Inc. September 29, 2006 31-51
Parallel Processing

• Relaxation sweeps per iteration describes the number of relaxation sweeps (or
iterative solutions) on all levels for all equations per iteration. A relaxation sweep
is usually one iteration of Gauss-Siedel or ILU.

• Relaxation exchanges per iteration describes the number of solution commu-


nications between processors during the relaxation process in AMG. This number
may be less than the number of sweeps because of shifting the linear system on
coarser levels to a single node/process.

• Time-step updates per iteration describes the number of sub-iterations on the


time step per iteration.

• Time-step wall-clock time per iteration describes the time per sub-iteration.

• Total wall-clock time describes the total wall-clock time.

• Total CPU time describes the total CPU time used by all processes. This does
not include any wait time for load imbalances or for communications (other than
packing and unpacking local buffers).

The most relevant quantity is the Total wall clock time. This quantity can be used
to gauge the parallel performance (speedup and efficiency) by comparing this quantity to
that from the serial analysis (the command line should contain -t1 in order to obtain the
statistics from a serial analysis). In lieu of a serial analysis, an approximation of parallel
speedup may be found in the ratio of Total CPU time to Total wall clock time.

31-52
c Fluent Inc. September 29, 2006
31.6 Checking and Improving Parallel Performance

31.6.2 Optimizing the Parallel Solver


Increasing the Report Interval
In FLUENT, you can reduce communication and improve parallel performance by increas-
ing the report interval for residual printing/plotting or other solution monitoring reports.
You can modify the value for Reporting Interval in the Iterate panel.
Solve −→Iterate...

i Note that you will be unable to interrupt iterations until the end of each
report interval.

Load Balancing
A dynamic load balancing capability is available in FLUENT. The principal reason for
using parallel processing is to reduce the turnaround time of your simulation, ideally
by a factor proportional to the collective speed of the computing resources used. If, for
example, you were using four CPUs to solve your problem, then you would expect to
reduce the turnaround time by a factor of four. This is of course the ideal situation, and
assumes that there is very little communication needed among the CPUs, that the CPUs
are all of equal speed, and that the CPUs are dedicated to your job. In practice, this is
often not the case. For example, CPU speeds can vary if you are solving in parallel on
a cluster that includes nodes with different clock speeds, other jobs may be competing
for use of one or more of the CPUs, and network traffic either from within the parallel
solver or generated from external sources may delay some of the necessary communication
among the CPUs.
If you enable dynamic load balancing in FLUENT, the load across the computational and
networking resources will be monitored periodically. If the load balancer determines that
performance can be improved by redistributing the cells among the compute nodes, it
will automatically do so. There is a time penalty associated with load balancing itself,
and so it is disabled by default. If you will be using a dedicated homogeneous resource,
or if you are using a heterogeneous resource but have accounted for differences in CPU
speeds during partitioning by specifying a load distribution (see Section 31.5.7: Load
Distribution), then you may not need to use load balancing.

i Note that when the shell conduction model is used, you will not be able to
turn on load balancing.


c Fluent Inc. September 29, 2006 31-53
Parallel Processing

To enable and control FLUENT’s automatic load balancing feature, use the Load Balance
panel (Figure 31.6.1). Load balancing will automatically detect and analyze parallel
performance, and redistribute cells between the existing compute nodes to optimize it.
Parallel −→Load Balance...

Figure 31.6.1: The Load Balance Panel

The procedure for using load balancing is as follows:

1. Turn on the Load Balancing option.

2. Select the bisection method to create new grid partitions in the Partition Method
drop-down list. The choices are the techniques described in Section 31.5.5: Bisection
Methods. As part of the automatic load balancing procedure, the grid will be
repartitioned into several small partitions using the specified method. The resulting
partitions will then be distributed among the compute nodes to achieve a more
balanced load.

3. Specify the desired Balance Interval. When a value of 0 is specified, FLUENT will
internally determine the best value to use, initially using an interval of 25 iterations.
You can override this behavior by specifying a non-zero value. FLUENT will then
attempt to perform load balancing after every N iterations, where N is the specified
Balance Interval. You should be careful to select an interval that is large enough to
outweigh the cost of performing the load balancing operations.

31-54
c Fluent Inc. September 29, 2006
31.6 Checking and Improving Parallel Performance

Note that you can interrupt the calculation at any time, turn the load balancing feature
off (or on), and then continue the calculation.

i If problems arise in your computations due to adaption, you can turn off
the automatic load balancing, which occurs any time that mesh adaption
is performed in parallel.
To instruct the solver to skip the load balancing step, issue the following Scheme com-
mand:

(disable-load-balance-after-adaption)

To return to the default behavior use the following Scheme command:

(enable-load-balance-after-adaption)


c Fluent Inc. September 29, 2006 31-55
Parallel Processing

31-56
c Fluent Inc. September 29, 2006

Você também pode gostar