Você está na página 1de 42

Introduction to Gompute

Table of Contents
Introduction......................................................................................................................................5
Commonly used commands.............................................................................................................5
qstat.............................................................................................................................................5
qrsh..............................................................................................................................................5
qmon............................................................................................................................................5
qmod............................................................................................................................................5
qalter............................................................................................................................................5
qhold............................................................................................................................................6
qrls...............................................................................................................................................6
Frequent Operations.........................................................................................................................6
Disable a node.............................................................................................................................6
Enable a node..............................................................................................................................6
Increase job priority....................................................................................................................6
Decrease job priority...................................................................................................................6
Suspend a job..............................................................................................................................6
Resume a suspended job.............................................................................................................
!dditional in"ormation....................................................................................................................
!dditional support pro#ided by $omputeSer#er.............................................................................
%clicenses.........................................................................................................................................&
Setup............................................................................................................................................&
Con"i%uration..............................................................................................................................&
$CS$EFle'()D..........................................................................................................................**
Introduction...............................................................................................................................**
Setup..........................................................................................................................................**
Con"i%urin% the inter"ace..........................................................................................................**
Optional trans"ormation commands.....................................................................................*+
De#eloper In"ormation...................................................................................................................*+
)odulari,ed En#ironment.............................................................................................................*-
.here is $omputeSer#er located/................................................................................................*-
Standard components in $omputeSer#er......................................................................................*-
Sun $rid En%ine.............................................................................................................................*-
)odules.........................................................................................................................................*-
0a#a................................................................................................................................................*-
1uic2 3ips......................................................................................................................................+4
Con"i%uration.................................................................................................................................+4
5rea2do6n................................................................................................................................+4
)odule...........................................................................................................................................+5
De"inition..................................................................................................................................+5
Constructor................................................................................................................................+5
Resource Calculations...............................................................................................................+6
7rolo% $eneration.....................................................................................................................+6
!pplication Command $eneration...........................................................................................+6
Gridcore AB Orgnr: 556629-7866
Ashebergsgatan 46 Telephone: 0! !8 2! 60
4!! G"teborg
#$eden
Complete Script.........................................................................................................................+
Con"i%uration Files........................................................................................................................-8
5oot )odule..................................................................................................................................-8
Command (ine..........................................................................................................................-8
!pplication )appin%................................................................................................................-*
5ase )odule..................................................................................................................................-*
!pplication )odule.......................................................................................................................-*
Sections..........................................................................................................................................-*
9eader.......................................................................................................................................-*
$rid En%ine...............................................................................................................................-*
7rolo%........................................................................................................................................-+
Command..................................................................................................................................-+
Epilo%........................................................................................................................................-+
Complete script..............................................................................................................................-+
.rapper Script )ethods................................................................................................................--
System )ethods.............................................................................................................................--
9elper methods..............................................................................................................................--
mappin%.c"%...............................................................................................................................-5
%sub.c"%.....................................................................................................................................-5
:alues...................................................................................................................................-5
;ip..................................................................................................................................................-
3ar..................................................................................................................................................-
3ar<$;...........................................................................................................................................-
7recompilation...............................................................................................................................-=
From )!3(!5........................................................................................................................-=
Standalone.................................................................................................................................-=
>sin% the 5inary.......................................................................................................................-&
Compile in 0ob...............................................................................................................................-&
7lottin% Crashes the :?C Session.................................................................................................-&
?et6or2 error............................................................................................................................48
Connection re"used...................................................................................................................48
!uthentication error..................................................................................................................48
0ob runnin% slo6.......................................................................................................................4*
0ob not startin%..........................................................................................................................4*
0ob in error state........................................................................................................................4*
0ob crashin%...............................................................................................................................4*
Gridcore AB Orgnr: 556629-7866
Ashebergsgatan 46 Telephone: 0! !8 2! 60
4!! G"teborg
#$eden
Introduction
$omputeSer#er is $ridcore@s pre pac2a%ed application distribution.
$omputeSer#er ran%es "rom basic mathematical librariesA C!DA meshin% and #isuali,ation so"t6are
to sel"Bcontained continuum mechanics sol#ers. !ll $omputeSer#er so"t6are components are
bundled 6ith a compliant en#ironmentA necessary "or proper e'ecution o" the so"t6are .
$omputeSer#er is a complement o" the usual operatin% systems supported by the so"t6are
pro#idersA "or the time bein% Suse (inu' Enterprise Ser#er and Red 9at Enterprise (inu'. Some o"
the applications supported by $omputeSer#er areC all !nsys products De.%. FluentA !nsys FE)A
CFEA etc...FA !baqusA (SBDynaA OpenFO!)A )SC productsA etc...
Gridcore AB Orgnr: 556629-7866
Ashebergsgatan 46 Telephone: 0! !8 2! 60
4!! G"teborg
#$eden
Queue System
Introduction
$omputeSer#er includes $rid En%ine as the queue system "or load mana%ement and optimal
utili,ation o" resources. $rid En%ine by itsel" is an open source product and pro#ides a lot o"
"le'ibility in policy mana%ement. In this section o" the $omputeSer#er documentationA a "e6
commonly used $rid En%ine commands are hi%hli%htedA ho6e#erA users are stron%ly encoura%ed to
read the man pa%es included 6ith the $omputeSer#er installation to %ain a better understandin% o"
the queue system.
Commonly used commands
qstat
qstat can be used to display the status o" the jobs and queues.
qstat Displays your jobs
qstat u "*" Displays all jobs
qstat -f Full listing of queues
man qstat Exhaustive information
qrsh
qrsh can be used to obtain an interacti#e shell scheduled throu%h the queue system.
man qrsh Exhaustive information
qmon
$raphical administration tool "or %rid en%ine.
qmod
)odi"y queue states.
qmod -d all.q Disables all.q
qmod -e all.q Enables all.q
man qmod Exhaustive information
qalter
qalter can be used to alter job requests and priorities "or jobs already in the queue. ?ote that
ordinary users can only decrease the priority o" their o6n jobs 6hile mana%ers can increase or
Gridcore AB Orgnr: 556629-7866
Ashebergsgatan 46 Telephone: 0! !8 2! 60
4!! G"teborg
#$eden
decrease priorities "or all jobs.
man qalter Exhaustive information
qhold
qhold is used to set a queued job in hold. 9oldin% a job in the queue 6ill pre#ent the scheduler "rom
attemptin% to schedule the job.
3his is use"ul 6hen you ha#e a lot o" jobs in the queued and 6aitin% stateA 6aitin% to run on the
clusterA and you reali,e that you 6ould 6ant to let a job at the bac2 o" the queue %et scheduled
be"ore jobs 6hich are at the head o" the queue. E#ery user can per"orm this operation on his<her jobA
6hile the mana%er can do this "or all jobs.
man qhold Exhaustive Information
qrls
qrls can be used to release a job placed on hold.
man qrls Exhaustive Information
Frequent Operations
Disable a node
3o disable a particular nodeA in this case node8*
qmod -d *.q@node01
Enable a node
3o enable a particular nodeA in this case node8*
qmod -e *.q@node01
Increase job priority
3o increase priority o" a queued job 6ith id *46. D 3o be run as queue system mana%er F
qalter -p 100 146
Decrease job priority
3o increase priority o" a queued job 6ith id *46. D 3o be run as queue system mana%er F
Gridcore AB Orgnr: 556629-7866
Ashebergsgatan 46 Telephone: 0! !8 2! 60
4!! G"teborg
#$eden
qalter -p -100 146
Suspend a job
Suspend a job 6ith id *46
qmod -s 146
Sends a suspend si%nal to the job i.e similar to kill -SO!. 3he beha#ior o" suspendin% a job is #ery
dependent on ho6 the application handles the kill -SO! si%nal.
"esume a suspended job
Resume a job 6ith id *46
qmod -us 146
Resumes a job pre#iously suspended i.e send kill -CO# to the job. !s in the case o" job
suspensionA the beha#ior o" the application is dependent on ho6 it handles these si%nals.
$dditional in%ormation
!dditional in"ormation on $rid En%ine can be "ound at
http:%%$i&is's(n'co)%displa*%Grid+ngine%,o)e
$dditional support pro&ided by GomputeSer&er
E#en thou%h the queue system 6ill send rele#ant 2ill si%nals to the jobsA it is up to the job to handle
these si%nals in an e""icient 6ay to handle suspension and resumption. 9o6e#erA it is possible to
in#o2e custom commands durin% the suspension and resumption procedure to tri%%er custom
suspensionA resumption and termination procedures. 3his can be done by placin% scripts 6ith the
names terminateA suspend or resume in the 6or2in% directory o" the job. 3hese scriptsA i" presentA
6ill be in#o2ed 6ith the job@s 7ID bein% passed as an ar%ument to them.
3he de"ault procedures 6hich are in#o2ed are located in the directory 'opt'(cdistro'(cportal'sbin.
Gridcore AB Orgnr: 556629-7866
Ashebergsgatan 46 Telephone: 0! !8 2! 60
4!! G"teborg
#$eden
)icense Inte(ration
3he $omputeSer#er license inte%ration consists o" t6o main partsC
(clicenses is responsible "or startin% the license ser#er.
GCSGEFle*)+D inte%rates the queue system 6ith the license ser#ers.
(clicenses
Setup
%clicenses is a %eneric init script pro#ided 6ith $omputeSer#er to con"i%ure start<stop<reload o" license
ser#ers. 3his init script is placed as 'etc'init,d'(clicenses.
Con%i(uration
Con"i%uration "or the license ser#ers is directly placed in the init script. Separate sections to con"i%ure
start<stop<restart<reload<status are pro#ided in the init script. Since each license ser#er has its o6n custom
method to per"orm these operationsA only the Fle'() ser#er details are pro#ided in this documentation. !s
an e'ampleA 6e pro#ide the con"i%uration "or the !?SGS license mana%er and the CDBadapco license
mana%er on S(ES systems. )ultiple license ser#ers can be con"i%ured in their indi#idual @i"@ loops. 7lease
note thatA the !$- to the license ser#er commands may ha#e to be altered on your installation.
## Start section for Ansys an !D"Aapco license managers
start)
echo "Starting gclicenses: "
if -! ""#" -o ""#" $$ "ans%s" & ' then
echo -n " ans%s "
su - gcadmin -c ()opt)gcdistro)app)ans%s)licensing)start*ans%sli(
# #emember status an be verbose
rc*status -+
fi
''
if -! ""#" -o ""#" $$ "star" & ' then
echo -n " star "
su gcadmin -c ()opt)gcdistro)app)cdadapco),-./lm*11.0)1in)lmgrd 2
-c )opt)gcdistro)app)cdadapco)license.dat -l 2
)opt)gcdistro)app)cdadapco)license.log(
# #emember status an be verbose
rc*status -+
fi
''
## Stop section for A$S%S an !D"aapco license managers
stop)
echo "Shutting do3n gclicenses: "
if -! ""#" -o ""#" $$ "ans%s" & ' then
echo -n " ans%s "
su - gcadmin -c ()opt)gcdistro)app)ans%s)licensing)stop*ans%sli 4)de+)null(
# #emember status an be verbose
rc*status -+
fi
if -! ""#" -o ""#" $$ "star" & ' then
echo -n " star "
su - gcadmin -c ()opt)gcdistro)app)cdadapco),-./lm*11.0)1in)lmutil 2
lmdo3n -q -c )opt)gcdistro)app)cdadapco)license.dat(
# #emember status an be verbose
rc*status -+
fi
''
## #eloa section for A$S%S an !D"aapco license managers
reload)
echo "5eload gclicenses: "
if -! ""#" -o ""#" $$ "ans%s" & ' then
echo -n " ans%s "
su - gcadmin -c ()opt)gcdistro)licensing)lmutil lmreread -c 2
)opt)gcdistro)app)ans%s)licensing)license.dat(
# #emember status an be verbose
rc*status -+
fi
if -! ""#" -o ""#" $$ "star" & ' then
echo -n " star "
su - gcadmin -c ()opt)gcdistro)app)cdadapco),-./lm*11.0)1in)lmutil 2
lmreread -c )opt)gcdistro)login0#)app)cdadapco)license.dat(
# #emember status an be verbose
rc*status -+
fi
''
## Status section for A$S%S an !D"aapco license managers
status)
echo "Status of gclicenses: "
if -! ""#" -o ""#" $$ "ans%s" & ' then
echo -n " ans%s "
)s1in)chec6proc )opt)gcdistro)app)ans%s)licensing)ans%slmd
# $&'E( rc)status *no+s that +e calle this init script +ith
# ,status, option an aapts its messages accoringly.
rc*status -+
fi
if -! ""#" -o ""#" $$ "star" & ' then
echo -n " star "
)s1in)chec6proc )opt)gcdistro)app)cdadapco),-./lm*11.0)1in)cdlmd
# $&'E( rc)status *no+s that +e calle this init script +ith
# ,status, option an aapts its messages accoringly.
rc*status -+
fi
''
## #estart section for A$S%S an !D"aapco license managers
restart)
## Stop the service an regarless of +hether it +as
## running or not- start it again.
"0 stop "#
sleep 7
"0 start "#
# #emember status an be quiet
rc*status
''
GCSGEFle*)+D
Introduction
$CS$EFle'()D is the inter"ace bet6een $ompute Ser#er and Fle'() based license ser#ers. 3his inter"ace
is responsible "or ma2in% sure that the count o" a#ailable and "ree licenses is maintained 6ithin the queue
system. 3he license "eatures are con"i%ured automatically as consumables 6ithin the queue system. >sers
submittin% jobs to the queue system can request these consumables to ma2e sure that their jobs are started
only 6hen su""icient resources are a#ailable. 3hese consumables can be requested usin% the standard $rid
En%ine comple' request "la%s. man . comple* "or more details.
Setup
3he %cs%e"le'lmd is located in the "ollo6in% directory
E'ecutable <opt<%cdistro<%cportal<sbin<%cs%e"le'lmd
Con"i%uration <opt<%cdistro<etc<%cs%e"le'lmd.con"
Start script <etc<init.d<%cs%e"le'lmd HstartIstopIstatusIrestartJ
Con%i(urin( the inter%ace
%cs%e"le'lmd can be con"i%ured to inter"ace 6ith multiple license ser#ers. It is also possible to only con"i%ure
the monitorin% o" speci"ic license "eatures in each license ser#er. 3hese settin%s are made in the "ile
'opt'(cdistro'etc'(cs(e%le*lmd,con%
! portion o" the con"i% "ile is presented belo6. 3hese are the three lines required "or con"i%urin% an inter"ace
to a sin%le license ser#er.
,.89:5.S*1$"" # Flex./ features to monitor
-;*-<=.>S.*,<-.*1$"" # etails of license server
/,5;*1$"" # optional transformation commans
!s an e'ampleA let us con"i%ure the inter"ace to an !?SGS Fluent license ser#er.
(et us assume that !?SGS Fluent requires ac"dK"luentKsol#er and anshpcKpac2 license "eatures to be
a#ailable. FurtherA let us assume that the license ser#er ser#in% these "eatures is installed on a machine licser#
and listenin% on port *855.
In order to con"i%ure a success"ul inter"ace 6e 6ould setup the %cs%e"le'lmd con"i% "ile in the "ollo6in%
6ay.
,.89:5.S*1$"acfd*fluent*sol+er anshpc*pac6" # Flex./ features to monitor
-;*-<=.>S.*,<-.*1$"1077@licser+" # etails of license server
/,5;*1$"" # optional transformation commans
Once the con"i%uration is doneA %cs%e"le'lmd needs to be restarted usin% the init script.
)etc)init.d)gcsgefle?lmd restart
?o6A you should be able to see the count o" a#ailable licenses automatically updated in the queue system.
3he qhost command can be used "or chec2in% this.
qhost -, @ grep acfd*fluent*sol+er @ head -1
qhost -, @ grep anshpc*pac6 @ head -1
Optional trans%ormation commands
For certain applications li2e ?astranA !?SGS Fluent etcA there is a possibility to combine di""erent sets o"
license "eatures "or runnin% the application. !s an e'ampleA !?SGS Fluent can use either ac"dKsol#er or
ac"dK"luentKsol#er "eatures to start the sol#er. In these casesA it is easier to be able to present both these
"eatures 6ith the same "eature name in the queue system. 3he trans"ormation options in the %cs%e"le'lmd
con"i%uration are intended "or this purpose.
3o combine and present both ac"dKsol#er and ac"dK"luentKsol#er as ac"dK"luentKsol#er in the queue system
an entry as belo6 needs to be made in the %cs%e"le'lmd con"i%uration.
,.89:5.S*1$"acfd*fluent*sol+er anshpc*pac6" # Flex./ features to monitor
-;*-<=.>S.*,<-.*1$"1077@licser+" # etails of license server
/,5;*1$"acfd*fluent:acfd*fluent*sol+er" # optional transformation commans
De&eloper In%ormation
FE$/"ES01
a space separated list o" license "eatures to trac2 ()K(ICE?SEKFI(EK* B a colon separated list o" F(EElm
ser#ers to query EFR)K* B a space separated list o" "eature name trans"ormations o" the "orm
oldnameCne6name
FE$/"ES02
a space separated list o" license "eatures to trac2 ()K(ICE?SEKFI(EK+ B a colon separated list o" F(EElm
ser#ers to query EFR)K+ B a space separated list o" "eature name trans"ormations o" the "orm
oldnameCne6name D3he second set o" #ariables is used i" the "eatures reported are incompatible bet6een
some license ser#ersF
)+S$
the lmstat command to use 6hile queryin% ser#ers
!"OD/CS01
a space separated list o" products to trac2 R()K(ICE?SEK* B a colon separated list o" R() ser#ers to query
EFR)K7K* B a space separated list o" product name trans"ormations o" the "orm oldnameCne6name
!"OD/CS02
a space separated list o" products to trac2 R()K(ICE?SEK+ B a colon separated list o" R() ser#ers to query
EFR)K7K+
a space separated list o" product name trans"ormations o" the "orm oldnameCne6name D3he second set o"
#ariables is used i" the products reported are incompatible bet6een some license ser#ersF
")+S$
the rlmstat command to use 6hile queryin% ser#ers
S)EE!
the inter#al to sleep bet6een updatin% "eature status
Setup and Con%i(uration
+odulari3ed En&ironment
$omputeSer#er@s modulari,ed en#ironment is desi%ned to cause minimum impact to a production system
6hen handlin% chan%e requests.
4here is GomputeSer&er located5
$omputeSer#er is located in the directory 'opt'(cdistro on the master machine o" your cluster. 3his directory
is e'ported to all the cluster nodes i.e the directory 'opt'(cdistro is uni#ersal across your cluster.
Standard components in GomputeSer&er
3he "ollo6in% components are a standard in e#ery installation o" $omputeSer#er.
Sun Grid En(ine
Sun $rid En%ine is the de"ault Distributed Resource )ana%er used in all the $omputeSer#er installations on
(inu'.
+odules
)odules is an open source so"t6are pac2a%e that pro#ides "or the dynamic modi"ication o" a user@s
en#ironment #ia module"iles. 3his is the system used "or s6itchin% bet6een di""erent #ersions o" installed
so"t6are components in $omputeSer#er. It is important to understand ho6 to create module "iles "or ne6ly
installed applications on $omputeSer#er so as not to corrupt the system. 7lease re"er to -,o$ to .nstall a
ne$ Application in Go)p(te#er/er0 on ho6 to install ne6 applications in $omputeSer#er.
6a&a
3he 0a#a Run En#ironment is installed under 'opt'(cdistro'ja&a, 3he latest stable #ersion o" 0a#a a#ailable
at the time o" installation o" your $omputeSer#er is usually installed here.
Directory Structure o% GomputeSer&er
3he directory structure o" $omputeSer#er in a standard installation is as "ollo6sC
'opt'(cdistro
5ase directory
'opt'(cdistro'etc
Contains the con"i%uration "iles "or $omputeSer#er. )ainly there is a "ile called as
'opt'(cdistro'etc'clustersettin(s,sh 6hich contains all the basic con"i%uration settin%s required "or your
installation o" $omputeSer#er.
'opt'(cdistro'app
3his is the directory under 6hich all the commercial applications to be used on $omputeSer#er are to be
installed. !pplications li2e !nsys FluentA CFEA etc are to be installed here.
Sometimes non commercial so"t6are can also be installed here but al6ays under its o6n sub directory.
!s an e'ampleC 3he Fluent installation on $omputeSer#er has 'opt'(cdistro'app'%luent set as the
F)/E#0I#C directory.
'opt'(cdistro's(e
3his is the directory 6here Sun $rid En%ine is installed and this directory is set as the 7SGE0"OO in the
en#ironment settin%.
'opt'(cdistro'modules
3he directory under 6hich modules is installed. 3he #ersion installed is al6ays the latest stable #ersion
a#ailable at the time o" installation o" your $omputeSer#er.
'opt'(cdistro'modules'module%iles
3he directory 6hich contains the module"iles %enerated "or each application installed in $omputeSer#er.
?ote that al6ays application module "iles must e'ist in their o6n subdirectory 6hich ma2es it easier to
include ne6 module "iles "or ne6 #ersions o" the same application. !s an e'ample
3he module "iles "or Fluent could loo2 somethin% li2e
'opt'(cdistro'modules'module%iles'%luent'8,9,28
'opt'(cdistro'modules'module%iles'%luent'8,2,18
'opt'(cdistro'modules'module%iles'%luent'12
'opt'(cdistro'modules'module%iles'%luent',&ersion
Each o" the numbered "iles abo#e contains en#ironment settin%s "or each o" the respecti#e #ersions o" Fluent
and the de"ault #ersion is speci"ied in the ,&ersion "ile.
'opt'(cdistro'packa(es
3his is the directory 6here all the so"t6are installation "iles "or the di""erent applications installed on
$omputeSer#er are placed.
/sin( +odules
In"ormation on modules and 6ritin% ne6 module "iles can be "ound at http:%%)od(les'so(rce1orge'net
man moduleA man 4 module"ile
Some e'amples usin% modulesC
module a+ail
Displays the list o" all a#ailable modules and their #ersions.
module list
Displays the list o" currently loaded modules
module load Amodule*name4
>pdates the shell en#ironment 6ith #alues described in the module "ile called LmoduleKnameM
E'ampleC
module load fluent)1#
>pdates the en#ironment 6ith settin%s required "or Fluent #ersion *+.
module unload Amodule*name4
>pdates the shell en#ironment and remo#es #alues described in the module "ile called LmoduleKnameM
E'ampleC
module unload fluent)1#
Clears the en#ironment 6ith settin%s required "or Fluent #ersion *+.
-o: to Install a ne: $pplication in GomputeSer&er
Follo6 the steps belo6 to install a ne6 application on $omputeSer#er.
!s "ar as possible per"orm the steps belo6 as the user %cadmin. I" you ha#e the root pass6ord to your cluster
you can become %cadmin by doin%
su B gcadmin
*. Create a subdirectory "or the application you 6ant to install under 'opt'(cdistro'app. ?ame the
directory in such a 6ay that its easily understood 6hich application is installed there. For e'ample i"
you are installin% FluentA you 6ould call the directory as 'opt'(cdistro'app'%luent.
+. Run the application and any o" its associated pro%ram@s installer and speci"y the directory to install in
as the one you created in the step abo#e. For e'ample "or a "luent installation you 6ould speci"y
Fluent.Inc as 'opt'(cdistro'app'%luent.
-. ?e#er place any application speci"ic en#ironment #ariables directly in your ,bashrc or ,pro%ile or
similar pre"erences "iles. !ll the application related en#ironment settin%s must be done in
module"iles. Each application installed must ha#e a subirectory to hold its module"iles under
'opt'(cdistro'modules'module%iles. >nder this directory you must ha#e #ersion speci"ic module"iles
"or each #ersion o" the application installed. For samples you may loo2 at the module "iles in
'opt'(cdistro'modules'module%iles'%luent 6hich contains module "iles "or the "luent application.
4. !ny settin%s "or license ser#ers "or the installed application must be done in the module "iles
directory called as 'opt'(cdistro'modules'module-%iles';app0name<-licenses'G)O=$). 3hin%s
li2e pointin% to a license ser#er must be done here.
!s an e'ample t6o module "iles are presented hereA one "or Fluent #ersion 6.-.+6 and one "or "luentBlicenses.
3he directory structure 6ould loo2 in this case as "ollo6s
'opt'(cdistro'modules'module%iles'%luent'8,9,28
'opt'(cdistro'modules'module%iles'%luent-licenses'G)O=$)
Fluent 6.-.+6 module "ile
CD;odule1.0CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CC modules modulefile
CC
proc ;odulesEelp F G F
glo1al +ersion modroot
puts stderr "2t,luent 6.H.#6. "
puts stderr "2n2tIersion "+ersion2n"
G
module-3hatis ",luent 6.H.#6."
C for 9cl script use onl%
set +ersion H.#.6
set modroot )opt)gcdistro)modules)
seten+ ,-:.>9*85=E lnamd64
prepend-path J89E )opt)gcdistro)app)fluent),luent.<nc)1in
module use )opt)gcdistro)modules)+ersions
)odule "ile to point to "luent license ser#er
CD;odule1.0CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CC modules modulefile
CC
proc ;odulesEelp F G F
glo1al +ersion modroot
puts stderr "2t,luent K-LM8- licenses "
puts stderr "2n2tIersion "+ersion2n"
G
module-3hatis "-oads en+ironment for ,luent K-LM8- licenses"
C for 9cl script use onl%
set +ersion H.#.6
set modroot )opt)gcdistro)modules)
seten+ ,-:.>9*-<=.>S.*,<-. 0#41@master
prepend-path J89E )opt)gcdistro)app)fluent),luent.<nc)1in
module use )opt)gcdistro)modules)+ersions
GSub
GSub is the $ompute !pplication Inte%ration "rame6or2A aimed at pro#idin% a comprehensi#e and
e'tensible "rame6or2 "or application inte%ration 6ith the queue system.
Quick Start
3he command line structure "or $Sub isC
gsu1 options& application application options&
Start a sin%le core interacti#e "luent jobC
gsu1 -i fluent Hd
Start a = core batch "luent jobC
gsu1 -n N fluent Hd -i test.Oou
Options
BB#ersionA BhA NhelpA Br RE(E!SEA NreleaseORE(E!SEA Bn 7ROCESSESA NprocessesO7ROCESSESA B?
?!)EA NnameO?!)EA BiA Ninteracti#eA BbA NbatchA BdA Ndebu%A BBde#A Nde#elopA B%A N%raphicsBaccelerationA
BBnoB%raphicsBaccelerationA Nn%aA BB%raphicsBspoilin%A N%sA BBnoB%raphicsBspoilin%A Nn%sA BB#%lB
#ersionO:$(K:ERSIO?A BBlistBapplicationsA BBlistB#ersionsA Bp 7RO0EC3K?!)EA NprojectB
nameO7RO0EC3K?!)EA BBlistBprojectsA Bq 1>E>EA NqueueO1>E>EA BBe'clusi#eBschedulin%A BB6aitB"orB
jobsO.!I3KFORK0O5SA BBsmpA BBlo%"ileO(O$FI(EA BBpostBoperationsO7OS3KO7ER!3IO?SA BBs2ipBpostB
operationsA BBhi%hBpriorityA BBcon"i%OCO?FI$A BB'mlA BBterseA BB6dO.ORPI?$KDIREC3ORGA BBscratchA
BBarORESER:!3IO?A Bt 3I)EA NtimeO3I)EA Bm )!I(A NmailO)!I(A BB%etBapplicationBnameA BB%etBinputB
"ileA BBresourcesORESO>RCESA BBe'traBslotA BBapplicationBoptionsA BBdes2topODESP3O7.
BB#ersion
7rint the $Sub #ersion and e'it.
gsu1 --+ersion
BhA BBhelp
7rint the help and e'it.
gsu1 -h
Br RE(E!SEA BBreleaseORE(E!SE
Request "or a speci"ic #ersion o" the application
gsu1 -r #011a matla1
Bn 7ROCESSESA BBprocessesO7ROCESSES
?umber o" processes to reques.
gsu1 -n N fluent Hd -i test.Oou
B? ?!)EA BBnameO?!)E
3he job name as sho6n by qstat
gsu1 -n test*Oo1 3rf.e?e
BiA BBinteracti#e
Run the job in interacti#e mode. 3his means that the current terminal is used "or I<O. 3o the user this beha#es
the same 6ay as i" you 6ould ha#e launched the application directly "rom the terminal.
gsu1 -i date
BbA BBbatch
Run the job in batch mode. 3his 6ill redirect output to the "ile speci"ied by BBlo%"ile. 3his is the de"ault mode
"or most inte%rations.
gsu1 -1 date
BdA BBdebu%
Run in debu% mode. 3his 6ill enable debu% messa%es and is intended to be used by system administrator and
de#elopers.
gsu1 -d matla1
BBde#A BBde#elop
Run %sub in de#elop mode. 3his 6ill instead o" submit the job to the queue system print the submit command
and the 6rapper script to the console. 3his is intended to be used by system administrator and de#elopers.
gsu1 --de+ matla1
B%A BB%raphicsBacceleration
Run 6ith accelerated %raphics. 3his 6ill impro#e the per"ormance o" applications that support accelerated
%raphics. !ccelerated %raphics requires and display to be speci"ied either indirectly by launchin% "rom an
:?C or usin% BBdiplay
gsu1 -g gl?gears
BBnoB%raphicsBaccelerationA BBn%a
Disable accelerated %raphics. 3his is use"ul "or applications that enable %raphics acceleration by de"ault.
gsu1 --nga gl?gears
BB%raphicsBspoilin%A BB%s
.hen the %raphics nodes rendered one "rame it 6ill be put on a queue to be sho6n to the user. In many cases
the %raphics node 6ill be able to render "rames "aster then the queue is processed due to net6or2 limitations
etc. I" "rame spoilin% is enabledA instead o" processin% all "rames on the queue "rames 6ill be dropped and
only the most recent "rame 6ill be displayed. 5y enable "rame spoilin% you mi%ht e'perience less input la%.
3he dra6bac2 is that the %raphics node 6ill spend time renderin% "rames that 6e tro6 a6ay 6astin%
computation time.
gsu1 --gs -g gl?gears
BBnoB%raphicsBspoilin%A BBn%s
Disables %raphics spoilin%.
gsu1 --ngs -g gl?gears
BB#%lB#ersionO:$(K:ERSIO?
Selects 6ith bersion o" :irtual$( to use. 3his can "or e'ample be used "or testin% a ne6er #ersion than the
system de"ault.
gsu1 -g --+gl-+ersion #.#.1 gl?gears
BBlistBapplications
(ist the applications the current $Sub installation ha#e support "or. 3his only includes applications that ha#e
a $Sub module present or applications 6ith an section present in the con"i%uration "iles. Gou mi%ht still be
able to use $Sub to launch other applications.
gsu1 --list-applications
BBlistB#ersions
(ist the #ersions installed o" the %i##en application. 3his option uses the )odules system to determine the
#ersions present in the system.
gsu1 --list-+ersions lsd%na
Bp 7RO0EC3K?!)EA BBprojectBnameO7RO0EC3K?!)E
Speci"y the project that you 6ant to couple to job to. 3his is used by the accountin% en%ine in $ompute )D.
gsu1 -p 9.S9*J5LP.=9 date
BBlistBprojects
(ist the di""erent projects that you ha#e access to.
gsu1 --list-proOects
Bq 1>E>EA BBqueueO1>E>E
Speci"y the queue in 6hich you 6ant to run your job. 3his can include 6ildcards and speci"y speci"ic nodes.
3he "ollo6in% e'ample 6ill tar%et the queue called all.q on any node 6ith a name startin% 6ith node.
gsu1 -q all.q@node* date
BBe'clusi#eBschedulin%
Request "or e'clusi#e schedulin%. 3his 6ill ma2e sure your job is scheduled alone on a machine and don@t
share it 6ith any other job.
gsu1 --e?clusi+e-scheduling fluent
BB6aitB"orBjobsO.!I3KFORK0O5S
(ets you speci"y jobs that this job depends on. I" speci"iedA this job 6ill not start until the the speci"ied jobs
ha#e e'ited the queue. 3he jobs are identi"ied either by id or name.
gsu1 --3ait-for-Oo1s 1#H47 date
BBsmp
Request to run the job as a S)7 job. 3his 6ill schedule the job on one machine opposed to spreadin% it o#er
multiple machines.
gsu1 --smp -n N fluent
BBlo%"ileO(O$FI(E
Speci"ies the lo%"ile that output 6ill be redirected to. 3he de"ault is Q0O5KID.lo%
gsu1 --logfile date.log date
BBpostBoperationsO7OS3KO7ER!3IO?S
!llo6s the user to speci"y commands that 6ill run a"ter the job command is completed. 3his can be used to
per"orm actions such as copyin% result "iles or remo#in% temporary "iles created by the job.
gsu1 --post-operations "rm *.tmp" fluent
BBs2ipBpostBoperations
3his option tells $Sub to s2ip the de"ault post operations. 3he application inte%ration mi%ht include post
operations that is added to all jobs by de"ault.
gsu1 --s6ip-post-operations lsd%na
BBhi%hBpriority
!pplication inte%rations can chose to submit jobs 6ith decreased priority by de"ault. 5u apsci"yin% this
option the job 6ill %et submited 6ith the system de"ault prioroty instead.
gsu1 --high-priorit% gam1it
BBcon"i%OCO?FI$
Speci"y a custom directory "or the $Sub con"i%uration "iles. It is intended to be used by system administrator
and de#elopers.
gsu1 --config Q)gsu1)cfg date
BB'ml
Format the $Sub output as E)( 6here applicable. 3his is intended to be 6hen callin% $Sub "rom an
application or script.
gsu1 --?ml -h
BBterse
Enable terse printout. 3his is intended to be used 6hen callin% $Sub "rom an application or script.
gsu1 --terse date
BB6dO.ORPI?$KDIREC3ORG
Speci"y the 6or2in% directory to use "or the application. 3he de"ault is to use the current directory.
gsu1 --3d )home)gdpt*gompute)shared)Oo1/ date
BBscratch
Enables the use o" the local scratch dis2 6here applicable. 3he de"ault is the use the shared dis2 6hich can be
slo6 6hen runnin% applications that per"orms a lot o" scratch IO.
gsu1 --scratch a1aqus
BBarORESER:!3IO?
Speci"y an ad#anced reser#ation to use. See man qrsub "or details on ad#anced reser#ations.
gsu1 --ar 1#H date
Bt 3I)EA BBtimeO3I)E
Set a time limit on the job a"ter 6hich the job 6ill be 2illed by the queue system. 3he "ormat is 99CmmCss.
gsu1 -t 1:0:0 lsd%na
Bm )!I(A BBmailO)!I(
Speci"y mail addressDesF that the queue 6ill send status updates to. 3he de"ault con"i%uration is to send
updates on job startA endA abort and suspend.
gsu1 -m user@ser+er.com date
BB%etBapplicationBname
7rints the application name "or the %i#en command line. 3his is intended to be used 6hen callin% $Sub "rom
an application or script.
gsu1 --get-application-name lsd%na
BB%etBinputB"ile
7rints the application input "ile "rom the %i#en command line. 3his is intended to be used 6hen callin% $Sub
"rom an application or script.
gsu1 --get-intput-file lsd%na i$infile.in
BBresourcesORESO>RCES
Comma separated list o" resources to request "rom the queue system in addition to those added by the
con"i%uration "ile and application module.
gsu1 --resources infini1and mppd%na
BBe'traBslot
Request an additional slot "rom the queue system to use "or the control process. 3his can be use"ul "or
applications 6here a simulation usin% "or e'ample *8 coresA launches one control process and *8 6or2ers. 5y
allocation one additional slot "or the control process 6e pre#ent the system to be o#erloaded and slo6in%
do6n the simulation.
?ot all modules are a6are o" this option and 6ill in those cases use the e'tra slot as i" the user had requested
it usin% RBn 7ROCESSESA RBRBprocessesO7ROCESSES.
gsu1 -n 10 --e?tra-slot starccmR
BBapplicationBoptions
7rints a subset o" options that the application supports. 3his is intended to be used 6hen callin% $Sub "rom
an application or script.
gsu1 --application-options fluent
BBdes2topODESP3O7
Speci"y the E** des2top to use "or %raphical output. 3he de"ault is to use the des2top "rom 6hich $Sub is
launched.
gsu1 --des6top login01:1H ?term
E*tendin( (sub
Introduction
$Sub is desi%ned to be e'tendableA the result is that almost e#ery part o" the job submission process can be
modi"ied. 3he de#elopment is done in python +.4. .hen e'tendin% $SubA you should i" possible ma2e sure
that your additions 6or2 on python +.4. E'tendin% $Sub requires 2no6led%e o" python de#elopment and
$rid En%ine job submission.
3he -2(ic& #tart0 contains a 6al2 throu%h o" the "luent module. 3his can be used as a startin% point "or your
o6n module.
3he -3e/eloper G(ide0 pa%e contains detailed in"ormation on ho6 to e'tend $sub.
Quick Start
Introduction
.e 6ill %i#e a 6al2 throu%h o" the "luent $Sub con"i%uration and module. Gou should ha#e basic 2no6led%e
about $rid En%ine and 7ython "or this.
Quick ips
$Sub pro#ides t6o options that are use"ul 6hen de#elopin% a moduleA -d and --de&. 3he -d enables debu%
modeA this 6ill print 6arnin%s and errors to the console "or easy debu%%in%. 3he --de& enabled de#eloper
modeA this 6ill run the module as normal but instead o" submittin% the 6rapper script to $rid En%ine it 6ill
print the submission command and the script to the console.
3he typical test command line 6hen de#elopin% the "luent module in serial mode 6ould beC
gsu1 -d --de+ fluent Hddp -i test.Oou
Or "or parallelC
gsu1 -d --de+ -n N fluent Hddp -i test.Oou
Con%i(uration
3his is the complete con"i%uration "or the "luent module. .e 6ill pro#ide a brea2do6n o" the lines and
e'plain 6hat they do.
fluent&
peS;J: smp
pe;J<: pe*gc*fluent
release: 1#.1.#
Oo1>ame: ,-:.>9
logfile: fluent*"PLM*<S.out
app>ame: fluent
=reakdo:n
peS;J: smp
3his con"i%ures $Sub to use the parallel en#ironment smp "or sin%le machine jobs.
pe;J<: pe*gc*fluent
3his con"i%ures $Sub to use the parallel en#ironment peK%cK"luent "or distributed.
release: 1#.1.#
3his is the de"ault release #ersion that 6ill be used i" $Sub "ails to read the SP5CmodulesT de"ault.
Oo1>ame: ,-:.>9
3he de"ault $rid En%ine job name.
logfile: fluent*"PLM*<S.out
3he lo% "ile $rid En%ine 6ill use. 5atch jobs 6ill redirect stdout and stdin here.
app>ame: fluent
3he application name. 3his is used "or accountin% purposes
+odule
9ere 6e pro#ide a step by step brea2do6n o" the P5Ccomplete script
De%inition
import li1
1ase $ li1.getKSu1T)
Retrie#e the base class that 6e should e'tend. 3his is needed since the base $Sub module can be o#erridden
by site and<or department speci"ic implementations. 3his method 6ill chec2 i" such modules e'ists and return
the correct one.
class KSu1T1ase.KSu1):
>.9*;8J $ F
(6.1.##(:(-pnmpi(U
(6.#.16(:((U
(6.H.#6(:(-pi1.ofed(U
(6.H.H7(:(-pi1.ofed(U
(1#.0.0(:(-pinfini1and(U
(1#.0.16(:(-pinfini1and(U
(1#.1.#(:(-pinfini1and(
G
De"ine the class and an array 6ith mappin% "rom application #ersion to In"ini5and option to use.
Constructor
def **init**TselfU optionsU argsU configU section):
1ase.KSu1.**init**TselfU optionsU argsU configU section)
.e start by callin% the base constructor that initiali,e $Sub.
if T1 A self.numJrocesses and self.graphics8cceleration):
print "Vou cannot com1ine a parallel launch 3ith a request for a graphics node"
s%s.e?itT1)
if T#04N A self.numJrocesses):
print "Vou cannot run on more than #04N cores using 8>SVS EJ= Jac6s."
s%s.e?itT1)
7er"orm some sanity chec2s to ma2e sure 6e don@t try to launch a job that 6ill not run.
self.fluent5elease $ self.get5eleaseT)
C M% default 3e 3ill tr% our luc6 3ith no net3or6 specified.
self.p+ersion$""
if self.>.9*;8J.has*6e%Tself.fluent5elease):
self.p+ersion $ " " R self.>.9*;8J.getTself.fluent5elease)
Retrie#es the application #ersion and see i" 6e pro#ide an In"ini5and option to use
"esource Calculations
def resource=alculationTself):
1ase.KSu1.resource=alculationTself)
Start by callin% the base implementation. 3his 6ill add resource requests "or %raphics acceleration etc.
C -icense resource calculation
features $ FG
C We al3a%s need one sol+er license
features"acfd*fluent*sol+er"& $ 1.0
C ,or parallel runs 3e also need EJ= pac6s
if self.numJrocesses 4 1:
features"anshpc*pac6"& $ 1.0
if self.numJrocesses 4 N:
features"anshpc*pac6"& R$ 1.0
if self.numJrocesses 4 H#:
features"anshpc*pac6"& R$ 1.0
if self.numJrocesses 4 1#N:
features"anshpc*pac6"& R$ 1.0
if self.numJrocesses 4 71#:
features"anshpc*pac6"& R$ 1.0
Calculates the total number o" licenses 6e 6ill need to run.
C >ormalise 1% the num1er of =J:s for SK.
for feature in features.6e%sT):
featuresfeature& )$ self.numJrocesses
$rid En%ine 6ill allocate the number o" resources requested multiplied by the number o" cores requested. In
order to ma2e it request the ri%ht number o" licenses 6e then need to di#ide the total licenses needed by the
number o" cores requested.
for feature in features.6e%sT):
self.resources.appendT"Ds$D.Ns" DTfeatureU featuresfeature&))
!dd the needed resources to the resource list. 3hese 6ill be added to the 6rapper script by the base
implementation.
!rolo( Generation
def prologKenerationTself):
1ase.KSu1.prologKenerationTself)
Call the base implementation "irst.
self.3rapper-ines.appendT"e?port SK.*=XJ9*S<5$"SK.*L*WL5XS<5")
self.3rapper-ines.appendT"module load fluent)Ds" D self.fluent5elease)
self.3rapper-ines.appendT"module load fluent-licenses)K-LM8-")
Set up the en#ironment "or the application.
if 1 A self.numJrocesses:
self.3rapper-ines.appendT"cp "9;JS<5)machines .)machines."PLM*<S")
Copy the machine "ile to the 6or2in% directory i" 6e are runnin% a parallel job.
$pplication Command Generation
def application=ommandKenerationTself):
3his method don@t call the base implementation since 6e 6ant to ha#e "ull control o" the command line.
command $ self.+gl=ommand
!dd the :$( start command i" applicable. 3his is usually somethin% li2e #%lrun Bc pro'y Bsp
command R$ self.args0&
!dd the binary.
command R$ " -rDs" D self.fluent5elease
!dd -r parameter 6ith the application #ersion.
if self.graphics8cceleration:
command R$ " -dri+er opengl"
3ell "luent to use Open$( i" 6e ha#e requested "or accelerated %raphics.
if not self.is<nteracti+e:
command R$ " -gu"
if 1 A self.numJrocesses:
command R$ self.p+ersion
command R$ " -tDi" D self.numJrocesses
command R$ " -cnf$.)machines."PLM*<S"
I" 6e are submittin% a parallel jobA add parameters "or In"ini5andA cores and machines.
command R$ " Ds" D " ".OoinTself.escapeTarg) for arg in self.args1:&&)
3his line adds all remainin% parameters %i#en by the user. In order to ma2e sure parameters containin%
spaces or other special characters 6or2 6e need to escape them usin% the method sel".escape
self.command-ines.appendTcommand)
Finally 6e add the command to sel".command(ines that 6ill be added to the 6rapper script by the base
implementation.
Complete Script
3he complete script. It should be sa#ed as 'opt'(cdistro'(sub'lib'%luent,py
C =op%right Tc) #006-#010 Kridcore 8M. 8ll rights reser+ed.
C
C 9he computer programTs) is the proprietar% information of Kridcore 8MU
C and pro+ided under the rele+ant 8greement 1et3een %ourself and
C Kridcore 8M containing restrictions on use and disclosureU and
C are also protected 1% cop%rightU patentU and other intellectual and
C industrial propert% la3s. >o part of this program ma% 1e used)copied
C 3ithout the prior 3ritten consent of Kridcore 8M.
C
C <> >L .I.>9 SE8-- K5<S=L5. 8M M. -<8M-. 9L 8>V J859V ,L5 S<5.=9U
C <>S<5.=9U SJ.=<8-U <>=<S.>98-U L5 =L>S.Y:.>9<8- S8;8K.SU <>=-:S<>K -LS9
C J5L,<9SU 85<S<>K L:9 L, 9E. :S. L, 9E<S SL,9W85. 8>S <9S SL=:;.>989<L>U
C .I.> <, K5<S=L5. 8M E8S M..> 8SI<S.S L, 9E. JLSS<M<-<9V L, S:=E S8;8K..
C
C K5<S=L5. 8M SJ.=<,<=8--V S<S=-8<;S 8>V W8558>9<.SU <>=-:S<>KU M:9 >L9
C -<;<9.S 9LU 9E. <;J-<.S W8558>9<.S L, ;.5=E8>98M<-<9V 8>S ,<9>.SS ,L5 8
C J859<=:-85 J:5JLS.. 9E. SL,9W85. 8>S 8==L;J8>V<>K SL=:;.>989<L>U <, 8>VU
C J5LI<S.S E.5.:>S.5 <S J5LI<S.S "8S <S".
C
C ,or more information +isit 333.gridcore.se.
**author** $ "filip"
**date** $ ""Sec NU #010 10:46:0# 8;""
import s%s
import li1
1ase $ li1.getKSu1T)
class KSu1T1ase.KSu1):
>.9*;8J $ F
(6.1.##(:(-pnmpi(U
(6.#.16(:((U
(6.H.#6(:(-pi1.ofed(U
(6.H.H7(:(-pi1.ofed(U
(1#.0.0(:(-pinfini1and(U
(1#.0.16(:(-pinfini1and(U
(1#.1.#(:(-pinfini1and(
G
def **init**TselfU optionsU argsU configU section):
1ase.KSu1.**init**TselfU optionsU argsU configU section)
if T1 A self.numJrocesses and self.graphics8cceleration):
print "Vou cannot com1ine a parallel launch 3ith a request for a graphics node"
s%s.e?itT1)
if T#04N A self.numJrocesses):
print "Vou cannot run on more than #04N cores using 8>SVS EJ= Jac6s."
s%s.e?itT1)
self.fluent5elease $ self.get5eleaseT)
C M% default 3e 3ill tr% our luc6 3ith no net3or6 specified.
self.p+ersion$""
if self.>.9*;8J.has*6e%Tself.fluent5elease):
self.p+ersion $ " " R self.>.9*;8J.getTself.fluent5elease)
def resource=alculationTself):
1ase.KSu1.resource=alculationTself)
C -icense resource calculation
features $ FG
C We al3a%s need one sol+er license
features"acfd*fluent*sol+er"& $ 1.0
C ,or parallel runs 3e also need EJ= pac6s
if self.numJrocesses 4 1:
features"anshpc*pac6"& $ 1.0
if self.numJrocesses 4 N:
features"anshpc*pac6"& R$ 1.0
if self.numJrocesses 4 H#:
features"anshpc*pac6"& R$ 1.0
if self.numJrocesses 4 1#N:
features"anshpc*pac6"& R$ 1.0
if self.numJrocesses 4 71#:
features"anshpc*pac6"& R$ 1.0
C >ormalise 1% the num1er of =J:s for SK.
for feature in features.6e%sT):
featuresfeature& )$ self.numJrocesses
for feature in features.6e%sT):
self.resources.appendT"Ds$D.Ns" DTfeatureU featuresfeature&))
def prologKenerationTself):
1ase.KSu1.prologKenerationTself)
self.3rapper-ines.appendT"e?port SK.*=XJ9*S<5$"SK.*L*WL5XS<5")
self.3rapper-ines.appendT"module load fluent)Ds" D self.fluent5elease)
self.3rapper-ines.appendT"module load fluent-licenses)K-LM8-")
if 1 A self.numJrocesses:
self.3rapper-ines.appendT"cp "9;JS<5)machines .)machines."PLM*<S")
def application=ommandKenerationTself):
command $ self.+gl=ommand
command R$ self.args0&
command R$ " -rDs" D self.fluent5elease

if self.graphics8cceleration:
command R$ " -dri+er opengl"
if not self.is<nteracti+e:
command R$ " -gu"
if 1 A self.numJrocesses:
command R$ self.p+ersion
command R$ " -tDi" D self.numJrocesses
command R$ " -cnf$.)machines."PLM*<S"
command R$ " Ds" D " ".OoinTself.escapeTarg) for arg in self.args1:&&)
self.command-ines.appendTcommand)
De&eloper Guide
$d&anced
.e 6ill start by e'plainin% the di""erent concepts 6e use in $Sub and than sho6 ho6 to 6rite a custom
application module.
3here are "our main parts that to%ether de"ines the beha#ior o" $Sub.
Con"i%uration "iles
5oot module
5ase module
!pplication module
3he steps in#ol#ed areC
7arse command line. Done by the boot module.
Find the module to load. Read "rom the con"i%uration "iles
$enerate a 6rapper script and submit to $rid En%ine. Done by the base and application module.
.hen 6e tal2 about module 6e mean a python module containin% a class names $Sub5oot or $Sub
dependin% on the module type. 5y e'tendin% a module 6e mean that your $SubD5ootF class e'tends the class
in the module you e'tend.
Con%i(uration Files
3he de"ault con"i%uration directory 6here $Sub loads the con"i%uration "iles is 'opt'(cdistro'etc'(sub
3his "older contains t6o con"i%uration "ilesA (sub,c%( and mappin(,c%(.
In addition to these $Sub 6ill try to load "iles 6ith the same name "rom a sub directory 6ith the current linu'
%roup name. E.%. i" a user in the %roup %dptK%ompute runs %subA it 6ill load the con"i%uration "iles "ound in
;con%i(Dir<'(dpt0(ompute
3hese additional con"i%uration "iles 6ill be mer%ed 6ith the de"ault "iles. 3his allo6s the system
administrator to %i#e %roup speci"ic con"i%urations.
=oot +odule
3he boot module is responsible "or parsin% the command line and loadin% the correct application module.
3he main reason "or 6ritin% a custom boot module 6ould be to add ne6 options to $Sub.
Command )ine
3he options parsin% is done usin% the optparse api. ! 7arser 6ith all the build in options is created in the boot
modules constructor.
Gou can add your custom options to $Sub by creatin% a ne6 module that e'tends the build in module. In the
constructor you 6ould "irst call the parent constructor and then add your options to the Option7arser object.
$pplication +appin(
3he "irst un2no6n command line ar%ument is assumed to be the binary the user 6ant to launch. For usin% the
same application module to handle di""erent binaries or i" you "or some other reason 6ant to name your
module somethin% other than the binary name you de"ine mappin%s bet6een binary names and application
modules in mappin%.c"%.
=ase +odule
I" no application module i" "oundA $Sub 6ill load the base module. 3his module 6ill %enerate the a basic
6rapper script that 6ill launch the speci"ied binary.
3he base module consists o" a number o" di""erent methods each responsible "or di""erent parts o" the job
submission. 3his includes methods "or the di""erent parts o" the 6rapper scriptsA methods "or sendin% the
script to $rid En%ine and utility methods used by the module itsel".
$pplication +odule
)any applications 6ill be able to run usin% the base module and the con"i%uration "ile. !pplications that uses
start command e.%. mpirun or ha#e some custom resource calculation "ormula 6ill require an application
module in order to 6or2 properly.
4rapper Script
3he 6rapper script contains all in"ormation $rid En%ine needs in order to run the users command. .e 6ill
6al2 throu%h the script that is %enerated 6hen runnin% the command (sub
Sections
3he 6rapper script consists o" a number o" distinct sections. 9ere 6e %i#e a short description o" the di""erent
sections and 6hat our sample script contains.
-eader
3he script headers de"ines the interpreter the system should use. 3he de"ault beha#ior should not be modi"ied
"or this section.
CZ)1in)1ash --login
Grid En(ine
3he $rid En%ine section contains options that $rid En%ine 6ill interpret. 3his is 6here custom $rid En%ine
options should be added.
CSK. parameters
C" -S )1in)sh
C" -c3d
C" -O %
C" -o "PLM*<S.log
C" -> gsu1
C" -q all.q
C" -p 0
C" -8 KL;J8JJ$un6no3n
!rolo(
3he prolo% contains commands that 6ill run be"ore the users command. 3he de"ault is to ma2e bac2ups o"
the 6rapper scriptA the 6or2in% directory and the machines used "or the job.
3his is 6here en#ironment initiali,ation should be done. 3his includes loadin% modulesA settin% en#ironment
#ariables etc.
CJrolog
m+ )home)gcadmin)gsu1*cM4<3; )opt)gcdistro)gsu1)1ac6ups)3rappers)"PLM*<S
echo "JWS 4 )opt)gcdistro)gsu1)1ac6ups)c3ds)"PLM*<S
chmod 600 )opt)gcdistro)gsu1)1ac6ups)c3ds)"PLM*<S
Command
3his is 6here the actual command is located.
C=ommand
date
Epilo(
3his is 6here command that should be e'ecuted a"ter the users command i" "inished. 3he de"ault includes
post operations i" speci"ied by the user. Remo#in% temporary "iles or some other cleanup should be done here
as 6ell.
C.pilog
Complete script
3his is the complete script i" 6e add all the sections.
CZ)1in)1ash --login
CSK. parameters
C" -S )1in)sh
C" -c3d
C" -O %
C" -o "PLM*<S.log
C" -> gsu1
C" -q all.q
C" -p 0
C" -8 KL;J8JJ$un6no3n
CJrolog
m+ )home)gcadmin)gsu1*cM4<3; )opt)gcdistro)gsu1)1ac6ups)3rappers)"PLM*<S
echo "JWS 4 )opt)gcdistro)gsu1)1ac6ups)c3ds)"PLM*<S
chmod 600 )opt)gcdistro)gsu1)1ac6ups)c3ds)"PLM*<S
C=ommand
date
C.pilog
=ase +odule +ethods
3he base class de"ines a number o" methods that control job submission and script %eneration. 3hese can be
%rouped into three %roups. 3he ones responsible "or %eneratin% the 6rapper scriptsA methods controllin%
submission and helper methods.
4rapper Script +ethods
3hese are the methods in#ol#ed in creatin% the 6rapper script.
Instead o" 6ritin% directly to the 6rapper "ile the di""erent methods should add their lines to the list o" strin%s
self.3rapper,ile
resourceCalculation Calculate the resources that the command needs in order to run. 3his usually means the
number o" cpu cores 6e are %oin% to use and in the case o" license inte%ration the licenses the application
6ill request. 3he #ariable sel%,resources contains the resources that 6ill be requested as a list o" strin%s. 3he
"ormat o" the resource strin% is the same as in $rid En%ine.
E'ampleC Request "or one test license per cpu core
self.resource.appendT"test*license$1"G
headerGeneration !dd the script header to sel%,:rapper)ines.
s(eGeneration !dds the $rid En%ine section to sel%,:rapper)ines 3his method 6ill add $rid En%ine
resource requests i" sel%,resources contains anythin%.
prolo(Generation !dds the prolo% to sel%,:rapper)ines.
applicationCommandGeneration Frames the command and adds the required lines to the list o" strin%s
sel%,command)ines.
epilo(Generation !dds the epiolo% to the script.
System +ethods
3hese are the methods settin% up the system "or submission and per"ormin% cleanup.
(enerate4rapperFile Creates the 6rapper "ile and stores the path in
self.3rapperJath
3his only creates the "ileA the content is 6ritten in by preSubmissionOperations
preSubmissionOperations 3his is called be"ore the script is submitted to $rid En%ine. this is 6here the
6rapper script is 6ritten to dis2.
postSubmissionOperations Called a"ter the script is submitted to $rid En%ine. 3his is 6here you 6ould
per"orm cleanup etc.
-elper methods
escape>strin(?@ strin( Escapes a stin% "or use in the 6rapper script.
(et"elease@ strin( I" there are multiple #ersions o" an application installedA the user can select the release
usin% the BBrelease option. 3his method 6ill return the release that should be loaded.
=ase +odule Aariables
3he base module de"ines a number o" #ariables that can be used by application modules.
app#ame@ strin( 3he name o" the application. 3his is read "rom the con"i%uration "ile.
ar(s@ strin(BC 3his contains the command line application the user 6ants to run.
backupDir@ strin( 3he directory 6here 6e store the bac2ups.
command)ine@ strin(BC 3he command lines used to start the application.
con%i(@ Con%i(!arser 3he con"i% parser used to read the con"i%uration "ile.
debu(@ boolean 3his is set to true i" 6e are runnin% in debu% mode.
departmentId@ strin( 3he current users department
de&elop@ boolean 3his is set to true i" 6e are runnin% in de#eloper mode.
display@ strin( 3he display that 6ill be passed to the command en#ironment. 3his is read "rom the
en#ironment by the base module. 3he user can speci"y a di""erent #alue 6ith BBdisplay
e*traSlot@ boolean Set to true i" the application should use one e'tra slot to compensate "or a control process
etc.
(raphics$cceleration@ boolean Set to true i" the application should use accelerated %raphics.
(raphicsSpoilin(@ boolean Set to true i" %raphics spoilin% is enabled.
hostname@ strin( 3he hostname o" the machine 6here $Sub is started.
inputFile@ strin( 3he name o" the input "ile. 3his should be set by the application module i" possible. 3his is
used by BB%etBinputB"ile.
isInteracti&e@ boolean Set to true i" this is an interacti#e job.
job#ame@ strin( 3he name o" the job.
loadDe%ault+odule@ boolean Is set to true i" the de"ault en#ironment module should be loaded "or the
application.
lo(%ile@ strin( 3he lo% "ile to 6here stdout and stderr is redirected.
mailE&ents@ strin( 3he e#ents on 6hich $rid En%ine 6ill send a mail.
module@ strin( 3he module 6e are currently runnin% as.
num!rocesses@ int 3he number o" cores the job is %oin% to request.
options@ OptionAalues 3he parsed command line options.
priority@ int 3he priority the job 6ill request "rom $rid En%ine.
queue@ strin( 3he queue the job 6ill run in.
resources@ strin(BC Resource to be requested "rom $rid En%ine.
scratchDir@ strin( ! a path to a directory that can be used as a scratch directory.
useOld4D@ boolean .ill be set to true on old $rid En%ine installations.
user#ame@ strin( 3he current users username.
&(lCommand@ strin( 3he :irtual$( command that should be used to start the application.
&(lAersion@ strin( 3he #ersion o" :irtual$( that 6ill be loaded.
:rapper)ines@ strin(BC 3he lines that 6ill be 6ritten to the 6rapper script.
:rapper!ath@ strin( 7ath o" the 6rapper script.
GSub Con%i(uration
Con%i(ure GSub
3he $Sub con"i%uration consists o" t6o "ilesC mappin(,c%( and (sub,c%(. mappin(,c%( maps commands to
$Sub modulesA (sub,c%( contains the settin%s to use "or a speci"ic module.
3he name o" the binary is read "rom the command line and used as the module name. $Sub then chec2s
mappin(,c%( o" there is an entry "or that name and i" there isA use the ne6 #alue as module name. $Sub then
uses the module name to determine 6hich section in (sub,c%( to read.
mappin(,c%(
3he mappin% "ile speci"ies mappin% bet6een commands and $Sub modules. 3his can be used 6hen an
application ha#e multiple commands or i" di""erent #ersions ha#e di""erent names.
For !baqusC
a1q604: a1aqus
a1q6NH: a1aqus
a1q6[#: a1aqus
a1q6[ef#: a1aqus
(SBDynaC
lsd%na*s: lsd%na
lsd%na*d: lsd%na
mppd%na*s: lsd%na
mppd%na*d: lsd%na
(sub,c%(
(sub,c%( contains a section called DEF!>(3 6ith all the system de"ault #alues. Each module speci"ic
section inherits the #alues "rom the DEF!>(3 section. )any o" the options can be o#erridden by command
line options. See Options "or a listin% o" the a#ailable options.
Aalues
backupDir 3he path 6here bac2ups are 6ritten. 3his should normally not be chan%ed in a module section.
scratchDir 3he directory to use "or scratch "iles. 3his should normally not be chan%ed in a module section.
projectDir 3he directory containin% the project "iles. 3his should normally not be chan%ed in a module
section.
&(lAersion 3he #ersion o" :irtual$( to load.
(raphics$cceleration Controls i" 6e request %raphics acceleration by de"ault or not.
(raphicsSpoilin( Controls i" 6e use %raphics spoilin% by de"ault or not.
loadDe%ault+odule I" this is set to true the "ollo6in% line 6ill be added to the 6rapper prolo%ueC
modue load )OD>(EK?!)E<RE(E!SE
release 3he de"ault release o" the application to use i" $Sub is unable to read that "rom the modules system.
pe+!I 3he parallel en#ironment to use "or distributed jobs.
peS+! 3he parallel en#ironment to use "or sin%le machine jobs.
queue 3he queue to submit the job to.
priority 3he priority %i#en to %rid en%ine "or the submitted job.
resources Resources to request "rom %rid en%ine. )ultiple resources can be speci"ied as a comma separated
list.
mailE&ents 3ells %rid en%ine 6hen to send mail.
interacti&e 3his can be used to ma2e an application start in de"ault mode by de"ault.
%orceS+! Force sin%le machine mode. 3his can be used "or applications that can@t run distributed o#er
multiple machines.
smp+a*Si3e 3he lar%es number o" cores a sin%le machine job can request.
e*traSlot Controlls i" 6e should allocate an e'tra slot. See BBe'traBslot.
app#ame 3he application name as it 6ill appear in accountin% etc.
job#ame 3he de"ault name o" the job as it 6ill appear in the job listin% etc.
lo(%ile 3he name o" the lo% "ile to 6hich stdout and stderr 6ill be 6ritten.
postOperations Commands that 6ill be e'ecuted a"ter the application "inished.
useOld4D Enables an 6or2around "or older %rid en%ine installations.
)inu* tips
)inks
(inu' cheat sheet. http:%%1iles'1oss$ire'co)%2007%08%1$(ni4re1'pd1
! more comprehensi#e %uide. http:%%cb'/(%(ni4toolbo4'4ht)l
4orkin( :ith archi&es
Dip
Create a compressed archi#e "oo.,ip containin% the directory "oo and it@s content.
!ip -r foo.!ip foo
>npac2 "oo.,ip
un!ip foo.!ip
ar
Create an uncompressed archi#e "oo.tar containin% the directory "oo and it@s content.
tar cf foo.tar foo
>npac2 "oo.tar
tar ?f foo.tar
ar'GD
Create a compressed archi#e "oo.t%, D.tar.%, can be used as 6ellF containin% the directory "oo and it@s content.
tar c!f foo.tg! foo
>npac2 "oo.t%,
tar ?cf foo.tg!
$pplications
+$)$=
)EE "iles
7recompilation
From )!3(!5
Standalone
>sin% the 5inary
Compile in 0ob
Pno6n 7roblems
7lottin% Crashes the :?C Session
+EE %iles
.hen 6or2in% "rom a .indo6s 6or2station 6ith a (inu' cluster you need to ma2e sure your )EE "iles are
compiled to the correct binary "ormat. Files compiled on .indo6s 6ill not 6or2 on (inu' and #ice #ersa.
Gou can choose to either precompile the me' "iles or compile them "rom 6ithin a batch job.
!recompilation
.hen you precompile a me' "ile you copy the source to the cluster and compile it manually.
Compilation on the cluster is done in the same 6ay as on a .indo6s 6or2station. Gou either do it "rom
6ithin% )!3(!5 or use the standalone command mex.
From +$)$=
3he "ollo6in% command 6ill start )!3(!5 6ith the %raphical inter"ace.
gsu1 -i matla1 -des6top
Compilin% a C method.
me? %prime.c
Compilin% a "ortran method.
me? %primef., %primefg.,
Standalone
>sin% the standalone me' command "rom )!3(!5 +8**a
module load matla1)#011a
me? %prime.c
/sin( the =inary
Once you ha#e the binary you can either copy it bac2 to your 6or2station and include it in jobs you send to
the cluster or you can tell the jobs to read it "rom a "older on the cluster.
Include the binary as a part o" the jobC
Oo1 $ 1atchT(script(U (,ileSependencies(U (%prime.me?a64()'
>se the binary on the clusterC
Oo1 $ 1atchT(script(U (JathSependencies(U (A,older=ontaining;./Minaries4()'
Compile in 6ob
.hen compilin% as a part o" a job you need to add the source "iles as "ile dependencies. 3he dependent "iles
6ill not be in the 6or2in% directory on the cluster so you 6ill need to do a cd be"ore you can access them.
! script that compiles a me' "ile and calls the compiled methodC
function compile*testT)
tmp $ get,ileSependenc%SirT)'
3d $ cdTtmp)'
me? %prime.c'
cdT3d)'
%primeT1U 1:4)
3he script to submit the jobC
Oo1 $ 1atchT(compile*test(U (,ileSependencies(U (%prime.c(U (=urrentSirector%(U (.()'
3aitTOo1)'
diar%TOo1)
destro%TOo1)
Fno:n !roblems
!lottin( Crashes the A#C Session
3he system mesa library can cause )!3(!5 to crash the :?C ser#er. ! 6or2around "or this it to tell
)!3(!5 to use the internal mesa #ersion. Gou can do this by includin% the "ollo6in% line in your scriptsC
opengl soft3are
)ore in"ormation can be "ound hereC
http:%%$$$')ath$or&s'co)%)atlabcentral%ne$sreader%/ie$5thread%!58572
http:%%$$$')ath$or&s'co)%help%techdoc%re1%opengl'ht)l
roubleshootin(
Introduction
9ere are some tips that can be used to troubleshoot 6hen there is some problem 6ith the cluster.
Check the connection
3he "irst thin% to chec2 is that you can connect to the cluster. 3he easiest 6ay is to either use
$omputeEplorer or SS9 directly. Gou should either be able to connect or %et some error messa%e.
3here are three di""erent types o" error messa%esC
#et:ork error
I" the error is that the host is unreachableA connection timed out or similar it@s most li2ely somethin% 6ron%
6ith your net6or2 and you should tal2 to your I3 support since.
Connection re%used
I" the messa%e says that the connection is re"used there could be somethin% 6ron% 6ith the cluster and you
should contact $ridcore help des2.
$uthentication error
I" you %et an error messa%e sayin% somethin% about authentication "ailure it means that the connection is
6or2in% but you most li2ely ha#e the 6ron% pass6ord. 3ry resettin% your pass6ord. 3his can be done trou%h
666.%ompute.com i" your cluster is connected to the %ompute authentication ser#er. Other6ise you should
contact the cluster administrator "or "urther help.
Check the job
3he "irst thin% 6e usually do i" there is a speci"ic job that the user e'perience some problem 6ith is to chec2
the queue state. Runnin% qstat 6ill sho6 you a list o" all your jobs in the queue. I" you are troubleshootin%
some other users job you need to run either qstat -u ;user< "or sho6in% that users jobA or qstat -u G "or
sho6in% all users jobs.
I" there are jobs in error stateA the state column contains an EA or in queued state you can run qstat -j
;jobID< to %et in"ormation on the job includin% the schedulin%A includin% possible errors.
I" e#erythin% loo2s OP here you can %o to the jobs 6or2in% directory. Gou can "ind this "or a runnin% job by
loo2in% at the c6d line o" the qstat -j printout. For "inished jobs submitted 6ith $Sub you can run cat
'opt'(cdistro'(sub'backups'c:ds';jobID<A this requires that you either are that user or root.
Once in the directory you can loo2 "or lo% "iles. (oo2 "or "iles endin% 6ith .lo% or similar.
Check the system
Gou can chec2 the status o" the computin% nodes by runnin% qstat -% and loo2 at the usa%eA load and state
columns.
3he load should not be hi%her than the used number o" slots on a machine. I" it is hi%her you can run qstat -F
to %et in"ormation on 6hich jobs should be runnin% on that machine. Do an SS9<RS9 to the machine and run
top to see i" there are ,ombie processes le"t "rom old jobs or some other process runnin% that should not be
there.
Common problems
6ob runnin( slo:
! job runnin% slo6 is o"ten caused by either system s6appin% or ,ombie processes. >se the $an%lia %raphs
to loo2 at the memory usa%e and load o" the machines in#ol#ed in the run.
Other causes can be that the "ile system is runnin% slo6 or in the case o" distributed job that there is some
problem 6ith In"ini5and.
6ob not startin(
! job mi%ht be in the q6 state and ne#er start. 3his can be caused by lac2 o" licensesA no "ree slots etc. Run
qstat -j ;jobID< to %et detailed schedulin% details.
6ob in error state
Sometimes jobs mi%ht end up in the error state. Details to 6hy should be printed by qstat -j ;jobID<. 3his
can be caused by problems 6ith the "ile system.
6ob crashin(
(oo2 at the 6or2in% directory and see o" you can "ind lo%s e'plainin% 6hy the application ha#e e'ited. !
dead job 6ithout any lo% messa%es can in some cases be 2illed by the system out o" memory 2iller. Gou can
o"ten determine i" this is the case by loo2in% at the $an%lia memory %raphs.
-o: to (et -elp
"eportin( !roblems and Feedback
7roblemsA bu%sA impro#ement requests and %eneral "eedbac2 can be sent to helpdes2U%ridcore.se.