Você está na página 1de 46

A Modern Batch Programming Tutorial (Win 2k/XP)

Change the Look on This Site

This page teaches you some modern NT-based batch programming and has
some fairly advanced and pretty useful batch scripts to help you get started.
Batch files are the simple and rather archaic, interpreted scripting language of
MS-DOS and it's derivatives and followers such as Windows 9x and NT (NT, 2k
XP). Basic MS-DOS knowledge is assumed, and it does help if you know some
programming language already, although this is not strictly necessary, I'll try to
explain most of the terminology on the way. I've written this in tutorial form so
be sure to check out other options and commands not covered here extensively.
Although many of the commands and scripts represented here will work on MS-
DOS and later, many Many of the newer command-options that are essential to
advanced batch programming do require Windows NT, 2000 or XP (of course
2003 will also do as will probably LongHorn once it comes out).

Although I didn't know it when I started writing this tutorial, I've later on
realized that the batch language is equivalent to the old late 70s basics having
commands roughly equivalent to input, print, let, if and goto. And because
repetition can be expressed with goto and if as in most assembler languages,
you can build more elaborate programs out of these primitives.

You may wonder why one would write batch files these days as there are really
good, real programming languages around. Things like Perl, CPP or Java to
name a few. For full blown programs I prefer Java or CPP , true, and Perl needs
to be installed separately. Nowadays I tend to do all of these little programs
with Perl and use batch files just on systems that don't have Perl installed. But
back to batches, I do find this an intellectual challenge, accomplishing as much
with batches as is practical. Unlike Perl and many other languages, Batch files
are native to DOS and you can count on them on every machine without having
to download tens of megs of third-party software (well only about a floppy in
case of tiny Perl). Batch syntax is also relatively easy and the language is
interpreted. OK there are Visual Basic script and JavaScript but I don't really
want to learn either of those for various reasons including platform dependence.

WARNING:

While I've been working with DOS since MS-DOS 5 and still do use the DOS BOX
occasionally, it is still possible that some information in this document is not 100
percent accurate, as I don't know the formal syntax of batch files exactly, nor
do I know everything about DOS or NT specific batch commands. So, not to be
used for mission critical stuff, hehe. Any comments, additions and corrections
would be welcome, though.

NOTE:

I tend to use the term NT to mean the set of operating systems based on the
NT kernel (NT, 2000, XP and 2003 currently). In most cases it means Windows
2000 or XP as most of the command-extensions were added in 2k, I believe.

Contents

• The Basics of Batch Programming


• Batch Syntax and Somee Practical Notes
• Relative Paths
• Wildcards
• Redirection
• Piping
• Variables
• commands
• Good Batch Programming Style
• NT Specific Command Extensions
• String Input, Integer Arithmetic and Looping
• The Else clause, String and Numeric Comparisons
• The Indispensable For Command
• More String Processing and Some Magic Variables
• Bigger Example Programs
• Emulating Gosub
• Random Lines
• Array Emulation and Sorting
• Epilog

The Basics of Batch Programming

this section introduces the basic MS-DOS batch commands and concepts like
relative paths, redirection, pipes and variables (but some Windows pitfalls are
also mentioned). If you are an experienced batch programmer, you might want
to skip this section if you feel like it. However, if you've done batches but your
batch knowledge is a bit rusty, for instance, acquired in the good old DOS days,
you may wish to quickly browse through this in case there's anything new. Let
me emphasize that I'm not trying to be complete and cover all the options and
idioms, just the most useful you'll likely use most of the time.

Batch Syntax and Somee Practical Notes

Batch files are series of MS-DOS commands typed in a file, one command per
line. The file uses the MS-DOS character set, has an extension of .bat and it is
run automatically if you type it's base name without the extension. If an exe or
com file with the same basename exists, you might want to explicitly specify the
.bat extension to guarantee that the batch file gets executed. If you need to use
high-ASCII characters in the file such as umlauts and special graphics symbols,
you must save it as MS-DOS text otherwise the characters might be different
from what you expected. Many of the better Windows text editors can save as
MS-DOS text (even Wordpad can), and there's good old Edit which still writes
out MS-DOS files.

The philosophy of batch programming is that nearly all of the batch constructs
are ordinary commands that can also be used outside batch scripts in MS-DOS.
Although some of these commands are virtually never used outside batches,
they are still their in DOS. So most of the commands you'll be likely using are
ordinary DOS commands and work just as you would expect them to.

By the way, if you run into problems in a batch file (e.g. it doing unwanted
things, getting stuck in a loop etc...) you can in most cases quit the execution of
a batch file by pressing ctrl+c and answering y when asked if you really want to
terminate the batch job.

To get a list of all MS-DOS commands type in help in the prompt. For help on an
individual command type it's name followed by a slash and a question mark
(e.g. copy /?). I recommend getting to know most of the commands that look
interesting, so you'll be familiar with the set of tools used in real world batch
scripting.

In order for help to work in Windows 9X, you need to download this set of old
DOS commands , extract it and run help in the current directory.

One excellent resource covering pretty much everything from ancient DOS
utilities like edlin to cmd extensions and little known commands such as findstr
is Microsoft Windows XP Command Reference . I definitely recommend it even
over the DOS command help pages.

Relative Paths

Although many people know the cd and md commands, it is not that common
to use relative path names in DOS that much. The path names are called
relative because they are specified relative to the current directory (.)). Here are
some examples:

cd games\duke3d
Will change to .\games\duke3d where . is the current directory. It is not
necessary to type
cd games
cd duke3d
separately, but on the other hand this means less re-typing if you make a
mistake. I'd suggest using complete pathnames as much as possible to decrease
the number of lines in your batch scripts, though.

Notice that
md games\mygame
Doesn't work as you would expect, unless you have NT command extensions on
but that's another story.

Another way of using relative paths is to refer to directories that are on a lower
level on the directory hierarchy. Say we are in \temp
copy ..\autoexec.bat .
Will copy autoexec from the root directory to the subdir temp. This can also be
risky, don't ever type del . in temp like I once did. The two periods can be
chained like this ..\..\..\ to refer to even lower levels on the directory tree. You
can also continue directory specifications after a period. Supposing we are in
\games\dukebacup
xcopy /s ..\duke3d\* .
Will copy everything under \games\duke3d to \games\dukebackup including
sub-directories. Notice that \games\duke3d is actually also a relative path name
(it's relative to the current drive).

Changing drives can also be done relatively, d: wil change to the previous
directory in which you were on drive d where as d:\ switches to the root of drive
d. Finally, environment variables (see the section on variables) can be used as
pathnames. A common example is to refer to the windows system32 directory
regardless of actual Windows directory name with:
%windir%system32
The variable named windir can be used in NT and later on in which it's defined
to contain the location of your Windows directory.

Wildcards

Wildcards are an extremely useful tool for processing sets of files. Most, though
not all, DOS commands do support wildcards. The idea is that in stead of giving
a name of a single file, you use some wildcards to make the name more
generic. All the file names that match the given wildcard expression will be
processed. The most common wildcard is the asterisk sign * which replaces any
number (including 0) of any characters. Examples:

*.txt
Will select all files that end in .txt for processing *.* and * are strictly speaking
different. *.* selects all file names with extensions, that is, files that have a
period in the name. Where as * will also select files that don't have any
extension. Actually, doS doesn't seem to make the difference between the two,
even if it should, although most Unix shells likely do.
foo*bar*
Will select everything starting with foo and also containing the text bar as well:
foobar.aaa
foobarb.ab
foo.bar and
fooblabar.txt
would all be selected.

The other wildcard character is the question mark which replaces 0 or 1


instances of any characters.
*.st?
Would select all files whose extension starts with st, the third character may be
anything and even files with extensions starting with just .st would be selected.
Redirection

Redirection is the process of directing command output to a file or reading


keyboard input for a program from a file. It's a standard trick of Unix chunkies
but not too well known in DOS. Here's how it works:
command <input >output
Will read lines of keyboard input from the file input and write the output of the
command to the file output. Either of the redirection symbols (< or >) can be
omitted at will. If the output file doesn't exist, it will be created. If it does exist,
however, it is overwritten. To be able to append the output of several
commands into one file use the form command >>output in stead.

Normally both error messages and ordinary output, say after an echo command,
gose to the same output be it a file or screen. Sometimes it would be useful to
separate the two streams like if you want to append errors into some kind of a
log file while showing ordinary output to the user. I found some vague
instructions on this on the MS Web site but cannot really fully figure it out.
Here's the URL to the MS articble about advanced redirection in case you want
to read up on it:
About Redirection by Microsoft

I got some help with this problem. The following example will redirect the
standard output and error streams to separate files:
command 1>stdout.txt 2>strerr.txt

Piping

Pipes are used to pass the output of one program as the (standard) input of
another. This could be done with:
commandA >infile
commandB <infile
del infile

But pipes allow you to do the same on one line without user-managable
temporary files:
commandA | commandB
Practically any number of pipes are allowed
commandA | commandB | commandN

However, pipes and redirection cannot be combined the way you would expect
(neither in doS nor in most Unix shells). That is something like:
dir /b >unsorted.txt |sort >sorted.txt
will run but will not work as expected. one reason for this is that redirecting the
output to a file doesn't output the redirected content on the screen so the next
program in the pipe has nothing to process. In unix, there's a command called
tee, which will output it's input both to a file and on the screen so the limitation
can be worked around. although there are ports of this utility to DOS, there's no
microsoft equivalent in DOS or Windows.

Variables

Another important concept in batch programming are environment variables.


They are named placeholders for character data and are usually global in DOS
(will persist till reboot and be accessible in all batch files). When running in DOS
Box, all user-created variables will be lost upon exiting the command interpreter
(in real DOS they persist till reboot). Environment variable names are not case
sensitive.

You use the set command to manage variables like this: set name=value. If
value is empty name is unset, that is it will be removed. Calling set with no
parameters lists all environment variables in the system.
Examples:
set name=here's my name
set prompt=
The values of variables can be used in batch files by enclosing the variable
name in percent signs. %name% will expand to value and it can be used almost
everywhere in a batch. A classic example of this is appending something to the
path:
set path=%path%;newDirectoryEntry
Variables are good for short term, local storage and particularly useful in NT-
based batch commands.

Each batch file has a special set of variables named %0 to %9. %1 through %9
correspond to the command-line parameters passed to the batch program and
%0 is the relative path to the batch file being run or simply the base name if
the batch is in the path. Commandline arguments (parameters) are separated
by spaces and get their values as follows:
programname arg1 arg2 arg3 ... arg9 arg10 ... argN
All arguments that are not explicitly specified are empty by default.
The memory (space) allocated to the variables is limiting. You can set the size in
autoexec.bat in DOS but in 9X and later Windows won't usually run out of
storage space (the virtual doS machine is able to allocate more memory on the
fly). Usually you'll be using only a couple of variables in batch scripts so memory
is not a problem.

Batch-specific Commands

Here's an overview of the most useful DOS commands, that are generally
speaking thought to be batch file specific. I'll call them batch commands, for
clarity. For more information I recommend checking the help for each of these
commands in turn.

Before we delve into the various batch commands, here's a pair of short batches
that don't require any batch specific syntax, yet manage to be fairly useful. Here
you'll see redirection and environment variables in action.

The following pair of programs are a poor man's substitute of the Unix locate
command. build builds an index of files that are found in the given path
containing wild cards and locate will do searches in the file database that was
build with build.bat.

build.bat:

dir /s /b %1 >%temp%\dbindex.dat

locate.bat:

find "%1" %temp%\dbindex.dat

If the use of find and locate seems alien to you find /? and dir /? |more should
set some light on this matter.

Now on to the commands.

The cls command will clear the screen. It doesn't take any arguments and is
mainly used to clear the screen before displaying any longer output to the user.

Echo can be used to print messages on the screen. echo this is a test To echo
empty lines you need the period character echo. By default MS-DOS will echo all
command output in the batch script on the screen, which can be handy in
debugging situations. If you want to get rid of command-echoing, though, it's
done with the following: @echo off The at sign is needed to supress the
message "echo is off". Command-echoing can also be turned on at any time by
typing echo on.

The pause command prints a generic prompt and waits for some user keyboard
input (a bit like getting a one character in C and other languages). It can be
useful if you want to be interactive, but in most batch scripts that are ment to
run unattended, it's rarely used.

The more command is a bit like pause but it will read input and stop only when
a screenful is printed. usually you'll be piping stuff to more like type text.txt |
more. More is rarely used in batch programs unless you need to display multiple
pages of output for the user.

The infamous goto command will jump to a specified label in the batch file and
continue execution from there. goto foo would jump to a lable titled :foo. Label
names always start with a colon and the first eight characters of a label name
must be unique inside a batch file. Goto is mainly useful in breaking up batches
into clear, managable parts and is of special interest when used with the if
command. It also provides the basis for proper, finite looping when we get to
NT-specific batch commands.

The if command does simple conditional processing. If the condition is true the
following command is executed. Only one command may be specified after if
and there's no easy way of specifying an else-clause but if does support not (if
not conditions). Normally one would use the goto command after an if to
workaround the single-command limitation. If has four forms which are: string
equivalence (if a==b), file existence (if exist a), directory existence (if dir\null)
and error level check (if errorlevel number). Error levels are mainly useful with
the choice command, and getting the most out of string equivalence requires
the use of environment variables. To check if a string, like a command-line
parameter, is empty use: if stringx==x. It will evaluate to x==x (true) if the
string is empty. You can include the keyword not in the condition, to reverse the
sense of the test e.g. testing if something doesn't exist.

The shift command shifts command-line variables %0 to %9 around replacing


%n with %n+1. %9 is replaced by the 10th command-line argument (it can be
a real argument or the variable is empty if there's no argument). A classic
example of using shift is when you want to iterate through all command-line
arguments without knowing before hand how many there will be:

argiter.bat:

rem this is a comment,


@echo off
:start
if%1x==x goto end
rem doStuff to %1, only a single command
shift
goto start
:end
rem The above is a lable.

Call will call another batch file inside the one you are currently in and return
execution when the called batch file terminates. If you use a batch file name
without call, execution will not return to the caller afterwords! Calling a batch
file is simple:: call dostuff.bat and it is also a great way to modularize and reuse
batch code. note that user-specified command-line arguments and the scratch
variables in a for loop won't show up in called sub-batch scripts, only in the
main script. To workaround this you have to pass the command-line arguments
or for variables to the batch script you are calling explicitely. Here's a simple
example passing the first command-line argument as %1 and the current value
of the scratch variable in for as %2.
for %%a in (*) do call dostuff.bat %1 %%a
The for command does seem cryptic and it is what we'll be covering next.

calling another batch file to do part of the processing is very close to calling a
sub-routine or a function in a real programming language. Parameters are
passed by reference in general, meaning that if you modify a parameter that
you got from another batch file, the changes will show up in the original
variable. You can even return values from batch files but the only way to do it is
through a (global) variable that both batches will see. One good name
candidate is %retval% standing for return value.

The for command will iterate through a set of filenames executing the specified
single command for each file. It's a nice way of emulating wildcards (* and ?)
for programs that don't support them natively (e.g. type in DOS). The syntax is:

for var in (set) do command var


Where var is %%variable case sensitive %a to %z (you must use %%a to
%%z inside batch files, single percent signs will only do on the command-line),
set is a set of files specified with wildcards (e.g. sta*.sq?) and command is the
command to be carried out. Note that you don't have to use the scratch variable
(e.g. %%a) as a parameter for the command after do, although you'd probably
want to in most cases. The single command limitation can be broken by calling
a batch file in the for loop. You can also nest for-commands inside each other
although you'll have to use a unique variable name for each loop. Variables in
the other loops can be used in the inner.

One classic example of using for is to make the MS-DOS type support wildcards:

for %%a in (%1) do type %%a |more


Notice the %1 which is the first command-line parameter passed to the batch
script (e.g. *.txt). Pay special attention to double percent signs which are
strictly necessary in batch files, as well as piping the result to more.

Some amazingly simple things can be done with just the call and for commands
alone. Here's a simple example that will sort the files in the current directory
into sub-directories by letter (a-z) and will put the rest in a subdir called 0-9.
Here's how you do it:

_indexby.bat:

@echo off
md %1
for %%f in (%1*) do move "%%f" %1

index.bat:

@echo off
call _indexby.bat a
call _indexby.bat b
call _indexby.bat c
call _indexby.bat d
call _indexby.bat e
call _indexby.bat f
call _indexby.bat g
call _indexby.bat h
call _indexby.bat i
call _indexby.bat j
call _indexby.bat k
call _indexby.bat l
call _indexby.bat m
call _indexby.bat n
call _indexby.bat o
call _indexby.bat p
call _indexby.bat q
call _indexby.bat r
call _indexby.bat s
call _indexby.bat t
call _indexby.bat u
call _indexby.bat v
call _indexby.bat w
call _indexby.bat x
call _indexby.bat y
call _indexby.bat z
md 0-9
move * 0-9

Here the auxiliary script _indexby.bat does all the work. It simply creates a
directory whose name is the first command-line argument and then moves all of
the files that start with the very same argument string to the directory which it
created. The main program calls this auxiliary batch with strings from a to z and
moves all the files still remaining in the current directory to the directory named
0-9. You could also move sets of files in the auxiliary script, but move will
complain if it doesn't find the files. Where as if nothing matches the condition in
for, it will never run the loop not even once.

You don't have to worry about case in Windos or DOS. The policy in at least
Windows is that case will be recorded but it's ignored as far as file names are
concerned in practise. Thus a?.txt would match both a.txt and A.TXT (in caps).

Now that Windows has long file names, it is often necessary to enclose file
names in double quotes to prevent mis-interpretation when filenames contain
spaces. This is what I also did in the previous example.

There's also a command called choice which is used to get single character
input from the user. It returns an error level depending on which key the user
pressed. Although the command can be handy, I won't go into the details here
for a number of reasons . Firstly, choice is not supported under Windows NT
and later nor before DOS 6 so it won't be of use in modern batch programming.
Besides, the functionality of choice can be pretty much duplicated with the new
set options in NT. Still, choice can be a nice command so if you've got DOS 6 or
Win 9x type in choice /? to see the help screen. Choice is about the only
command with which the if errorlevel number goto label style idiom can be
useful.

Good Batch Programming Style

As the batch language is very simple, there aren't many coding style issued to
be considered in general. However, here are some guidelines to help you make
batch files easier to read and maintain. These apply to both DOS and Windows
in general.

• Use environment variables as quick scratch variables as much as possible.


This should always be prefered to hard-coding values because it's easier
to change an environment variable reference in one place than it is to do a
find and replace operation. This also keeps the code more readable.

• Always make @echo off the first line of finished batch files. Normally the
user of the batch doesn't really want to know what's going on under the
hood.

• Initialize all of your temporary variables at the beginning of the batch file
and clear them at the end.
:Start
set _foo=0
set _bar=temp.dat

:end
set _foo=
set _bar=

This can be partially automated with the _init script demonstrated later
on.

• As far as naming convensions go, prefix your temporary batch-specific


variables with an underscore character to avoid name clashes. You might
want to extend this practise to temporary text files that are created by
batch files as well.
• Take full advantage of variables specified on the command-line and prefer
that approach to asking stuff interactively whenever you can. This is
handy in unattended batch sessions and when batchfiles call each other.

• Don't use short cryptic comments inside the file, rather document the
batch in a text file with the same basename if you feel like docs are
necessary. Also, if your batch has no docs with it, be sure to print some
usage screen if no parameters ar given or if /? is specified.

• If it seems you are using the same couple of commands again and again
by copy pasting, it's a good idea to move that part in another batch file
and call it in your main file. I prefer to prefix my auxiliary batches with an
underscore to signal that it's private data. Not private in the OOP sense
but rather in the sense that it something the batch file has to deal with
and the user shouldn't know or care about. Just like the variables that
start with _ are temporary variables that the user of the batch shouldn't
see.

• If you want even more control, you can use several evaluated labels inside
a batch to build small function libraries. See the example regarding gosub,
for more information. It turned out there's one even cleaner alternative,
the call command can take a label name as the very first argument like
this:

< CALL :LABEL ARGS

:LABEL
rem some code
goto :EOF

The label should be in the same file. The arguments themselves get stored
in the batch file's command line arguments %1, %2 etc... The goto :EOF
bit in stead of ending a file merely returns from a GOSUB style label. You
need another goto :EOF to exit a file.

• Prefer passing parameters to other batch files as command-line


parameters, in stead of putting them in a global variable that both of the
batches will see (all environment variables apart from for-command
"temporaries" and command-line arguments are global). This approach
minimizes dependencies and encourages reuse.
• If you take advantage of the ability to put multiple statements after the if
else or do clauses (available at least in Win XP), be sure to indent the
commands in parenthesis with a fixed number of spaces, I prefer three.
Don't use tabs as there size depends on the text editor.

• For easy access to commonly used batch files, make a directory such as
c:\bats\ and include it in your path so that all of your batches can be used
in any directory. The procedure varies, in DOS and 9x, it's about adding a
directory entry in the path environment variable set in autoexec.bat. If
there are multiple paths, they must be separated by semicolons. In NT
and later the same can be achieved graphically through system properties
(right click on My Computer, choose properties etc...).

• If you want to distribute batch files to others, make sure you ad sufficient
error checking to make batch files more reliable (OK I didn't do this in
these samples to keep them brief, I admit it). A classic example is a plain
move batch that copies the source and then deletes it. If the copy fails the
source wil still be deleted. Always make backups when you are processing
real important files, though, no matter how tough the error checking code.

NT Specific Command Extensions

This section introduces the most useful additions to the MS-DOS commands that
are used in Windows 2000, XP and later. To get these new so called command
extensions working, you should start cmd.exe in stead of the old
command.com. Typing in cmd in the run box does the trick. This section is
mostly about the new options for the if, for, set and findstr commands. For full
details on these, check there help screens. The help is fairly long and it even
has some examples in it.

String Input, Integer Arithmetic and Looping

The enhanced set command is an essential tool in modern batch programming


because it allows you to do some of the most important feats in script
programming, sadly missing in the batch scripts in the DOS days, at least
without auxiliary programs. These two important features are, namely, signed
integer arithmetic and string input.

Let's start with the input, first. The /p switch for set allows you to read a line of
input from the user (or file with
set /p variableName=promptString
Variable name is the name of the environment variable to which you store the
input and prompt, which may be omitted as well, specifies what text to display
for the user. Normally, white space at the end of the prompt string is eaten, so
if you want a space after the prompt you have to enclose the prompt in double
quotes (luckyly the quotes don't show up in the output).

The syntax looks a bit quirky at first, notice that you are not, I stress, not
assigning variable name the string prompt, although it sure does look like it.

Now that we can do string input, I'll show you how the famous "hello "
programming example goes as a batch script. It will ask the users name and
print it on the screen.

hiname.bat:

@echo off
set /p _name=Enter your name:
echo Hello %_name%!.
set _name=

The above input statement would be like this if you don't want a prompt being
displayed:
set /p _name= br> Note that the equals sign is mandatory. Again, as far as the
old set syntax is concerned, it looks as though you are unsetting name, where
as in reality you are reading a value from the standard input.

The other important feature of set is the ability to perform arithmetic with
signed integers. This feature is activated with the /a switch and the syntax is set
/a variable=expression. Naturally, it evaluates expression and assigns the result
to variable.

Firstly, this might not seem that revolutionary as you'd hardly want to code any
larger, complex math programs with batch scripts. However, set can be used to
increment counters and thus provides a way of getting out of a loop made with
goto by testing the value of a variable on each iteration and jumping out if the
variable matches with the given string. Here's one of the simplest examples, it
will count from 0 to 9:

to9.bat:
@echo off
set _number=0
set _max=10
:Start
if %_number%==%_max% goto end
echo %_number%
set /a _number=_number + 1
goto start

:end
set _number=
set _max=

Variable names may generally speaking be specified without the percent signs in
math expressions. They will automatically be expanded to their value in math.
However, if a variable contains an expression you wish to evaluate, you must
still use the percent signs around the variable name.

In stead of using an explicite end label and goto-ing to it, you can use.
goto :EOF
Notice the explicite colon in there which is normally not a requirement in label
names after the goto command. You don't have to specify a label named EOF at
all. For those of you who are interested, EOF is programming jargon and very
common in languages like C. It stands for end of file. Ends of files are marked
by a special character. One popular option in DOS apps was ctrl+z (ASCII value
27) which is also the keypress you need to enter in files that are read from the
keyboard input (standard input that is STDIN).

NOTE ON STRINGS

OK I wonder why this took me so long to discover but I've noticed that there's a
serious bugg or omition in the string handling of batch files. Consider the
equality comparison:

if %_a%==%_b%

Unlike in even the most primitive of proper programming languages, strings are
always embedded directly in expressions and as far as the parser (computer)
goes, are indistinguishable from the code itself. Let me demonstrate. Suppose
that the variable a has the text not in it. One would expect the comparison to be
evaluated as:

"not"==%b

However, in batches the text not is just text so it gets interpreted differently as
not==%_b%. AS the keyword not can't be put right here, it generates a syntax
error at runtime. The worst thing is that this exceptional situation cannot be
handled or trapped in any way. Thus it is impossible to handle quite a large set
of strings in a batch file. This would be a security problem if batch files were
secure to begin with.

But what if we just stick quotes around the variables when they are compared,
like this:

if "%_a%"=="%_b%"

Though this hack seems better now, there's still the problem of the double
quotes themselves. Merely inputting one double quote character breaks the
script and there are no escape constructs as in programming languages in
general. Lastly, delayed variable expansion might be one way around the issue
and I ought to include it to make this tutorial complete. However, by the time i
had found out there's such a thing, I had already lost most of my interest in
batches and so I've dropped it. You can read up on the subject in the set and
cmd help screens. Do let me know if you're able to device a solution.

But despite the flawed string handling, let's dive into integer arithmetic now.
The expression evaluator in set supports a number of operators including the
usual: + - ( ) * / % (integer remainder, actually two %% signs in batches,
expression must be enclosed in double quotes for this to work). In case the
modulo operator is new to you, here's how it works with some example input
(minus signs at the left side of the percent sign affect the result by "negating" it
but those at the right have no effect):

left % right = modulo


1%2 = 1
7%9 = 7
8%8 = 0
10/3 = 1
7/4 = 3
You get the picture. The modulo operator is useful for getting a given range out
of random numbers among other things.

Other than these basic math operations, there are more exotic ones such as C-
language like bit shifts and bitwise logical operators. However, we won't be
going through them here as they are rarely needed as you don't usually work
with individual bits or binary data in batch programs.

Just like in C and Java, the following shorthands for + - * and / are supported:
+= -= *= /= %=. They require only a single "value" at the right hand side and
have the following equivalents:

a += b is a = a + b
a *= b is a = a * b
and so on.

These constructions are nice if you know you'll be using the previous value of
the variable being assigned to in your calculations. Thus the counter in the
previous program could have been increased with:
set /a %_number%+=1

Again I strongly advice you to refer to the help screen of set for all the detais on
the various operators.

Although the arithmetic in set can be tested on the command line by simply
typing set /a expression, here's a little calculator program that will evaluate user
specified expressions and show there results on the screen until the user types
in exit. Note that the user may refer to the previous result by typing _result as
part of the expression. Also, the percent signs around _string are still necessary
when the result is defined in the calculations, without them set doesn't seem to
evaluate _string as an expression.

calculate.bat:

@echo off
echo Type exit when done.
:start
set /p _string="> "
if /i "%_string%"=="exit" goto end
set /a _result=%_string%
echo %_result%
goto start

:end
echo Bye.
echo.

set _string=
set _result=

The Else clause, String and Numeric Comparisons

Although the changes are not as revolutionary as in set, the if command has
also acquired some new features in Windows 2000 and XP. One of the handiest
is the else-clause. Here's a simple example:

IF EXIST filename. (
del filename.
echo deleted
) ELSE (
echo filename. missing.
echo try again.
)

The parenthesis are strictly necessary,

Because of DOS legacy the parser is really picky about the syntax, the format:
if condition (
do stuff
) else (
do something else
)
helps to minimize errors and avoids most of the common pitfalls.

Although I don't fully understand it, it seems that the DOS-like variable
expansion is stupid and tends to expand variables unexpectedly and
prematurely. It considers for and if blocks single statements and does variable
substitution all at once in one quick swoop, giving unexpected and eronious
results. The help screen for set warns that the following won't work:
set LIST=
for %i in (*) do set LIST=%LIST% %i
echo %LIST%

One way to workaround this is to enable delayed environment variable


expansion (see set /? for details). The catch is that this requires either
modifying the registry or spawning another instance of the command interpreter
having delayed variable expansion on (cmd /v:on /c "batname or command").
When you are using delayed variable expansion, the variables whose contents
you want to be evaluated dynamically every time in a block, must be enclosed
between exclamation marks, that is: !variable! in stead of %variable%. The
above list example could be written as:
set LIST=
for %i in (*) do set LIST=!LIST! %i
echo %LIST%

Provided that delayed variable expansion is enabled iehter in the registry or by


spawning a new cmd.exe as described above. Generally speaking, I've found
that the default-style immediate variable expansion works just fine in most
cases. You ought to consider using dynamic variable expansion if your batch
scripts are giving eronious and counter-intuitive results or alternatively when-
ever you are using more than one statement in an if or for-clause.

Further more, you can now properly compare strings in textual form like this:
if left comparison right
where left and right are strings or environment variables and comparison is one
of the following
EQU (equal), NEQ (not equal), LSS (less), GTR (greater), LEG (less or equal) or
GEQ (greater or equal). comparisons are case sensitive by default but you may
change this with the /i switch right after the command if.

The following example program keeps asking for passwords until they are equal.
We'll be covering both string input and comparisons here, also notice the use of
parenthesis:

password.bat:

@echo off
set _original=a
set _retry=b

:start
set /p _original="Type in your new password: "
set /p _retry="Retype the password, please: "
if %_original% equ %_retry% (
echo password changed succesfully.
goto end
) else (
echo The passwords don't match, please try again.
echo.
goto start
)

:end
set _original=
set _retry=

Another useful property of the string comparison operators is that if the strings
are numbers, they are compared as such and not as ordinary strings. To
demonstrate this concept, here's a program that will print all integer powers of
the first command-line argument until the exponent equals the second
argument:

power.bat:

@echo off
rem checking input
if %1x==x goto end
if %2x==x goto end

rem variable initialization


set /a _base = %1
set /a _power = 0
set /a _destination = %2
set /a _result = 1

echo BASE POWER RESULT

rem the main loop


:Start
if %_power%==%_destination% goto end
if %_result% LSS 0 goto overflow
echo %_base% %_power% %_result%
set /a _result = _result * _base
set /a _power += 1
goto start

rem In case the result overflows


:overflow
echo.
echo Too large a value. Exiting....
rem cleaning up.

:end
set _base=
set _power=
set _result=
set _destination=f

There are a number of key points I'd like to emphasize here. Firstly, pay
attention to the easy way of determining if a command-line argument is not
given by comparing it and x against x. If the argument is empty the condition is
x==x which is always true. This way is also portable, also working in Win 9x
and DOS.

Unlike in many of the other batches in which error checking was omitted
completely for brevity, here we do check that the user suplies two command-
line arguments to prevent an infinite loop.

In the strings passed to the echo statements a run of three spaces should be
replaced by a real tab character (press tab in your text editor) so DOS will print
the output in tabulated form.

We also use the LSS operator for arithmetic comparisons to prevent an


overflow. An overflow is a situation in which the maximum range for some
datatype is reached and the value wraps around to the minimum value of that
datatype. The reason for this is deep in how numbers are represented in binary
and we won't cover it in here. An extra bit: the opposite of overflow is called an
underflow.
The Indispensable For Command

The for command has got some extremely useful additions to it that make it
almost as worthwhile as set is, if not even better. Just like the if command does
nowadays, for wil also accept more than one command if you use parenthesis
after the keyword do. Although this capability comes in handy, it is sometimes
preferable to use the call command and break up the action in several batch
files. Remember the index builder example?

If you need to pass on all command-line arguments passed to a batch file, e.g.
in a for-loop, there exists a special shortcut notation. In stead of saying %1 %2
.. %n-1 %n where n is the number of command-line arguments passed to a
batch file, you can say %* which does exactly the same thing. The asterisk here
means every element (except %0, which is batch specific anyway).

It is now quite possible to process not only the files in the current directory but
also (recursivly) all the other files in the sub-directories matching the criteria
you specified. You do this with the new /r switch, otherwise the syntax is
unchanged.

for /r %%a in (c:\*.bak *.tmp *.) do del "%%a"


Would delete all of your temporary files (ending in .bak or .tmp) in all of the
directories of drive c. Remember that if you wish to do the same thing on the
command-line, you must use only one percent sign. Similarly the switch /d can
be used to process directory names rather than file names as usual. Another
method to achieve the same thing would be to use the for command to process
the output of dir with the /a switch. The /a switch let's you list files matching
the given file attributes only specific types of files. The type d is a directory and
minus d means the inverse i.e. a file.

In addition to iterating through a set of files or directories, for has several other
tricks up it's sleeve. One of them is the /l swich which, in stead of going through
a set of file names, goes through a set of values. The syntax is
for /l %%a in (beginning step end) do command
It's syntactically close to the previous for statements apart from the begin step
end bit. Begin and end define an inclusive range of signed values (either or both
may be negative) and step, which may also be negative, specifies a value which
is added to begin in each iteration until end is reached. Perhaps a few examples
will make things more clear:
for /l %%a in (0 2 100) do echo %%a
Prints all, even whole numbers between one and one hundred. Similarly,
for /l %%a in (1 2 100) do echo %%a
Would print all odd ones (99 being last). Where as:
for /l %%a in (10 -3 -10) do echo %%a
Would sweep through the range 10 to - 10 in "decrements" of -3.
for /l %a in (0 1 7) do @for /l %b in (0 1 7) do @echo %a%b
Is a nested loop. It prints the numbers 00 to 77 that is two-digit octal numbers.

For has enough sense to reject potential over or underflow cases. Things like:
for /l %%a in (1 -1 2) do echo %%a Will be rejected off the bat so nothing is
echoed by echo %%a.

In addition, I just discovered that you may give the range or the step size in
hexadecimal by prefixing the value with 0x. This is an undocumented feature.
Octal will also probably work, though I haven't tried.

I haven't found any real use for this ability to generate increasing or decreasing
sequences of numbers, however, here are some potential uses:

• The /l switch could be used like the conventional for-loop in other


programming languages. However, I find manually increasing counters
with set /a is both more flexible and easier in the long run. In fact. I'm not
even sure if for commands can be nested, that is, if it is valid for the
command part of for to contain another for command.

• Secondly, the counter could be used to generate a large number of


variables. Although it would eat lots of memory, there's not that much for
environment variables, this could be used to generate arrays as follows:
for /l %%a in (0 1 100) do set _%%a=0
Creates 100 new variables titled _0 to _100 which can then be used like
array indeces among other things. This array emulation does work in
principle but accessing the elements is harder than one might think. Not to
mention that the speed is terrible. For a working example program about
arrays, see the end of this tutorial.

One of the most useful switches of the for command is the /f switch. It will
process a file or the output of a command breaking it up into lines. Then it will
break up each line into tokens as specified by the user and allocate the bits it
tokenized into scratch variables. In most cases the syntax is
for /f "options" %%a in (fileSet) do command %%a %%b %%c ...
The %%a and do command parts should already be familiar from other
examples. Options is a double-quoted string of options that will determine how
lines are tokenized and which tokens are put into scratch variables. The format
is:
"option[=attribute] option[=attribute] ..."
Here are some options you'll probably use very often:
eol=character If this character is encountered, ignore the rest of this line and
move onto the next one.
skip=number Number of lines to skip from the beginning.
delims=characters each of these characters are taken as token delimiters, in
other words, when any of these is encountered, it ends one token and starts
another. These are tab and space by default. I recommend specifying the
delimiters last as this avoids unambiguous bits if you want to include space in
the set of token delimiters (space is also the delimiter of the option=attribute
pairs in the option string).
tokens=numbers Specifies which of the found tokens, per line, are to be
allocated their own scratch variable names. Tokens 2,4 would only take the 2nd
and 4th token and 1-5 would take the first five. The last character may be a star
(*) and is taken as the rest (non-tokenized remains) of this line.
usebackq If this option is present, a string enclosed in `` characters, found
after the "in (" part, is executed as a command and the output of that command
is tokenized.

Notice that the string in back quotes must be a simple command, ordinary
piping, for example, is not allowed and is treated as a syntax error. However, if
you escape the pipe symbol with an up arrow "^|" in stead of just '|', it does
work. However, you still cannot use other redirection characters in there like <
> and >>. Finally, piping stuff in the do part of a for loop is quite possible, too.

Don't be troubled if this /f swich seems cryptic at first. It's very hard to explain
clearly without resorting to some examples. So a number of them will follow.

Firstly, here's a heavy one that will run the help command extracting all of the
command-names from the listing. command-names are strings that are at the
very beginning of the line and are delimited by spaces. Then it will run all of the
commands listed in help with the /? option, appending their help screens
together. In the process, lines that have only the word "command:" in them will
be added to make finding the start of the next command's help screen easier.
The listing generated by these two batch files is a very nice and clear command
reference. Certainly a lot more convenient than hunting around the MS Website
or browsing the help screens separately.

gethelp.bat:

@echo off
set _filename=help.dat
set _filename2=commands.dat
help |find /v "command-name" |findstr /r /v "^[^A-Z]" >>%_filename%
for /f "tokens=1 delims= " %%a in (%_filename%) do call _helper.bat %%a
del %_filename%
set _filename=
set _filename2=

_helper.bat:

@echo off
echo. >>%_filename2%
echo command: >>%_filename2%
%1 /? >>%_filename2%

Phew, that seemed cryptic, didn't it? Let's go through the hardest parts
separately:
help |find /v "command-name" |findstr /r /v "^[^A-Z]" >>%_filename%
Is a complex example of piping and redirection. It runs the help command,
passing the output to the find command who, in turn, passes it to findstr.
Finally, the output of findstr is appended in the file denoted by the variable
%_filename%. The /v option tells find not to show the lines that contain the
word command-name. This trick is necessary to get rid of the first line of the
help command which reads:
For more information on a specific command, type HELP command-name
This is a safe operation because this is the only line containing that word.

The findstr command is not part of the DOS heritage but rather added in
Windows 2000 I think. It's ke find except much more powerful. The /r switch in
particular tells that in stead of looking for a single string, look for any one of a
set of strings that are specified by a regular expression. Unless you are a
programmer or have been working with Unix, you probably won't know what a
regular expression is (basically a heavily boosted version of a DOS wildcard
expression). The topic is fairly complex and I've just gotten started in it, so this
is totally out of the scope of this tutorial. For more information, see this regular
expression tutorial .

To cut a long story short "^[^A-Z]" means a set of strings that have a non-
letter character at the very beginning of the line. The /v switch tells findstr to
display all but the lines that match so the end result is that only the lines having
a command-name at the very beginning of the line are printed. Without this
step, longer command descriptions that wrap to the next line and are indented
with spaces, would also be processed.

After exhaustive filtering for /f "tokens=1 delims= " %%a in (%_filename%) do


call _helper.bat %%a extracts the command-names from each line in
%_filename%. Only the first token is extracted and tokens are delimited by
spaces. In other words, only the first word on a line, that is the command
name, is extracted and assigned to the variable name %%a. The variable %%a
is then passed to _helper bat which will append the text:
command:
to the file as well as running commandname /? and appending the output to the
file commands.dat. Although some of the boosted DOS commands for Windows
will normally pause between screenfuls of output, they are smart enough not to
pause if run in a batch file.

If other tokens are extracted by specifying a different set of tokens, then


additional variables like %%b %%c and so on are allocated for the selected
tokens, respectively. after %%z comes %%A I think and %%Z is the absolute
last scratch variable that is token number 52.

As another, actually a lot simpler, example of tokenization consider a genral


purpose environment variable cleaner. Knowing that all of our temporary batch
file variables start with an underscore makes it relatively painless to device a
general purpose temporary variable cleaner that can then be called at the end
of most batch files. Here's how the script looks:

_cleanup.bat:

@echo off
for /f "usebackq tokens=1-2 delims==" %%a in (`set _`) do set %%a=

We'll take the first two tokens delimited by the equals sign. The usebackq
option specifies in this case that we are dealing with command output, namely
the output of the set command.

One addition to the set command which wasn't mentioned earlier is that passing
it an ordinary string will list all environment variables that start with that string.
Thus we list everything starting with an underscore and for each line call the set
command again passing it the first, extracted token (the variable name before
the equals sign) and an another equals sign. Here's how we've effectively unset
all variables starting with an underscore. Notice that we didn't actually use the
second token anywhere so we might just as well have specified tokens=1 or
omitted the tokens part because the default is only the first token.

Finally, if you are processing file names, the for command does support new
variable substitution options. These options will only work for for scratch
variables and arguments from percent 1 to percent 9. They will not work for
ordinary environment variables. The syntax is
%%~a
Where %%a makes up the variable name and letters can be, among others,
any of the following file properties (see for /? for help):
f: the full path.
d: drive letter
p: path
n: basename
x: extension
s: shortname
a: attributes
t: date and time
z: size

Note that only including the tilde character (e.g. %~a) without any of these
modifiers removes quotes around a file name string, which can come in quite
handy.

Here's an example program that processes a set of files and adds sequential
numbering to each file's base name:

number.bat:

@echo off
set _counter=0
for %%a in (%1) do call _rencount.bat %%a
call _cleanup.bat
_rencount.bat:

@echo off
for %%a in (%1) do ren %%a %%~na%_counter%%%~xa
set /a _counter+=1

Number bat simply iterates through the user-specified set of file names calling
_rencount.bat and passing each file name to it as an argument. Notice the call
to _cleanup at the end (see the previous example).

_Rencount takes the file's base name, adds the value of the _counter to it and
appends the extension. Then it simply increments the counter.

The tricky part is:


for %%a in (%1) do ren %%a %%~na%_counter%%%~xa
As %1 is actually a single file name, the "inner" for-loop is run only once per
file. Notice that environment variable names need percent signs around them
where as the for scratch variables start with %% then a ~ and any of the
attribute letters (n for base name, x for extension) and finally the letter
identifier, in this case a. And to confuse matters more, command-line arguments
are prefixed by only a single percent sign followed by a number. Thus:
%%a: for scratch variable
%%~na: base name of scratch variable
%1: command-line argument (passed in call)
%_counter%: ordinary environment variable.

There are pit falls when combining several file attribute tokens in printed
output. To print the name, extension and size in bytes of each file in the current
directory separated by spaces, you must use the following line (substitute %
with %% if using this in a batch file):
for %a in (*) do @echo %~na %~xa %~za
It is likely you would have initially tried something like this:
for %a in (*) do echo %~nxza

Here are the differences. Although %~nxza specifies explicitly the fields name,
extension and size in this order, the output is quite different. Regardless of the
order, a preset and undocumented field order is used. In the above example,
the order is first the size (and a tab), followed by the file name and extension
(the period is also taken to be part of the extension). In the first example we
separated the arguments for echo by spaces and repeated the argument name
%a for each field. This way it is possible to always guarantee a certain desired
order. Secondly, notice that, as we are running this in the command line, the
first form includes an at sign before the echo statement. The at can also be
used outside batches and without it all of the echo comamnds run by the for
loop would also be shown.

If all of the fields are specified without repeating the environment variable for
each field the full preset order and format is:

• Attributes (including NTFS-specific): a dash for an unset attribute and a


respective letter for each set attribute like in the attrib command (r for
read only. h for hidden etc...)

• Date and time: Most likely (if not always) in the format dd.mm.yyyy
hh:mm

• Drive letter: In upper case folllowed by a colon.

• The path: Without the file name and starting and ending with a back
slash. The full path option is different, it is equivalent to drive path base
extension. If both the full path and the path is specified or if other path
related fields are requested, the full path option will work exactly like the
path option (the path option is ~p where as the full path is ~f).

• File size: In bytes, not rounded to the nearest cluster size and without the
thousand separator or any unit indicator. base name: The file name
without the last period and everything followed by it. Extension: The sub-
string starting from the last period of the file name till the end.

More String Processing and Some Magic Variables

There's yet another way of processing environment variables that, although


listed as part of set, is not specific to the set command. The form:
%varname:expression% will snip or replace parts of a variable's value.
%varname:first=second%
will replace the substring first, if found in the value of the variable name, with
second. Similarly, %varname:first=%
would delete all occurances of first.
You can also extract sub-strings: %varname:~2% (notice the colon before the
tilde) Would include the third character of the variable value and everything
beyond that till the end of the value string (character indeces are counted from
zero).
%varname:~2,4%
Would extract characters 3 to 7 (position 3, length 4).
%varname:~0,-2%
Would extract all but the last two characters. Where as:
%varname:~-2%
would extract only the last two.

To show how string substitution works in practise, here's a batch file that will
convert spaces in file names to underscores (_). This might be handy if
preparing files for the WWW, for instance.

nospaces.bat:

@echo off
set _file=a
for %%a in ("* *") do call _killspace.bat "%%a"
set _file=

_killspace.bat:

@echo off
set _file=%1
ren %1 %_file: =_%

This is pretty simple in comparison to what you've gone through previously, and
hopefully won't need much explaining. Notice that substituting ren %1 %_file:
=_% to with %1 %_file: =% alters the behavior so that in stead of substituting
spaces with underscores, the spaces are removed.

The most apparent limitation regarding strings in batch files is the lack of an
indexing operator or command. There's no easy way of iterating through or
picking each character in a string and doing something to it, nor is there any
easey way of getting the length of a string, as ffar as I know. One way to
indirectly calculate the length of a string, though, is to copy the contents to a
file via output redirection and determine the length of the string by the file size.
Usually the size in bytes is the same as the size in characters minus three
(carriage return and line feed if using the echo command and then the end of
file character). Each line in DOS and Windows ends in two nonprintable ASCII
characters as in a teletype, the carriage return for getting at the beginning of
the line and the line feed for scrolling down the paper or screen if you will.
Because of the overhead related to actually reading and writing files on disk,
this is many times slower than in programming languages supporting strings
properly.

Another serious limitation worhty of mention is that you cannot nest


environment variables inside of each other. This means that if attempting to
specify sub-string indeces or replacements using an environment variable, the
batch interpreter will not do what you might expect. It would be really cool if
there was a workaround for this. To demonstrate with a snippet:

set _string=%1
set _index=%2
set _find=%3
set _replace=%4
set _result=%_string:~-%_index%%
echo %_result% Last %_index% characters of %_string%
set _result=%_string:%_find%=%_replace%%
echo %_result% with each %_find% replaced by %_replace% in %_string%

Finally, there's one additional set of magic variables originally documented as


part of set but actually available almost everywhere. These variables should
always be usable in batch files:
%CD%: current directory.
%DATE%: Date in the same format as given by the command.
%TIME%: Time in the same format as given by the command.
%RANDOM%: A pseudo-random, positive 15 bit integer.
%ERRORLEVEL%: The most current error level (return value) from an
application.

It is possible to assign to some of these variables. You can assign to errorlevel


to signal a return value from a batch file being called, for instance. However,
there's only one errorlevel variable so you might get odd results if you aren't
careful enough. It is also a good idea to reset the error-level to zero in the last
bat-file which usually signals succesful program execution (no errors).
Bigger Example Programs

To wrap up this tutorial, here are some slightly larger example programs to get
you started. In addition to demonstrating batch programming, some new
concepts are covered on the way.

Emulating Gosub

Unless you are familiar with some dialect of basic, the term gosub may seem
alien. It is basically an enhanced version of goto which remembers the label
from which you started so you can go back to where you left. It can even
remember more than one label. A gosub is not nearly as useful as true functions
in a programming language but it is the next step up from the goto command.
The example that follows is actually redundant. I was informed lately that
batches do support a gosub syntax. See the section about batch programming
style.

So to implement a gosub we need a datastructure which let's you put in several


labels, pieces of text, and retrieve the labels in the reverse order. That is the
last label you put in is the first to come out, the second last is the second first to
come out and so on. This is known as a last in first out order, LIFO, and the
name for such a datastructure is a stack. The operations of putting in data and
retrieving it are called push and pop respectively.

From the implementation point of view, we cannot use an array as it is not


natively supported (but see the third example). In stead, we can put the pieces
of text in a single environment variable and delimit them with semicolons just
like the folder locations in the path environment variable (type in set path to see
it). In order to make the stack more generally useful, we can also put in several
related functions inside a single file. Here's how the stack code would look:

_stack.bat:
@echo off
goto %1

:push
set _stack=%2;%_stack%
goto :EOF

:pop
for /f "usebackq tokens=2 delims=;=" %%a in (`set _stack`) do set
_retval=%%a
for /f "usebackq tokens=2* delims=;=" %%a in (`set _stack`) do set
_stack=%%b
goto :EOF

:peek
for /f "usebackq tokens=2 delims=;=" %%a in (`set _stack`) do set
_retval=%%a
goto :EOF

:clear
set _stack=
goto :EOF

The variable _stack holds the stack contents and persists as long as it has not
been unset. The file can hold several independent functions because the first
argument passed to it is the name of the function to call, that is the label to
which we go to. Each label in turn goes to the end of the file as soon as it has
done its job. This way the batch file name _stack serves as our little name
space or module and each label corresponds to a function or procedure. The
mechanism of passing back values is through a global variable called _retval
short for return value.

As to handling the stack the push command appends the second argument in
front of the current stack contents using the set command:
set _stack=%2;%_stack%
So the stack grows on the right and new values are entered on the left near the
equals sign. Each value also ends in a semi colon to mark where the next one
begins. This code will only work assuming our data has no semicolons but its a
fair bet. And we could change the separator or even let the user define it.
Poping the data is about filtering the output of the set command so that only
things near the first equals sign and the first semicolon are assigned to _retval.
Here's how that would look:
for /f "usebackq tokens=2 delims=;=" %%a in (`set _stack`) do set
_retval=%%a
Then we'll need to remove the bit we just saved. The easiest option is to use
another for, using the asterisk token to grab the rest of the line and assign it as
the new stack contents as follows:
for /f "usebackq tokens=2* delims=;=" %%a in (`set _stack`) do set
_stack=%%b
note that as a special case when there's only one element left, there's nothing
after its semicolon, so the stack gets automatically unset as a side effect, neat.
Finally, the command peek is like pop Without changing the stack and clear
simply unsets the _stack variable. It

The order in which you push and pop stuff in the stack is important. That is
whether you'll read the list of values on the left or right side and whether it
expands to left or right when you add to it. In case of our stack we add to and
read the left side so the most recently added thing is the one read first. But
merely adding on the right in stead, is enough to turn the datastructure into a
queue. A queue is a data structure where the thing you put in first is read out
last. The changes you would need to make it a queue in stead, were:
set _stack=%_stack%;%2
rather than
set _stack=%2;%_stack%

In the real world the terminology for a queue is unshift rather than push, and
shift rather than pop, but that's beside the point. By the way, if the term shift
rings a bell, you can think of the command-line parameters passed to a batch
file as a queue.

As the stack is a relatively complex beast, it would be a good idea to have some
code for testing and understanding its inner-workings. Here's a test script which
you can run to see how the stack behaves:

test_stack.bat:
@echo off
call _stack clear
echo empty stack:
set _stack
call _stack push 1
call _stack push 3
call _stack push 5
call _stack push 6
echo Stack after pushing 4 values:
set _stack
call _stack peek
echo Top item is:
set _retval
call _stack pop
echo Popped the top which is:
set _retval
echo The stack is now:
set _stack
call _stack pop
echo After another pop the stack is:
set _stack
echo and the value popped was
set _retval
call _stack pop
call _stack pop
echo After two more pops the stack is:
set _stack
echo And the last value popped:
set _retval
echo Push yet another item:
call _stack push last
echo Which grows the stack to:
set _stack
echo But now we clear it:
call _stack clear
set _stack

There are several limitations in this stack. The most significant of these which I
have not mentioned earlier, is that there can be only one stack in existence as
the name of the stack variable is hard-coded. You might be able to have the
user specify it on the command-line, too, but I've left it out for simplisity. Not
hard-coding the stack would add the ability to create more complex data
structures, which would be groups of simple variables that the user need not
know about. The syntax of batchname, datastructureName, function,
parameters does bring object-oriented programming and abstract datatypes in
mind. Though truth be told the similarity is mostly superfluous, there's nothing
even remotely object-oriented in batch programming.

But what's all this got to do with gosub. Well, gosub is simply a stack of labels
or line numbers. When you go to something you make a label just before that
point, and push it on the stack. Then when you need to go back you pop the
top most label and go to it. Of course you might equally well hard code the
same label name twice without using the stack but it would not scale as cleanly
and you wouldn't have a genrally usable stack then. Still, it is apparent the
batch syntax doesn't support gosub as smoothly as even basic languages do. My
understanding is that in basic a gosub is a stack of line numbers and the
interpreter does the book keeping, pushing and poping for you without having
to code manual labels like this. Poping a return label from the stack is similar to
how functions are implemented in many programming languages. In stead of
labels you have addresses of machine code in memory to which the program
counter is set, but other than that it mostly works the same, ignoring local
variables and pass by value here.

Random Lines

This second script will use the random number generator to append arbitrary
lines from a set of user-specified files. It could be used to build a funny random
word generator (the funniest I've seen so far is the random tech phrase
generator). This batch script is stretching the limits of find and set a bit but it's
not that long after all.

randlines.bat:

@echo off
set _filename=index.dat
set _current=0
set _output=
set _linecount=1

:Start
if %1x==x goto end
call _rndline.bat %1
shift
goto start

:end
echo Output: %_output%
del %_filename%
call _cleanup.bat

_rndline.bat:

@echo off
for /f "usebackq tokens=2 delims=:" %%a in (`find /c /v "" %1`) do set
_linecount=%%a
set /a _current="%random% %% %_linecount% + 1"
find /n /v "" %1 >%_filename%
for /f "usebackq delims=] tokens=2" %%a in (`findstr /r "\[%_current%]"
%_filename%`) do set _output=%_output% %%a

The main program initializes some variables and then calls rndline.bat for each
output iteration. Finally, it prints the concatenated output generated by _rndline
and cleans up whene all the arguments have been processed.

The second batch file, _rndline.bat is worthy of some more detailed scrutiny,
though,

for /f "usebackq tokens=2 delims=:" %%a in (`find /c /v "" %1`) do set


_linecount=%%a
Is a clumsy way of getting the line count of a file. By looking for lines that don't
contain the empty string, we get all of the lines in the file. Then we just tell find
to count those lines and rip the 2nd token (the first that has a colon on the
left).This is because the format of find /c is:
---------- filename: linecount

Next comes set /a _current="%random% %% %_linecount% + 1" which is a


tricky bit of arithmetic. Notice the use of the modulo operator to limit the range
of the random output values. It seems that the percent signs around random
are mandatory, contrary to other, ordinary variables. I didn't get this example
working without using two percent signs for the modulo operator, either. And
the help for the set command states that double quotes must be used around
the expression if the modulo operator is to be used. These unexpected
limitations are unfortunate but can be worked-around.

Then, find /n /v "" %1 >%_filename% is used to make a numbered index of


the lines in this file. The n option pre-pends bracketed line numbers, starting
from one, before the found character strings. as there's only one greater than
sign the file to which the output is directed is overwritten on every invocation of
his auxiliary batch file.

Finally, for /f "usebackq delims=] tokens=2" %%a in (`findstr /r


"\[%_current%\]" %_filename%`) do set _output=%_output% %%a finds the
line having the chosen random number in the index file. Again some heavy use
of findstr, although the same could likely be achieved with find alone. We need
to use for for tokenization to display only the matching string without it's
bracketed number prefix. Notice that we are concatenating the result to the
previous output to form a longer string.

Array Emulation and Sorting

As another, optional extra consider dealing with a number of related variables in


the form of an array. Handling an array in a batch file is so slow and impractical
that this example is more like a proof of concept. You have been warned. Given
an array of random numbers, the goal is to sort and print those numbers from
smallest to largest. In programming an array is a sequential collection of similar
elements (of the same type e.g. integer). If we know the size of each element
and the number of the element we want, it is easy, in some programming
languages, to jump directly to the specified address in memory and get at the
element number, array index, desired. However, in batch files the best you can
do to my knowledge is to create a group of variables that have a common base
name followed by a number that can be used as an array index. I call this
technique array emulation as batches have no direct, built-in support for arrays.
To make this discussion more concrete, here's the sample code followed by a
rather longwinded explanation of what's going on:

_index.bat:

rem @echo off


for /f "usebackq tokens=2 delims==" %%a in (`set %1%2 ^|find "%1%2="`)
do set %3=%%a

array.bat:

@echo off
set _i=0
set _max=%1
set _retlabel=swapped1

:randomize
if %_i%==%_max% goto listbuilt
set _item%_i%=%random%
set /a _i+=1
goto randomize
:listbuilt
echo The %_max% unsorted elements are:
@echo on
set _item
@echo off

set /a _last=%_max%-1

:shorten
if %_last%==0 goto sorted

set _i=0

:swapping
if %_i%==%_last% goto afterswap
set /a _elem0index=%_i%
set /a _elem1index=%_i%+1
call _index.bat _item %_elem0index% _elem0
call _index.bat _item %_elem1index% _elem1
echo Comparing %_elem0% and %_elem1% at index %_i%.
if %_elem0% gtr %_elem1% goto swap
:swapped1
set /a _i+=1
goto swapping

:afterswap
set /a _last-=1
echo %_last% partially sorted elements left.
goto shorten

:sorted
echo And after sorting:
set _item
@echo off

goto end

:swap
set _temp=%_elem1%
set _elem1=%_elem0%
set _elem0=%_temp%
set _item%_elem0index%=%_elem0%
set _item%_elem1index%=%_elem1%
echo Swapped. New order is %_elem0% and %_elem1%
goto %_retlabel%

:end
call _cleanup.bat

The first thing we'll consider is representing the array. In this example it is a
collection of environment variables from _item0 up to the maximum specified by
the first command-lien argument minus one. So if you wanted an array of ten
elements you would have variables _item0 to _item9 respectively. For the
sorting we'd like to make these variables random. The variable random
produces such numbers easily enough but the obvious choice of using a for /l
loop and set for generating the numbers would fail. The reason has to do with
immediate variable expansion namely that the variable random is evaluated only
at the very beginning of the for command and thus all items get the very same
random number. One fix is using another batch to generate the numbres but
I've chosen a different approach relying on set, if and labels as expected. See
the code associated with the label randomize for details.

When it comes to printing the produced numbers, I took the lazy route and
merely used set to display all variables beginning with _item. Though easy, this
approach will fail if some other variables start with the same name and you
don't have too much control over the display format, either. The code is below
the label listbuilt.

before we consider the sorting, there's the problem of accessing the newly
created array elements. Unfortunately, I have not found a way that would let
you get the element number specified by a variable, directly. Your first attempt
might be something like this:

echo %_item%%_i%

In stead of getting the value of the variable, it prints the expression that would,
if re-evaluated, produce the desired value. Even if you stored this expression in
yet another variable and tried running it as part of a batch file, it will not, oddly
enough, work as expected. Also neither will the command line arguments nor
for loop temporaries interpolate inside other environment variables, I've tried
that, too. I've been told delayed variable expansion, which I don't use in this
tutorial, may provide a solution:
!_array%_index%!

The best solution I can offer so far is to resort to an external batch file again.
Please let me know if you know of a better or faster way to index an emulated
array. The auxiliary batch _index.bat does the job given the base name of an
array, the index of the desired array element (starting from 0) and the name of
the variable to which the array value should be copied. The code is the longest
line in the example and it uses the output of set parsed through a for command.
Yet again set fetches a list of all variables starting with the combination of the
base name for the array and its index. I say the beginning because there can be
many such variables. For example, if you called the batch as:
call _index.bat _item 4 _fifth
You would be requesting the 5th element, remember we count from 0, of the
array named item and the result should appear in _fifth. However, if there are
over 50 elements, _item50 also begins with _item5 and is listed in the set
output. In the script the output of set is parsed through for to get at the
variable value at the right side of the first equals sign. As all lines are processed,
however, the last line and so the largest matching array element, sets the final
variable value. Conclusion, multiple lines in set output must be eliminated and
this is accomplished by piping the set output through find to also include the
equals sign right after the variable name. In the above example only the set line
matching _item5 followed by equals will be printed and so we have guaranteed
that the correct array index is assigned to _fifth.

One common programming error in arrays is trying to get at array elements that
do not exist. This is referred to as a buffer overrun as you run over the end of
an array. Using my array indexing script this error is checked in set and you get
a warning about a non-existant variable. To make this a more fatal error, you
could check the error level of set which is non-zero if something went wrong.
I've omited this check for brevity.

Now that we can access the array elements, the last problem is sorting the
array. The algorithm used here is a Bubble Sort which is one of the simplest
sorting methods out there. The basic idea is going through the array multiple
times comparing successive elements e.g. element numbers 0 and 1, 1 and 2, 2
and 3 and so on. In the script the variable i is used for this purpose. If the
elements are in the wrong order, the first (element0) is greater than the second
(element1) if sorting ascendingly, then the two elements are swapped. Once the
end of the array is reached, we know that the largest element has bubbled up
to the end of the array.

Say we find an element in the middle of the array which we know to be the
largest. We compare it with the next one and discover that the element is
larger, and we should swap the two. After this we examine the next element, so
the item that was the second one before swapping is the first one in our next
comparison. This is why a single element can move more than one position
during our pass through the list. This bubbling thing, if you will, made bubble
sort hard for me to understand initially.

Once we have gone through the whole list, the largest element is at the very
end and so we know we won't need to check that element again (the variable
_last tracks this in the code). But we do need to go through the list again to
make sure everything is in order (pun intended). After the second pass we have
the second biggest item in place. This multi-pass approach continues until we
have sorted the whole list. For more info and better explanations, Google or
check the Wiki article to which I linkd earlier.

As far as the code itself is concerned, the swap procedure is worth checking
out. To exchange two values, regardless of language, you need to save away
one of the two (elem1), assign to the original the other variable (elem0) and
lastly assign the saved value (temp) to the other variable (elem0) itself. In code
this is expressed as:

set _temp=%_elem1%
set _elem1=%_elem0%
set _elem0=%_temp%

Despite the index numbers the two elements are not really treated as an array.
Though conceptually, you could think of them as a 2-element array. To further
confuse matters the changes are not made in the array directly but the
swapping procedure, the label swap, works with copies of the elements namely
element0 and element1 because it is easier syntactically. However, near the end
of the label the changes are commited to the array directly to save the caller
from doing that.

Though the program is not exactly modular, you could imagine a situation in
which one might need to go to the swap label twice. Thus, a question arises,
how do you get back to where you left off? If the swap label had no goto,
execution would fall through to end which is obviously undesirable. To remedy
this we could add a goto to a fixed label as the last thing under the swap label.
This does work but execution after the swap label will always return to the same
point no matter where you started from. As a more general solution, I've used a
variable called _retlabel which the caller can set to tell the swap to which label it
should return. This is analogous to pushing the return address on the functions'
stack when you call a function in a programming language. Without that return
address the computer would forget where the function was called. This retlabel
is still a bit clumsy and a solution including nested gotos soon gets very ugly, oh
well.

Before I let you go, you might be wondering what the performance of this array
emulation and sorting might be. I ran a benchmark and was shocked to find out
that on a 1.8 GHz mobile Pentium with a gig of RAM sorting fifteen items took
twenty seconds. This is real depressing and obviously with this mode of array
emulation, working with larger data sets makes little sense. I next tried
removing the debug prints (echo commands) but it had no effect on the
performance. I reckon having to call, find, set and for for each array indexing
operation takes its toll. As does managing numerous environment variables and
doing all the swaps. The sorting method matters very little at this point as a
bubble sort should be good up to a few thousand items. For those of you who
are interested, the time it takes to run a bubble sort, regardless of language, is
proportional to the square of the number of elements being sorted.

I started this tutorial by mentioning that I tend to use Perl for programs like
this. As a teaser here's how you might write the above program, though the
sorting algorithm is pre-supplied, which I've been going on about for pages, in
Perl:

Perl code:

use strict;
my $max = 100; # Maximum number.
my $count = shift; # How many items.
my @numbers; # The array of numbers.
push @numbers, int($max * rand) for(1 .. $count); # Generate the numbers.
print "$count unsorted elements: @numbers\n";
@numbers = sort @numbers;
print "And afterwords: @numbers\n";

Or if you are a lazy typer:

push @nums, int(100 * rand) for(1 .. $ARGV[0]);


print scalar(@nums) . " unsorted: @nums\n", "afterwords: ", join ' ', sort
@nums;

As far as performance goes, Perl can randomize and sort ten million items in
less than a minute, even though it is interpreted. Though the method here is
merge sort which beats bubble sort quite easily. See the end of this page for a
Perl tutorial.

Epilog

Now that this tutorial is drawing to a close, I'd like to wish you good luck with
your own batch files. I truely hope this information was enough to get you
started with both the more advanced DOS stuff and Windows NT specific
additions. Lastly, it would be nice to hear what you think of this tutorial. Drop
me a line in case of any questions, additions, corrections and so on.

Thanks to Petteri Järvinen for writing the most excellent DOS book in Finnish
called PC käsikirja DOS 6.22. Also thanks to MS for writing a relatively friendly
disk operating system and taking the time to improve it's shell in recent years.

Related Links

A Perl Tutorial in a Similar Style to This One

If you have any questions, comments or suggestions


Drop Me a Line here

Back to Programming
Back to Main Page

Você também pode gostar