Introduction to Computing at SIO: Notes on Scientific Programming

Introduction to Computing at SIO: Notes for Fall class, 2012
Peter Shearer Scripps Institution of Oceanography University of California, San Diego November 30, 2012
ii
Contents
1 Introduction 1.1 Scientic Computing at SIO . . . . . . . 1.1.1 Hardware . . . . . . . . . . . . . 1.1.2 Software . . . . . . . . . . . . . . 1.2 Why you should learn a real language 1.3 Learning to program . . . . . . . . . . . 1.4 FORTRAN vs. C . . . . . . . . . . . . . 1.5 Python . . . . . . . . . . . . . . . . . . 2 UNIX introduction 2.1 Getting started . . . . . . . . . . . . . . 2.2 Basic commands . . . . . . . . . . . . . 2.3 Files and editing . . . . . . . . . . . . . 2.4 Basic commands, continued . . . . . . . 2.4.1 Wildcards . . . . . . . . . . . . . 2.4.2 The .login and .cshrc les . . . . 2.5 Scripts . . . . . . . . . . . . . . . . . . . 2.6 File transfer and compression . . . . . . 2.6.1 FTP command . . . . . . . . . . 2.6.2 File compression . . . . . . . . . 2.6.3 Using the tar command . . . . . 2.6.4 Remote logins and job control . . 2.7 Miscellaneous commands . . . . . . . . . 2.7.1 Common sense . . . . . . . . . . 2.8 Advanced UNIX . . . . . . . . . . . . . 2.8.1 Some sed and awk examples . . . 2.9 Example of UNIX script to process data 2.10 Common UNIX command summary . . 1 2 2 3 4 4 5 6 9 9 11 14 15 17 17 19 21 22 23 24 25 26 29 29 31 31 34 37 37 51 52 52 53 54 55 56
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
3 Generic Mapping Tools 3.1 Installation of Fink, GMT other useful tools . . . . . . . . . . . . . . 4 Fortran 4.1 Fortran history . . . . . . . . . . . . . 4.2 Texts and manuals . . . . . . . . . . . 4.3 Compiling and running F90 programs 4.3.1 The rst program explained . . 4.3.2 How to multiply two integers . 4.4 An important historical digression . . iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv 4.5
CONTENTS 4.4.1 Alternate coding options . . . . . . . . . . . . . . . . . . Making a formatted trig table using a do loop . . . . . . . . . . 4.5.1 Fortran mathematical functions . . . . . . . . . . . . . . 4.5.2 Possible integer vs. real problems . . . . . . . . . . . . . 4.5.3 More about formats . . . . . . . . . . . . . . . . . . . . Input using the keyboard . . . . . . . . . . . . . . . . . . . . . If statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 If, then, else constructs . . . . . . . . . . . . . . . . . . Greatest common factor example . . . . . . . . . . . . . . . . . User dened functions and subroutines . . . . . . . . . . . . . . 4.9.1 Subroutines . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.2 Linking to subroutines during compilation . . . . . . . . Internal procedures . . . . . . . . . . . . . . . . . . . . . . . . . Extended precision . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.1 Integer sizes . . . . . . . . . . . . . . . . . . . . . . . . . Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.1 Checking for problems with the -fcheck=bounds option 4.12.2 More about random numbers . . . . . . . . . . . . . . . 4.12.3 Arrays as subroutine arguments . . . . . . . . . . . . . . Character strings . . . . . . . . . . . . . . . . . . . . . . . . . . I/O with les . . . . . . . . . . . . . . . . . . . . . . . . . . . . More about multi-dimensional arrays . . . . . . . . . . . . . . . 4.15.1 Arrays of strings . . . . . . . . . . . . . . . . . . . . . . A more complex example of data processing . . . . . . . . . . . Example sorting routine from Numerical Recipes . . . . . . . . Example of saving values in a subroutine . . . . . . . . . . . . . Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . Array operations in F90 . . . . . . . . . . . . . . . . . . . . . . Allocatable arrays . . . . . . . . . . . . . . . . . . . . . . . . . Structures in F90 . . . . . . . . . . . . . . . . . . . . . . . . . . Writing fast programs . . . . . . . . . . . . . . . . . . . . . . . 4.23.1 The -O option . . . . . . . . . . . . . . . . . . . . . . . Fast I/O in Fortran . . . . . . . . . . . . . . . . . . . . . . . . . 4.24.1 Ascii versus binary les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . fractals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 58 61 61 62 63 65 66 68 69 71 73 75 77 80 81 86 88 89 90 93 97 100 100 105 107 111 113 116 117 119 121 122 124 127 127 133 137 137 140 143 143 144 145 147 148 148 150 150
4.6 4.7 4.8 4.9 4.10 4.11 4.12
4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21 4.22 4.23 4.24
5 Fun programs 5.1 Tic-tac-toe . . . . . . . . . . . . . . 5.2 Fractals . . . . . . . . . . . . . . . . 5.2.1 Plotting using MatLab . . . . 5.2.2 Plotting using Python . . . . 5.2.3 Good targets and more about 6 Lisas Python Notes 6.1 Introducing Python . . . . . . . . 6.2 An Interactive Python session . . 6.3 A rst look at NumPy . . . . . . 6.4 Variable types . . . . . . . . . . . 6.5 Data Structures . . . . . . . . . . 6.5.1 Lists . . . . . . . . . . . . 6.5.2 More about strings . . . . 6.5.3 Data structures as objects . . . . . . . . . . . . . . . .
CONTENTS 6.5.4 Tuples . . . . . . . . . . . . . . . . . . . 6.5.5 Dictionaries! . . . . . . . . . . . . . . . 6.5.6 N-dimensional arrays . . . . . . . . . . . Python Scripts . . . . . . . . . . . . . . . . . . A rst look at code blocks . . . . . . . . . . . . 6.7.1 The for loop . . . . . . . . . . . . . . . 6.7.2 If and while blocks . . . . . . . . . . . . 6.7.3 Code blocks in interactive Python . . . File I/O in Python . . . . . . . . . . . . . . . . 6.8.1 Reading data in . . . . . . . . . . . . . 6.8.2 Command line switches . . . . . . . . . 6.8.3 Writing data out . . . . . . . . . . . . . Functions . . . . . . . . . . . . . . . . . . . . . 6.9.1 Line by line analysis . . . . . . . . . . . 6.9.2 Main program as function . . . . . . . . 6.9.3 Scope of variables . . . . . . . . . . . . Combining F90 code with Python . . . . . . . . 6.10.1 Brute force f2py method: . . . . . . . . 6.10.2 Signature les . . . . . . . . . . . . . . . 6.10.3 F90 Surgery . . . . . . . . . . . . . . . . 6.10.4 A few more things you need to know: . Classes . . . . . . . . . . . . . . . . . . . . . . . Matplotlib . . . . . . . . . . . . . . . . . . . . . 6.12.1 A rst plot . . . . . . . . . . . . . . . . 6.12.2 Multiple gures and more customization 6.12.3 Adding text . . . . . . . . . . . . . . . . 6.12.4 Histograms . . . . . . . . . . . . . . . . 6.12.5 Pie Charts . . . . . . . . . . . . . . . . 6.12.6 Basemap . . . . . . . . . . . . . . . . . 6.12.7 Contour plots . . . . . . . . . . . . . . . Deeper into NumPy and Scipy . . . . . . . . . 6.13.1 More on slicing with arrays . . . . . . . 6.13.2 Looping through arrays . . . . . . . . . 6.13.3 Random Numbers . . . . . . . . . . . . 6.13.4 Normal Distribution . . . . . . . . . . . 6.13.5 Statistics in Python . . . . . . . . . . . Graphical User Interfaces - GUIs . . . . . . . . 6.14.1 Interacting with plots . . . . . . . . . . 6.14.2 Event Handling in matplotlib . . . . . . 3D plotting with Python . . . . . . . . . . . . . 6.15.1 Geoscience applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v 150 151 151 154 156 158 160 162 162 162 165 166 168 169 170 172 173 174 176 178 178 179 181 181 185 186 188 191 192 193 195 197 198 198 200 200 202 207 214 220 225 233 234 235 236 236 237 239 239 240
6.6 6.7
6.8
6.9
6.10
6.11 6.12
6.13
6.14 6.15
7 Peters Python Notes 7.1 How to multiply two integers . . . . . . . 7.1.1 Declaring variables . . . . . . . . . 7.1.2 Alternate coding options . . . . . . 7.2 Numpy . . . . . . . . . . . . . . . . . . . . 7.3 Making a trig table using a for statement 7.3.1 Numpy mathematical functions . . 7.3.2 Possible integer/real problems . . . 7.4 More about Python for loops . . . . . . .
vi 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 7.4.1 More about formats . . . . . . . . . . . Input using keyboard . . . . . . . . . . . . . . . If statements . . . . . . . . . . . . . . . . . . . If, elif, else constructs . . . . . . . . . . . . . . Greatest common factor example . . . . . . . . User dened functions . . . . . . . . . . . . . . 7.9.1 Python keywords . . . . . . . . . . . . . 7.9.2 Using functions in separate les . . . . . Arrays . . . . . . . . . . . . . . . . . . . . . . . Character strings . . . . . . . . . . . . . . . . . I/O with les . . . . . . . . . . . . . . . . . . . Using tuples and lists . . . . . . . . . . . . . . Plotting with Python . . . . . . . . . . . . . . . 7.14.1 Example of computing least-squares line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 241 242 245 246 247 249 250 251 253 254 257 257 260 265 266 267 269 269 269 270 271 273
8 LaTeX 8.1 A simple example . . . . . . . . . 8.2 Example with equations . . . . . 8.3 Changing the default parameters 8.3.1 Font size . . . . . . . . . . 8.3.2 Font attributes . . . . . . 8.3.3 Line spacing . . . . . . . 8.4 Including graphics . . . . . . . . 8.5 Want to know more? . . . . . . .
9 Postscript plotting 275 9.1 PSPLOT Fortran subroutines . . . . . . . . . . . . . . . . . . . . . . 279
Chapter 1
Introduction
This course is intended to help incoming students get up to speed on the various computing tools that will help them with their research and some of the homework assignments for other classes. The perspective is largely that of the Geoscience program at SIO, but we hope that the course is general enough to be useful for other students as well. All students should have access to a Mac and an account on the IGPP Mac network. Please let me know if you do not already have an account. If you are using your own Mac, you will need to install the following: Fink gfortran GMT (from nk) XCode Python TexShop TextWrangler There are download instructions on the class website. The planned class schedule is listed in Table 1.1. We will spend a few classes on UNIX, GMT and other topics, but the bulk of the class will be an introduction to the Fortran90 and Python programming languages. If you are already experienced in C or Fortran, you probably dont need to take this class (although you may nd Python and some of the other material and the geoscience examples of interest). 1
CHAPTER 1. INTRODUCTION Table 1.1: Class Schedule Wednesday Friday 09/28 10/03 UNIX 10/05 10/10 F90 10/12 10/17 F90 10/19 10/24 F90 10/26 10/31 no class 11/02 11/07 F90 11/09 11/14 F90 11/16 11/21 F90 11/23 11/28 Python 11/30 12/05 AGU week 12/07
Monday 10/01 10/08 10/15 10/22 10/29 11/05 11/12 11/19 11/26 12/03 12/10 UNIX GMT F90 F90 F90 F90 UC holiday F90 Python AGU week TTT playo
Intro UNIX F90 F90 F90 no class F90 F90 no class (Thanksgiving) Python AGU week
1.1
1.1.1
Scientic Computing at SIO

Hardware
Before 2004 or so, computer hardware used at SIO was of two main types: 1. UNIX Workstations (e.g., Sun, HP, Silicon Graphics, etc.) (a) (b) (c) (d) Scientic programming Data storage General research (some word processing, graphics)
2. PCs (Windows machines, Macs) (a) (b) (c) (d) Word Processing Graphics Presentations (talks and posters) (some programming)
However, these boundaries are now blurred because many PCs run Linex1 (an open source form of UNIX) and Apple has adopted UNIX for their operating system (starting with OS X). This permits these machines to be used for both serious scientic computations and word processing and more traditional PC activities.
1
Linex is pronounced similar to linen (see http://www.paul.sladen.org/pronunciation/)
1.1. SCIENTIFIC COMPUTING AT SIO
Ten to twenty years ago, most geophysics departments had networks of Sun workstations. Today these have largely been replaced with networks of cheap PCs running Linux or with Apple machines. At IGPP we now use Macs for everything except for certain specialized or high-performance computers maintained by individual PIs. For example, David Sandwells group maintains some Suns because some of their processing software runs only on that platform.
1.1.2
Software
Here is a list of programming languages and software used at IGPP (Items with * will be discussed in this class). 1. Programming (a) (b) (c) (d) *FORTRAN (most IGPP faculty use this, large library of existing code) C (more widely taught and used by computer scientists) C++ (extended version of C) Java (used a lot for web programming, has portability advantages, but often not fast enough for serious computing) (e) *Python (lots of buzz, Lisa Tauxe likes it) (f) MatLab (widely used but commercial product, previously presented in this class, but our aim is to replace it with Python) (g) HTML for web sites (most people now dont program in raw html, but use web designer software, such as DreamWeaver)
2. Community Programs (a) (b) (c) (d) Bob Parker programs (plotxy, color, contour) *GMT for mapping (UNIX based) SAC for seismic analysis etc.
Commercial Programs (a) MS Word for word processing (b) MS Powerpoint or Apple Keynote for presentations (c) *TeX/LaTex (no single vendor, dierent implementations available, les are compatible, best option for papers & thesis) (d) Adobe Illustrator (for graphics, preparing posters) (e) PhotoShop (f) Mathematica (g) etc.
CHAPTER 1. INTRODUCTION
1.2
Why you should learn a real language
Many students arrive at SIO without much experience in FORTRAN or C, the two main scientic programming languages in use today. While it is possible to get by for most class assignments by using Matlab, you will likely be handicapped in your research at some point if you dont learn FORTRAN or C. Matlab is very convenient for quick results but has limited exibility. Often this means that a simple FORTRAN or C program can be written that will perform a task far more cleanly and eciently than Matlab, even if a complicated Matlab script can be kludged together to do the same thing. In addition, Matlab is a commercial product that does not have the long-term stability of other languages, including large libraries of existing code that are freely shared among researchers. Your research may involve processing data using a FORTRAN or C program. If you do not understand the program, you will not be able to modify it to do anything other than what it can already do. This will make it dicult to do anything original in your research. You may resort to elaborate kludges to get the program to do what you want, when a simple modication to the code would be much easier. Worse, you may drive your colleagues crazy by continually requesting that the original authors of the program make changes to accommodate your wishes. Finally, you will be in a more competitive position to get a job after you graduate if you have real programming experience.
1.3
Learning to program
Learning to program can be intimidating, particularly if you have not previous experience. The worst thing to do is to buy a book on the language and try to read it. There is simply too much material and it seems overwhelming. DO NOT DO THIS! Instead, begin by writing the simplest possible program and get it to work. Then read just enough to add a single additional feature to your program and get that to work. You only really learn when you get your own programs to work, so the idea is to write lots of little programs that do various things and only gradually add in new concepts. There is no need to learn everything that the language will do all at once. Part of learning to write more complicated programs is to gure out ways to logically divide the problem into smaller pieces. This will come with experience.
1.4. FORTRAN VS. C
The nal project in this class will be to write a program that plays tic-tac-toe. This is a daunting task if one tries to write it all at once. But we will divide it into smaller parts for separate assignments over the weeks, and only put all the pieces together for the nal program at the end. Also, be aware the most languages have far more features than you actually need. Few of us except professional programmers learn them all. But we do not need to write professional programswe only need to learn to write practical programs that serve our own needs. For example, I probably only know about 20 UNIX commands. A real computer nerd would nd pathetic some of the ways that I do things. But its been enough for me to get by and to do the science that I want to do. (Having said that, it wouldnt hurt for me to learn more and Im going to look over some of Duncan Agnews notes from the 2009 class to nd some new tricks)
1.4
FORTRAN vs. C
Some years ago when I (Peter) rst started teaching programming in this class, I had to decide whether to teach C or Fortran. Like most SIO faculty of my generation or older, I was experienced in FORTRAN but had little exposure to C. After talking to some people who know both FORTRAN and C, however, I decided to bite the bullet and learn enough C to teach this class. This was motivated in part by the fact that C is now one of the standard languages taught in computer science departments; many of our incoming students have C experience but few have Fortran experience. Here is my summary of the advantages and disadvantages of either choice: FORTRAN Advantages Large amount of existing code Preferred language of most SIO faculty (most faculty are old) Complex numbers are built in Choice of single or double precision math functions
Disadvantages Column sensitive format in older versions Dead language in computer science departments C Advantages Large amount of existing code
CHAPTER 1. INTRODUCTION Preferred language of incoming students, some younger faculty Free format, not column sensitive More ecient I/O Easier to use pointers
Disadvantages Less user-friendly than FORTRAN (I think so, but others may debate this) Fewer built in math functions (but easy to x) No standard complex numbers (but easy to x) Easier to use pointers With reasonable compilers, both languages are equally fast. Ultimately, there are xes for both languages that permit them to assume the advantages of the other language. For example, you can use pointers in FORTRAN90 and you can dene an add-on set of functions to do complex arithmetic in C. However, after learning enough C to teach the class, I concluded that C is not a very user friendly language for those without programming experience. So I switched to Fortran90, an improved version of Fortran77, that combines the advantages of Fortran and C into a more user-friendly package.
1.5
Python
Python is an up-and-coming new programming language that is gaining popularity. It is very exible and combines many of the advantages of traditional programming languages (e.g., C and Fortran) with object-oriented languages (e.g., Java) and with scripting languages (e.g., UNIX shell scripts). It also has more integrated graphics options in its standard library than C or Fortran (which require use of non-standard plotting packages). Indeed it has so many features that it can be daunting for beginning programmers. For this reason, we are introducing it after the F90 part of the class. Python is free and runs on almost all computers. An interesting feature is that indentation is required in the block delimiters used in if statements and do/for loops. The idea is to enforce good programming style. Python was started in 1989 by Guido van Rossum from the Netherlands, who has been been given the name Benevolent Dictator for Life (BDFL) by the Python community. The name Python comes from the old Monty Python TV series and Monty Python references are common in example code.
1.5. PYTHON
Python is a dynamic programming language, which is not compiled before it is run (C and Fortran programs are rst compiled). Other examples of dynamic languages include BASIC and Matlab. Dynamic languages generally run much slower than compiled languages, which can be a disadvantage for serious data processing and number crunching. However, it is possible to call C or Fortran subroutines from within Python and we will learn how to do this. This provides a way to combine the speed of Fortran with the graphical capabilities of Python.
Peter Shearer, pshearer@ucsd.edu, x42260, Munk Lab, IGPP
Class web site: http://mahi.ucsd.edu/class233/
CHAPTER 1. INTRODUCTION
Chapter 2
UNIX introduction
You will very likely do your scientic computing in a UNIX environment. UNIX is by far the most common operating system for the workstations that dominate todays scientic computing. There are many dierent versions of UNIX. In this class we will be using OS X on the Macs, which is an Apple version of UNIX. Some of us still work some on the Suns, which use Solaris (also UNIX). There is also a free version of UNIX, called LINUX (pronounced lynn exs), that will run on most PCs.
2.1
Getting started
To use UNIX on the Macs, you will need to bring up a regular text-based window rather than the standard Mac windows. This can be done by running the Terminal program, which in normally in Applications/Utilities. We will assume in these notes that you are entering UNIX commands within such a window. In many cases you will need to have X11 (called XQuartz in the newest Mac OS) also running in order to display graphics, etc. Thus I recommend that you always run X11 rst, and then Terminal to create the UNIX text window that you will use. (X11 also has a text window but its scroll bar is not as nice as the one in the Terminal application). X11 (XQuartz) can be downloaded for free from Apple if you do not already have it installed on your machine. A note about UNIX shells: UNIX comes in dierent avors. We are going to assume that you are running what is called the C shell (csh or tcsh). However it is possible that your Mac is running something called bash, which many people think is better than csh. The default C-shells for Macs changed from tcsh to bash 9
10
CHAPTER 2. UNIX INTRODUCTION
between Jaguar and Panther (OS X 10.2 to 10.3). However because these notes are based on csh, you should make sure your Mac is running tcsh (if you think bash is better, than you probably are more of a UNIX expert than me and dont need to be taking this part of the course anyway!). To nd out what you are running, just look at the top of the Terminal window and it will say. Alternatively, type printenv within your UNIX shell and it will tell you lots of interesting things, including what your SHELL is. To set things up so that you are always in tcsh, run the Terminal program and select Preferences from the pull-down menu under Terminal at the top of the screen. Select Execute this command (specify complete path) and enter /bin/tcsh in the little box. Then just close the Terminal Preferences box and restart Terminal. To learn more about UNIX shells on the Macs, check out: http://www.macdevcenter.com/pub/a/mac/2004/02/24/bash.html I am by no means an expert on UNIX; probably there are several of you here that know much more than I do. I have learned only enough to get by and could benet from learning more. So Im just going to outline the basics here for the benet of those students who have not been exposed much to UNIX. The UNIX operating system was originally designed to run on mainframe computers where security was a big issue. You dont want users to be able to delete other users les or do other nasty things. So there are a lot of security features. The rst of these is the login and the password. You will be assigned a login name and password. The rst thing that you should do after you login is change your password so that only you know what it is. This normally happens automatically upon your rst login. If not, or if you later want to change your password, on the Macs, go to System Preferences and click on Accounts. To change your password on more traditional UNIX machines, use the command: passwd You should choose a password with numbers or special characters as well as letters. Do not choose a common word that can easily be guessed by outside hackers. The NetOps people here are very concerned about security issues and IGPP computers have been attacked several times already, although no damage has been done.
2.2. BASIC COMMANDS
11
Do not write your password down (like on a note next to your computer!) where it can be found. Do not tell other students your password. You can easily give them access to read your les without giving them the password. Of course all this makes it very hard to remember your password if you dont use the computer very often.... Let us assume that you have successfully logged on. Your will now be located in your home directory. UNIX uses a directory tree structure, similar to the folders used by PCs but without all the fancy icons. Normally you will have a cursor prompt that tells you what machine you are running on. In my case, this looks like: shearer@katmai 5> because my laptop (my primary computer) is called katmai. Your default prompt is likely to be dierent from this. If you want to have one like mine, enter set prompt = "%n@%m %h> " where n will give your username, m your machine name, and h the line number. You will nd it convenient to put this in your .cshrc le (see below). The notes that follow were written when I used a now defunct Sun computer named rock and so have rock as the standard prompt.
2.2
Basic commands
To nd out where you are, enter pwd which stands for Print Working Directory. In my case, this will give: rock> pwd /net/rock/shearer This shows that I am located in my home directory (named shearer) on rock. You can list the contents of your current directory with the ls command: rock> ls (Im not showing you the output because its too messy in my case!) UNIX has an online set of manuals that may be accessed with the man command. For example, suppose you forgot what pwd does. You could type man pwd and you would get a description of the pwd command. One annoying aspect
12
of standard UNIX is that the man command output uses a form of the VI editor rather than the normal window output, which permits scrolling up and down in the window. When you enter man pwd you will get a page of output on the screen with a colon (:) at the bottom. If you enter the space bar, you will then get the next page of output, etc. But you cant scroll backwards using the window scroll arrows. To go backwards, enter b at the colon prompt. When you get to the nal page, you will see END at the bottom. To leave the man pages and return to the normal terminal, hit the q key (for quit) at any point. If you nd this too annoying to deal with, you can always save the man pages to another le. To do this, enter: man pwd > man.pwd The output will now be directed into a le named man.pwd rather than to the screen. To look at man.pwd, use the cat command: cat man.pwd The cat command prints a text le to the screen. The advantage in this case is that you can use the scroll bar. Note that if you try to look at man.pwd using a text editor, you will likely see how kinds of weird control characters that are used to format the screen output. Of course, you risk cluttering up your directory with lots of les named man.whatever if you do this a lot. If you dont want to worry about deleting them, one solution is to use the name junk as a temporary place to store output. If you already have a le named junk, it will just overwrite the old le. This way, you only ever have one scratch le in your directory at one time. Another approach, if you access to the web, is to simply Google unix man pwd to bring up the man pages, nicely formatted! There usually are options to UNIX commands that can make them more useful for what you want. For example, ls simply lists the le names in your directory. If you want to nd out how big they are or when they were last modied, then use ls -l where the l stands for the long output option. In your home directory, you will have a very important le called .cshrc (see below). This le will not appear if you just enter ls. To make it appear, enter ls -a where the a stands for the all le option.
2.2. BASIC COMMANDS To change your directory, use: cd dirname
13
This will move you to the directory dirname. This directory name must be in your current directory. Or you can give the full path name for dirname, i.e. cd /net/rock/shearer will get me to my home directory no matter where I am on the system. Alternatively, one can go to ones home directory by entering cd You can also go directly to subdirectories by entering: cd ~/dir1/dir2 This will get you to directory dir2, located in dir1 in your home directory. Naturally this does the same thing as cd cd dir1 cd dir2 You can go back up one level by entering cd .. You can go back up and then down again into a dierent directory by entering: cd ../dir2 To create a new directory, use the make directory command: mkdir dir1 It is often convenient to use a dierent naming convention for your directories in order to distinguish them from your les. Some people put .dir after their directories. For awhile I put d. in front of the directory names, which has the advantage of grouping them together when ls is entered, since UNIX lists things in alphabetical
14
order. Most recently, I have been using all capital letters for the directory names. This makes them more visible, but has the disadvantage of slowing down typing their names (yes, UNIX is case sensitive!). If you dont want to use special names for directories or if you nd yourself in somebody elses directory where they dont do this, you can use the ls -F command: ls -F This will add a / sux to directory names and a * prex to excutable le names. I like this so much that I have set this up as my default ls command by putting an alias into my .cshrc le (more about this later).
2.3
Files and editing
Files can be simple ascii (text) les or they can be binary les, often the executable versions of computer programs. One standard UNIX editor is called vi and is still used by some of the old time programmers at IGPP who will insist that it is remains the best editor. If you know all of its tricks it can be an extremely powerful editor. You can do things, like cut and paste columns of numbers, that most editors cant do. It also has the advantage of running on any terminal so if you log in from home you can still edit les. If you know vi or decide that you want to learn itmore power to you. You will not lose any points around here. However, vi is not mouse or window friendly and is not favored by most students today. I have forgotten most of the vi that I once knew (I used it to do my thesis back in the late Cretaceous). Editors on the Mac include: TextEdit You can nd this in the applications folder. Be sure to set the format (under Preferences) to Plain text and not Rich text, which will put in control characters that will screw things up. nedit This is supposed to run on all platforms and has more features than textedit. emacs this is a very powerful editor that is used by computing professionals. It can do just about anything but may be somewhat harder to learn at rst than simpler editors.
2.4. BASIC COMMANDS, CONTINUED Xcode special editor designed for editing programming code
15
You can use any of these editors to create a new le or edit an existing le. Unix le names are case sensitive. By convention, the type of le is often indicated by .type, for example: testprog.f for Fortran77 program source code testprog.f90 for Fortran90 source code testprog.c for C program source code testprog.m for a MatLab script testprog.o for an object le gure1.ps for a Postscript le gure1.gif for a GIF le You may wish to create your own naming convention to keep track of your les. When used with wildcards (see below) this will make it easy to list all les of a particular type. WARNING: Do not use the dash character (-) in le names; this may cause all kinds of problems for you. Use . or to separate the words. NEVER USE BLANKS IN FILE OR DIRECTORY NAMES!!! (I know this is common on Macs and PCs but you will eventually have big problems reading these les with your programs)
2.4
Basic commands, continued
If you want to remove a le use the rm command: rm filename If you want to change the name of a le, use the mv (move) command: mv filename1 filename2 If lename2 already exists, this will have the possibly undesired consequent of deleting the original lename2. To guard against this, use the -i option: mv -i filename1 filename2 Now if lename2 already exists, the computer will ask you rst if you want to overwrite this le. mv can also be used to move les between directories:
16 mv filename dirname
will move lename into directory dirname (assuming dirname already exists!) where it will have the same name. Note that this does the same thing as
mv filename dirname/filename For convenience, you can leave o the /lename if the le is to keep the same name. A note of caution: In this shortened version, if dirname does not exist as a directory, then the name of lename will be changed to dirname. The copy command works in a similar way:
cp filename1 filename2 makes a copy of lename1 called lename2.
cp -i filename1 filename2 will rst ask if you really want to do this if lename2 already exists. You can copy les to dierent directories in the same way as the mv command works. Many people so prefer the -i option for mv and cp that they make it the default option by dening an alias in their .cshrc le (see below) so that mv and cp become mv -i and cp -i. I recommend that you do thisit is likely to save you some grief in the future. To remove a directory, use the rmdir command:
rmdir dirname The directory must rst be empty for this to work. To recursively remove a directory and its contents use:
rm -r dirname Use this with extreme caution to avoid accidentally deleting more than you intended!
2.4. BASIC COMMANDS, CONTINUED
17
2.4.1
Wildcards
UNIX commands become much more powerful when they are used with the wildcard character * which can take on any ascii string. For example, suppose you wanted to list all les ending with .f the sux used to identify Fortran source code. Simply enter: ls *.f You could move all of these programs in a subdirectory called source.dir by entering: mv *.f source.dir You might have a bunch of plot les called mypost1, mypost2, etc. You could delete all of these at once by entering: rm mypost* For obvious reasons, be very careful when using * with the rm command. For example, suppose you wanted to delete all les in your current directory that end in %, which some text editors use to store the original version of edited les. To do this, enter: rm *% Suppose, however, that you are careless when you type this and enter instead: rm * % This will delete everything in your current directory! So always look carefully at what you have typed before hitting the return key when you are deleting les using wildcards.
2.4.2
The .login and .cshrc les
In your home directory, you can have a les called .login and .cshrc that are executed whenever you login. These les are used to dene and customize your environment. You may have default .login and .cshrc les set up by the NetOps people when you rst logon, but you can modify them to do what you want. For example, you can put aliases in your .cshrc le for the mv and cp commands so that you dont accidentally overwrite les:
18 alias cp cp -i alias mv mv -i
One of the most important things in the .cshrc le is the set path command. This lists all the directories that the computer should look in when you try to run a program. For example, if you type matlab the computer needs to know where to look to nd the matlab program. If you get a Command not found message, then you dont have the right directories listed in your .cshrc le under set path Here are example set path commands: set set set set set set set set path=($path path=($path path=($path path=($path path=($path path=($path path=($path path=($path /usr/X11R6/bin ) /sw/bin ) /sw/sbin ) /sw/igpp/bin ) /Applications ) /Applications/Utilities ) /Applications/MATLAB6p5p1/bin) /Developer/Applications )
In the past, I recommended that students include the current directory in their path, i.e., set path=($path .) This has the advantage that you can run programs in the current directory simply by typing their name, i.e., if you have a program called compspec, you can run it by typing compspec on the command line. However, NetOps has now convinced me that this is a security risk because bad people could put malevolent programs into your directories, with the same name as common UNIX commands (e.g., ls), which would be executed when you innocently typed them. This means that to execute compspec within the current directory, you must type ./compspec on the command line (which really is not that much more work). You can always look at other peoples .cshrc les if you have trouble with yours. If you make any changes to your .cshrc le (or your .login le), they wont take place until your next login. Alternatively you can enter: source .cshrc to make the changes immediately. But you will have to do this separately for each window that you have open.
2.5. SCRIPTS
19
2.5
Scripts
One of the most powerful ways to use UNIX is to write scripts to run your programs. This is an easy way to keep track of the input and output parameters and to make changes without having to enter everything in again. For example, suppose you have a program called mapfocal that asks you a bunch of questions like this: shearer@katmai 16> ./mapfocal reading cmt file... nev = 24584 Enter output file name globalcmtmap.ps Enter fill level, line thickness for conts 0.93 1 Enter line thickness for plate boundaries 5 Enter line thickness for beachballs 1 Enter minimum magnitude 0 Enter circle radius for beach balls (neg. for M0 scale) 0.08 (1) complete plot or (2) outline for test 1 Creating psplot postscript file : globalcmtmap.ps nplot = 415 shearer@katmai 17> After running this program, you decide that you would like to change the continent line thickness to 2. This is tedious if you have to type in everything again. Instead you can write a script to run the program called do.mapfocal that looks like this: ./mapfocal << MAPFOCAL globalcmtmap.ps 0.93 1 !fill level, line thickness for continents 5 !line thickness for plate boundaries 1 !line thickness for beach balls 0 !mininum magnitude 0.08 !circle radius for beach balls (neg for mom scale, 0.08 looks nice) 1 !1=complete plot or 2=outline for test MAPFOCAL This is just an ascii le that you write with your favorite text editor. Often people will use ! instead of MAPFOCAL in scripts like this; they seem to work
20
in the same way. (Can anyone tell me if one convention has any advantages over the other?) The comments following the numbered input (e.g., !min,max quake magnitude) are convenient if the program that you are running is robust enough to ignore additional characters on line that follow the numbers that are actually input into the program. To run the script, simply enter: ./do.mapfocal You may get a message which says: do.mapfocal: Permission denied This means that you dont have execute permission on this le. To see what the permissions are, use the ls -l command: shearer@katmai 18> ls -l do.mapfocal -rw-r--r-1 shearer 669 421 Sep 30 09:20 do.mapfocal*
The -rw-rr shows what the permissions are for the le. Columns 2-4 (rw-) are your permissions as owner of the le. Columns 5-7 and 8-10 give the permissions for your group and all others, respectively. The r means read permission, the w means write permission and x means execute permission. In this case our problem is that we have rw- instead of rwx To x this, use the chmod command: chmod 755 do.mapfocal This will give you rwx permission of le do.mapquake and will give everyone else rx permission but not w permission. This is probably the most common set of permissions you will want to set, assuming you are OK with others looking at your code, as long as they are not allowed to make changes. If you dont want anyone else to be able to see your le, set chmod 700 do.mapfocal For more details see the chmod manual.
2.6. FILE TRANSFER AND COMPRESSION
21
The do.mapfocal script will run the mapfocal program and enter all of the required inputs. Notice that in this case I have added helpful comments to the numerical input lines. You can do this with most programs and it will not aect the input (at least for FORTRAN, Im not so sure about C). You usually cannot, however, add comments to the character inputs (e.g., globalcmtmap.ps in this example) because it will consider them part of the name. Now its easy to keep track of the inputs and to make changes without having to re-enter everything. You can also put scripts together to run the program many times:
./mapfocal << ! globalcmtmap1.ps 0.93 1 5 1 0 0.08 1 ! ./mapfocal << ! globalcmtmap2.ps 0.93 2 5 1 0 1.1 1 !
!fill level, line thickness for continents !line thickness for plate boundaries !line thickness for beach balls !mininum magnitude !circle radius for beach balls (neg for mom scale, 0.08 looks nice) !1=complete plot or 2=outline for test
!fill level, line thickness for continents !line thickness for plate boundaries !line thickness for beach balls !mininum magnitude !circle radius for beach balls (neg for mom scale, 0.08 looks nice) !1=complete plot or 2=outline for test
In this case, you can make two dierent plots from the same script.
2.6
File transfer and compression
You often will want to get les from another computer. There are various ways to do this. Today often les can simply be accessed via the web through a web browser, or by directly mounting the disk of the other computer. An old fashioned method of le transfer is the ftp (le transfer protocol) command, which is still used quite a bit.
22
2.6.1
FTP command
One classic method of UNIX le transfer is the ftp command. For many public sites and data centers this is done with anonymous ftp. You simply enter: ftp othercomputer where othercomputer is the name (or IP address) of the other computer. When you are asked to login, just enter anonymous and then your e-mail address as a password. Of course, if you have login permission on the other computer then you can ftp to the machine even if it is not set up for anonymous ftp. Once within ftp, you will get a ftp prompt. You then can use the cd command to get to the directory that you want and ls to see the le names on the remote computer. Finally use get to bring the desired le to your own computer and then quit to exit ftp. If you want to get all of the les in the directory, use the command mget * and you will be prompted for each le name. If you dont want to be prompted, you can turn o the interactive mode by entering ftp -i othercomputer when you rst invoke ftp. If you are getting binary les (rather than simple text les), you should enter type binary at the FTP prompt before getting the les. Because type binary will work with all types of les, including ascii, it is a good practice to always enter type binary as soon as you enter ftp. The default for the Suns is type ascii which does not work with binary les. If the remote computer objects to regular ftp for security reasons, you might try the sftp which is the secure ftp command. If you have just a few les to transfer, you may want to use the secure copy command: scp filename *.ps shearer@rock.ucsd.edu:./TRANSFER will copy lename and all les ending with .ps from the current directory to the TRANSFER directory in shearers account on rock.ucsd.edu (after rst prompting
23
for the password). If you want to copy a directory, you need to use the -r (recursive) option, i.e., scp -r dirname shearer@rock.ucsd.edu:./TRANSFER Note that one can also copy les from the remote computer to the current directory, i.e. scp -r shearer@rock.ucsd.edu:/net/moray2/scratch/dirname dirname\_local In this case the remote directory was on the /net/moray2 disk; notice how we specied the complete path name.
2.6.2
File compression
Often the les that you retrieve will be compressed. Files are compressed using the UNIX compress command: compress filename This will change the name to lename.Z which tells you that it is compressed. This is useful to save disk space when you will not be using the le for awhile. To get back to the original le, use uncompress: uncompress filename You can compress a whole bunch of les by using wildcards: compress *.ps will compress all of your Postscript les, assuming you use the .ps sux convention for these les. These can be uncompressed with uncompress *.ps as you would expect. An alternative compression method (not standard UNIX but usually available) is invoked with the gzip command: gzip filename This changes the name to lename.gz with the reverse operation:
24 gunzip filename
The gunzip command will also decompress .Z les (but the uncompress command will not decompress .gz les). Note: On Macs, most compressed le formats can be uncompressed by simply double-clicking on them. You may nd it useful to use compression yourself by compressing les that you do not use very often in order to save space. I often do this when I want to get some disk space but am too nervous to delete the les and too lazy to write them to a backup tape.
2.6.3
Using the tar command
Often you may want to save or retrieve an entire directory of les. This is most easily done using the tar command. If you are within the directory containing the les that you wish to save, then enter: tar -cvf ../archive.tar . The arguments are as follows: -c -v -f ../archive.tar . create tar archive verify by printing file names to screen output file name will follow name for tar file (../ to put in next level up) tar every file in current directory
Alternatively you can save the entire directory and its contents from the level above the directory: tar -cvf programs.tar programs.dir The tar le can then be FTPed to another machine. For even more eciency you may wish to compress the tar le rst. The les can be retrieved as follows: tar -xvf archive.tar This will put all of the les in the archive into the current directory. The tar command was originally written largely to write and retrieve backups, etc., onto tape, in which case the le name (e.g, ../archive, programs.tar, etc.) in
25
the above examples is replaced with the name for the tape device. Today hardly anyone uses tape drives because large capacity hard drives have gotten so cheap. The Exabyte and DAT tape drives we used to have in the barnyard have been retired. If your advisor hands you an old data tape, you might check with NetOps to see if they can read it, as they still have a few old tape readers.
2.6.4
Remote logins and job control
Often you will want to run on a dierent machine than the one that you are sitting at. The other machine may be faster, have more memory, be connected to a tape drive, etc. To do this enter: ssh machinename and you can login and run remotely on this machine. Of course this will slow the machine down for anyone else using it so use some courtesy in doing this. One way to do this is to start your job with the nice command: nice do.bigjob where do.bigjob is the script that runs your program. The nice command lowers the priority of your job so that it will not interfere with others using the machine. You still may be unpopular, however, it you use a lot of memory on the machine. Niceness levels range from 1 (highest) to 19 (lowest). The default for the nice command is niceness 4 or 10 (depending upon which UNIX shell you are running). To set the niceness to a specic value, you can specify a number, e.g., nice +15 do.bigjob will run do.bigjob at niceness 15. As a word of caution, most people dont like to have other people run jobs on their personal machines (the ones in their oces) without permission, even if the jobs are set to run at large niceness values. To nd out what jobs are running on your machine, you can use the top command to list the most active jobs: top
26
CHAPTER 2. UNIX INTRODUCTION This will take over your window and update the results continuously until you
enter q for quit. The Process ID (PID) is listed, together will the username, the niceness, the faction of the CPU being used and other useful information. top is interactive and you can input various commands (? for help, u to see only one user, etc.). You can stop (kill) jobs from within the top program, or at the command line using the kill command (see below). If you did not originally use nice when you started a job you can renice the job from within top by entering r Alternatively, if you know the job number you can change the niceness at the command level, e.g., rock> renice +15 1132 where 1132 is the PID number (get from top program). You can renice more than once, but only to raise the niceness level you can never lower the niceless level once it is set.
2.7
Miscellaneous commands
Unix keeps track of your previous commands. To see them, enter history for history and it will list your last 30 commands. To repeat a past command, enter ! followed by line number in the h list or simply the rst few letters of the command. To repeat the very last command, enter !! Want to see what the beginning or end of a le looks like? Use the head or tail command: head filename tail filename ---lists the first 10 lines of the file ---lists the last 10 lines of the file
To see a le one page at a time, use the more command: more filename and then hit the space bar to advance one page at a time. Want to see how many lines and words are in a text le? Use the word count (wc) command:
2.7. MISCELLANEOUS COMMANDS wc filename
27
This will list the number of lines, words, and bytes in the le. Often I just want to know the number of lines, in which case it will run faster to use the -l option for line: wc -l filename Lose track of where a le is? You can use the nd command: find . -name filename -print This will look in the current directory (thats what the . is for) and below for the le named lename and then print where it is. What if you only know some of the characters of the le? You can use a wildcard: find . -name map* -print This will nd all les that begin with map and print them on your screen. Note that you MUST enclose map* in apostrophes for this to work. Of course, most computers now have built in search programs (e.g., Mac Spotlight) to locate les, which may be more useful in many cases. Unix has many powerful utility programs. One of these is the sort command to sort les in alphabetical or numerical order. Example: sort +4 -n -b -r file1 -o file2 This sorts le1 and outputs the results to le2. The following options are used in this example: +4 -n -b -r skip first 4 fields (leave out to use beginning of line) numerical order (default is alphabetical order) ignore leading blanks reverse order (leave out for standard order)
NOTE: The +4 option no longer seems to work. Instead you set the eld number (normally the column number) using the -k option, i.e. sort -k 5 -n -b -r file1 -o file2
28
should do the same thing as +4, i.e., skip 4 columns and sort on the 5th column. To nd out how much disk space is available use df (disk free): df This will not list all the disks on the system unless they have been mounted. Just go to the disk that you are interested in and retry df if it does not appear the rst time. To see how much space you are using, the best command is du -ks * This will list the disk usage of each of your subdirectories. It is a good idea to go through your directories once a month or so to delete unnecessary les and/or compress large les that you dont use very often. To check for misspelled words in a le, use: spell filename This will list all words not found in a dictionary (which one?). Use the -b option for the British spelling if you are submitting a paper to Nature. (NOTE: spell does not seem to work on the Macs!) To reformat a text le to a uniform maximum line length of 72 characters, use: fmt file1 > file2 This assumes that blank lines separate paragraphs. Line feeds within a paragraph are removed and added as necessary to make the lines of approximately equal length. This command is a useful feature if you use regular unix mail and create your outgoing messages with a text editor. It is helpful when you want to change a line in the middle of a paragraph because you dont have to redo all the following lines in order to make things look nice. To preview a postscript le on the screen use: gv filename.ps which is an abbreviation for the old ghostview program.
2.8. ADVANCED UNIX
29
2.7.1
Common sense
This should go without saying, but you never know. Do not download pornography, hate literature, etc., on UCSD computers. You may get into BIG trouble (worse, you could get your advisor into trouble!). You should not consider e-mail to be completely private. Even deleted e-mail is usually still on the computer system somewhere, often on daily and weekly backup tapes. Be very careful when you reply to messages sent to a group of people that you do not send your message to everyone on the list (unless that is your intention). Attempts at sarcasm and irony usually misre in e-mails. Its best to err on the side of professionalism in your communications with your colleagues.
2.8
Advanced UNIX
This writeup so far contains pretty much all Ive ever done with UNIX. You can go pretty far with only 20 commands or so. However there are much more powerful things you can do. If you really want to be a UNIX guru, try reading up on pipes, grep, awk, sed, etc. Then you will be able to write custom scripts to do all kinds of neat things. Here are some examples: grep dziewonski filename This will print every line from le lename that contains the string Dziewonski. grep -v dziewonski filename This will print every line from le lename that does NOT contain the string Dziewonski. grep dziewonski filename > dz.lines This works as above except the lines containing dziewonski are written to the le dz.lines rather than printed to the screen. grep dziewonski * This lists lines containing dziewonski from ALL les within your current directory. However, you may really want just the le names of the les that contain the string dziewonski, in which case the following command will work better:
30 grep -l dziewonski *
This lists the le names of the les that contain the string dziewonski grep ezxplot find . -name Makefile -print This is one that recently saved me when I could not nd a program that I knew I had written, but I did not know its name or which directory it was in. I knew, however, that the program was compiled with a Makele that would contain ezxplot because this is necessary to make the program work. The grep command searches for ezxplot in a list of le names returned by the nd command. Note that the nd command must be enclosed with backward apostrophes, not regular apostrophes. ps -eaf This lists all processes that are running on your machine ps -eaf | grep shearer This lists all of the processes that contain the string shearer in the output lines of ps. In this case, the vertical line is a pipe that directs the output of the rst command, ps, into the input of the second command, grep. ps -eaf | grep shearer > junk As above, but writes the output into le junk rather than to the screen. kill PID This kills a job with process ID number PID (obtained from the tops or ps command). This is useful for runaway jobs. For stubborn jobs, use the -9 option: kill -9 PID You can compare two les using the di command: diff file1 file2 This lists all dierences between le1 and le2. This is useful if you have made some changes to a le but cannot remember exactly what they are.
2.9. EXAMPLE OF UNIX SCRIPT TO PROCESS DATA
31
2.8.1
Some sed and awk examples
cat file1 | sed s/Peter/Paul/ > file2 Copy file1 to file2, substituting "Paul" for the first "Peter" on each line. cat file1 | sed s/Peter/Paul/g > file2 Copy file1 to file2, substituting "Paul" for the every "Peter" on each line (note the "g" flag for global substitution). cat file1 | sed s/^/Paul says / > file2 Insert the prefix "Paul says " at the beginning of each line. Note that "^" means start of line cat file1 | awk {print $5,$3,$1} > file2 Assuming file1 has 5 columns of figures, this copies columns 5, 3, and 1 to file2, in that order, omitting the 2nd and 4th columns. cat file1 | awk {print $2, $1*(-10)} > file2 Switch columns 1 and 2 and multiply original 1st column numbers by -10. cat file1 | awk {print substr($0,1,10) substr($0,31,10) substr($0,21,10) substr($0,11,10) substr($0,41,length($0)-40)} > file2 This copies file1 to file2, swapping the contents of columns 11-20 and columns 31-40. $0 is the line, substr takes a chunk of it, the last part prints the remainder of the line. cat file1 | awk {print "Columns 11 to 20 are " substr($0,11,20)} > file2 Start the beginning of each line with "Columns 11 to 20 are " and then list the contents of those columns in the input file. Variations on the last few examples can be used to reformat ascii data les, including those that do not have spaces between elds.
2.9
Example of UNIX script to process data
By combining UNIX commands into a script, it is possible to create very powerful tools for processing data. Consider a simple example where we have a number of data les contained in a data directory: rock> cd data.dir rock> ls data1 data2 data3 We have written a program to process the data in these les and write new data les which we might want to call data1.proc, etc. The program is called procdata
32
and prompts the user for an input le name and an output le name, and, in this simple example, a multiplier factor to scale the data. If the program is one level up from the data directory, then: rock> ../procdata Enter input file name data1 Enter output file name data1.proc Enter multiplier factor 3 rock> To process all of the les in the directory, we could run the program for each le, manually entering the le names. However, clearly this would get very tedious if we had lots of les to process. We could modify the program to accept a list of le names, but perhaps it is a complicated program that someone else wrote that we dont want to mess with. Another approach is to write a UNIX script to run the program for all of the les in the directory. Here is one way to do this, using the command le do.proc, which looks like this: #! /bin/csh \rm procdata.log \rm data.dir/*.proc ls data.dir > filelist cd data.dir # Note the "backwards" apostrophes in next line, regular ones wont work! foreach filename (cat ../filelist) echo "processing file:" $filename ../procdata >>! ../procdata.log << ! $filename $filename.proc 2 ! end cd .. \rm filelist
2.9. EXAMPLE OF UNIX SCRIPT TO PROCESS DATA
33
This is designed to be located in the same directory as the program, one level up from the data directory. The screen output from the program (all of the Enter input le name, etc., lines) are directed to a le called procdata.log so the rst thing we do is remove any old version of this le, if it exists. Note that within the script, we use backslash rm instead of rm so that any aliases that might require interactive verication of the deletions are not performed. Otherwise the computer might prompt us to see if we really want to delete the les and the script would not be prepared to handle this. Next, we remove any existing processed les in the data.dir directory, using a wildcard and assuming that the le names end in .proc Next, we write a list of the data le names within data.dir into the le lelist Then we go into data.dir where we loop over the lenames contained in lelist, using the foreach command. This loop is terminated by the end command later in the script. The cat ../filelist (be sure to use backward apostrophes!) will return one line of lelist at a time and assign it to the lename variable. The backward apostrophes indicate that a UNIX command is to be executed. We use the echo command to output to the screen each le that is being processed. We then run the procdata program (one level up so we need the ../) and direct the normal screen output to the logle. We use >> ! rather than >> in case set noclobber is contained in our .cshrc le (set noclobber prevents overwriting an existing le with > or writing to a nonexistent le with >>, the latter case being our situation. Note that >! also overrides the noclobber setting). Within the foreach loop we refer to the contents of the lename variable as $lename. The procdata program is terminated within the script with the ! symbol. Following the end statement that completes the loop over the les, we go back to the directory containing the program and delete lelist, as it is no longer needed. The power of this script is that it can be run on a directory containing thousands of les, just as easily as for a smaller number of les. In this example, we really did not need to generate the le lelist because we eventually deleted it. Thus, we could have written the script as: #! /bin/csh
34 \rm procdata.log cd data.dir \rm *.proc
Note the backwards apostrophes in the next line, regular ones wont work! foreach filename (ls) echo "processing file:" $filename ../procdata >>! ../procdata.log << ! $filename $filename.proc 2 ! end cd .. Alternatively, the ls could be written as *, i.e., foreach filename (*) will also work. In this case the wildcard * will assume the name of all of the les within the current directory. We wont have time in this class to go into the details of all of the dierent things one can do in scripts like this. There are lots of books on UNIX that one can consult for this purpose (but who has time to read them?), but most of us just pick up stu as we need it. The main point that I want to get across is that if you are spending lots of time running programs manually, then you are wasting your time. Spend some of that time learning how to write a UNIX script and you will be far better o in the long run. Your work will be better documented and it will be much easier for you (and others) to reproduce your work.
2.10
Common UNIX command summary

print filename on your screen change directory to dirname go back up one level go to home directory go to directory dirname in home directory go to home directory of otheruser
cat filename cd cd cd cd cd dirname .. ~/dirname ~otheruser
2.10. COMMON UNIX COMMAND SUMMARY cp file1 file2 cp -i f1 f2 df du -ks * copy file1 to file2 copy f1 to f2 but ask before overwriting f2 list disk space on the different disks list disk usage for files/dirs in current directory send filename to printer silo files in directory all files including those starting with . directories by adding slash to their name all files with ending with ".f"
35
lpr -P silo filename ls ls ls ls ls -l -a -F *.f list list list flag list
mkdir dirname mv file1 file2 mv -i f1 f2 mv *.f src pwd rm filename rmdir dirname wc filename wc -l filename
make directory dirname change name of file1 to file2 change name of f1 to f2 but ask before overwriting existing f2 move all files ending in .f into existing directory src print working directory remove filename remove directory dirname (directory must be empty) count words in file count lines in file
36
Chapter 3
Generic Mapping Tools

Generic Mapping Tools, frequently referred to as GMT, is a set of UNIX tools for making map and plots written by Paul Wessel and Walter Smith. This software is in the public domain and is still maintained by Wessel and Smith. The latest update (Version 4.5.7) was released on July 15, 2011. If you havent yet installed GMT, Im not going to lie, it is a pain. The best way is to use a utility called Fink, which itself must be installed and is also a pain. The good news is that once you have installed Fink, there are a LOT of other programs available to you for free and keeping everything up to date is pretty easy. In the following are some instructions to help you through the installation process.
3.1
Installation of Fink, GMT other useful tools
Download the Fink source code for your operating system via this link: http://www.nkproject.org/download/srcdist.php Double click on the tar le that you just downloaded if it hasnt been unpacked automatically. Open a terminal window and change directories into this folder (in the test case I just did, /Downloads/nk-0.31.3). %cd ~/Downloads/fink-0.31.3 Start the installation operation with the supplied bootstrap script. Note that with all these scripts, just select the default options by hitting the return key. % ./bootstrap Now do the following: 37
38
CHAPTER 3. GENERIC MAPPING TOOLS
% /sw/bin/pathsetup.sh # this sets up your path (if it wasnt already) % fink selfupdate-rsync # this updates your fink distribution % fink index -f # this makes an index % fink scanpackages # this figures out what packages are available. If you want a complete list of packages, try nk list. We are after GMT now, so install it and some other related packages using Fink: % fink install gmt % fink install gmt-coast % fink install gmt-doc The high resolution GMT coastlines may not be the most up to date in the most recent version. If desired, go to ftp://ftp.soest.hawaii.edu/gmt and download the following high resolution coastline archives: GSHHS2.0.2_coast.tar.bz2 GSHHS2.0.2_full.tar.bz2 GSHHS2.0.2_high.tar.bz2 In the Finder go to the download folder and double click on each of the three archives. They will be unpacked as folders. Now in Terminal run the following commands (please note that the version number of your downloaded packages may be newer than in the commands below and may need to be modied): % cd ~/Downloads % sudo cp GSHHS2.0.2_coast/share/coast/*.cdf /sw/share/gmt/coast % sudo cp GSHHS2.0.2_high/share/coast/*.cdf /sw/share/gmt/coast % sudo cp GSHHS2.0.2_full/share/coast/*.cdf /sw/share/gmt/coast You should also install ps2eps (PostScript to Encapsulated PostScript), gv (Ghostview) and ImageMagick (which includes the conver program to convert .eps to image les like .png): % fink install ps2eps % fink install gv % fink install imagemagick
3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS
39
Now that you have installed GMT, lets take it out for a spin. There is a GMT website at http://gmt.soest.hawaii.edu/ that contains a GMT tutorial, and a number of examples. The best strategy for guring things out is to look through the examples and nd something close to what you want. Use that as a starting point. Some other sites that may be helpful are: http://geophysics.eas.gatech.edu/classes/Intro GMT/ (tutorial) http://www.ruf.rice.edu/ben/gmt.html (useful links to things to plot). Once you have learned a little bit about how to use GMT, its helpful to use the help documents as your primary reference. You can access these at the UNIX command level by simply entering: man pscoast or even just pscoast. But be warned, GMT is not very user friendly. If you are unfamiliar with some of the features of the UNIX environment like piping output, etc., you may nd it fairly intimidating at rst. However, it is extremely powerful in the number of things that it can do and the time spent learning how it works will not be wasted as you will become familiar with many useful UNIX tools. GMT is capable of producing very nice maps and plots and is pretty much the industry standard for Earth Science map making. We begin with an example using the tool pscoast. You will ALWAYS want to run GMT as a UNIX script (see Chapter 2). Here is one called do.gmt1:
#!/bin/csh pscoast -R0/360/-90/90 ps2eps -f -g -l map.ps gv map.eps The rst line, #!/bin/csh, invokes the C-shell environment. The script will run without this, but it is probably safest to always go into the C-shell because all of the example GMT scripts do this. Presumably in some cases it may make a dierence. Next we encounter pscoast which is the GMT program that draws coastlines. This program has a lot of options that, in UNIX style, are invoked with a dash and a letter on the same line (e.g. -P, -Dc, etc.). The output of the program is directed to the Postscript le map.ps. Note that we use >! instead of > to avoid getting an error message if map.ps already exists; but beware, you will overwrite any existing map.ps le. -JQ180/6i -B60g30 -P -Dc -G200 >! map.ps
40
CHAPTER 3. GENERIC MAPPING TOOLS To get a full list of options, simply type the command name alone. But for now,
let us examine the arguments in our example: -R0/360/-90/90 This sets the map limits of longitude (lon1=0, lon2=360), and latitude (lat1=90, lat2=90). Note that we could have written this with a space between the R and the zero (0) immediately following. Most people leave this space out to more easily separate the dierent commands. -JQ180/6i
This species the map projection to be cylindrical equidistant (a simple linear scaling of lat/lon). The center meridian is set to 180 degrees; the plot width is set to 6 inches (the i means inches). A very large number of dierent map projections are available! -B60g30
This sets the labeled lat/lon lines to 60 degree intervals and the unlabeled lines to 30 degree intervals -P
Species portrait mode so that the plot is not rotated by 90 degrees as in landscape mode (default). -Dc
Sets the resolution of the coastline to c for crude. This is all that is required for a small map of the whole globe. For larger maps or closeups, higher resolution will be required. The available options are: (f)ull, (h)igh, (i)ntermediate, (l)ow and (c)rude. Note that the full resolution les require over 55Mb of data and provide great detail. It takes more time to generate the plot, too, so generally should only be used for extreme close ups. The default is (l)ow resolution.
3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS -G200
41
Species the grayshade level for the continent shading from 0 (black) to 255 (white). These nonintuitive units are a Postscript convention! Finally, there are two more commands that let us view the plot without any fancy store bought applications. These are ps2eps and gv. The former converts the postscript le format to and encapsulated postscript (EPS) format and the second calls ghostview, a handy quick look program for EPS les. After running the script do.gmt1, you should get a popup window that looks like this:
If you open the le in PageView, you will notice that there is a lot of white space around the plot and it is not centered on the page. The rst problem can be xed by setting the page size to Letter using the command gmtset PAPER MEDIA Letter. But, if you generate a plot that is bigger than that, GMT will truncate it. If you want to be always safe from that problem, set the media type to ArchE, which is poster size. The second problem is because the default position for the lower left corner is at x=1inch, y=1inch. We can change this by specifying the x and y positions directly: -X1.2i -Y4i This sets the lower left corner to 1.2 inches from the left edge and 4 inches from the bottom.
42
CHAPTER 3. GENERIC MAPPING TOOLS You may also decide that we dont want grid lines drawn on the plot so we
remove the g30 from the -B command: -B60 This puts a label on the latitudes and longitudes every 60 , but doesnt draw the grid You may prefer a smaller font. We can do this with a dierent program which we run before calling pscoast: gmtset ANOT_FONT_SIZE 10 Finally, you may decide we want the continents to be a hideous green color and can choose the -Gred/green/blue option where red, green and blue are numbers between 0 and 255. The resulting script (do.gmt2) is: #!/bin/csh gmtset PAPER_MEDIA Letter gmtset ANOT_FONT_SIZE 10 pscoast -R0/360/-90/90 -Dc ps2eps -f -g -l map.ps gv map.eps Note that we used a backslash to indicate that the line continues to the next line. In this way we can avoid making our lines too long. This script produces something like this (is the green hideous enough for you?):
0 60 120 180 240 300 0
-JQ180/6i
-B60
-P -X1.2i -Y4i
-G0/255/0 >! map.ps
60
60
60
60
60
120
180
240
300
43
Next, suppose you have a le containing the coordinates of some seismic stations which we wish to plot on this map. The le is called station.list and its rst ve lines are: 9.02920 42.63900 37.93040 38.76560 2442 AAE 74.49400 1645 AAK 58.11890 678 ABKT 116 ADK 706 AFI
51.88370 -176.68440 -13.90930 -171.77730
Here is a script (do.gmt3) that will plot these points on our map: #!/bin/csh gmtset ANOT_FONT_SIZE 10 pscoast -R0/360/-90/90 psxy -O -R -JQ180/6i -JQ180/6i -St0.06i -B60 -P -Dc -G200 \ >> map.ps -X1.2i -Y4i -K >! map.ps -G0 -: station.list ps2eps -f -g -l map.ps gv map.eps The rst part of the script is the same as before, except that the -K is necessary to tell GMT to keep the Postscript le open (i.e., dont put a showpage at the end of the le) so that more can be added to the le. Next, we use psxy to read and plot the x-y points from le station.list. We use >> to append the output onto the end of the map.ps le. Note that > or >! would not work here because it would overwrite the le instead of adding to it. The arguments of psxy are as follows: -O Indicates that this is a overlay onto an existing Postscript le to avoid the initializations at the beginning of the le. -R
Sets the plot boundaries (defaults to those set by pscoast)
44 -JQ180/6i
Sets the map projection (according to the manual, this should work without the 180/6i but I did not nd this to be true). -St0.06i
Plots xy points as (t)riangles of 0.06 inch width. Other options are (c)ircle, (d)iamond, (s)quare, (i)nverted triangle, (x)cross, and (v)ector. In the case of the vector, the direction and length are also read from the le (see manual). -G0
Set ll parameter to 0 (black). This will ll in the triangles. -:
This tells the program that the data are to be read as y-x pairs. The default is x-y, or (long,lat) in our case. For seismology this option is very useful because coordinates are usually given as lat, lon rather than the other way around. Note that the program automatically converts the longitude convention of the data points (-180 to 180) to the longitude convention of the map (0 to 360). This is a nice feature of GMT.
0 60 120 180 240 300 0
60
60
60
60
60
120
180
240
300
Now lets try a dierent map projection, plot the points in red, and add a title in a script called do.gmt4:
3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS #!/bin/csh gmtset ANOT_FONT_SIZE 10 pscoast -R-180/180/-90/90 -P psxy -O -R -Dc -JH -St0.06i -JH0/6i -Bg0:."IRIS FARM stations": \ >> map.ps -G150 -X1.2i -Y4i -K >! map.ps -G255/0/0 -: station.list
45
The changed commands are as follows: -JH0/6i Invokes the equal-area Hammer projection with 6 inch width -Bg0:."IRIS FARM stations": The 0 results in no grid lines or labels The title is set o with :. and : (weird!) -G255/0/0 Dene the ll for the xy plot as red=255, green=0, blue=0 To my taste, the title is way too big. This can be changed using the gmtset HEADER FONT SIZE command (see below). Next, lets look at a closeup of these stations in southern California with the script do.gmt5: #!/bin/csh gmtset HEADER_FONT_SIZE 20 gmtset ANOT_FONT_SIZE 10 pscoast -R-121/-114/32/37 -P psxy -O -R -JM6i -B1g1:."IRIS FARM stations": \ -Di -I1 -I2 -I3 -N1 -N2 -G255/200/200 -X1.2i -Y4i -K >! map.ps -JM -St0.06i -G0 -: station.list >> map.ps
Changes are: gmtset HEADER_FONT_SIZE 20 Set font size for title to 20 -R-121/-114/32/37 Set lon1,lon2,lat1,lat2 to S. California
46 -JM6i
Use Mercator projection, width = 6 inches
-B1g1:."IRIS FARM stations":
Label and draw grid lines every 1 degree Use same title
-I1 Plot permanent major rives
-I2 Plot additional major rivers
-I3 Plot additional rivers
-N1 Plot national boundaries
-N2 Plot state boundaries within the Americas
-G255/200/200 Fill land areas with red=255, green=200, blue=200 - a ghastly shade of pink:
47
IRIS FARM stations

239 37 240 241 242 243 244 245 246 37
36
36
35
35
34
34
33
33
32 239 240 241 242 243 244 245 246
32
Next, lets add lines that show the traces of mapped faults in southern California. For this, we use a le called calif.ts that has the following format: 370.0000 99.0000 -115.5496 32.9312 -115.5419 32.9142 -115.5358 32.9029 -115.5276 32.8890 370.0000 99.0000 -115.9218 32.9916 -115.9096 32.9849 -115.8936 32.9745 -115.8729 32.9655 -115.8398 32.9498 -115.8216 32.9410 -115.7983 32.9295
48 -115.7796 32.9228 370.0000 99.0000 -115.8391 33.0127 -115.8205 33.0069 -115.8017 33.0015 -115.7892 32.9973 etc.
The faults are dened as (lon,lat) pairs. A value of (370,99) is used to separate the dierent faults because 370 is o the map (only goes to 360!). To plot these faults on our map of the southern California stations, we can use the psxy command a second time in the script do.gmt6: #!/bin/csh gmtset HEADER_FONT_SIZE 20 gmtset ANOT_FONT_SIZE 10 pscoast -R-121/-114/32/37 -P psxy -O psxy -O -R -R -JM -JM -JM6i -B1:."IRIS FARM stations": \ -Di -I1 -I2 -I3 -N1 -N2 -G255/200/200 -X1.2i -Y4i -K >! map.ps -M370 -W8/255/0/0 calif.flts -K >> map.ps -St0.15i -G0/0/255 -: station.list >> map.ps
Changes are: -B1:."IRIS FARM stations": We removed the g0 so we dont plot grid lines which might get confused with the faults We plot the faults using: psxy -O -R -JM -M370 -W8/255/0/0 calif.flts -K >> map.ps
In this command, we use the -M option to ag the segment boundaries -M370 lines starting with 370 mark segment boundaries. Without this command the plot would look like a mess because lines would be drawn to the (370,99) points. -W8/255/0/0
49
This draws the line with linewidth=8 (thicker than normal) and color red=255, green=0, blue=0 Note that we do not need the -: option because the points are already given as (lon,lat). We plot the stations last so that they will go on top of the faults. We change the symbol size and the color: -St0.15i plot triangles 0.15 inches high -G0/0/255 plot with red=0, green=0, blue=255 Note that -K is needed on every command except the last; -O is needed on every command except the rst. (***THIS IS KEY TO GETTING GMT SCRIPTS TO WORK PROPERLY!!! CHECK THIS FIRST WHEN YOU HAVE PROBLEMS.***) The script do.gmt6 results in the plot:
IRIS FARM stations

239 37 240 241 242 243 244 245 246 37
36
36
35
35
34
34
33
33
32 239 240 241 242 243 244 245 246
32
(Are there really no faults in eastern southern California?) ASSIGNMENT GMT1
50
CHAPTER 3. GENERIC MAPPING TOOLS Install Fink and GMT if you havent already. Write a script that plots your
birthplace as a big (.2in) red star. Use an orthographic projection with the star at lon0/lat0. Draw all rivers, national and state boundaries. Make a title in 20pt font with your name. Look up how to annotate the point with text (HINT: pstext and label the star with the name of the place you were born. Turn in the script and the .eps le produced by it (via e-mail to ltauxe@ucsd.edu).
Chapter 4
Fortran
Why learn Fortran? The answer for geophysics students at SIO is obviousmost IGPP faculty program in Fortran. However, let me make a broader pitch for its usefulness. For many years, Fortran was the language of choice for scientic programming. Recently, C has emerged as a competitor and seems to now be much more widely taught to students, perhaps because it is favored by computer science departments (who worry more about writing compilers than how to handle complex numbers). Why learn Fortran if you already know C? Several reasons come to mind: 1. You will communicate and exchange software more readily with most IGPP faculty, who are generally procient in Fortran but rather challenged in C. 2. There is a huge library of existing Fortran subroutines to perform various algorithms, both at IGPP and in the wider scientic community. 3. Complex numbers and double precision are included. 4. Its fun! Finally, some cranky advice: It is important to become familiar with at least one major programming language. Fortran and C are such languages; Matlab and other application programs are not. In the long run, you will handicap your ability to do science if you do not take the time to gain experience with a real programming language. In the worst case, you will be one of those annoying people that is forever asking others to write or modify programs for you. 51
52
CHAPTER 4. FORTRAN
4.1
Fortran history
The name Fortran is derived from IBS Mathematical FORmula TRANslation System. Originally it was spelled in all caps as FORTRAN, but the more modern usage is to only capitalize the rst letter. Fortran is one of the oldest computer programming languages and was begun in 1954 by IBM programmers led by John Backus (no relation to George). It is updated and improved by some committee of computer sciences every ten years or so. Important versions include: Fortran IV (released in 1972) Fortran 77 (released in 1980), major revision Fortran 90 (released in 1991), major revision Fortran 95 (released in ?), minor revision Wikipedia also lists Fortran 2003 and 2008, which I dont know anything about. I would not recommend using these at this point in time. Most versions are designed to be fully backward compatible with previous versions. However, Fortran90 departed from the column sensitive format of older Fortran, resulting in a more modern approach that necessitated some slight incompatibilities with older Fortran. I will try to teach this class entirely in Fortran90, but will describe the dierences with Fortran77 when they are important because you are likely to need to work with existing Fortran77 code at some point.
4.2
Texts and manuals
This class will be a tutorial on Fortran90 and will not be comprehensive. Thus, I recommend that you buy a textbook that will be serve as a more complete reference. I just checked on Amazon and there are 4 or 5 books out there. One that I have used and recommend is: Fortran 90/95 Explained (Metcalf and Ker Reid, Oxford Univ. Press, 1999), which has some favorable reviews. Its only $47. There is an updated version that I have not seen for $53. You may want to check around to see which book you like best. In class, we will use gfortran, which can be downloaded free for the Macs. To make sure you have it in your path, enter: which gfortran and you should get something like /usr/local/gfortran/bin/gfortran I recommend that you add the following to your .cshrc le: alias f90 gfortran
4.3. COMPILING AND RUNNING F90 PROGRAMS
53
This will allow you to just type f90 instead of gfortran when you want to compile programs. The examples that follow assume that you have done this.
4.3
Compiling and running F90 programs
Fortran90 programs are written as ascii les that end in .f90. This is called the source code. Programs in older versions of Fortran end in simply .f. Because of the slight incompatibilities between the versions, be sure to use the appropriate sux so that both you and the compiler know which version to use. It is in fact possible to set things up so that your F90 programs end in .f rather than .f90. I do not recommend this because there is so much existing code around in Fortran77 that confusion is likely to result if you do not explicitly identify your code as F90. Before the program can be run, it must be compiled using the Fortran90 compiler on your computer, creating the executable le that you use to actually run the program. For example, suppose you have a program called printmess.f90 with contents: ! simple Fortran90 test program (printmess.f90) program printmess print *, "test message" end program printmess To compile this program from my computer (rock in this case), I enter f90 printmess.f90 -o printmess and get the following response: rock% f90 printmess.f90 -o printmess rock% No news is good news! If there was a syntax or other error detected during the compilation, we would get an error message at this point. This creates an executable version of the program called printmess (you should ALWAYS use name of the program without the .f90 for the executable) which can then be run simply by entering ./printmess: rock% ./printmess test message rock% Note the because . is not in our path, we need to type ./printmess and not just printmess.
54
CHAPTER 4. FORTRAN If you are like me, you will quickly tire of typing in the line f90 printmess.f90 -o
printmess Thus, I recommend that you create a Makele (yes, the rst character MUST be capitalized) in the same directory as the program. Include in the Makele the following lines: %: %.f90 gfortran $< -o $* The tricky part of this is that the space before the gfortran MUST be entered as a tab, not as a series of spaces. Once you have set up this Makele, then you can just enter: make printmess in order to compile the program. Makeles are very useful to keep track of compiler options and to bind with subroutines. (more about this later). The dierence between source code and excutables is fundamental to languages like FORTRAN and C. It is why they generally run much faster than uncompiled languages like BASIC or Matlab or Python scripts.
4.3.1
The rst program explained
OK, now lets examine our simple F90 test program again: ! simple Fortran90 test program (printmess.f90) program printmess print *, "test message" end program printmess The rst line is a comment line. Anything following an exclamation mark (!) is a comment. The end of the line terminates the comment. There is no need (as in many languages) to terminate the comment with another ag. We can also use ! to add an inline comment following a statement, e.g., print *, "test message" !print message on screen
would be a valid line. The next line is blank. You can put in as many blank lines as you want and they will be ignored by the compiler. Blank lines provide a good way to improve the readability of longer programs by breaking them up into coherent blocks of code.
4.3. COMPILING AND RUNNING F90 PROGRAMS
55
The next line is used to name the program. The name printmess is not used by the program at all. This line is optional but is considered good programming style. Good style also suggests that lines in the body of the program are indented, in this case by three spaces: print *, "test message" print * will output to the screen. The desired output is enclosed in quotes (apostrophes would also work). Finally, all F90 programs must terminate with an end statement. The program printmess is optional, but is considered good programming practice. These style points dont matter much for a short program like this, which would work just as well if were written as: print *, "test message" end but are helpful for longer programs (hundreds to thousands of lines of code) where the label following the end would remind a reader of which program is actually ending. ASSIGNMENT F1 Write a F90 program to print your favorite pithy phrase.
4.3.2
How to multiply two integers
Here is a simple F90 program to multiply 2 and 3: program multint integer :: a, b, c !declare variables a = 2 b = 3 c = a * b print *, "Product = ", c end program multint The program uses three variables, the letters a, b and c. Variable names can be from 1 to 31 characters long. The rst character must be a letter. The remaining characters can be any combination of letters, numbers, and underscores ( ). Variables in Fortran can be of many types, including real (oating point), integer, complex, double precision, character and logical. In our case, we want them to be integers so we dene them using a type statement
56 integer :: a, b, c !declare variables
CHAPTER 4. FORTRAN
Older Fortran programs do not include the :: in these statements; this convention still works under F90 but is discouraged. For the Sun F90 compiler, these are 4byte integers that can range from -2,147,483,648 to 2,147,483,647. Alternatively, they could have been dened as real numbers: real :: a, b, c in which case they would be oating point (real) variables. For the Sun compiler, real numbers range from 1.175494e-38 to 3.402823e+38) The lines that follow are pretty self explanatory: a = 2 b = 3 c = a * b print *, "Product = ", c Note that * is used to indicate multiplication as is true in almost all programming languages. Addition is +, subtraction is -, and division is /. In Fortran, a to the power of b is written as a**b where a and b can both be real, another Fortran advantage over standard C.
4.4
An important historical digression
It is not required that variables be declared in Fortran. If they are not declared, then the compiler assigns them as integer or real, depending upon their rst letter. Variable names beginning with i through n (INteger, get it?) are assumed to be integers; all others are assumed to be real. Many, if not most, older Fortran programs adopt this convention. Often they only declare variables when the rule is broken, for example when it is desired that the variable year be an integer. A prominent example of this type of Fortran programming may be found in the rst edition (1986) of Numerical Recipes (but by the time of the second edition in 1992, the authors had been shamed into declaring all variables). Such undeclared variables are said to be declared implicitly based upon their rst letter. An implicit statement can be used to modify the default rules, for example implicit real (a-z)
4.4. AN IMPORTANT HISTORICAL DIGRESSION
57
will make all undeclared variables real. However, modications such as this will only lead to more confusing code. Modern programming practice is to explicitly declare ALL variables and to have the compiler warn us when variables are not declared. This can be done by including the statement implicit none at the beginning of every Fortran program. I will admit that, as a long time Fortran programmer, I do not always follow this practice. I must concede, however, that it is a good idea and is likely to save more time in the long run (by eliminating program bugs that are often caused by undeclared variables) than is lost while writing the program. Example 1: Suppose you should have the following line in your program: x2 = a1 + a2*sin(theta)*scale1/scale2 but you accidentally type: x2 = a1 + a2*sin(theta)*scalc1/scale2 If you dont follow the practice of declaring all of your variables, then the error will not be detected during compilation. The program will run and assign zero to the otherwise unused variable scalc1 and you will get wrong answers. On the other hand, if you use implicit none and declare your variables, the compiler will ag scalc1 as an undeclared variable and you can x it before it causes any more trouble. Example 2: Suppose you have the following lines in your code: kmdeg = 111.19 dist = (delta2 - delta1)*kmdeg but you have not declared kmdeg explicitly. Because the letter k is between i and n, the program will declare kmdeg as an integer. The value 111.19 will be truncated to 111 and you will get values for dist that are slightly, but not obviously, wrong (the worst kind of program bug to have). To save you from these embarrassments and to encourage good programming habits, we will declare all variables for the programs in this class and I will dock you points if you fail to do so in assignments.
58 ASSIGNMENT F2
CHAPTER 4. FORTRAN
Copy the program multint but leave out the integer :: a, b, c statement. What happens when you run the program? Why? ASSIGNMENT F3 Cut and paste the following defective program onto your computer: program longjump beamon_long = 8.90 !distance in meters powell_long = 8.95 dif_inch = (powell_long - beamon_1ong) * 39.37 print *,"Powell jumped ",dif_inch," inches more than Beamon" end program longjump Compile and run the program. Then explain why it gives the wrong answer.
4.4.1
Alternate coding options
There are always lots of ways to write the same program. Here is another way to write the multint.f90 code: program multint2 implicit none integer :: a=2, b=3, c !declare variables c = a * b; print *, "Product = ", c end program multint2 First, notice that variables can be assigned values when they are declared (OK in F90, dont try this in F77). Second, notice that more than one command can be included on a line if the commands are separated by a semicolon (again, only OK in F90). Both of these changes make the code more similar to C. The variable assigning option is a reasonable convenience, but the multiple command option is certainly misused in this case because it makes the code much harder to read. Unless you have a really good reason to put more than one command on a line (saving space is NOT a good reason!), I suggest that you never use semicolons in this way.
4.5
Making a formatted trig table using a do loop
Any reasonable programming language must provide a way to loop over a series of values for a variable. In C, this is most naturally implemented with the for statement. In FORTRAN this is done with the do loop. Here is an example program that generates a table of trig functions:
4.5. MAKING A FORMATTED TRIG TABLE USING A DO LOOP
59
program trigtable implicit none integer :: itheta real :: theta, stheta, ctheta, ttheta, degrad degrad = 180./3.1415927 do itheta = 0, 89, 1 theta = real(theta) ctheta = cos(theta/degrad) stheta = sin(theta/degrad) ttheta = tan(theta/degrad) print "(f5.1,1x,f6.4,1x,f6.4,1x,f7.4)", theta, ctheta, stheta, ttheta enddo end program trigtable Fortran (like C and Matlab) uses radians (not degrees) as the arguments of the trig functions. Thus, following the denitions of the variables as real, we assign degrad to 180/pi so that we can easily make this conversion. do itheta = 0, 89, 1 This begins the do loop which must eventually be closed with the enddo statement. For clarity, we indent the inside of the loop. This is a loop over values of theta from a starting value of 0, incremented by 1 each time, until theta is greater than 89. The 1 is actually optional as it is the default increment. Thus theta will assume the values (0, 1, 2, ...., 88, 89) inside the loop. Notice that we use an integer for the do loop. Older versions of Fortran (e.g., F77) permitted the use of real variables in do loops. This is not recommended and roundo peculiarities mean that the do loop would need to be written in the form: do theta = 0.0, 89.1, 1.0 !****WONT WORK IN F95!
Note that we use 89.1 rather than 89.0 as the ending value to avoid the possibility that roundo error might cause the desired ending value (computed by successively adding 1.0 to theta) to slightly exceed 89 and thus be excluded from the loop. I have a lot of old code of this form, but Im going to slowly try to get rid of it. Inside the do loop we rst convert from the integer theta to the real theta using: theta = real(theta) We then compute the cosine, sine and tangent of theta, after converting from degrees to radians by dividing by degrad (anybody see how we could make the program slightly more ecient?). To make the output look nice, we do not use
60 print *, theta, ctheta, stheta, ttheta
CHAPTER 4. FORTRAN
which would space the numbers irregularly among the columns. Instead, we explicitly specify the output format using a format specication: print "(f5.1,1x,f6.4,1x,f6.4,1x,f7.4)", theta, ctheta, stheta, ttheta where f5.1 species that theta will be output as a real number into 5 total spaces, with 1 digit to the right of the decimal place. Similarly, f6.4 species that ctheta is output into spaces with 4 digits to the right of the decimal place. The numbers will be right justied, with leading blanks used as necessary. The 1x species that one blank character will be output between each of the numbers. An alternative way to write this: print "( f5.1, f7.4, f7.4, f8.4)", & theta, ctheta, stheta, ttheta Here we have used the continuation character & to split the statement into two lines. Blanks are ignored so we can space things out neatly to line up the variables with their formats. Notice that we have removed the need for the 1x between formats by adding an additional column to the appropriate formats (i.e., writing f7.4 rather than f6.4). Because the numbers are right justied, this will add an additional space to the left of each number. Aligning things this neatly is probably more trouble than its worth, but it certainly makes the code easier to understand. Notice that in this case the f7.4 appears twice in a row. Often programmers will write this more compactly as: print "(f5.1, 2f7.4, f8.4)", theta, ctheta, stheta, ttheta since Fortran allows this syntax. Older Fortran programs usually put the format specier into a separate numbered line, i.e. print 117, theta, ctheta, stheta, ttheta format (f5.1, 2f7.4, f8.4)
117
This convention is still allowed in F90 but should not be used unless you want to use the format statement more than once, e.g.,
4.5. MAKING A FORMATTED TRIG TABLE USING A DO LOOP print 117, theta1, ctheta1, stheta1, ttheta1 print 117, theta2, ctheta2, stheta2, ttheta2 format (f5.1, 2f7.4, f8.4)
61
117
117 is termed a line label and must consist of digits. It is best not to refer to it as a line number (the old usage) because it normally has nothing to do with the line numbers and the labels need not be sequential (more about this later). Note that the format line need not immediately follow the print line(s); it can come before. Many older programs put all of the format statements at the end of the code. An alternative way in F90 to use the same format specication more than once is to dene it as a character variable, i.e., character (len = 30) :: fmt fmt = "(f5.1, 2f7.4, f8.4)" print fmt, theta1, ctheta1, stheta1, ttheta1 print fmt, theta2, ctheta2, stheta2, ttheta2 but we are getting ahead of ourselves because we have not yet shown how to use character variables.
4.5.1
Fortran mathematical functions
We used the Fortran sine, cosine and tangent function in the trigtable program. Here is a complete list of math functions: acos(x) asin(x) atan(x) atan2(y,x) cos(x) cosh(x) exp(x) log(x) log10(x) sin(x) sinh(x) sqrt(x) tan(x) tanh(x) arccosine arcsine arctangent arctangent of y/x in correct quadrant (***very useful!) cosine hyperbolic cosine exponential natural logarithm base 10 log sine hyperbolic sine square root tangent hyperbolic tangent
4.5.2
Possible integer vs. real problems
As an aside, note that the trigtable program uses:
62 degrad = 180./3.1415927 rather than simply degrad = 180/3.1415927
CHAPTER 4. FORTRAN
The reason is to make completely sure that the program will compute a real quotient and not an integer quotient. In fact, this caution is not needed in this case, as the following program demonstrates: program testfrac implicit none real c c = 2/3 print *,2/3 = , c c = 2/3. print *,2/3. = , c c = 2./3 print *,2./3 = , c c = 2./3. print *,2./3. = , c end program testfrac Running the program yields: 2/3 = 0.0E+0 2/3. = 0.6666667 2./3 = 0.6666667 2./3. = 0.6666667 As long as one part of the fraction is real, the program will compute a real quotient. It is only when both numbers are written as integers that the result is truncated. However, I have gotten into the habit of always including the decimal point in real expressions to avoid someday accidentally writing something like: a = sin(phi/degrad) + (2/3) * cos(theta/degrad)**2 which will denitely produce the wrong answer!
4.5.3
More about formats
There are many dierent formats that can specied. Here are some common examples:
4.6. INPUT USING THE KEYBOARD i5 = i5.4 = f8.3 = e12.4 = integer, 5 spaces, right justified as above but pad with zeros to 4 spaces (88 becomes 0088) real, 8 spaces, 3 to right of decimal place real output with exponent, 4 places to right of decimal e.g., b-0.2342E+02 where "b" is blank (useful for big or small numbers or when you are not sure what size they will be and want to be sure you have enough room to output them) (cranky aside: why always start with "0." which wastes a space? -2.342E+01 would be more compact)
63
If a number does not t into the allocated spaces, it will appear as a series of asterisks (*****) a8 2x tn tln trn / = = = = = = character output, 8 spaces, right justified (if the length of the string is greater than 8, then the leftmost 8 characters will appear) output two blanks tab to position n tab left n spaces tab right n spaces (tr2 is the same as 2x) start a new line
ASSIGNMENT F4 Write a F90 program to print a table of x, sinh(x), and cosh(x) (the hyperbolic sine and cosine, these are built-in functions in Fortran) for values of x ranging from 0.0 to 6.0 at increments of 0.5 (use these x values directly in sinh(x) and cosh(x), do not convert them to radians). Use a suitable format to make nicely aligned columns of numbers.
4.6
Input using the keyboard
So far all of our example programs have run without prompting the user for any information. To expand our abilities, lets learn how to input data from the keyboard. In most programs, we will want to rst prompt the user to input the data, so here is an example of how to input two numbers: print *, "Enter two integers" read *, a, b Pretty easy, isnt it? (Compare this section with the corresponding input section in the C or Python notes and you will see why I think Fortran is more user friendly than C or Python)
64
CHAPTER 4. FORTRAN Here is an example of a complete program that multiplies two numbers:
program usermult implicit none integer :: a, b, c print *, "Enter two integers" read *, a, b c = a * b print *, "Product = ", c end program usermult Running this program, we have: rock% usermult Enter two integers 2 5 Product = 10 The program will also accept the input on two lines: rock% usermult Enter two integers 2 5 Product = 10 Often in cases like this I will forget that Im supposed to input more than one number. When the program just sits there, I then realize that I need to input more numbers (or I wonder why its taking so long to nish!). What happens if we make a mistake and try entering a real number? Lets check: rock% usermult Enter two integers 3.1 15 ****** FORTRAN RUN-TIME SYSTEM ****** Error 1083: unexpected character in integer value Location: the READ statement at line 4 of "usermult.f90" Unit: * File: standard input Input: 3.1 ^ Abort rock% This is what happens on my old Sun computer, but most Fortran compilers will produce a similar message. The error message in this case is quite informative and
4.7. IF STATEMENTS
65
tells us exactly what the problem is. This is better than performing the computation and returning the wrong answer (the default in C for this example unless you are quite careful).
4.7
If statements
Next, lets modify this program so that it will allow the user to continue entering numbers until he/she wants to stop: program usermult2 implicit none integer :: a, b, c do print *, "Enter two integers (zeros to stop)" read *, a, b if (a == 0 .and. b == 0) exit c = a * b print *, "Product = ", c enddo end program usermult2 Here the do loop has no arguments and the block of code inside the do loop will be repeatedly executed until an exit command is executed. Exit (a new feature in F90) means to leave the do loop entirely and go to the next line after the enddo statement. As in our previous example, we indent the block inside the do loop to make the code easier to read. The program will allow the user to continuing entering numbers to be multiplied. When the user wishes to stop the program (in a more elegant way than hitting CNTRL-C), he/she enters zeros for both arguments. The if statement checks for this and exits the do loop in this case: if (a == 0 .and. b == 0) exit A list of the relational (comparison) operators in dierent languages is as follows: FORTRAN 77 90 .eq. .ne. .lt. .le. .gt. == /= < <= > C == != < <= > PYTHON == != < <= > MATLAB == ~= < <= > meaning equals does not equal less than less than or equal to greater than
66 .ge. .and. .or. .not. >= .and. .or. .not. >= && || ! >= and or not >= & | ~
CHAPTER 4. FORTRAN greater than or equal to and or not
The F77 syntax will still work under F90 and you are likely to see this in many of the older programs, e.g., if (a .eq. 0 .and. b .eq. 0) exit will also work. These operators can be combined to make complex tests, e.g., if ( (a > b .and. c <= 0) .or. d == 0) z = a There is, of course, an order of operations for these things which I cant remember very well. Look it up in a book if you are unsure or, better, just put in enough parenthesis to make it completely clear to anyone reading your code. One nice aspect of Fortran compared to C is that if you make a mistake and type, for example, if (a = 0) exit you will get an error message during compilation. In C this is a valid statement with a completely dierent meaning than is intended!
4.7.1
If, then, else constructs
In the above example, a single statement is executed when the if condition is true. A more versatile form is as follows: if (logical expression) then (block of code) else if (logical expression) then (block of code) else if (logical expression) then (block of code) . . else (block of code) end if
4.7. IF STATEMENTS
67
The blocks of code can contain many lines if desired. As many else if statements as required can be used. At most, one block of code will be executed (once one of the if tests is satised, it does not check the others). The nal else will be executed if none of the preceding if statements is true. The nal else is optional. Here is a demonstration program that repeatedly prompt the user for a positive real number. If it is negative, ask the user to try again. If it is positive, it computes and displays the square root using the sqrt() function. If the user enters zero, the program stops. program usersqrt implicit none real :: a, b do print *, Enter positive real number (0 to stop) read *, a if (a < 0) then print *,This number is negative! cycle else if (a == 0) then exit else b = sqrt(a) print *, sqrt = , b end if end do end program usersqrt Notice the use of the cycle command (also new in F90), which directs the program to the next iteration of the do loop. In contrast, the exit command exits the do loop entirely. The cycle and exit commands permit code to be written that is free of the go to statements that would likely have been present in a F77 version of this program (go to statements are considered mortal sins by the programming style police). ASSIGNMENT F5 Write a program to repeatedly ask the user for the constants a, b, and c in the quadratic equation a*x**2+b*x+c=0. Using the quadratic formula, have the program identify and compute any real roots. Output the number of real roots and their values. Stop the program if the users enters zeros for all three values. HINT: Test your program for some simple examples to make sure it is working correctly (a=1, b=2, c=-3 should return -3 and 1).
68
CHAPTER 4. FORTRAN
4.8
Greatest common factor example
Here is an example program that uses some of the concepts that we have just learned: ! compute the greatest common factor of two integers program gcf implicit none integer :: a, b, i, imax do print *, Enter two integers (zeros to stop) read *, a, b if (a == 0 .and. b == 0) then exit else do i = 1, min(a, b) if (mod(a, i) == 0 .and. mod(b, i) == 0) imax=i end do print *, "Greatest common factor = ", imax end if enddo end program gcf This is not a particularly ecient algorithm, but it runs plenty fast for small numbers. There are some new things here: 1. min(a,b) computes the minimum of a and b using the intrinsic Fortran min function. Naturally there is also a max function. min and max can have more than 2 arguments if desired. 2. mod(a,i) computes the remainder of a divided by i (this is called the modulus). If mod(a,i) is zero, then a is evenly divisible by i. If non-zero, then mod(a,i) has the sign of a. Here are some more useful Fortran functions: abs(a) sign(a, b) real(a) int(a) nint(a) ceiling(a) floor(a) absolute value abs(a) with sign of b conversion to real (F77 float(a) still works) conversion to integer nearest integer least integer greater than or equal to number greatest integer less than or equal to number
ASSIGNMENT F6 Modify gfc.f90 to compute the least common multiple of two integers.
4.9. USER DEFINED FUNCTIONS AND SUBROUTINES
69
4.9
User dened functions and subroutines
As the length and complexity of a computer program grows, it is a good strategy to break the problem down into smaller pieces by dening functions or subroutines to perform smaller tasks. This provides several advantages: 1. You can test these pieces individually to see if they work before trying to get the complete program to work. 2. Your code is more modular and easier to understand. 3. It is easier to use parts of the program in a dierent program. To illustrate how to dene your own function, here again is the greatest common factor program: program gcf2 implicit none integer :: a, b, gcf integer, external :: getgcf do print *, Enter two integers (zeros to stop) read *, a, b if (a == 0 .and. b == 0) then exit else gcf = getgcf(a,b) print *, "Greatest common factor = ", gcf end if enddo end program gcf2 integer function getgcf(x, y) implicit none integer :: x, y, i, z do i = 1, min(x,y) if (mod(x,i) == 0 .and. mod(y,i) == 0) getgcf = i end do end function getgcf We now perform the gcf calculation in the function getgcf. The variables a and b in the main program are the function arguments. They are passed to the function in the statement: gcf = getgcf(a, b)
70 The function getgcf denition begins with:
CHAPTER 4. FORTRAN
integer function getgcf(x, y) The variables x and y in the function will assume the values passed to the function by the main program. These arguments must match in number and type (real vs. integer, etc.) with the variables in the main program. Note, however, that they do not need to have the same names. Notice that we must declare getgcf in the calling program:
integer, external :: getgcf This is the preferred F90 syntax, although the following will also work:
integer :: getgcf The name of the function is by default the value that will be passed back to the main program. In some cases involving recursive functions (a more complicated topic that we may cover later), it may desirable to have the value returned to the main program be specied by a dierent variable name. To do this, we can use the optional result specier in the subroutine name: function getgcf(x, y) result(z) implicit none integer :: x, y, i, z do i = 1, min(x, y) if (mod(x, i) == 0 .and. mod(y, i) == 0) z = i end do end function getgcf The result(z) indicates that the result will be passed back to the calling program as the variable z. Thus, gcf in the calling program will assume the value of z in the function. ASSIGNMENT F7 Modify your program from F6 to compute the least-common multiple as a userdened function.
71
4.9.1
Subroutines
Functions are limited in their usefulness because they are designed to pass only one value back to the calling program. A more general construct is the Fortran subroutine, which allows unlimited numbers of values to be passed to and from the calling program. Here is our rst geophysically useful example, a subroutine to compute the distance and azimuth between any two points on the Earths surface: program userdist implicit none real lat1, lon1, lat2, lon2, del, azi do print *, Enter 1st point lat, lon read *, lat1, lon1 print *, Enter 2nd point lat, lon read *, lat2, lon2 call SPH_AZI(lat1, lon1, lat2, lon2, del, azi) print *,del, azi = , del, azi end do end program userdist ! SPH_AZI computes distance and azimuth between two points on sphere ! ! Inputs: flat1 = latitude of first point (degrees) ! flon2 = longitude of first point (degrees) ! flat2 = latitude of second point (degrees) ! flon2 = longitude of second point (degrees) ! Returns: del = angular separation between points (degrees) ! azi = azimuth at 1st point to 2nd point, from N (deg.) ! ! Notes: ! ! (1) applies to geocentric not geographic lat,lon on Earth ! ! (2) This routine is inaccurate for del less than about 0.5 degrees. ! For greater accuracy, use double precision or perform a separate ! calculation for close ranges using Cartesian geometry. ! subroutine SPH_AZI(flat1, flon1, flat2, flon2, del, azi) implicit none real :: flat1,flon1,flat2,flon2,del,azi,pi,raddeg,theta1,theta2, & phi1,phi2,stheta1,stheta2,ctheta1,ctheta2, & sang,cang,ang,caz,saz,az if ( (flat1 == flat2 .and. flon1 == flon2) .or. & (flat1 == 90. .and. flat2 == 90.) .or. & (flat1 == -90. .and. flat2 == -90.) ) then del=0.
72
CHAPTER 4. FORTRAN
azi=0. return end if pi=3.141592654 raddeg=pi/180. theta1=(90.-flat1)*raddeg theta2=(90.-flat2)*raddeg phi1=flon1*raddeg phi2=flon2*raddeg stheta1=sin(theta1) stheta2=sin(theta2) ctheta1=cos(theta1) ctheta2=cos(theta2) cang=stheta1*stheta2*cos(phi2-phi1)+ctheta1*ctheta2 ang=acos(cang) del=ang/raddeg sang=sqrt(1.-cang*cang) caz=(ctheta2-ctheta1*cang)/(sang*stheta1) saz=-stheta2*sin(phi1-phi2)/sang az=atan2(saz,caz) azi=az/raddeg if (azi.lt.0.) azi=azi+360. end subroutine SPH_AZI The subroutine is called with the statement: call SPH_AZI(lat1, lon1, lat2, lon2, del, azi) In this case, the lat/lon values are passed to the subroutine while del and azi are passed back to the main program. However, note that if at1, etc., were changed in the subroutine, then the corresponding variable would also be changed in the main program as well. lat1 in the main program and at1 in the subroutine point to the same memory location. If one is changed, the other automatically changes as well. Fortran is not case-sensitive so sph azi and SPH AZI have the same meaning. I like to put subroutine names in all caps so they are more visible. Note that sph azi does not have to be declared in the main program. The subroutine is well-documented in this case, explaining exactly what is going into the subroutine and what is going out, as well as some of the limitations of the routine. This may seem like overkill, but documenting your subroutines as completely as possible is likely to save you considerable time later if you ever want to use the routine again. It is good to document the routine well enough that you, or someone else, can use it correctly without having to study the code itself. Clarity is importantI have seen versions of this routine that do not make clear whether
73
the azimuth is measured at the rst point to the second point, or vice versa. Listing the limits of the subroutine may help prevent future misuse of the routine, in this example it may prevent the naive user from assuming that the distance returned is accurate when used with geographic latitude and longitude on the Earth. The routine is also designed to be robust with respect to pathological inputs, such as when the two points have the same coordinates. ASSIGNMENT F8 Write a single subroutine that computes the volume, surface area, and circumference of a sphere, given its radius, together with a main program that inputs dierent values for the radius from the keyboard and prints the results. Allow the user to terminate the program by entering zero for the radius. E-mail me the source code in a single le containing both the main program and the subroutine.
4.9.2
Linking to subroutines during compilation
A powerful aspect of subroutines is that one can link to compiled versions of existing subroutines without having to recompile them. This means that you only have to maintain one version of a subroutine; it need not be listed along with the source code of the main program. For example, the SPH AZI subroutine is also part of a F77 package of spherical geometry subroutines contained in: ~shearer/PROG/SUBS/sphere_subs.f Our main program could simply consist of: program userdist2 implicit none real flat1,flon1,flat2,flon2,del,azi do print *,Enter 1st point lat,lon read *,flat1,flon1 print *,Enter 2nd point lat,lon read *,flat2,flon2 call SPH_AZI(flat1,flon1,flat2,flon2,del,azi) print *,del, azi = , del, azi end do end program userdist2 When we compile this program, we need to indicate where the SPH AZI subroutine can be found. We want to link with what is called an object le for the
74
CHAPTER 4. FORTRAN
subroutines. Object les end in .o and there is a sphere subs f90.o le in the same directory ( shearer/PROG/SUBS) as the source le sphere subs.f. You must link with object les that have been compiled using the same Fortran compiler as you use for the main program. When I switched to using gfortran from g77 a year or two ago, I had to recompile my subroutines in order for them to link properly with gfortran compiled code. I still occasionally run into this problem when I link to a subroutine I have not used in awhile. To create a F90 object le, enter, for example: gfortran -c someprogram.f90 -o someprogram.o You can also create a F90 object le from F77 source code: gfortran -c anotherprogram.f -o anotherprogram.o This will work on native F77 code that includes non-F90 compatible features (e.g., comment lines starting with C), because the F90 compiler sees that the source le ends in .f rather than .f90. To compile userdist2 and link to sphere subs.o, we enter: gfortran userdist2.f90 /home/shearer/PROG/SUBS/sphere_subs.o -o userdist2 This quickly becomes cumbersome to write so you will nd it convenient to set up a Makele to keep track of all this for you. Here is part of a Makele that does this for this program: OBJS1= /home/shearer/PROG/SUBS/sphere_subs_f90.o
userdist2: Makefile userdist2.f90 $(OBJS1) gfortran userdist2.f90 $(OBJS1) -o userdist2 Note that you MUST use a tab to generate the spaces before gfortran for the Makele to work correctly! This is a leading source of confusion for novice Makele users. You can list the full path names for any number of subroutine object les in this way. To compile the program, simply enter: make userdist2
4.10. INTERNAL PROCEDURES ASSIGNMENT F9
75
Study the SPH MID subroutine contained in sphere subs.f (in shearer/PROG/SUBS). Write a program that uses this subroutine to compute the midpoint between two (lat,lon) points input by the user. Print out the latitude and longitude of this point. Do not attempt to copy the SPH MID source code, just link to the sphere subs f90.o object le. Make sure that the argument list for SPH MID in your program matches the subroutine arguments. E-mail me the source code and also tell me where the working executable version of the program is located. Be sure to give me execute permission so that I can try running your program. (Remember: ls -l shows the current permissions, chmod is how you change the permissions, man chmod will explain this.) If you are at sea (!), then you may have to copy the sphere subs.f routines to your local machine. You can make an object le from them using gfortran -c sphere subs.f -o sphere subs.o
4.10
Internal procedures
The functions and subroutines that we discussed above are called external because are located outside of the main program. With external procedures all of the values that are to passed into and out of the procedure must be part of the argument list. All other variables are local to the procedure or to the main program, even if they have the same name as a variable in a dierent procedure. External procedures are the only kind of procedure allowed in F77 and are what you will nd in books like Numerical Recipes or in the subroutine libraries maintained by various scientists at SIO. Their great advantage is their portabilityeverything you need to know about what they do is contained in their argument list. However, in some cases it may be more convenient to use an internal subroutine or function, a method that is new to F90. Internal procedures are listed immediately BEFORE the end statement in the main program. All variable names are shared within internal procedures; thus argument lists are not necessarily required for them to work. Here is an example adapted from the Brainard text that illustrates how internal subroutines work: program sort3 implicit none real :: a1, a2, a3
76 call read_numbers call sort_numbers call print_numbers contains subroutine read_numbers print *,Enter three numbers read *, a1, a2, a3 end subroutine read_numbers subroutine sort_numbers if (a1 > a2) call swap(a1,a2) if (a1 > a3) call swap(a1,a3) if (a2 > a3) call swap(a2,a3) end subroutine sort_numbers subroutine print_numbers print *,The sorted numbers are: print *, a1, a2, a3 end subroutine print_numbers subroutine swap(x,y) real :: x, y, temp temp = x x = y y = temp end subroutine swap end program sort3
CHAPTER 4. FORTRAN
Internal procedures are listed following a contains statement and before the end statement for the main program. Variables already declared in the main program need not be declared in the internal procedure. Arguments are optional; they are used here in the swap routine to permit it to be used for dierent pairs from a1, a2 and a3. Note that the procedures are not nested (e.g., one might have tried to put swap internal to sort numbers); internal procedures may not contain other internal procedures. Note also that variables declared within a subroutine are purely local to the subroutine. For example, the value for temp (in swap) is not available in the main program. Even if temp were declared in the main program, its value will not correspond to the value of temp in the swap subroutine (try it!). The use of internal subroutines is rather pointless in this case because the code would probably be clearer without them. However, for longer programs they may well be useful for making the code more structured. If you write a program and
4.11. EXTENDED PRECISION
77
notice that you are using the same block of code over and over again, this would a situation where using a subroutine would make sense. The advantage of an internal subroutine is that you dont have to match the argument lists and declare all of the variables. External subroutines can become cumbersome when they involve a large number of variables. Often common blocks are used to avoid long argument lists in this case (more about this later). In many case, internal subroutines may provide a neater way to handle this situation.
4.11
Extended precision
By default, Fortran stores real variables using 4 bytes, providing a precision of about 6 to 7 signicant gures over a range from (on the Suns) 1.175494e-38 to 3.402823e+38. If this is insucient precision, then variables can be dened in double precision. In F77, this was done by declaring them as double precision or real*8 variables. This syntax will still work, although F90 has introduced a new concept the kind specier for variables. All of these methods are demonstrated in this program: program testdouble implicit none character (len = 30) :: fmt = "(a5,f44.40)" real a4 real*8 a8 double precision aa real (kind=4) :: k4 real (kind=8) :: k8 real (kind=16) :: k16 !only include if using 64-bit machine + compiler a4 = 8.9 print fmt, a4 = , a4 a8 = 8.9d0 print fmt, a8 = , a8 aa = 8.9d0 print fmt, aa = , aa k4 = 8.9 print fmt, k4 = , k4 k8 = 8.9_8 print fmt, k8 = , k8 k16 = 8.9_16 !only include if using 64-bit machine + compiler
78 print fmt, k16= , k16 end program testdouble
CHAPTER 4. FORTRAN !only include if using 64-bit machine + compiler
which produced the following output on my old Sun computer: a4 = a8 = aa = k4 = k8 = k16= 8.8999996185302734375000000000000000000000 8.9000000000000003552713678800500929355621 8.9000000000000003552713678800500929355621 8.8999996185302734375000000000000000000000 8.9000000000000003552713678800500929355621 8.900000000000000000000000000000000308148
and which produces the following output on my Mac running 32-bit gfortran: a4 a8 aa k4 k8 = = = = = 8.8999996185302734375000000000000000000000 8.9000000000000003552714000000000000000000 8.9000000000000003552714000000000000000000 8.8999996185302734375000000000000000000000 8.9000000000000003552714000000000000000000
after I comment out the k16 lines. Note that a4 and k4 are regular real*4 variables and approximate 8.9 as 8.999996. Much greater precision is obtained with a8, aa, and k8, which are all real*8 (double precision) variables. k16 is a real*16 variable (quadruple precision). NOTE: k16 does not currently work with 32-bit gfortran. Does it work on your computer? On the Suns, kind=4 is for single precision (4-byte real), kind=8 is for double precision (real*8), and kind=16 is for 16-byte real. Note that the precision of numbers may be specied by appending them with an underscore and the kind value. Thus we write 8.9 8 to indicate that 8.9 is to represented as an 8-byte real number. The use of kind is intended to give you greater control over the degree of precision in your programs and ultimately make codes that are less platform dependent in their behavior. Im not sure if this has been achieved! It appears that F90 also allows complex variables to be declared as double precision, something not allowed in F77. We will check if this is actually true later when we consider complex variables. IMPORTANT NOTE: You must dene the extra-precision variables using numbers with the appropriate precision. If you write: a8 = 8.9 !***Not correct if a8 is real*8
you will get single precision accuracy. Even the dble operator (which works in F77, at least on the Suns) will not work in F90 in assigning variables:
4.11. EXTENDED PRECISION a8 = dble(8.9) !***Not correct if a8 is real*8
79
will assign a8 only at single precision accuracy. The key is to write: a8 = 8.9d0 or k8 = 8.9_8 Note that the 0 following the d is the power of 10, thus 1.23d3 is 1230, 0.84d-2 is 0.0084, etc. To get the full 16-byte precision (only on the Suns or 64-bit Macs), you must say k16 = 8.9_16 rather than k16=8.9d0 or k16=8.9 8, which will assign k16 only to double precision accuracy. Many of my existing F77 routines use dble( ) to dene double precision numbers. These will not work correctly if they are changed to F90, unless all of the dble( ) commands are rewritten. They are ne, however, if they continue to be compiled using F77. I dont know if this is a bug in Sun F90, or if the use of dble( ) in this way is non-standard Fortran. ASSIGNMENT F10 Examine the datetime.f source code in shearer/PROG/SUBS (also see class website) and write a F90 program to compute the number of seconds that have elapsed since noon on your birth date (or the exact time if you know it) until a user specied date and time. Have the program print out this number of seconds. You will want to use the DT TIMEDIF subroutine. Make sure that all of the variables match and are of the same type (integer, real, and real*8 for timdif, the nal argument). You should also be aware that & in column 6 of F77 code is how lines are continued, that is: subroutine DT_TIMEDIF(iyr1,imon1,idy1,ihr1,imn1,sc1, & iyr2,imon2,idy2,ihr2,imn2,sc2,timdif) !correct way to set double precision variables
80 is actually all one line. E-mail me the source code of your program.
CHAPTER 4. FORTRAN
Extra credit: Determine the date and time when you will be (or were) exactly one billion seconds old. This will require using one of the other subroutines in datetime.f Mark your calendar for a party!
4.11.1
Integer sizes
Just as you can set aside xed numbers of bytes for real numbers, you can do the same to store integers of varying sizes. The standard is 4-byte integers, which have 32 bits and thus will go (approximately) from 231 to 231 . However, 2-byte and 8-byte integers are also allowed. These are sometimes called short and long integers, respectively. The following example program shows how this works, using two dierent ways to dene the integers: program testinteger implicit none integer*2 i2 integer i4 integer*8 i8 integer (kind=2) :: k2 integer (kind=4) :: k4 integer (kind=8) :: k8 i8 = 32000 i4 = i8 i2 = i8 print *, i2, i4, i8 i8 = 32000**2 i4 = i8 i2 = i8 print *, i2, i4, i8 i8 = 32000 i8 = i8**4 i4 = i8 i2 = i8 print *, i2, i4, i8 end program testinteger This will produce the output:
4.12. ARRAYS 32000 0 0 32000 1024000000 0 32000 1024000000 1048576000000000000
81
The zero elds in the output are incorrect and indicate that the number was too large to be stored in the given variable type. Note that k2, k4 and k8 (dened using the kind statement) will give the same results, even though they are not explicitly tested in the program. Regular integers suce for most purposes. Short integers are useful to save space for data sets that dont require bigger numbers (i.e., beyond about 32000).
4.12
Arrays
Arrays are dened in F90 as in this example real, dimension(100) :: a, b integer, dimension(50,2) :: index which denes a and b as 100 element arrays (a(1) to a(100)) and index as a 50x2 element array (index(1,1) to index(50,2)). Note that the default starting array number is 1 (not 0 as in C). Also note that array elements are written using parentheses, not brackets as in C. Older Fortran programs would dene the same arrays as follows: real a(100), b(100) integer index(50,2) This syntax will still work and is easier to read for short programs. The new standard has the advantage, however, of being able to easily dene many arrays with identical dimensions. A nice feature of Fortran is that we can specify the lower and upper array boundaries explicitly in the declaration, e.g., real, dimension(-100:100) :: a, b integer, dimension(0:50, 2) :: index In this case there are 201 values for a and b (from a(-100) to a(100)) and 102 values for index (from index(0,1) to index(50,2)). Here is an example program that uses an array to compute prime numbers less than 100:
82
CHAPTER 4. FORTRAN
program prime implicit none integer, parameter :: maxnum=100 integer :: i, j, prod(maxnum), max_i, max_j, nprime=0 do i = 1, maxnum prod(i) = 0 enddo max_i = floor(sqrt(real(maxnum))) do i = 2, max_i if (prod(i) == 0) then max_j = maxnum/i do j = 2, max_j prod(i*j) = 1 enddo end if end do do i = 2, maxnum if (prod(i) == 0) then nprime = nprime + 1 print "(i4)", i end if enddo print *, Number of primes found = ,nprime end program prime
The method is sometimes called the sieve of Eratosthenes, named after a Greek mathematician from the 3rd century BC. We start with a list of numbers from 2 to 100 and consider them all possible primes. In the program, this list is the array prod. We initialize the array by setting its values to zero. The strategy is to then ag numbers that are not prime by setting the corresponding array values to one. We then start with the number 2 and eliminate all multiples of 2 up to the maximum value of 1000. We then move to 3 and eliminate (i.e., set prod to 1) all multiples of 3. We can skip 4 because 4 and all its multiples were already eliminated. We need check numbers, i, only up to sqrt(100) because larger factors would already have been eliminated. When we are nished, we simply print out all numbers that were not eliminated. In our case, this is all i such that prod(i) = 0. We count the number of primes using the counter variable nprime. A new aspect of this program is the dening of maxnum as a parameter:
4.12. ARRAYS integer, parameter :: maxnum=100
83
This tells the program to set maxnum to 100 and that this value will never be changed during the program. It also allows the array dimension for prod (in the next line) to be set by maxnum. This makes it easy for us to change the size of our prime search by changing the value of maxnum without having to change anything else in the program. Note that in the statement max_i = floor(sqrt(real(maxnum))) it is necessary to convert maxnum to a real number before taking the square root. This is because the argument to the Fortran sqrt function must be real; if we had written sqrt(maxnum) we would have gotten an error during compilation. This program lists the prime numbers with one number per output line and thus will become cumbersome for larger values of maxnum. Lets modify the program to list 10 primes per line. To to this, we save the prime numbers in a separate array called pnum. Here is the code: program prime2 implicit none integer, parameter :: maxnum=1000 integer, dimension(maxnum) :: prod, pnum integer :: i, j, max_i, max_j, nprime=0 do i = 1, maxnum prod(i) = 0 enddo max_i = floor(sqrt(real(maxnum))) do i = 2, max_i if (prod(i) == 0) then max_j = maxnum/i do j = 2, max_j prod(i*j) = 1 enddo end if end do do i = 2, maxnum if (prod(i) == 0) then nprime = nprime + 1 pnum(nprime) = i end if
84
CHAPTER 4. FORTRAN enddo print *, Number of primes found = , nprime print "(10i5)", (pnum(i), i = 1, nprime)
end program prime2 Note that we use the new F90 convention for setting up the arrays. The program is pretty self-explanatory, but we do introduce a new concept in the output line: print "(10i5)", (pnum(i), i = 1, nprime) This is termed an implicit do loop and is very useful in input/output (I/O) statements. Note that the format specier 10i5 applies to the rst 10 numbers and then is reused for the next 10, etc. The parentheses around (pnum, i=1,nprime) are required. More complicated expressions are also possible, e.g., print "(5(i4,i4,2x))", (i, pnum(i), i = 1, nprime) will print the primes to the right of their count. Array values can be assigned one element at a time, or can be assigned in single statements as in this F90 example: program testarray implicit none integer, dimension(3) :: x = (/ 1, 2, 3 /), y = 1 print *, x x = (/ 15, 30, 40 /) print *, x print *, y y = 2 print *, y end program testarray and its output: 1 2 3 15 30 40 1 1 1 2 2 2 Note that when an array is set to a scalar, every array element is set to the value of the scalar, i.e., y =2 sets all y values to 2. The individual elements can be specied if they are listed between (/ and /) (Warning: Dont put a space between
4.12. ARRAYS
85
the / and the parenthesis!!). The number of values must match the dimension of the vector. We will discuss this more later when we consider two-dimensional arrays in detail. So we see that we could have written: integer, dimension(maxnum) :: prod = 0, pnum in program prime rather than wasting the three lines we used later to set all elements in prod to zero. There are some cases when we would prefer not to advance to the next line following a print statement. Instead, we would like to have output continue on the same line. Here is an example of how this can be used to print 10 prime numbers per line without having to store the prime numbers in a separate array: program prime3 implicit none integer, parameter :: maxnum=1000 integer :: i, j, prod(maxnum)=0, max_i, max_j, nprime=0 max_i = floor(sqrt(real(maxnum))) do i = 2, max_i if (prod(i) == 0) then max_j = maxnum/i do j = 2, max_j prod(i*j) = 1 enddo end if end do do i = 2, maxnum if (prod(i) == 0) then nprime = nprime + 1 if (mod(nprime,10) /= 0) then write (*,"(i4)", advance=no) i else write (*,"(i4)") i end if end if enddo print * print *, Number of primes found = ,nprime end program prime3 The statement
86 write (*, "(i4)") i
CHAPTER 4. FORTRAN
works exactly the same as the print statement we used previously. (print only goes to the screen, write can go to the screen or to a le. The * in the above write statement means standard output, not free format). The statement write (*, "(i4)",advance=no) i does the same thing, except that it does not advance to the next line. The program checks to see if the prime count (nprime) is divisible by 10; if not then the output does not advance. Following the enddo, a print * statement is needed to be sure that the nal output (Number of primes found) is on a separate line. NOTE: With the Suns, under both F77 and F90, the carriage return (advancement to next line) can also be suppressed simply by adding a $ to the format statement. Thus the above line could be replaced with print "(i4,$)", i and the result would be the same. However, I dont think this is standard Fortran so you may get into trouble at some point with this convention.
4.12.1
Checking for problems with the -fcheck=bounds option
Suppose when writing the prime number program, we made the mistake of writing: max_j = maxnum rather than max_j = maxnum/i When the program is run, at some point the product i*j in the line prod(i*j) = 1 will exceed the array dimensions of prod (maxnum). The program will then start overwriting adjacent memory locations and disaster will result. Most commonly the program will crash with the following kind of message:
4.12. ARRAYS Bus error or Segmentation Fault
87
These types of errors result from exceeding array boundaries or mismatched subroutine arguments. These are not very helpful error messages because they do not tell you where in the program the problem occurred. However, there is usually a compiler option you can set that will provide more useful diagnostics. For gfortran, it is the -fcheck=bounds option and can be invoked either by compiling prime4 with: gfortran -fcheck=bounds prime4.f90 -o prime4 or changing the Makele to: %: %.f90 gfortran -fcheck=bounds $< -o $*
(remember the tab before gfortran!) If you have compiled your program with this option, then the computer will check to see if your array indices exceed their boundaries and tell you where the problem occurs: shearer@katmai 324> ./prime4 At line 11 of file prime4.f90 Fortran runtime error: Array reference out of bounds for array prod, upper bound of dimension 1 exceeded (1002 > 1000) This is so helpful, and will save you so much time in debugging your programs, that I recommend that you ALWAYS use this option when you are rst getting your programs to work. It does slow the code somewhat, so for programs where run time is a factor, you should then recompile without -fcheck=bounds for the nal working version. (in these cases, you may also want to experiment with the -O compiler to make your code run faster, see below). ASSIGNMENT F11 Fortran 90 includes a built in random number subroutine (called random number). Here is an example program that prints out 20 random values from between 0 and 1.
88 program listrand integer :: i real :: xran do i = 1, 20 call random_number(xran) print *, xran enddo end program listrand
CHAPTER 4. FORTRAN
Write a simple F90 program to generate 10000 random numbers between 0 and 1. Test the randomness by counting the number of these that fall in each of 10 evenly spaced bins between 0 and 1 (i.e., 0 to 0.1, 0.1 to 0.2, etc.). Print these counts to the screen. Here is an example of what your output might look like: 0 1 2 3 4 5 6 7 8 9 1009 1048 1001 1038 1008 959 993 925 1017 1002 HINT: Set up an integer array with 10 elements and initialize all the values to zero. For each random number generated, then add one to the appropriate array element to keep track of how many of the numbers are in that bin.
4.12.2
More about random numbers
To learn more about random number generators, please consult the book Numerical Recipes, which has a discussion about good algorithms and bad algorithms. I have not tested the built-in F90 random number generator to see how well it measures up, but it seems pretty good for most purposes. You may notice that if you use random number in a program that you will get the same numbers every time you run the program. This is desirable in many cases. For example, if you are debugging a program, you want any problems to be completely reproducible. But in other cases, you will want to get dierent results each time. One simple way to achieve this would be to ask the user to input an integer and then compute that many random numbers before continuing to the main program. But this requires the user to think of dierent input numbers. To avoid this, we can
4.12. ARRAYS
89
randomize the start of the random number generator itself by using some number that will always be dierent, for example from the system clock on the computer. This turns out to be annoyingly hard to do in F90. Here is an example program: program listrandseed implicit none integer :: i, count, count_rate, count_max, narg integer, dimension(8) :: idate integer, dimension(12) :: seed real :: xran ! randomize start of random numbers call date_and_time(VALUES=idate) print *, idate = , idate call system_clock(count, count_rate, count_max) seed(1) = count seed(2:9) = idate(8:1:-1) seed(10:12) = idate(6:8) print *, seed = , seed call random_seed(SIZE=narg) print *, narg = , narg call random_seed(PUT=seed) do i = 1, 20 call random_number(xran) print *, xran enddo end program listrandseed Once you have this working, you may want to remove the print statements in the random seed part. But before doing so, make sure that seed is dimensioned to narg. When I recently changed computers, narg changed from 8 to 12 and I had to rewrite this little program.
4.12.3
Arrays as subroutine arguments
Suppose we have dened an array x as: integer :: x(100) If we use x in the argument list for a subroutine, e.g., call SUMTOT(x,n,y)
90
CHAPTER 4. FORTRAN
we include x without any parentheses. The subroutine argument list must also include a matching array of the same size. All array values are passed to and from the subroutine. We will discuss later how to write subroutines that will work for arrays of diering size in the calling program.
4.13
Character strings
A character string may be declared in F90 as follows: character (len = 20) :: name declares the variable name to be a character string of length 20. One can also dene an array of character strings: character (len = 20), dimension(100) :: name_array The old F77 ways to dene these strings will still work in F90: character name*20, name_array(100)*20 or character*20 name, name_array(100) A string variable can be assigned as, e.g., name = "Bill Clinton" The quotes (apostrophes will also work) are required. If name in this case was declared as 20 characters long, then trailing blanks are added so name actually is Bill Clinton If, on the other hand, name was declared as 10 characters long, then name would contain Bill Clint as the extra letters would be cut o (no error results). The length of a character string may be obtained with the len function: len("Bill") will give 4 len(" ") will give 2 len(name) will give 20 if name was declared as 20 characters long, regardless of any trailing blanks Substrings may be obtained as follows for name = Bill Clinton:
4.13. CHARACTER STRINGS name(1:4) is "Bill" name(6:12) is "Clinton" name(6:6) is "C" Thus, we could change the last name as follows: name = "Bill Clinton" name(6:12) = "Jones " Note that the trailing blanks are necessary to overwrite the on The following moves the string one character to the right: name(2:13) = name(1:12) This was not allowed in F77 (because of the overlap) but is OK in F90.
91
Often one wants to know the length of a string without any trailing blanks. This can be done with the built in function len trim. Another built in function is called trim which gives the string without any trailing blanks. To nd the position of a substring within a string, the built in funtion index can be used: index("Clinton", "in") returns 3 index("Clinton", "on") returns 6 index("Clinton", "no") returns 0 (indicates string not found) To append one string onto the end of another (concatenation), the // operator can be used: text1 = "Bill Clinton" // " was elected in 1992."
sets text1 to the complete sentence (text1 must have been already declared of sufcient length). Note the blank before was is necessary for there to be a space between the word. When trailing blanks are present, the trim function is very useful. For example, if name is declared as 20 characters long, then name = "Bill Clinton" text1 = name // " was elected in 1992" sets text1 to: Bill Clinton was elected in 1992
92 while
CHAPTER 4. FORTRAN
text1 = trim(name) // " was elected in 1992" will produce the correct result. Here is an example program that illustrates how to input a string: program vote implicit none character (len = 80) :: name print *, Who did you vote for in 1992? read "(a)", name if (index(name,ill) /= 0 .or.index(name,lin) /= 0) then print *, "Then you are likely a Democrat." else if (index(name,eor) /= 0 .or.index(name,ush) /= 0) then print *,"Then you are likely a Republican." else print *,"Then you are likely an independent voter." end if end program vote Note the format for the read statement: read "(a)", name does not need to use a80 although that would also work. The free format read statement: read *, name is not recommended in this case because it will only read until the rst blank. Thus Bill Clinton would be read in as Bill (unless the full name were enclosed in quotes, but who wants to make the user do that?). ASSIGNMENT F12 Write a simple version of the famous 1960s program, Eliza, that simulates a psychologist talking to a patient. A humorous ctional account of such a program is contained in Small World, the David Lodge satire of academia. There is an online version of the original program at: http://chayden.net/eliza/Eliza.html Programs of this type are now called bots and examples can be found at alicebot.org. Begin your program by asking the patient:
4.14. I/O WITH FILES "How are you?"
93
Then use the index function to input a line from the user/patient. Search this line for various key words that you can then use to guide the computers response. Design the program so that it will continue this conversation between doctor and patient. Some suggestions: If the input string contains a "?", then respond: "Let me ask the questions. Why is it important to you?" If the string contains "!", then respond: "You seem pretty upset. What is really bothering you?" If the string contains "mother" then respond: "Tell me more about your mother." If the string contains any swear words, respond: "There is no need for that kind of language." If you identify no key words on your list, then respond with something generic like: "Go on." or "Tell me more about it." Be creative but dont spend an innite amount of time on this assignment. Your program will be more realistic if you use random numbers (see above) to randomly select from a series of possible responses so that the computer does not keep saying the same thing.
4.14
I/O with les
So far, all o our examples have involved input from the keyboard and output to the screen. Now let us see how to open les, read from them, and write data to them. Here is a program that reads pairs of numbers from an input le, computes their product, and outputs the original numbers and their product to an output le. program fileinout implicit none character (len=100) :: infile, outfile integer :: i, ios real :: x, y, z print *, Enter input file name read "(a)", infile
94 open (11, file=infile, status=old) print *, Enter output file name read "(a)", outfile open (12, file=outfile)
CHAPTER 4. FORTRAN
do i = 1, 999999999 !more lines than any likely input file read (11,*, iostat = ios) x, y if (ios < 0) then exit else if (ios > 0) then print *, ***Warning, read error on line: , i cycle endif z = x * y write (12,*) x, y, z enddo close (11) close (12) end program fileinout We open the input le with the statement: open (11, file=infile, status=old) Files must be assigned unit numbers for I/O. This statement opens the le with unit 11. On most systems, unit 5 is dened as the default standard (keyboard) input and unit 6 is dened as the default standard (monitor) output, so these numbers should always be avoided as unit numbers. I have gotten into the habit of using units 11, 12 and 13 for most of my I/O. le=inle is what provides the lename. Note that we could hardwire a le name here by writing: open (11, file="file1", status=old) The status=old is optional, but I recommend always using it for opening les which should already exist. When you say status=old and the le does not exist, then the program will terminate and print an error condition. If you have not set status=old then the program will create a new le and when you try to read from it, you will reach the end of the le immediately and the program will terminate without any obvious errors (except that you will now have zero length les in your directory!). Another value for status is:
4.14. I/O WITH FILES status=new file must not already exist (useful to avoid overwriting existing files)
95
However, in this example we choose to simply write: open (12,file=outfile) so that we can overwrite existing le names. This assigns unit 12 to the output le (the input and output le unit numbers must be dierent!). We read data from the input le with the statement: read (11, *, iostat = ios) x, y This does a free-format read from unit 11 (for a xed format, the * is replaced with a format specier such as (i2, 2f6.2) etc.). Because we do not know, a priori, how long the input le is, we need some way to recognize when we have reached the end of the le (EOF). The iostat = ios specier does this by assigning the local variable ios (which we must declare as an integer) that is zero for a normal read, positive if there is an error during the read, and negative if the end of the le is reached. Note that we are free to choose any name for ios that we want as long as it is declared as an integer. Once ios has been set, we check its values to see if there is an EOF or error condition: if (ios < 0) then exit else if (ios > 0) then print *, ***Warning, read error on line: , i cycle endif For ios < 0, we exit the do loop because there are no more lines to read. For ios > 0, we print a warning message and specify the line number. Being able to print the line number is useful because otherwise we would not know where to look for the problem in a big input le. This is why we use i in the do loop. Following the warning message, we use cycle to skip this line and read the next line. Without the cycle, the program would write an output line using the same values of x and y used for the previous line, the result being an erroneous duplicate line in the output le. We should note at this point that older Fortran programs do not use this method of testing for the end of the le. Rather, they will say something like:
96 read (11, *, end = 123) x, y
CHAPTER 4. FORTRAN
where 123 is a statement label that the program will branch to when the end of le is reached. This method still works in F90 but you lose style points for having to use a statement label. However, despite this there is one aspect in which the old way of doing things is better than the new way. Suppose there is an error in the input le, for example a stray character in one of the input lines instead of two numbers. In the old way of doing things, the program will crash on the faulty input. In the new approach, using the iostat= convention, any read errors are ignored. You never want to ignore errors without at least being aware of them. So if you use iostat= then you always should explicitly look for errors. In other words, you should never write code like this: ! example of what NOT to do! do read (11,*, iostat = ios) x, y if (ios < 0) exit z = x * y write (12,*) x, y, z enddo In this case, any input error would be agged by assigning a positive number to ios. But this code does not check for this and will therefore continue running, retaining the previous values for x and y. The result will be that the output le will repeat the results from the previous line. This is not good, because we have introduced an error into the output that might not be immediately obvious. In the old way of doing things, however, the program crashes on the faulty input. Continuing to examine the program leinout.f90, we write the original numbers and their product using a free format write to unit 12: write (12,*) x, y, z Note that x and y will probably not be in the same format as in the input le. Finally, after the EOF but before we end the program it is good practice to close the les: close (11) close (12)
4.15. MORE ABOUT MULTI-DIMENSIONAL ARRAYS although the les will be closed automatically anyway when the program ends. ASSIGNMENT F13
97
Write a program that reads from a text le containing 5 numbers per line. Compute the mean of the 5 numbers and then subtract the mean from each of the 5 original numbers. Output the 5 demeaned numbers to an output le. Allow the user to specify the input and output le names. For example, if the input le contains: 10 20 1 2 3 2 4 6 5 5 5 30 40 50 4 5 8 10 5 5
your program should write to the output le: -20.000 -2.000 -4.000 0.000 -10.000 -1.000 -2.000 0.000 0.000 0.000 0.000 0.000 10.000 1.000 2.000 0.000 20.000 2.000 4.000 0.000
Hint: Use the leinout.f90 program listed above as a template for the le handling part of your code.
4.15
More about multi-dimensional arrays
Two-dimensional arrays in Fortran may be dened as follows: integer a(2,3) integer, dimension(2,3) :: a !F77 convention !F90 convention
This denes a matrix in which the rst indices will vary from 1 to 2, and the second from 1 to 3 (recall that the default starting array index is 1 in Fortran). Here is a test program that illustrates how the numbers in a can be specied and how they are stored in memory: program testmatrix implicit none integer, dimension(2,3) :: a a(1, 1:3) = (/ 1, 2, 3 /) a(2, 1:3) = (/ 4, 5, 6 /) print *, a end program testmatrix
98 The output of this program is: 1 4 2 5 3 6
CHAPTER 4. FORTRAN
First, note how the array values are specied. (/ 1, 2, 3 /) is an example of what is called an array constructor and is a sequence of scalar values along one array dimension only. We equate this to the desired part of the a matrix by writing, for example: a(1, 1:3) = (/ 1, 2, 3 /) thus setting a(1,1)=1, a(1,2)=2, and a(1,3)=3. Finally note how the array is printed. The print *, a statement will dump the values of a in the same order that they are stored in memory. In doing this, Fortran varies the rst array element rst, then the second, etc. In this case, the result is: a(1,1), a(2,1), a(1,2), a(2,2), a(1,3), a(2,3) Similarly a three-order array b(2,2,2) would be stored as: b(1,1,1) b(2,1,1) b(1,2,1) b(2,2,1) b(1,1,2) b(2,1,2) b(1,2,2) b(2,2,2) This is important to remember when passing arrays to and from subroutines for cases when the array in the main program has more dimensions than the array in the subroutine. For example, suppose we wished to store 100 seismograms of 1000 points each in the main program. We have a ltering subroutine that will act on a single time series vector. If we dimension the array in the main program as b(1000,100) then we can call the subroutine using a statement like: call FILTER(b(1,i),npts) where i is the seismogram number. The lter can then act on b(1:npts, i) and would be of the form:
4.15. MORE ABOUT MULTI-DIMENSIONAL ARRAYS subroutine FILTER(c, n) real, dimension(n) :: c . .
99
This is a very common type of construction. Note that this would not work if the array was dimensioned as b(100,1000) because then each 1000-point time series would not be continuous in memory. Here is an example program to multiply two matrices: program matmult implicit none real, dimension(3,3) :: a, b, c integer :: i, j, k a(1,1:3) = (/ -5.1, 3.8, 4.2 /) a(2,1:3) = (/ 9.7, 1.3, -1.3 /) a(3,1:3) = (/ -8.0, -7.3, 2.2 /) b(1,1:3) = (/ 9.4, -6.2, 0.5 /) b(2,1:3) = (/ -5.1, 3.3, -2.2 /) b(3,1:3) = (/ -1.1, -1.8, 3.0 /) do i = 1, 3 do j = 1, 3 c(i,j) = 0.0 do k = 1, 3 c(i,j) = c(i,j) + a(i,k) * b(k,j) enddo enddo enddo print *, "Matrix a follows" call PRINTMAT(a) print *, "Matrix b follows" call PRINTMAT(b) print *, "Matrix c = a*b follows" call PRINTMAT(c) contains subroutine PRINTMAT(x) real, dimension(3,3) :: x do i = 1, 3 print "(3f8.3)", (x(i,j), j=1,3) enddo end subroutine PRINTMAT end program matmult
100
CHAPTER 4. FORTRAN
Note the use of the internal subroutine PRINTMAT to display the results. This example performs the matrix multiplication exactly as one would in F77; it does not take advantage of some of the new array capabilities of F90. In fact, there is a built in function, MATMUL, in F90 that performs matrix multiplication. My plan is to have more to say about these special F90 capabilities later in the class. ASSIGNMENT F14 Modify the matmult program so that the matrix multiplication is done in a subroutine. Add another subroutine that computes the transpose of a square matrix. Compute the transpose of c and add this to the output listing.
4.15.1
Arrays of strings
As we have discussed, a string is just an array of characters. Thus, a 1-D array of strings will be a 2-D character array. Here is an example of this: program stringarray implicit none character (len = 80), dimension(10) :: name name(1) = "Bill Clinton" name(2) = "George Bush" print *, "Our last president was ", trim(name(1)) print *, "Our current president is ", trim(name(2)) end program stringarray The program can store up to 10 name of 80 characters each. Note the use of the trim function when printing the names. This is very handle to cut o the trailing blanks and limit unnecessary extra spaces (and sometimes lines) in the output.
4.16
A more complex example of data processing
Suppose we have a data le (e.g., datale1) that contains a series of measurements that has the following form: 980204 22:03:34.42 10.8 10.2 9.8 11.6 980205 08:45:22.20 5.5 7.2 6.6
4.16. A MORE COMPLEX EXAMPLE OF DATA PROCESSING 980205 20:13:42.88 15.0 13.8 12.9 14.2 etc.
101
The structure is that there are date/time identiers and then a series of measurements made at that time. The problem in reading this le is that we dont know how many measurement lines will follow each date/time line so we cant simply perform a formatted read on each line from the le. Let us assume that we want to compute the mean of these measurements and write a new le which stores this information on the same line as the date/time. In this case, our desired output le (e.g., datale2) will look like: 980204 22:03:34.42 980205 08:45:22.20 980205 20:13:42.88 10.60 6.43 13.97 4 3 4
where the nal column gives the number of measurements. Here is how such a program could be written: program procfile implicit none character (len=100) :: infile, outfile character (len=20) :: linebuf, event integer :: ios, nevent=0, ndata=0 real :: x, sum=0.0, xavg print *, Enter input file name read "(a)", infile open (11, file=infile, status=old) print *, Enter output file name read "(a)", outfile open (12, file=outfile) do read (11,(a), iostat = ios) linebuf if (ios < 0) exit if (linebuf(1:1) /= ) then ! no init blank, must be event nevent = nevent + 1 if (nevent /= 1) then !need to output previous event info call WRITE_EVENT ndata = 0 sum = 0 end if
102 event = linebuf else read (linebuf,*) x sum = sum + x ndata = ndata + 1 end if enddo call WRITE_EVENT close (11) close (12) contains
CHAPTER 4. FORTRAN ! has starting blank, must be measurement
subroutine WRITE_EVENT if (ndata /= 0) then xavg = sum/real(ndata) else xavg = 0. end if write (12,"(a20, f7.2, i5)") event, xavg, ndata end subroutine WRITE_EVENT end program procfile The beginning of the program where the les are opened is the same as in leinout.f90 (our earlier example of le I/O). Using an innite do loop, we read lines from the input le using: read (11,(a), iostat = ios) linebuf Because we dont know the format of each line before we read it, we read into a character string (linebuf) that will serve to temporary store the contents of the line (a line buer as they say in the computer world). We use iostat=ios to set ios to a ag that can be used to identify when the end of the le occurs so that we can exit the do loop. As discussed previously, we really should also check for an error condition (i.e., ios > 0), but we dont bother here because errors are unlikely when reading in a string (yes, we are being a bit sloppy!). Next, we check the contents of the rst character in linebuf. If is not a blank, then we know we have an event line. In this case we increment the nevent counter variable by one. If we have not just read the rst event, then we have read a new event, thus terminating the data input from the previous event. In this case we need to write the processed event info to the output le (we cant do this until this
4.16. A MORE COMPLEX EXAMPLE OF DATA PROCESSING
103
point because we dont know how many data points there will be) and reset sum and ndata to zero. Finally we set event = linebuf to store the event line until we need to output it. If, on the other hand, the rst character in linebuf is blank, then we have a data line. We then read the value of x out of linebuf: read (linebuf,*) x This is called an internal read and is the rst time that we have shown this. It allows the program to do a formatted read from a string, rather than from an external le. We do a free format read (*), but one could also use a format specier. After reading the measurement value into x, we suitably increment a running sum of x and the number of measurements. When we reach the end of the input le, we have to output the results of the nal set of measurements. To avoid repeating a block of code for this operation, we use an F90 internal subroutine (WRITE EVENT). This is used twice, inside the loop as we output information from the last event before going on to the next one, and outside the loop where we output the information from the last event. Note that we must check to see if there are no measurements for an event line, in which case we do not divide by zero but simply set xavg to zero. There should be no possibility of mistaking this for a real measurement in the output le because the number of measurements in this case will also be zero. ASSIGNMENT F15 Obtain the data le, scsn.phase.dat, from the class web site. This is a list of earthquakes and arrival time data from the Southern California Seismic Network (SCSN) for the rst two days of August 2000. (Files like this are readily obtainable from the SCEC Data Center at http://www.scecdc.scec.org if you ever want to obtain more of these data. The format is called the SCEC DC phase format.) Here is what part of the le looks like: 2000/08/01 14:44:12.6 L 1.1 WRC VHZ 911 P IU1 WCS VHZ 885 P E 3 WVP VHZ 925 P IU1 TOW VHZ 813 P E 2 CLC VHZ 202 P E 2 WMF VHZ 906 P E 3 c 35.962 -117.693 4.41 1.26 9.98 1.85 11.54 2.25 18.28 3.78 18.49 3.37 22.86 3.94 5.2 A 9159024 8 51 0 0
104 WMF VHZ 906 S E 2 22.86 7.08 WNM VHZ 908 P E 3 23.74 4.23 2000/08/01 14:45:36.8 L 1.3 h 34.806 -116.271 CDY VHZ 184 P IU0 6.66 1.73 RAG VHZ 1034 P IU1 17.58 3.19 RAG VHZ 1034 S E 2 17.58 5.77 RMM VHZ 647 P ID1 37.23 6.33 TPM VHZ 1033 P IU1 52.29 8.88 GRP VHZ 355 P IU1 61.21 10.14 0299 C 65535 P IU1 11395.79 7.39 2000/08/01 15:04:17.9 L 1.4 h 34.810 -116.268 CDY VHZ 184 P IU0 6.76 1.73 RAG VHZ 1034 P IU1 17.76 3.16 etc.
CHAPTER 4. FORTRAN
7.3 A 9159025
79
7.1 A 9159030
19
83
There is a line for each earthquake with the date and time (UT not local time). For the rst line in the above example, the additional bits include: L = local event 1.1 = magnitude c = how magnitude was computed (coda in this case) 35.962 = lat -117.693 = lon 5.2 = depth (km) A = SCSN assigned quality (A=best) 9159024 = cusp id number 8 = number of lines of phase arrival time info The station info lines include: WRC = station name VHZ = station channel 911 = ? P = phase name (normally P or S) IU1 = phase pick info 4.41 = station-event distance (km) 1.26 = travel time (s from event origin time)
(=X for assignment) (=T for assignment)
Your task is to write a F90 program to read this le and write to an output le a series of (X,T) points with (one X-T pair per line) for all P arrivals for all events between 3 and 8 km depth. Then make an X-Y plot of the T-X points using whatever plotting method you are most familiar with. That is, plot time on the y-axis and distance on the x-axis and plot the coordinates of each point. E-mail me a copy of the F90 program source code and a PDF le of the X-Y plot. The good news about this data format is that it does include a number in the event line that lists the number of phase lines that will follow. The bad news is that sometimes the last 1 or 2 phase lines are garbage in that they have numbers
4.17. EXAMPLE SORTING ROUTINE FROM NUMERICAL RECIPES
105
instead of station names and are some kind of calibration info. You will have to gure out how to recognize and discard these lines in your program. Also the phase lines begin with a tab character rather than a long series of blanks; this may complicate things somewhat. Hint: Dont try to write your entire program all at once. First, get a version working that simply reads the input le and prints out the parts of each line that you will need (i.e., the depth and number of phase lines for the event line, the phase name and the X and P values for the phase lines). Once you have this working correctly, then go on to add the part to output the (X,T) points.
4.17
Example sorting routine from Numerical Recipes
The book Numerical Recipes contains a large number of useful subroutines (there are both Fortran and C versions) that are fully explained in the text. You can nd F77 source code (1st edition of NR) for these routines in: ~shearer/PROG/SUBS/NUMRECIP To illustrate the use of the Numerical Recipes routines, here is an example program that generates a list of random numbers, sorts them using the NR function piksrt, and then prints out the sorted list. program sortrand implicit none integer, parameter :: NPTS=100 integer :: i real, dimension(npts) :: xran do i = 1, NPTS call random_number(xran(i)) enddo call PIKSRT(NPTS,xran) print "(10f7.4)", (xran(i), i=1,npts) end program sortrand The program sets the xran array to random numbers. It then calls the NR subroutine PIKSRT to sort the numbers before printing them out. We must either include the PIKSRT source code or link with a PIKSRT object code when we compile this program. Here is the PIKSRT source code:
106 SUBROUTINE PIKSRT(N,ARR) DIMENSION ARR(N) DO 12 J=2,N A=ARR(J) DO 11 I=J-1,1,-1 IF(ARR(I).LE.A)GO TO 10 ARR(I+1)=ARR(I) CONTINUE I=0 ARR(I+1)=A CONTINUE RETURN END
CHAPTER 4. FORTRAN
11 10 12
This ugly code is typical of old Fortran programs. It has all capital letters and has line numbers, go to statements, and uses continue statements in the do loops. Actually, its better than many examples, because the do loops are indented. The important thing for us is that the code workswe dont want to bother rewriting it and possibly breaking it. In general, F77 and older code will not compile under F90 because of old-style comment lines (C in column 1) and old-style line continuation ags (non-& character in column 6). This example, however, does not have these problems and is fully F90 compatible. Thus, we could just append the source code onto the end of our program. Alternatively, we could compile the subroutine under F90 as follows: f90 -c piksrt.f -o piksrt_f90.o which will create a piksrt f90.o le (replace f90 with gfortran if you are using gfortran). The f90 is my own convention to label f77 source code that has been compiled under f90 so that I dont try to link piksrt f90.o with a f77 program. Unfortunately, f77 and f90 object les are not always compatible (they are in this case, but it is dangerous to assume that they always will be, so it is safest to separately compile the source code in each case). f90 sortrand.f90 piksrt_f90.o -o sortrand or by using a Makele as discussed earlier. This would be our best choice if the subroutine source code is lengthy and/or is non-F90 compatible. ASSIGNMENT F16
4.18. EXAMPLE OF SAVING VALUES IN A SUBROUTINE
107
Modify the sortrand program to return the median of a user-specied number of random numbers. Do the median computation in the form of a subroutine that returns the median value, but does NOT resort the input array in the process. The random number generation should be in your main program, not within the MEDIAN subroutine which should be a general purpose routine that simply inputs an array of numbers and returns the median value. Then whenever you need to compute a median of some numbers within a program, you can simply call the same subroutine, e.g., call MEDIAN(x, n, xmed) !x=array, n=# of points, xmed=median
Hint: Within your median-computing subroutine, copy the input array to another array before calling PIKSRT. Make sure that your program gives the correct result for both even and odd numbers of points.
4.18
Example of saving values in a subroutine
A common convention in geophysics is to name an instrument site with a short ascii string. In seismology, for example, station names are usually designated with 3 to 4 character names. Southern California GPS sites are usually identied with 4 character names. Often data products are labeled by the station name, without including information about the station, such as its location. Data processing programs will often require this information. Thus, there is frequently a need for a computer routine that will retrieve information about a station, given the station name. One could hardwire this information into the program with a series of if statements, but this would become very cumbersome for large numbers of stations and would limit the exibility and portability of the code. A better approach is to save the station information in a le and have the computer read from the le to access the information when necessary. However, le I/O is relatively slow so we wont want to open, read, and close the le every time we need to determine a station location. It would be much faster to read the station information le once and then save the information during the time that the program is running. For portability, ideally all of this overhead would be performed in a function that we could use in many programs without having to worry about the format of the station le or the size of the array that are required to store it.
108
CHAPTER 4. FORTRAN In order to do this, we need to retain the values of certain variables within the
function for subsequent calls. Here is an example that will return station coordinates for some of the GSN (Global Seismic Network) stations. program testgetstat implicit none character (len = 4) :: stname real :: slat, slon, selev do print *, Enter station name (stop to stop) read (*,(a)) stname if (stname == stop) exit call GET_STAT(stname,slat,slon,selev) print *, slat,slon,selev = ,slat,slon,selev enddo end program testgetstat ! subroutine GET_STAT gets the lat/lon/elev of a station ! with a given name from the GSN station list ! ! Inputs: snam = station name (a4) ! Returns: flat = station latitude ! flon = station longitude ! felev = station elevation (m) ! NOTES: Station list must be alphabetized ! If station name is not found, then returned ! variables are set to -999. ! Program reads and saves station info on first call ! subroutine GET_STAT(snam,flat,flon,felev) implicit none integer, parameter :: NMAX=5000 character (len = 4) :: snam character (len = 4), dimension(NMAX) :: stname real, dimension(NMAX) :: slat, slon, selev real :: flat, flon, felev integer :: i, i1, i2, nsta, it logical :: firstcall = .true. save firstcall, stname, slat, slon, selev, nsta if (firstcall) then firstcall = .false. open (11, file=stlist.gsn, status=old) print *, Reading station file: stlist.gsn do i=1,NMAX read (11,7,end=12) stname(i), slat(i), slon(i), selev(i) format (a4, f10.5, f11.5, f5.0) enddo
4.18. EXAMPLE OF SAVING VALUES IN A SUBROUTINE
109
print *,***Warning: number of stations in file may exceed ,NMAX i = NMAX + 1 12 nsta = i-1 close (11) print *,Number of stations read = ,nsta end if i1 = 1 i2 = nsta do it = 1, 15 i = (i1+i2)/2 if (snam == stname(i)) then flat = slat(i) flon = slon(i) felev = selev(i) return else if (snam < stname(i)) then i2 = i-1 else i1 = i+1 end if enddo print *, ***station not found , snam flat = -999. flon = -999. felev = -999. end subroutine GET_STAT The main program is a short driver program that enables us to test that the GET STAT subroutine is working properly. It is always a good idea to use little programs like this to test functions that are being developed, BEFORE using the functions in larger programs. For eciency, the subroutine reads the station le only the rst time the subroutine is called. It then saves the information in memory to use for subsequent calls. Normally, the values of variables in a subroutine are not retained between calls. The save command is used to specify those variables that are to be saved between calls. This is our rst example of a logical variable: logical :: firstcall = .true. rstcall can have two possible values: .true. or .false. Here we set it to .true. the rst time (and only the rst time) that the subroutine is called. The save statement species which variable values are to be saved between subroutine calls:
110
CHAPTER 4. FORTRAN save firstcall, stname, slat, slon, selev, nsta To read the station le upon the rst call to the subroutine, we write if (firstcall) then
Notice that a logical variable can be the sole argument for an if test. We then set rstcall = .false. so that this block of code will not be excuting upon subsequent calls to the subroutine. The station le has the form: AAE 9.02920 38.76560 2442 AAK 42.63900 74.49400 1645 ABKT 37.93040 58.11890 678 ADK 51.88370 -176.68440 116 AFI -13.90930 -171.77730 706 ALE 82.50330 -62.35000 60 ALQ 34.94620 -106.45670 1840 ANMO 34.94620 -106.45670 1840 ANTO 39.86890 32.79359 883 AQU 42.35389 13.40500 720 . . In routines like this, I like to print out information about the le that is being read during the rst call. This reminds the user of the program about what is being done and it helpful in case there is a problem or error later in the program. For example, perhaps we have a faulty station le that has only 1 station. Or perhaps the rstcall ag is not working properly, in which case these output lines will result each time we call the routine, not just the rst time. For speed, the subroutine does not simply loop through the station list until it nds a match. Instead, it exploits that fact that the stations are in alphabetical order. i1 and i2 are indices that are designed to bracket the target station in the list. Initially, they are set to 1 and nsta, the rst and last station indices. Next, i is set to a point halfway between i1 and i2. The station in the station list at i is compared to the target station. If they are the same, then we can set the return variables. If the stlist station is greater than the target station (later in the alphabet), then i2 is set to i-1. If the stlist station is less than the target station, then i1 is set to i+1. We iterate using this procedure 15 times to narrow in on the station name. (2e15 must be greater than NMAX to make sure we do enough iterations) Note that
4.19. COMPLEX NUMBERS else if (snam < stname(i)) then is a valid check to see if string variables are in alphabetical order.
111
If we dont nd the target stname in the station list, then we print out a warning message. We also set the station coordinates to numbers that are unlikely to be confused with real station locations in case the user of the function does not notice the warning message (or chooses to ignore it!). ASSIGNMENT F17 Modify GET STAT to nd the nearest station to a given (lat, lon) point and return the station name and its coordinates. You can get the stlist.gsn le from http://igppweb.ucsd.edu/ shearer/SIO233/. Name your new function NEAR STAT and also include a testnearstat driver program to test the operation of the function. To nd the nearest station, you will need to be able to compute the distance between two points on a sphere. Use the SPH AZI from the notes or the SPH DIST subroutine from shearer/SUBS/sphere subs.o Dont bother to do the separate calculation for small values of del. The NEAR STAT suboutine should be of the form: subroutine NEAR_STAT(plat, plon, stname, slat, slon, sdep) where plat and plon are the coordinates going into the subroutine and stname, slat, slon and sdep are passed back from the function to the main program.
4.19
Complex numbers
A nice feature of Fortran is that complex numbers are a standard feature. Here is a program to show how they can be declared and used: program testcomplex implicit none complex :: a, b, c print *, Enter first complex number read *, a print *, Enter second complex number read *, b c = a*b print *, Product = , c print *, abs of product = , abs(c) print *, sqrt of product = , sqrt(c) end program testcomplex
112 Here is an example of running the program: rock% testcomplex Enter first complex number (1, 1) Enter second complex number (-.5, .1) Product = (-0.6,-0.4) abs of product = 0.7211103 sqrt of product = (0.2460795,-0.81274545)
CHAPTER 4. FORTRAN
Complex numbers are represented as a pair of numbers in parenthesis separated by a comma (rst number is the real part, second number is the imaginary part). This is the only format that will work for the free format read. If you try to enter or or even 1, 1 1 1 (1 1)
you will get an error. Notice that in F90 you dont need to use special forms for the functions abs and sqrt (e.g., cabs and csqrt as you will see sometimes in older code). There are conversion functions to build complex numbers from two real numbers and vice versa, as shown in this example code: program testcomplex2 implicit none complex :: a real :: ar, ai print *, Enter real part read *, ar print *, Enter imaginary part read *, ai a = cmplx(ar, ai) print *, a = ,a a = exp(a) print *, Exp(a) = , a ar = real(a) ai = aimag(a) print *, Real part = , ar print *, Imag part = , ai end program testcomplex2 and its output rock% testcomplex2 Enter real part
4.20. ARRAY OPERATIONS IN F90 1 Enter imaginary part 2 a = (1.0,2.0) Exp(a) = (-1.1312044,2.4717266) Real part = -1.1312044 Imag part = 2.4717266
113
ASSIGNMENT F18 Write a program to test the built in complex exponential function against the formula e^z = e^(x+iy) = e^x e^(iy) = (e^x cos y) + i (e^x sin y) which you will have to adapt for your program. Include a driver program to test your function for selected input values. Make sure that your code gets the correct answer for these examples: exp( 1.1 + 2.3i) = -2.002 + 2.240i exp( 0.0 + 1.2i) = 0.362 + 0.932i exp(-0.5 + 0.0i) = 0.607 + 0.000i Can you nd any dierences between the built in exp function are the results obtained with the formula?
4.20
Array operations in F90
F90 allows many operations on vectors and matrices to be performed in single statements without the necessity of writing do loops over the array indices (as was necessary in F77). Here is an example program that demonstrates some of these operations on vectors. program vectormath implicit none real, dimension(5) :: a = (/ 1.0, 2.0, 3.0, 4.0, 5.0 /), b = 2.0, c real :: x print c = a print c = 2 print c = a *, a = , a + 1 *, a + 1 = , c * a *, 2 * a = , c * a
114
CHAPTER 4. FORTRAN print *, a * a = , c c = sqrt(a) print *, sqrt(a) = , c c = sin(a) print *, sin(a) = , c c = exp(a) print *, exp(a) = , c print *, b = , b c = a + b print *, a + b = , c c = a * b print *, a * b = , c x = sum(a) print *, sum(a) = , x c = a c(4:5) = 0.0 print *, a with two zeros on end = , c x = dot_product(a, b) print *, a dot b = , x x = sum( (a - sum(a)/5. )**2) print *, sum of squares of difference from mean = , x
end program vectormath Note that even fairly complicated expressions are possible provided the programmer keeps track of what is a vector and what is a scalar. Operations are also possible on matrices. Here are some examples: program matrixmath implicit none real, dimension(2, 3) :: a23, b23, c23 real, dimension(3, 2) :: a32, b32, c32 real, dimension(2,2) :: a22, b22, c22 integer, dimension(2) :: loc2 integer :: i, j, k a23(1,1:3) = (/ -5.1, 3.8, 4.2 /) a23(2,1:3) = (/ 9.7, 1.3, -1.3 /) print *, Matrix a23 follows call PRINTMAT(a23, 2, 3) b32(1:3, 1) = (/ 9.4, -6.2, 0.5 /) b32(1:3, 2) = (/ -5.1, 3.3, -2.2 /) print *, Matrix b32 follows call PRINTMAT(b32, 3, 2) c22 = matmul(a23, b32) !this works but matmul(b32, a23) does not
print *, "Matrix c22 = matmul(a,b) follows" call PRINTMAT(c22, 2, 2)
4.20. ARRAY OPERATIONS IN F90 print *, maxval(a23) = , maxval(a23) print *, maxloc(a23) = , maxloc(a23) loc2 = maxloc(a23) print *, loc2 = , loc2 print *, a23(loc2(1), loc2(2)) = , a23(loc2(1), loc2(2)) b23 = a23 + transpose(b32) print *, Matrix b23 = a32 + transpose(b32) follows call PRINTMAT(b23, 2, 3) print *, size(a23) = , size(a23) print *, size(a23(1,:)) = , size(a23(1,:)) contains subroutine PRINTMAT(x, m, n) real, dimension(m, n) :: x integer :: m, n do i = 1, m print "(10f8.3)", (x(i,j), j=1,n) enddo end subroutine PRINTMAT end program matrixmath
115
This demonstrates the intrinsic functions matmul, maxval, maxloc, transpose, and size. Note that matmul(a,b) is true matrix multiplication, not the multiplication of the individual array elements as occurs from writing simply a*b. The maxloc function returns the location of the maximum value in the array. Note that the statement loc2 = maxloc(a23) works only if loc2 is dened as an integer array with dimension(2). NOTE: If maxloc is applied to a one-dimensional array, for example to nd the index of the maximum value of a vector bvec: loc = maxloc(a) Note that loc must be dened as an integer array with dimension(1). You will get an error message during compilation if loc is dened simply as an integer. There are corresponding functions, minval and minloc, that determine the minimum value and location within an array.
116
CHAPTER 4. FORTRAN It would be interesting to test whether programs using the intrinsic array func-
tions in F90 run any faster than those written using do loops.
4.21
Allocatable arrays
An awkward aspect of older Fortran programs is the need to declare arrays to the maximum possible size that could ever be needed when running the program. This memory must be set aside even when it is not neededa wasteful practice. F90 avoids this by allowing memory to be allocated and deallocated on the y. Here is an example program: program setarray implicit none real, dimension(:), allocatable :: x real, dimension(:, :), allocatable :: y, yy real :: xsum integer :: n, i, j, m print *, Enter number of points read *, n allocate (x(n)) do i = 1, n call random_number(x(i)) enddo xsum = sum(x) print *, sum = , xsum deallocate(x) !this frees up memory print *, Enter m, n (# rows, # columns) read *, m, n allocate (y(m,n)) allocate (yy(n,n)) do i = 1, m print *, Enter row # , i read *, (y(i, j), j = 1, n) enddo yy = matmul(y, transpose(y)) print *, y * yt = call PRINTMAT(yy, m, m) deallocate(y) deallocate(yy) contains subroutine PRINTMAT(x, m, n) real, dimension(m, n) :: x
4.22. STRUCTURES IN F90 integer :: m, n do i = 1, m print "(10f8.3)", (x(i,j), j=1,n) enddo end subroutine PRINTMAT end program setarray
117
4.22
Structures in F90
It is often convenient to have a user-dened array of data elements that may, or may not, be of the same data type. This is called a Structure in C and a Derived Data Type in F90. This is a new feature that was not included in F77. Suppose, for example, we wanted to set up a data base containing information (name, age, height, weight) about 3 dierent people. Here is a program that uses a structure to read in this information and print the name of the lightest person: program namebase implicit none integer, parameter :: nmax=3 type person character (len=20) :: name integer :: age real :: height, weight end type person type(person), dimension(nmax) :: student integer :: i real :: weightmin do i = 1, nmax print *, Enter first name, age, height, weight (e.g., Bob 27 69 183) read *, student(i)%name, & student(i)%age, & student(i)%height, & student(i)%weight enddo print *, Lightest student = , student(minloc(student%weight))%name print *, Oldest student = , student(maxloc(student%age))%name end program namebase The statements: type person character (len=20) :: name
118 integer :: age real :: height, weight end type person
CHAPTER 4. FORTRAN
create a dened type that can be used to name a variable later in the program. The contents are a character string (of length 20), an integer for the age, and real number for the height and weight. These contents are termed the members or components of the structure. In this example, the dened type is given the name person which is NOT a variable name. Rather it is a name that can later to used to dene the actual variable name(s) that will have this structure. The next line creates the array student that has this structure: type(person), dimension(nmax) :: student In this line, type(person) acts just like the integer and real statements that we are used to. It can be used to dene more than one variable name and these variables do not need to be arrays. Here is another example of this: type(person), dimension(nmax) :: grads, undergrads type(person) :: teacher which sets up structure arrays for grads and undergrads and a single structure for teacher. The members of the structure are referenced by appending %member to the name, where member refers to the specic member dened in the structure. So, for example, student(1)%age is the age of student(1). Note that the array index goes BEFORE the %member, not after. The % operator is also sometimes called the component selector. Note the nifty use of the minloc and maxloc functions, avoiding the need to write do loops. This example does not really make obvious the advantages of a structure because we could easily have maintained separate arrays for name, age, height and weight. A clearer advantage of the structure approach is that we can reassign all of the members with a single statement, i.e., we can say: student(3) = student(2) rather than having to say:
4.23. WRITING FAST PROGRAMS student(3)%name = student(2)%name student(3)%age = student(2)%age etc. However, we cannot compare structures as in the statement: if (student(3) == student(2) ) then ! ILLEGAL!!
119
4.23
Writing fast programs
Modern computers are amazingly fast compared to those from 10 to 20 years ago and thus even inecient code will often run fast enough. But any code that takes more than a few seconds to run is potentially more useful if it could run faster. So its always good to think about ways to make code more ecient. In most cases, there is one key part of the program that takes most of the time. The rst step is to identify that part. If its not obvious by inspection, then one can add print statements that keep track of how long the dierent parts take to run. In F90, this can be done using the builtin function system clock, as illustrated in this example program that tests how long dierent types of math take: program testspeed implicit none integer :: n, i, count1, count2, count_rate, k=2, k2 real :: dt, x=1.2, x2 print *, Enter number of operations read *, n ! test integer math call system_clock(count1, count_rate) k2 = 5 do i = 1, n k = k + k2 k = k - k2 enddo call system_clock(count2, count_rate) dt = real(count2-count1)/real(count_rate) print *, integer +-, dt = , dt call system_clock(count1, count_rate) k2 = 5 do i = 1, n k = k*k2 k = k/k2 enddo
120 call system_clock(count2, count_rate) dt = real(count2-count1)/real(count_rate) print *, integer */, dt = , dt ! test real math call system_clock(count1, count_rate) x2 = 3.14159 do i = 1, n x = x + x2 x = x - x2 enddo call system_clock(count2, count_rate) dt = real(count2-count1)/real(count_rate) print *, real +-, dt = , dt call system_clock(count1, count_rate) x2 = 3.14159 do i = 1, n x = x*x2 x = x/x2 enddo call system_clock(count2, count_rate) dt = real(count2-count1)/real(count_rate) print *, real */, dt = , dt ! test trig functions call system_clock(count1, count_rate) x2 = 5.14159 do i = 1, n x = sin(x2) x = cos(x2) enddo call system_clock(count2, count_rate) dt = real(count2-count1)/real(count_rate) print *, sin/cos, dt = , dt
CHAPTER 4. FORTRAN
! test if statement call system_clock(count1, count_rate) k2 = 5 do i = 1, n k = k + k2 k = k - k2 if (k > 2) k = k - k2 enddo call system_clock(count2, count_rate) dt = real(count2-count1)/real(count_rate) print *, integer +- with if statement, dt = , dt end program testspeed
On my Macbook Air (2 GHz Intel Core i7), this produces:
4.23. WRITING FAST PROGRAMS shearer@khan 84> ./testspeed Enter number of operations 99999999 integer +-, dt = 0.44499999 integer */, dt = 1.1700000 real +-, dt = 0.57099998 real */, dt = 0.95999998 sin/cos, dt = 1.7980000 integer +- with if statement, dt =
121
0.44499999
Before we continue, we should stop for a minute to be amazed. My little laptop is doing 200 million operations in about a second or less! Somewhat surprisingly, real arithmetic is comparable to integer arithmetic and sometimes even faster. Even more surprisingly, the trig functions only take about a factor of two longer. And it appears that if statements take hardly any time at all! Thus much of the advice that I was prepared to give you (try to use integers instead of reals, avoid computing trig functions and using if statements inside do loops) is not accurate. Some really smart people must be optimizing computer chips and the F90 compiler. But its still worth knowing how to identify the slow part of your code, so that you can devise ways to make it run faster.
4.23.1
The -O option
But there is another way to make your code run faster and its always surprising to me how many students dont know about it. Most modern computer compilers have an option to optimize code to make it run faster. In gfortran, this is done using the -O option, i.e., gfortran -O testspeed.f90 -o testspeed Here is what I get for this program after optimization: shearer@khan 86> ./testspeed Enter number of operations 99999999 integer +-, dt = 3.50000001E-02 integer */, dt = 0.22499999 real +-, dt = 0.19100000 real */, dt = 0.63099998 sin/cos, dt = 3.20000015E-02 integer +- with if statement, dt =
0.12400000
122
CHAPTER 4. FORTRAN The integer add/subtract is 13 times faster, the integer mult/divide is 5 times
faster, the real add/subtract is 3 times faster, the real mult/divide is 1.5 times faster, and the sin/cos is an astonishing 560 times faster1 . Since -O makes code run faster, why not use it all the time? It takes longer to compile programs using -O and the executables are not as stable, i.e., your program is more likely to crash or give the wrong answer (!). Thus, I recommend that you debug your programs without using -O, but then once they are stable and working, use -O to improve their speed (if necessary). ASSIGNMENT F19 Compile and run the program testspeed.f90, compiled both with and without the -O option. Send me the output results for 99,999,999 operations, along with the details of your computer (i.e., name, type, chip, clock speed). For example, on Macs go to About This Mac under the Apple menu and send me the processor details.
4.24
Fast I/O in Fortran
Unless you are a serious number cruncher you are likely to nd that input/output operations (reading and writing to le) take more time than any arithmetic operations. FORTRAN has generally not been considered to be as exible as C in the way that it handles i/o operations. However, with a little deviousness, it is possible to write FORTRAN programs that are capable of extremely fast i/o. The key to making FORTRAN do fast i/o is to use unformatted binary read/write operations. Suppose we have a 100000 10 real array that we wish to store on disk. We could output this simply as an ascii le: program testio1 implicit none real, dimension(100000,10) :: a = 1.1 integer :: i, j, count1, count2, count_rate real :: dt open (12, file=out.testio1) call system_clock(count1, count_rate) do i = 1, 100000 write (12, (10e12.4)) (a(i,j), j=1,10) enddo
1
Does it recognize that we are computing the same thing? I suspect so!
4.24. FAST I/O IN FORTRAN call system_clock(count2, count_rate) dt = real(count2-count1)/real(count_rate) print *, dt = ,dt close (12) end program testio1
123
Notice the use of the intrinsic function, system clock, to monitor how long the output takes. On my laptop (Macbook Air, 2 GHz Intel Core i7), this takes about 0.7 s and creates a 12.1 Mb le. Next, we could output this as a binary le using similar loops over the indices: open (12, file=out.testio2, form=unformatted) do i = 1, 100000 write (12) (a(i,j), j=1,10) enddo close (12) On my laptop, this takes about 0.067 s (10 times faster) and generates a 4.8 Mb le. A further improvement occurs when we store the entire array directly: open (12,file=out.testio3,form=unformatted) write (12) a close (12) On my laptop, this generates a 4.1 Mb le in about 0.006 s. The total speed improvement compared to ASCII is about a factor of 120. Similar speed improvements can be noted upon reading these les. The message is that for speedy i/o of large data sets in FORTRAN, we should read/write large arrays in blocks. This is how Guys GFS programs work and my routines to store travel time data. Of course, it is not always convenient to refer to data within a big array. To make it easier to keep track of things, it is often a good idea to dene a data type (structure). For example, suppose our array actually consists of 10000 lines of data with 10 values for each line, consisting, for example, of a 4 character station name, coordinates for the station, and date/time information. In this case, we can dene a suitable data type and then create an array of these data types. type station_info character (len=4) :: stname real :: slat, slon, selev, sec
124
CHAPTER 4. FORTRAN integer :: iyr, imon,iday, ihr, imn end type station_info type(station_info), dimension(10000) :: stlist We can now perform i/o very simply using the stlist array, while preserving
the ability to access its individual parts. For example, stlist(5633)%stname is the station name of the 5633th line of data, stlist(232)%slat is the latitude of the 232th line, etc. F77 does not allow for structures, but a kludge is possible using the equivalence statement. For example, we could we dene a small array aa with 10 elements and set these elements to be equal to the individual variable names that we want: real aa(10) common/com1/stname,slat,slon,selev,iyr, & imon,iday,ihr,imn,sec equivalence (aa(1),stname) The equivalence statement sets the values of aa to the same place in memory as the 10 variable names dened in the common block. The beauty of this is that these variables can be of mixed data types. In order to examine the record 523, we just set aa to the appropriate part of a: i=523 do 50 j=1,10 aa(j)=a(i,j) continue
50
The stname, slat, etc., are now set. This trick is used in Guys GFS routines and in many of my older I/O routines.
4.24.1
Ascii versus binary les
There is a tradeo between speed and convenience for ascii versus binary les. ascii is easier to work with and to understand and is often the preferred format when le size or speed is not a problem. However, for large data sets and data processing, binary is often better because it permits much faster I/O and the les will generally be smaller than their ascii counterparts. You are also assured of retaining the full machine precision for numerical values (you dont risk deciding later that you really should have stored 4 places to the right of the decimal, not 3). However, binary formats are somewhat less portable then ascii and require more user knowledge about their format.
4.24. FAST I/O IN FORTRAN ASSIGNMENT F20
125
Compile and run the program testio.f90 on your computer and send me the timing results for the three dierent ways to perform the I/O, along with the details of your computer (name, type, chip, clock speed). Then try it again after compiling with the -O option. Does -O help with I/O speed?
126
CHAPTER 4. FORTRAN
Chapter 5
Fun programs
Lets be honestthe examples in most programming books are pretty boring. Devising ecient sorting algorithms may be of academic interest, but these programs are pretty far from the science that most of us do. But the only way to get good at programming is to write lots of programs. It doesnt really matter what they do, as long as they do something. So to help us stay motivated, this chapter contains examples of programs that are fun to write and fun to run.
5.1
Tic-tac-toe
...nearly all the people I take down there have precisely the same response to the prospect of playing ticktacktoe with a chicken. After looking the situation over, they say, The chicken gets to go rst! But shes a chicken, I say. Youre a human being. Surely there should be some advantage in that. Some of my guests, I always report with some embarrassment, dont stop there. Some of them say, The chicken plays every day. I havent played in years. Calvin Trillin, The New Yorker, Feb. 8, 1999
As a goal for some of our F90 programming exercises, we are going to write a program to play tic-tac-toe. Although tic-tac-toe is a fully solved game in the sense that it is well known that a draw always results if both players play optimally (indeed even chickens have been trained to play the game), it is of sucient complexity that writing a working program may seem a daunting task to the beginning programmer. 127
128
CHAPTER 5. FUN PROGRAMS
As is usually the case, however, the problem can be made more tractable by splitting the task up into smaller pieces. Let us consider how we are going to approach this problem. Clearly we will need a way to store the current state of the board (who has moved where), some kind of ag for who is x and who is oh, some kind of ag for who goes rst, etc. We will also need a way to display the board and to input moves from the human player. There are obviously many, many dierent ways to write this program. However, to make sure that our programs will readily be able to play each other, it will be good to agree on some standardization of the dierent components. First, let us assume that the spaces on the board are dened by the numbers 1-9, e.g., 1 | 2 | 3 ---+---+--4 | 5 | 6 ---+---+--7 | 8 | 9 Thus, when the program asks the human for a move, the human will enter a number from 1 to 9. For convenience, lets use the same convention for storing the moves within the program. Dene an array called board that will store the moves that have been made: integer, dimension(9) :: board Now, let us specify that board will have the value 0 if the space is still open, 1 if the human has moved in the space, and 2 if the computer has moved in the space. Obviously there are other choices that we could have made for this (1 for x, 2 for oh; 1 for 1st player, 2 for 2nd player), but this makes the most sense to me because the human vs. computer dierence is fundamental to the program. x vs. oh is simply a naming convention; 1st vs. 2nd will simply determine where the program starts. Let us begin by writing an external subroutine that will print to the screen the current state of the board. We will need to input to the subroutine the board variable and a variable to dene which player is x in the game. In other words, this subroutine might have the form: subroutine printboard(board, playerx)
5.1. TIC-TAC-TOE
129
We assume that playerx = 1 if the human is x, playerx = 2 if the computer is x. ASSIGNMENT TTT1 Write the external subroutine printboard, together with a test main program to check if it is working properly. Here are some examples: board = (/ 1, 2, 1, & 0, 2, 0, & 0, 0, 0 /) playerx = 1 printboard should produce: x | o | x ---+---+--| o | ---+---+--| | board = (/ 0, 0, 0, & 0, 2, 1, & 0, 0, 2 /) playerx = 2 printboard should produce: | | ---+---+--| x | o ---+---+--| | x Note that in these examples board is a 1-D array with nine elements. The & line continuation characters are used only to make the mapping to the 3x3 board more obvious. Make sure that you do not change any of the values in the board array or that of playerx. You only want to print out the current state of the board. Hint: Dene a character array cboard(9) within your subroutine, the elements of which you set to , x, or o, depending upon the contents of board and playerx. Then write a series of print statments to do the actual output.
Next, our program will need some way to determine if a proposed move is a valid move, i.e., if there is currently a blank space in the move location. To implement this, let us write a function of the form:
130
integer function checkmove(board, move) which will return 1 if the move is valid, 0 otherwise. Note that board is the array dened above and move is the proposed move. ASSIGNMENT TTT2 Write the function checkmove, together with a test main program to check if it is working properly. To make the function robust, return zero if move is outside of the range 1 to 9. Make sure you do not change any of the values in the board array or of the move variable. Also, do NOT print out anything within this function to warn of an invalid move. We may want to use this function for multiple purposes in the program and we dont necssarily want it to print anything out. If a warning is necessary, the calling program can print out the message.
Next, we need a function that will prompt the human for a move. It should check to see if the move is valid and prompt the human again in the cases of invalid moves. Lets call this function readmove: integer function readmove(board) which will return a valid move as entered by the keyboard. ASSIGNMENT TTT3 Write the external function readmove, together with a test main program to check if it is working properly. The function should ask the user to enter a move, then use the checkmove function to check whether it is valid. If it is valid, return the move to the main program. If it is invalid, then alert the user and prompt again for a valid move. Repeat until a valid move is entered. IMPORTANT: Do not change the board array within the readmove function. That should be done later in the main program.
Next, we need a function to determine if there is a current winner (3 in a row) or if the game is drawn (all spaces lled, no winner). This function will have the form: integer function checkwinner(board)
5.1. TIC-TAC-TOE
131
and will return 1 if player 1 has won, 2 if player 2 has won, 3 if the game is drawn, and 0 otherwise. ASSIGNMENT TTT4 Write the external function checkwinner, together with a test program to check if it is working properly. Make sure you do not change the board array. Hint: Dene an 2-D integer array that stores the locations of the eight possible ways to win, e.g., integer, dimension(8,3) win(1,1:3) = (/ 1, 2, 3 win(2,1:3) = (/ 4, 5, 6 win(3,1:3) = (/ 7, 8, 9 win(4,1:3) = (/ 1, 4, 7 . . etc. :: win /) /) /) /)
Then loop over these 8 possibilities and check to see if the 3 board values are all ones or all twos. If there is no winner, then check to see if all board locations are full (the draw criteria).
The most challenging part of your TTT program will be determining the best move for the computer to make. We will implement this as a function: integer function findmove(board) which will return the best move for the computer (player 2) to make. Testing this function will be easiest once the complete program is nished. To complete the program, however, one needs a working version of ndmove. To get around this problem, lets write ndmove in two stages. First, lets write a very stupid version that will simply return a valid move. Then, after we have the complete program working, we can make improvements to ndmove. ASSIGNMENT TTT5 Write a working version of ndmove, together with a test main program to check if it is working properly. Important: Do NOT change the board array within the subroutine; that should be done only within the main program. Your subroutine should not use any arguments other than board, that is playerx is not needed and
132
should not be used, nor should your ndmove subroutine need to know who is going rst. Of course, you can gure out if you are making the rst move within nd move simply by checking to see if the board is empty. Hint: Just loop through the moves 1 to 9 until you nd an empty space (board(i) = 0). For a slightly more interesting stupid program, pick a random valid move.
Now we are ready to put these pieces together into a main program that will actually play the game. This program should ask the user if he/she wants to go rst and if he/she wants x or oh (the latter is the playerx ag in the printboard function). The board array should be initialized to zero. For each move, the program will have to determine whose move it is (a function of the move number and who went rst) and then either prompt the user or use the ndmove function to pick a move. Following the move, the program should check for a win or draw. If there is a win or draw, a suitable message should be printed and the program stopped. If there is not a winner/draw, then the program should continue with the next move. ASSIGNMENT TTT6 Construct a working version of the complete tic-tac-toe program.
Finally, we will want to improve on our initial ndmove function to make the program smarter. An obvious strategy for this is to rst check to see if any winning moves are possible. If not, then next check to see if there is a winning move for the human that should be blocked. Beyond this, youre on your own! ASSIGNMENT TTT7 Rene your program to incorporate a smart ndmove function. Send me the complete program, including all functions in an e-mail. I will check the operation of your program and also attempt to have the programs play each other. To facilitate the latter goal, please give your ndmove function the name: function findmove_lastname(board) where lastname is your last name. This will make it easier for me to test your routines all at once. Please only include board in the argument list to ndmove this function should not need to know who is X or who went rst. Please also make sure that you do not modify the board array within ndmove.
5.2. FRACTALS
133
5.2
Fractals
Generating beautiful images of fractals, such as the Mandelbrot set, is surprisingly easy on the computer and provides good practice in using complex numbers. The basic idea behind most fractal calculations is that there are certain complex functions that, when computed over and over again, will either diverge or remain bounded. Whether or not they diverge turns out to be extremely sensitive to small changes in the initial complex value that starts the calculation, at least in certain regions of the complex plane. The most famous of all fractal images is the Mandelbrot set. To generate the Mandelbrot set, we begin by considering a complex number, c. We then apply the following algorithm: set z = 0 to start then repeatedly compute z = z*z + c until |z| > 2 OR the number of iterations exceeds some threshold then output the number of iterations For example, if c = 0.3 + 0.3i then 1st 2nd 3rd 4th iteration: iteration: iteration: iteration: z z z z = 0.30 = 0.30 = 0.16 = -0.02 + + + + 0.30i 0.48i 0.59i 0.49i |z| |z| |z| |z| = = = = 0.42 0.57 0.61 0.49
In this case, z will remain bounded even after an innite number of iterations. However, if c = 0.5 + 1.0i then 1st 2nd 3rd 4th 5th iteration: iteration: iteration: iteration: iteration: z z z z z = = = = = 0.50 + 1.00i -0.25 + 2.00i -3.44 + 0.00i 12.32 + 1.00i 151.19 + 25.63i |z| |z| |z| |z| |z| = 1.1 = 2.0 = 3.4 = 12.4 = 153.4
and the size of z explodes toward innite values. By the 10th iteration it will exceed the oating point range of most computers. However, we can stop computing z once its absolute value (the complex absolute value or modulus is dened as the distance of a point in the complex plane from the origin, i.e. z = abs(a+bi) = sqrt(a*a+b*b) ) exceeds 2 because it can be shown that divergence is guaranteed at this point.
134
CHAPTER 5. FUN PROGRAMS We repeat this calulation for a series of dierent values of c and plot the resulting
output as a function of the location of c on the complex plane. The Mandelbrot set is the set of all complex numbers c for which the size of z 2 + c is nite even after an innite number of iterations. We may obtain a good approximation, however, by checking after some large number of iterations (1000 is a common choice). We also may take advantage of the fact that it is known that z 2 + c will diverge whenever |z| > 2. Here is an example F90 program that performs this calculation for user-specied parts of the complex plane and outputs a binary le that can be read and plotted using MatLab or Python: program mandel implicit none real, dimension(:), allocatable :: dat real :: x1, x2, y1, y2, dx, dy, cr, ci integer :: nx, ny, ix, iy, it, idat, ndat, nbytes, status integer, dimension(2) :: nxny complex :: c, z, z2 real, dimension(4) :: wind print *, Enter x1, x2, y1, y2, nx, ny read *, x1, x2, y1, y2, nx, ny dx = (x2 - x1) / real(nx) dy = (y2 - y1) / real(ny) allocate(dat(nx*ny)) idat=0 do ix = 1, nx do iy = 1, ny cr = x1 + dx/2. + dx*real(ix-1) ci = y1 + dy/2. + dy*real(iy-1) c = cmplx(cr, ci) z = cmplx(0.0, 0.0) do it = 1, 1000 z = c + z*z if (cabs(z) > 2) exit enddo idat = idat + 1 dat(idat) = it enddo enddo print *, Finished calculation ndat = idat print *, ndat = ,ndat
5.2. FRACTALS open (12, file=fractal.bin, form=unformatted) nxny(1) = nx nxny(2) = ny write (12) nxny wind(1) = x1 wind(2) = x2 wind(3) = y1 wind(4) = y2 write (12) wind write (12) dat end program mandel
135
The program computes the result for a grid of points in the complex plane, dened by the user-dened limits x1 and x2 for the real part and y1 and y2 for the complex part. The number of points computed in x and y are input as nx and ny. This will determine the resolution of the plot. For example, nx=50 and ny=50 will result in an image 50 pixels wide by 50 pixels tall. Although the image is really two-dimensional, I store the results in the 1-D array, dat, because this simplies the output of the results. Notice how allocate is used to set dat to dimension(nx*ny) on the y. The spacing in x and y are dened by: dx = (x2 - x1) / real(nx) dy = (y2 - y1) / real(ny) Note the conversion of the integer variables nx and ny to oats prior to performing the division. Here are the main loops that actually do the calculation: idat=0 do ix = 1, nx do iy = 1, ny cr = x1 + dx/2. + dx*real(ix-1) ci = y1 + dy/2. + dy*real(iy-1) c = cmplx(cr, ci) z = cmplx(0.0, 0.0) do it = 1, 1000 z = c + z*z if (cabs(z) > 2) exit enddo idat = idat + 1 dat(idat) = it enddo enddo
136
The real and imaginary parts of c are set to the center of the pixel with sides dx and dy. The calculation is done for a maximum of 1000 iterations. The calculation is halted if |z| > 2 by exiting the for loop. The iteration number is saved in the dat array, using idat as a counter. The next step is to save this information to a le. For speed in the case of a large dat array, we use a binary le. We create and open the le with open (12, file=fractal.bin, form=unformatted) We use form=unformatted to specify that this is a binary le, not an ascii le. Before writing the dat array to the le, we write some header information that will help MatLab or Python to know the size of the array and the values of x1, x2, y1, and y2. Writing to a binary le in Fortran is done using the write statement without a format. We set up the 2-element int array, nxny, to handle the output of nx and ny: nxny(1) = nx nxny(2) = ny write (12) nxny and we set up a 4-element oat array, wind, to handle the output of x1, etc. wind(1) = x1 wind(2) = x2 wind(3) = y1 wind(4) = y2 write (12) wind Finally, we write the output array to the le write (2) dat For this program, it is a good idea to compile with the -O compiler option to make it run faster: gfortran -O mandel.f90 -o mandel The Mandelbrot set is approximated in our program by places on the complex plane where the output value (contained in the dat array) is 1000, i.e., the z function has not diverged even after 1000 iterations. Output values less than 1000 provide a message of how quickly the function diverges (small values indicate rapid divergence, larger values indicate slower divergence). Plots of the output usually use colors to indicate the dierent output values.
5.2. FRACTALS
137
5.2.1
Plotting using MatLab
Here is how to plot our output using Matlab: clf reset clear [fid, message] = fopen(fractal.bin) % Fortran I/O has extra four bytes at beginning and end of writes [dum, count] = fread(fid, [1], int); [size, count] = fread(fid, [2], int); [dum, count] = fread(fid, [1], int); nx = size(1) ny = size(2) [dum, count] = fread(fid, [1], int); [wind, count] = fread(fid, [4], float) [dum, count] = fread(fid, [1], int); x1 x2 y1 y2 = = = = wind(1) wind(2) wind(3) wind(4)
[dum, count] = fread(fid, [1], int); [z, count] = fread(fid, [nx,ny], int); fclose(fid); z2 = log10(z); axis( [x1, x2, y1, y2] ); axis(square); hold on; imagesc( [x1, x2], [y1, y2], z2 ); colorbar; One detail is that Fortran (unlike C) writes an extra 4 bytes before and after each of the binary writes. So we need to read these extra btyes with our Matlab script into a dummy variable or we will not be properly aligned with the data elds. We set the output array to z and use imagesc to plot the result using the default color scheme. We plot log(z) rather than z because it shows some of the details a little better.
5.2.2
Plotting using Python
Here is one way to plot our output using Python, a script called plotfractal.py #! /usr/bin/env python import numpy import array
138 import matplotlib matplotlib.use("TkAgg") import pylab nxny = array.array(l) wind = array.array(f) dat = array.array(f) file = open(fractal.bin, mode=rb) b1 = nxny.fromfile(file, 4) nx = nxny[1] ny = nxny[2] print nx, ny = , nx, ny
CHAPTER 5. FUN PROGRAMS #set "backend" so plot appears on screen
#open in binary mode
#deprecated .read does same thing as .fromfile #note that nxny goes from 0 to 3, we exclude ends
b2 = wind.fromfile(file, 6) x1 = wind[1] x2 = wind[2] y1 = wind[3] y2 = wind[4] print x1,x2,y1,y2 =, x1, x2, y1, y2 b3 = dat.fromfile(file, 1 + nxny[1]*nxny[2]) print dat[1] to dat[4] = , dat[1], dat[2], dat[3], dat[4] dx = (x2 - x1)/nx dy = (y2 - y1)/ny x = numpy.arange(x1 + dx/2., x2, dx) y = numpy.arange(y1 + dy/2., y2, dy) X,Y = pylab.meshgrid(x,y) z = X + Y k = 0 for iy in range(0, ny): for ix in range(0, nx): k = k + 1 z[ix, iy] = dat[k] zlog = numpy.log10(z) #only goes to ny-1! #only goes to nx-1!
#plots look better if we take log
pylab.imshow(zlog,interpolation=bilinear,\ extent=(x1,x2,y1,y2), \ cmap=pylab.cm.Spectral) # cm is color map pylab.colorbar() pylab.axis(equal) pylab.show()
(nice choices: Accent, Spectral, Reds)
Because Im not a Python expert, there are probably better ways to do this, but
5.2. FRACTALS this is what I have gured out so far.
139
We follow Lisas example for plotting the gravity anomaly and import matplotlib. We also import array which helps in setting up the binary input format. The following statements set up the format for the three arrays in the le, where l is for reading long (4-btye) integers and f is for reading reals. nxny = array.array(l) wind = array.array(f) dat = array.array(f) The following statement opens up the binary le fractal.bin for reading in binary mode: file = open(fractal.bin, mode=rb) Next, we read into the nxny array: b1 = nxny.fromfile(file, 4) This array has 4 values, but the rst and last are discarded because Fortran writes an extra 4 bytes before and after every binary write. Because the array index goes from 0 to 3, we assign the middle two values as: nx = nxny[1] ny = nxny[2] In a similar manner, we read in and assign the wind array and the x1, x2, y1, y2 values: b2 x1 x2 y1 y2 = = = = = wind.fromfile(file, 6) wind[1] wind[2] wind[3] wind[4] #open in binary mode
Knowing the values of nx and ny, we read in the appropriate sized dat array: b3 = dat.fromfile(file, 1 + nxny[1]*nxny[2]) Note that b1, b2, and b3 are not used. This seems awkward to me, but as a Python novice Im not sure how else to do this. Next, we compute dx and dy and set up x,y and X,Y in the same way that Lisa did in her gravity example:
140 dx = (x2 - x1)/nx dy = (y2 - y1)/ny x = numpy.arange(x1 + dx/2., x2, dx) y = numpy.arange(y1 + dy/2., y2, dy) X,Y = pylab.meshgrid(x,y) Note that the last value of x will be x2 - dx/2.
Next, we need to assign the values in the dat array to a 2-D array aligned with X,Y. Following Lisas example, I rst assign z to dummy values in a way that ensures that it has the correct dimensions: z = X + Y Next, we explicitly copy the values from dat to the z array: k = 0 for iy in range(0, ny): for ix in range(0, nx): k = k + 1 z[ix, iy] = dat[k]
#only goes to ny-1! #only goes to nx-1!
The plot will look nicer if we take the log before plotting: zlog = numpy.log10(z) Finally we make the plot, again following Lisas example: pylab.imshow(zlog,interpolation=bilinear,\ extent=(x1,x2,y1,y2), \ cmap=pylab.cm.Spectral) # cm is color map pylab.colorbar() pylab.axis(equal) pylab.show()
(nice choices: Accent, Spectral, Reds)
Note that we have added the extent option to properly dene the axes labels. Its fun to explore the eect of dierent color maps in Python. Here is a site that shows lots of them: http://www.scipy.org/Cookbook/Matplotlib/Show colormaps
5.2.3
Good targets and more about fractals
The entire Mandelbrot set may be imaged in the window real -2 to 0.5, imag -1.25 to 1.25 (i.e., -2, 0.5, -1.25, 1.25)
5.2. FRACTALS
141
It is also fun to zoom in on details of the image. Here are some suggestions for places to look: -0.95 -0.90 0.25 0.30 -0.75 -0.65 0.25 0.35 -0.73 -0.72 0.282 0.292 -0.703 -0.693 0.26 0.27 Further details about generating fractals may be found in a series of articles by A.K. Dewdney in the Computer Recreations section that used to appear in Scientic American. The August 1985 article describes the Mandelbrot set, the November 1987 article talks about the related Julia sets, and the July 1989 article is about fractals called biomorphs that look like little creatures. There are also lots of books on fractals that have many beautiful images. The study of fractal is related to chaos theory and the fact that in physical systems (weather is one example) are inherently unpredictable on large time scales because small perturbations to the starting conditions will cause large changes over time. ASSIGNMENT FRAC1 Rewrite the mandel.c and plotfractal.m (or plotfractal.py) programs to plot examples of the Julia sets and the biomorphs. You will want to rst read the appropriate Scientic American articles. 1. Produce PDF les of 3 especially nice images of Julia sets. 2. Reproduce and make images of as many of the biomorphs as you can that are shown on p. 110 of the biomorph article. Part (2) is harder because you will have to generate more complicated complex functions than those that are used for the Mandelbrot and Julia sets. You may have to consult a book on complex numbers to nd suitable formulae to compute sin(z), z z , etc. Please e-mail me the PDF les and copies of your F90 and Matlab or Python source code. Have fun!
142
Chapter 6
Lisas Python Notes

6.1 Introducing Python
You may be wondering why, after spending over a month learning Fortran, we are now switching to Python. The answer is, once youve done all your fancy calculations in Fortran, how were you planning on looking at your data? Now you have experienced GMT, it is possible to do a lot of dierent plots with, say, psxy, but getting your data into a suitable format will require some manipulation and the plots are rather clunky. We at SIO often use Bob Parkers Plotxy program which produces lovely X-Y plots. Other options are Excel, or Kaleidograph or Matlab all of which are expensive (especially Matlab) and Excel plots are downright ugly. But the modern standard is to be able to plot in 3D and to make plots interactive. So, why Python? It is exible, freely available and cross platform. It is easy to learn and well documented. There are numerous online tutorials, for example at: http://docs.python.org/tutorial/ or this new (and very excellent) one: http://www.python-course.eu/index.php. There are lots of numerical, statistical and visualization packages. Finally, it is easy to install. Really. Go to: http://www.enthought.com/products/edudownload.php, supply your academic e-mail address and follow the instructions. The only potentially tricky bit is setting your path properly. To do this, just put the following line at the end of your .cshrc le: set path = (/Library/Frameworks/Python.framework/Versions/Current/bin/ $path) To make sure you have installed Python properly, simply type: % python You should get something that looks like this: 143
144
CHAPTER 6. LISAS PYTHON NOTES
Enthought Python Distribution -- www.enthought.com Version: 7.1-2 (32-bit) Python 2.7.2 |EPD 7.1-2 (32-bit)| (default, Jul 27 2011, 13:29:32) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "packages", "demo" or "enthought" for more information. >>> If you dont get something about Enthought Python, you are probably using the standard Mac Os version, which has none of the whistles and bells we want (plotting, numerical packages, etc.), so start over with the Introduction. Otherwise, we are ready to begin!
6.2
An Interactive Python session
As you have just red up Python, you are in an interactive mode with the prompt >>>. Like Matlab, you can just start typing commands. After each command, press the return key and see what happens: >>> >>> 2 >>> >>> >>> >>> >>> 5 >>> 4 >>> >>> a=2 a b=2 c=a+b c c+=1 c a=2; b=2; c=a+b; c d,e,f=4,5,6 # note syntax! d=4;e=5;f=6 # note also that the symbol # means the rest of the line is a comment!
To get out of Python interactive mode and back to your beloved command line, type the control key (here-after ) and D at the same time. PC users may have to type C instead. From your Fortran programming experience, you will recognize a, b, and c in the above session as variables and + as an operation. You may have been surprised that we didnt declare variables up front (C programmers always have to). And, the variables above are obviously behaving as integers, not oating point variables (no decimals) but they are not letters between i and n. And there were funny lines that seemed to set three numbers at once:
6.3. A FIRST LOOK AT NUMPY >>> a=2; b=2; c=a+b; c and >>> d,e,f=4,5,6 # note syntax! d=4;e=5;f=6 Here are some rules governing variables and operations in Python:
145
Variable names are composed of alphanumeric characters, including - and . They are case sensitive: a is not the same as A. They do NOT have to be specied in advance (unlike C) In fortran, integers are i n and oating points are all else - not the case in Python. You can make them whatever you want. + adds, - subtracts, * multiplies, / divides, % gives the remainder, ** raises to the power These two are fun: += and -=. They increment and decrement respectively. Parentheses determine order of operation (as in Fortran). For math functions, we can use various modules that either come standard with python (the math module) or are additions that come with the Enthought Python Edition we are using (the NumPy module). A module is just a collection of functions. NB: There is online help for any python function or method: just type help(FUNC).
6.3
A rst look at NumPy
O.K. First of all - how do you pronounce NumPy. According to Important People at Enthought (e.g, Robert Kern), it should be pronounced Num as in Number and Pie as in, well, pie, or Python. Oops. It is way more fun to say Numpee! So, that out of the way, what can NumPy do for us? Turns out, a whole heck of a lot! But for now we will just scratch the surface. It can, for example, give us the value of as numpy.pi. Note how the module name comes rst, then the name of the function we wish to use. In this case, the function just returns the value of . To use NumPy functions, we must rst import the module with the command import. The rst time you do this after installing Python, it may take a while, but after that it should be very quick.
146
CHAPTER 6. LISAS PYTHON NOTES There are four styles of the import command which all do pretty much the same
thing but dier in how you have to call the function after importing:
>>> import numpy >>> numpy.pi 3.1415926535897931
This makes all the functions in NumPy available to you, but you have to call them with the numpy.FUNC syntax.
>>> import numpy as np # or anything else! >>> np.pi # or N.pi in the second case. 3.1415926535897931
e.g., some use:
This does the same as the rst, but allows you to call NumPy anything you want - to save typing? To import all the functions from NumPy and not have to type numpy at all: >>> from numpy import * >>> pi 3.1415926535897931 This imports all the umpty-ump functions, which is a heavy load on your memory, but you can also just import a few, like pi or sqrt: >>> from numpy import pi, sqrt # pi and square root >>> pi 3.1415926535897931 >>> sqrt(4) 2.0 Did you notice how sqrt(4) where 4 was an integer, returned a oating point variable (2.0)? Good housekeeping Tip #1: I tend to import modules using the rst option above. That way I know what module the functions Im using are coming from especially because we dont know o-hand ALL the functions available in any given module and there might be conicts with my own function names or two dierent modules could have the same function (like math and numpy). Here is a (partial) list of some useful NumPy functions:
6.4. VARIABLE TYPES
147
absolute(x) absolute value arccos(x) arccosine arcsin(x) arcsine arctan(x) arctangent arctan2(y,x) arctangent of y/x in correct quadrant (***very useful!) cos(x) cosine cosh(x) hyperbolic cosine exp(x) exponential log(x) natural logarithm log10(x) base 10 log sin(x) sine sinh(x) hyperbolic sine sqrt(x) square root tan(x) tangent tanh(x) hyperbolic tangent Note that in the trigonometric functions, the argument is in RADIANS!.You can convert from degrees to radians by multiplying by: numpy.pi/180.. Also notice how these functions have parentheses, as opposed to numpy.pi which has none. The dierence is that these take arguments, while numpy.pi just returns the value of . If you are desperate, you can always use your Python interpreter as a calculator: >>> >>> >>> >>> >>> >>> 4 >>> >>> 0.5 import numpy a=2 b=-12 c=16 quad1 = (-b + numpy.sqrt(b**2-4.*a*c))/(2.*a) quad1 y=numpy.sin(numpy.pi/6.) y
6.4
Variable types
The time has come to talk about variable types. Weve been very relaxed up to now, because we dont have to declare them up front and we can often even change them from one type to another on the y. But - variable types matter, so here goes. Like Fortran, Python has integer, oating point (both long and short), string and complex variable types. It is pretty clever about guring out what is required. Here are some examples: >>> number=1 # an integer >>> Number=1.0 # a floating point
148
>>> NUMBER=1 # a string >>> complex=1j # a complex number with imaginary part 1 >>> complex(3,1) # the complex number 3+1i Try doing math with these! >>> number+number 2 [ an integer] >>> number+Number 2.0 [ a float] >>> NUMBER+NUMBER 11 [a string] >>> number+NUMBER [Gives you an angry error message] Lesson learned: you cant add a number and a string. and string addition is dierent! But you really have to be careful with this. If you multiply a oat by an integer, it is possible that you will convert the oat to an integer when you really wanted all those numbers after the decimal! So, if you want a oat, use a oat. You can convert from one type to another (if appropriate) with: int(Number); str(number); float(NUMBER); long(Number); complex(real,imag)
long() converts to a double precision oating point and complex() converts the two parts to a complex number. There is another kind of variable called boolean. These are: true, false, and, or, and not For the record, the integer 1 is true and 0 is false. These can be used to control the ow of the program as we shall learn later.
6.5
Data Structures
In Fortran, you encountered arrays, which was a nice way to group a sequence of numbers that belonged together. In Python of course we also have arrays, but we also have more complicated data structures, like lists, tuples, and dictionaries, that group arbitrary variables together, like strings and integers and oats - whatever you want really. Well go through some attributes of the various data structures, starting with lists.
6.5.1
Lists
Lists are denoted with [] and can contain any arbitrary set of items, including other lists!
6.5. DATA STRUCTURES
149
Items in the list are referred to by an index number, starting with 0. Note that this is dierent from Fortran which starts counting in arrays with the number 1. You can count from the end to the beginning by starting with -1 (the last item in the list), -2 (second to last), etc. Items can be sorted, deleted, inserted, sliced, counted, concatenated, replaced, added on, etc. Examples: >>> mylist=[a,2.0,400,spam,42,[24,2]] # defines a list >>> mylist[2] # refers to the third item 400 >>> mylist[-1] # refers to the last item [24,2] >>> mylist[1]=26.3 # replaces the second item >>> del mylist[3] # deletes the fourth element To slice out a chunk of the middle of a list: >>> newlist=mylist[1:3] This takes items 2 and 3 out (note it takes out up to but not including the last item number - dont ask me why). Or, we can slice it this way: >>> newlist=mylist[3:] which takes from the fourth item (starting from 0!) to the end. To copy a list BEWARE! You can make a copy - but it isnt a copy like in Fortran, but it is just another name for the SAME OBJECT, so: >>> mycopy=mylist >>>mylist[2]=new >>>mycopy[2] new See how mycopy got changed when we changed mylist? To spawn a new list that is a copy, but an independent entity: >>>mycopy=mylist[:] Now try: >>>mylist[2]=1003 >>>mycopy[2] new
150
So now mycopy stayed the way it was, even as mylist changed. Unlike Fortran, Python is object oriented, a popular concept in coding circles. Well learn more about what that means later, but for right now you can walk around feeling smug that you are learning an object oriented programming language. O.K., what is an object? Well, mylist is an object. Cool. What do objects have that might be handy? Objects have methods which allow you to do things to them. Methods have the form: object.method() Here are two examples: >>> mylist.append(me too) # appends a string to mylist >>> mylist.sort() # sorts the list alphanumerically >>> mylist [2.0, 42, [24, 2], 400, a, me too, spam] For a complete list of methods for lists, see: http://docs.python.org/tutorial/datastructures.html#moreon-lists
6.5.2
More about strings
Numbers are numbers. While there are more kinds of numbers (complex, etc.), strings can be more interesting. Unlike in Fortran, they can be denoted with single, double or triple quotes: e.g., spam, Sams spam, or """ Hi there I can type as many lines as I want """ Strings can be added together (newstring = spam + alot). They can be sliced (newerstring = newstring[0:3]). but they CANNOT be changed in place - you cant do this: newstring[0]=b. To nd more of the things you can and cannot do to strings, see: http://docs.python.org/tutorial/introduction.html#strings
6.5.3 6.5.4
Data structures as objects Tuples
What? Tuples? Tuples consist of a number of values separated by commas. They are denoted with parentheses. >>> t = 1234, 2.0, hello >>> t (1234, 2.0, hello) >>> t[0] 1234
151
Tuples are sort of like lists, but like strings, their elements cannot be changed. However, you can slice, concatenate, etc. For more see: http://docs.python.org/tutorial/datastructures.html#tuples-and-sequences
6.5.5
Dictionaries!
Dictionaries are denoted by {}. They are also somewhat like lists, but instead of integer indices, they use alphanumeric keys: I love dictionaries. So here is a bit more about them. To dene one: >>> Telnos={lisa:46084,lab:46531,jeff:44707} # defines a dictionary To return the value associated with a specic key: >>> Telnos[lisa] 46084 To change a key value: >>> Telnos[lisa]=46048 To add a new key value: >>> Telnos[newguy]=48888 Dictionaries also have some methods. One useful one is: >>> Telnos.keys() [lisa,lab,jeff,newguy] which returns a list of all the keys. For a more complete accounting of dictionaries, see: http://docs.python.org/tutorial/datastructures.html#dictionaries
6.5.6
N-dimensional arrays
Arrays in Python are more similar to arrays in Fortran than they are to lists. Unlike lists, arrays have to be all of the same data type (dtype), usually numbers (integers or oats), although there appears to be something called a character array. Also, the size and shape of an array must be known a priori and not determined on the y like lists. For example we can dene a list with L=[], then append to it as desired, but not so arrays - they are much pickier and well see how to set them up later. Why use arrays when you can use lists? They are far more ecient than lists particularly for things like matrix math. But just to make things a little confusing,
152
there are several dierent data objects that are loosely called arrays, e.g., arrays, character arrays and matrices. These are all subclasses of ndarray. Im just going to briey introduce arrays and matrices here. Here are a few ways of making arrays: >>> import numpy >>> A= numpy.array([[1,2,3],[4,2,0],[1,1,2]]) >>> A array([[1, 2, 3], [4, 2, 0], [1, 1, 2]]) >>> B=numpy.arange(0,10,1).reshape(2,5) >>> B array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) >>> C=numpy.array([[1,2,3],[4,5,6]],numpy.int32) >>> C array([[1, 2, 3], [4, 5, 6]]) >>> D=numpy.zeros((2,3)) # Notice the zeros and the size is specified by a tuple. >>> D array([[ 0., 0., 0.], [ 0., 0., 0.]]) >>> E=numpy.ones((2,4)) >>> E array([[ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]) >>> F=numpy.linspace(0,10,14) >>> F array([ 0. , 0.76923077, 1.53846154, 2.30769231, 3.07692308, 3.84615385, 4.61538462, 5.38461538, 6.15384615, 6.92307692, 7.69230769, 8.46153846, 9.23076923, 10. ]) >>> G=numpy.ndarray(shape=(2,2), dtype=float) >>> G # note how this is initalized with really low numbers (but not zeros). array([[ 1.90979621e-313, 2.75303490e-308], [ 1.08539798e-071, 3.05363949e-309]]) Note the dierence between linspace(start,stop,N) and arange(start,stop,step). The function linspace creates an array with 14 evenly spaced elements between the start and stop values while arange creates an array with elements at colorbluestep intervals between the starting and stopping values. In some of the online examples you will nd the short-cuts for arange() and linspace as r (-5,5,20j) and r (-5,5,1.) respectively. Python arrays have methods like dtype, ndim, shape, size, reshape(), ravel(), transpose() etc. Did you notice how some of these require parentheses and some
153
dont? The answer is that some of these are functions and some are classes, both of which we will get to later. Lets see what the methods can do. First, arrays made in the above example are of dierent data types. To nd out what data type an array is, just use the method dtype as in: >>> D.dtype dtype(float64) >>> And of course arrays, unlike lists have dimensions and shape. Dimensions tell us how many axes there are with axes dened as in this illustration:
a) A array: axis = 1 axis = 0 axis = 0 [[1, 2, 3], [4, 2, 0], [1, 1, 2]]
b)
ax
is
axis = 1
As shown above our A array has two dimensions (axis 0 and 1). To get Python to tell us this, we use the ndim method: >>> A= numpy.array([[1,2,3],[4,2,0],[1,1,2]]) # just to remind you >>> A.ndim 2 Notice how zeros, ones and ndarray used a shape tuple in order to dene the arrays in the examples above. The shape of an array is how many elements are along each axis. So, naturally we see that the C array is a 2x3 array. Python returns a tuple with the shape information using the shape method: >>> C.shape (2, 3) Lets say we dont want a 2x3 array for the sequence in the array C, but we want a 3x2 array. Python can reshape an array with a dierent shape tuple like this: >>> C.reshape((3,2)) array([[1, 2], [3, 4], [5, 6]])
154
CHAPTER 6. LISAS PYTHON NOTES And sometimes we just want all the elements lined up along one axis. We could
do that with reshape of course using a tuple with the size of the array (the total number of elements). You can see that this is 6 here. We could even get python to tell us what the size is (C.size) and use that in the reshape size tuple. Alternatively we can use the ravel() method which doesnt require us to know the size in advance: >>> C.ravel() array([1, 2, 3, 4, 5, 6]) There are other ways to reshape, slice and dice arrays. The syntax for slicing of arrays is similar to that for lists: >>> B=A[0:2] # carve the top two lines off of matrix A from above array([[1, 2, 3], [4, 5, 6]]) Lots of applications in Earth Science require the transpose of an array: >>> A.transpose() # this is the same as A.T array([[1, 4, 7], [2, 5, 8], [3, 6, 9]]) Also, we can concatenate two arrays together with the - you guessed it - concatenate() method. For a lot more tricks with arrays, go to the NumPy Reference website here: http://docs.scipy.org/doc/numpy/reference/. We promised to tell you about matrix objects, so here goes. A matrix is another subclass of ndarray with special advantages for particular applications. While arrays are intended to be general purpose n-dimensional arrays for numerical computing, matrices are better for linear algebra type problems: you can take the inverse and nd hermition easier (.I and .H methods) and matrix multiplication works like it does in Matlab for matrix objects, whereas arrays do element-wise computations (well learn more about this later). Check out this website for more dierences between the two: http://www.scipy.org/NumPy for Matlab Users. To convert the A array to a list: L=A.tolist(), from a list or tuple to an array: A=numpy.array(L), or from a list, a tuple or an array to a NumPy array: a=numpy.asarray(L))
6.6
Python Scripts
Are you tired of typing yet? Like UNIX shell scripts, Python scripts are programs that can be run and re-run from the command line. You can type in the same
6.6. PYTHON SCRIPTS
155
stu youve been doing interactively into a script le (ending in .py). You can edit scripts with: vi, TextWrangler, Xcode, emacs, etc. NOT Word! And then you can run them like this: %python < myscript.py Or you can put in a header line identifying the script as python (#!/usr/bin/env python), make it executable (chmod a+x), and run it like this: % myscript.py Here is a familiar example that creates a script using the UNIX cat command, makes it executable and then runs it: % cat > printmess.py #!/usr/bin/env python # simple Python test program (printmess.py) print test message ^-D % chmod a+x printmess.py % ./printmess.py test message In a Python script, the rst line MUST be: #! /usr/bin/env python so that the le is interpreted as Python. Unlike Fortran or C, you CANNOT start with a comment line (try switching lines 1 and 2 and see what happens). The second line is a comment line. Anything to the right of # is assumed to be a comment (Remember that in Fortran ! serves the same function). Notice that print goes by default to your screen. Because the message is a string, you can use single or double quotes for the test message. You can get an apostrophe in your output by using double quotes and quote marks by using single quotes, i.e., #! /usr/bin/env # simple Python print "The pump print She said produces: % ./printmess2.py The pump dont work cuz the vandals took the handles She said "I know what its like to be dead" % python test program 2 (printmess2.py) dont work cuz the vandals took the handles" "I know what it\s like to be dead"
156
In the second print statement, the \ is necessary to prevent an error (try it). This is an example of a Python escape code. These are used to escape some special meaning, as in an end-quote for a string in this example. We use the backslash to say that we really really want a quote mark here. Other escape codes are listed here: http://www.python-course.eu/variables.php Heres another example of a program - this one has an typo in line 4: #! /usr/bin/env python abeg = 2.1 aend = 3.9 adif = aend - abge print adif = , adif You had intended to type abeg but typed abge instead. When you run the program, you get an error message: Traceback (most recent call last): File "./undeclared.py", line 4, in <module> adif = aend - abge NameError: name abge is not defined Error messages are a desirable feature of Python. You dont want the program to run by assigning some arbitrary value to abge and giving you a wrong answer. Yet many languages will do exactly that, including Fortran (we can avoid this potential problem in Fortran by using the implicit none statement at the beginning our our programs). ASSIGNMENT P1 Write a Python script to print your favorite pithy phrase. e-mail your script to ltauxe@ucsd.edu
6.7
A rst look at code blocks
Any reasonable programming language must provide a way to group blocks of code together, to be executed under certain conditions. In Fortran, for example, there are if statements and do loops which are bounded by the statements if, endif and do, endo respectively. Many of these programs encourage the use of indentation to make the code more readable, but do not require it. In Python, indentation is the way that code blocks are dened - there are no terminating statements. Also, the initiating statement terminates in a colon. The trick is that all code indented
6.7. A FIRST LOOK AT CODE BLOCKS
157
the same number of spaces (or tabs) to the right belong together. The code block terminates when the next line is less indented. A typical Python program looks like this: program statement block 1 top statement: block 1 statement block 1 statement \ ha-ha i can break the indentation convention! block 1 statement block 2 top statement: block 2 statement block 2 statement block 3 top statement: block 3 statement block 3 statement block 4 top statement: block 4 single line of code block 2 statement block 2 statement block 1 statement block 1 statement program statement Exceptions to the code indentation rules are: Any statement can be continued on the next line with the continuation character \ and the indentation of the following line is arbitrary. If a code block consists of a single statement, then that may be placed on the same line as the colon. The command break breaks you out of the code block. Use with caution! There is a cheat that comes in handy when you are writing a complicated program and want to put in the code blocks but dont want them to DO anything yet: the command pass does nothing and can be used to stand in for a code block. Good housekeeping Tip #2: Always use only spaces or only tabs in your code indentation. I use only spaces because I use vi to write my code. Others use Xcode, the Python IDLE program, or TextWrangler to write their code and some of these things use tabs by default. Whatever you do BE CONSISTENT because tabs are not the same as spaces in Python even if you cant tell the dierence just by looking at it.
158
CHAPTER 6. LISAS PYTHON NOTES In the following, Ill show you how Python uses code blocks to create do and
while loops, and if statements.
6.7.1
The for loop
Here is an example of a for loop that is similar to the way you would do it in Fortran. : #!/usr/bin/env python mylist=[42,spam,ocelot] for i in range(0,len(mylist),1): # note absence of Indices list, start and step print mylist[i] print All done This script creates the list mylist with the line mylist=[42,spam,ocelot]. The length of mylist is an integer value returned by len(mylist). The script uses this integer as the stop value in the range() function, which returns a list of integers from 0 to the stop value MINUS ONE at intervals of one. [ The minus one convention is hard to get use to for Fortran programmers, but it is typical of Python syntax (and also of C) so just deal with it.] Anyway, range(start,stop,step) is just like numpy.arange(start,stop,step) but returns integers instead of oats. Also, like numpy.arange(), there is a short hand form when the minimum is zero and the interval is one, so we could (and will) just use the command range(stop). Python makes i step through the list of numbers from 0 to 2, printing the ith element of mylist. Note how the print command is indented - this is the program block that is executed for each i. Note also that the line could have been on the previous line after the colon, because there is only one line in the program block. But never-mind, this way works too and is more Fortran like. When i nishes its business, the program block terminates. At that point, the program prints out the All done string. There is no enddo statement or equivalent in Python. But, Python is far more fun than the Fortran-like for i in syntax in the above code snippet. In Python we can just step through a list directly. Here is another script which does just that (why not?): #!/usr/bin/env python mylist=[42,spam,ocelot] for item in mylist: # note absence of range statement print item print All done
159
Note that of course we could have used any variable name instead of item, but it makes sense to use variable names that mean what they do. It is easier to understand what item stands for than just the Fortran style of i. Here is an example with a little more heft to it. It creates a table of trigonometry functions, spitting them out with a formatted print statement: #! /usr/bin/env python import numpy as np deg2rad = np.pi/180. # remember conversion to radians for theta in range(90): # short form of range, returns [0,1,2...89] ctheta = np.cos(theta*deg2rad) # define ctheta as cosine of theta stheta = np.sin(theta*deg2rad)# define stheta as sine of theta ttheta = np.tan(theta*deg2rad) # define ttheta as tangent of theta print %5.1f %8.4f %8.4f %8.4f %(theta, ctheta, stheta, ttheta)
Lets pick this one apart a bit. First, notice the use of the variable deg2rad to convert from degrees to radians. Also notice how deg2rad is dened: deg2rad = np.pi/180. using the NumPy function for and the decimal point after 180. While in this case, it makes absolutely no dierence (try it!), it is a good practice to use real numbers if you want your variable to stay real. In fact: Good housekeeping Tip #3: Always use a decimal if you want your variable to be a oating point variable. The expression ctheta = np.cos(theta*deg2rad) uses the numpy cosine function. Ideally theta should be a real variable while in fact it is an integer in this expression, but fortunately Python gures that out and converts it to a real. Note that we could have also converted theta to a oat rst with the command oat(theta). print %5.1f %8.4f %8.4f %8.4f %(theta, ctheta, stheta, ttheta) To make the output look nice, we do not use print theta, ctheta, stheta, ttheta which would space the numbers irregularly among the columns and put out really long numbers. Instead, we explicitly specify the output format. The output format is given in the quotes. The format for each number follows the %, 5.1f is for 5 spaces of oating point output, with 1 space to the right of the decimal point (in Fortran this is f5.1). The single blank space between %5.1f and %8.4f is included in the output, in fact any text there is reproduced exactly in the output, thus to put commas between the output numbers, write:
160
print %5.1f, %8.4f, %8.4f, %8.4f %(theta, ctheta, stheta, ttheta) Tabs (\t) would be formatted like this: print %5.1f \t %8.4f\t %8.4f,\t %8.4f %(theta, ctheta, stheta, ttheta)
6.7.2
If and while blocks
The for loop is just one way of controlling ow in Python. There are also if and while code blocks. These execute code blocks the same way as for loops (colon terminated top statements, indented text, etc.). For both of these, the code block is executed if the top statement is TRUE. For the if block, the code is executed once but in a while block, the code keeps executing as long as the statement remains TRUE. The key to ow control therefore is in the top statement of each code block; if it is TRUE, then execute, otherwise skip it. To decide if something is TRUE or not (in the boolean sense), we need to evaluate a statement using comparisons. You know all about comparisons from Fortran. Python of course also has comparisons and they work in similar ways with a few dierences. Heres a handy table with comparisons (relational operators) in dierent languages: Comparisons F 77 F90 C MATLAB PYTHON meaning .eq. == == == == equals .ne. /= != = != does not equal .lt. < < < < less than .le. <= <= <= <= less than or equal to .gt. > > > > greater than .ge. >= >= >= >= greater than or equal to .and. .and. & & and .or. .or. or These operators can be combined to make complex tests. Here is a juicy complicated statement: if ( (a > b and c <= 0) or d == 0): code block
There are rules for the order of operations for these things like, multiplication gets done before addition. But these are easy to forget. You can look it up in the
161
documentation if you are unsure or, better, just put in enough parenthesis to make it completely clear to anyone reading your code. Good housekeeping Tip #4: Use parentheses liberally - make the order of operation completely unambiguous even if you could get away with fewer. One nice aspect of Python compared to C is that if you make a mistake and type, for example, if (a = 0):
you will get an error message during compilation. In C this is a valid statement with a completely dierent meaning than is intended! Finer points of if blocks The simplest if block works just like we have described: if (2+2)==4: # note the use of == and parentheses in comparison statement print I can put two and two together! However, as in Fortran (and C and any other reasonable programming language), there are whistles and bells to the if code blocks. In Python these are: elif and else. As in the Fortran equivalent else if, the elif code block gets executed if the top if statement is FALSE and the elif statement is TRUE. If both the top if and the elif statements are FALSE but the else statement is TRUE, then Python will execute the block following the else. Consider these examples: #!/usr/bin/env python mylist=[jane,doug,denise] if susie in mylist: pass # dont do anything if susie not in mylist: print call susie and apologize! mylist.append(susie) elif george in mylist: # if first statement is false, try this one print susie and george both in list else: # if both statements are false, do this: print "susie in list but george isnt"
While loops As already mentioned, the while block continues executing as long as the while top statement is TRUE. In other words, the if block is only executed once, while
162
the while block keeps looping until the statement turns FALSE. Here are a few examples: #!/usr/bin/env python a=1 while a < 10: print a a+=1 print "Im done counting!"
6.7.3
Code blocks in interactive Python
All of these program blocks can also be done in an interactive session also using indentation. The interactive shell responds with ..... instead of >>> once you type a statement it recognizes as a top statement. To signal that you are done with the program block, simply hit return: >>> a=1 >>> while a<10: .... print a .... a+=1 ....[return to execute block] ASSIGNMENT P2 Rewrite your Fortran Assignment F4 in Python. E-mail it to ltauxe@ucsd.edu
6.8
File I/O in Python
Python would be no better than a rather awkward graphing calculator (and we havent even gotten to the graphing part yet) if we couldnt read data in and spit data out. You learned a rudimentary way of spitting stu out already using the print statement, but there is a lot more to le I/O in Python. We would like to be able to read in a variety of le formats (not just X,Y in the rst two columns as in GMT) and output the data any way we want. In the following we will explore some of the more useful I/O options in Python.
6.8.1
Reading data in
From a le If you are using Python interactively or want interactivity in a script, use the command: raw input(). It acts as a prompt and reads in whatever is supplied prior to a return as a string.
6.8. FILE I/O IN PYTHON X=[] # make a list to put the data in ans=float(raw_input("Input numeric value for X: X.append(ans) # append the value to X print X[-1] # print the last item in the list "))
163
In this example, the variable ans will be read in as a string variable, converted to a oat and appended to the list, X. raw input() is a simple but rather annoying way to enter things into a program. Another (less annoying) way is put the data in a le (e.g., myle.txt) with cat, paste, Excel (saved as a text le), or whatever and read it into Python. The approach to this is similar to Fortran: we must rst open the le, then read it in and parse lines into the desired variables. To open a le we use the command open(), one of Pythons built-in functions. For a complete list of these, see: http://docs.python.org/library/functions.html The open() function returns an object, complete with methods, like readlines() which, yes, reads all the lines. Here is a script (ReadStations.py that will open the le station.list from the chapter on GMT, read in the data and print it out line by line. #!/usr/bin/env python f=open(station.list) StationNFO=f.readlines() for line in StationNFO: print line If you run this script, you will get this behavor: % ReadStations.py 9.02920 38.76560 2442 AAE 42.63900 37.93040 74.49400 1645 AAK 58.11890 678 ABKT 116 ADK
51.88370 -176.68440 etc.
The function open() has some bells and whistles to it and has the form open(name[, mode[, buering]) where the stu in square brackets is optional. The name argument is the le name to open and mode is the way in which it should be opened, most commonly for reading r, writing w or appending a. I use the form rU for unformatted reading because I often want to read in les that were saved in Dos, Mac OR Unix line endings and rU gures all that out for you. Just in case you are
164
curious, Unix lines end in \n, Mac les in \r and Dos (and windows) lines end in \r\n. I never use the buering argument and dont know what it does. If you are curious about the line endings, try typing out the representation of the line repr(line) in the above script and you get all the stu that is normally invisible like the apostrophes and the line terminations: % ReadStations.py 9.02920 38.76560 2442 AAE \n 42.63900 74.49400 1645 AAK \n 37.93040 58.11890 678 ABKT\n 51.88370 -176.68440 116 ADK \n -13.90930 -171.77730 706 AFI \n etc. Notice how in our rst version, printing the line also printed the line feed (\n) as an extra line. To clean this o of each line, we can use the string strip() function: print line.strip(\n) Putting this into the code results in this behavior: % ReadStations.py 9.02920 38.76560 2442 AAE 42.63900 74.49400 1645 AAK 37.93040 58.11890 678 ABKT Lets say you want to read in the data table into lists called Lats, Lons, and StaIDs (the rst three columns). You need to split each line into its columns and append the correct column into the appropriate list. Fortran automatically splits on the spaces so you probably didnt have to worry about this sort of thing yet, but Python reads in the entire line as a string and ignores the spaces or other possible delimiters (commas, semi-colons, tabs, etc.). To split the line, we use the string function split([sep]) where [sep] is an optional separator. If no separator is specied (e.g., line.split()), it will split on spaces. Anything could be a separator, but the most common ones are ,, ;, and \t. The latter is how a tab appears if you were to, say, print out the representation of the line, which shows all the invisibles. Here is a slightly modied version of ReadStations.py, ParseStations.py which parses out the lines and puts numbers (oats or integers) in the right lists: #!/usr/bin/env python Lats,Lons,StaIDs,StaName=[],[],[] ,[]# creates lists to put things in StationNFO=open(station.list).readlines() # combines the open and readlines methods!
6.8. FILE I/O IN PYTHON
165
for line in StationNFO: nfo=line.strip(\n).split() # strips off the line ending and splits on spaces Lats.append(float(nfo[0])) # puts float of 1st column into Lats Lons.append(float(nfo[1]))# puts float of 2nd column into Lons StaIDs.append(int(nfo[2])) # puts integer of 3rd column into StaIDs StaName.append(nfo[3])# puts the ID string into StaName print Lats[-1],Lons[-1],StaIDs[-1] # prints out last thing appended
From standard input As in Fortran, Python can also read from standard input. To do this, we need a system specic module, called sys which among other things has a stdin method. So, instead of specifying a le name in the open command, we could substitute the following line: #!/usr/bin/env python import sys Lats,Lons,StaIDs,StaName=[],[],[] ,[]# creates lists to put things in StationNFO=sys.stdin.readlines() # reads from standard input for line in StationNFO: nfo=line.strip(\n).split() # strips off the line ending and splits on spaces Lats.append(float(nfo[0])) # puts float of 1st column into Lats Lons.append(float(nfo[1]))# puts float of 2nd column into Lons StaIDs.append(int(nfo[2])) # puts integer of 3rd column into StaIDs StaName.append(nfo[3])# puts the ID string into StaName print Lats[-1],Lons[-1],StaIDs[-1] # prints out last thing appended The program can be invoked with: % ReadStations.py < station.list We could also use command line switches by reading in arguments from the command line. In the following example, we use the switch -f with the following argument begin the le name:
6.8.2
Command line switches
#!/usr/bin/env python import sys Lats,Lons,StaIDs,StaName=[],[],[] ,[]# creates lists to put things in if -f in sys.argv: # look in list of command line arguments file=sys.argv[sys.argv.index(-f)+1] # find index of -f and increment by one StationNFO=open(file,rU).readlines() # open file for line in StationNFO: nfo=line.strip(\n).split() # strips off the line ending and splits on spaces Lats.append(float(nfo[0])) # puts float of 1st column into Lats
166
CHAPTER 6. LISAS PYTHON NOTES Lons.append(float(nfo[1]))# puts float of 2nd column into Lons StaIDs.append(int(nfo[2])) # puts integer of 3rd column into StaIDs StaName.append(nfo[3])# puts the ID string into StaName print Lats[-1],Lons[-1],StaIDs[-1] # prints out last thing appended
This version can be invoked with: % ReadStations.py -f station.list
Reading numeric les In the special case where the data in a le are entirely numeric, you can read in the le with a special numpy function loadtxt(). This reads the data into a list whereby each element of the list is a list of numbers from each line.
6.8.3
Writing data out
Lets say I have a Python module that will convert latitudes and longitudes to UTM coordinates. O.K. I really do have one that I downloaded from here: http://code.google.com/p/pyproj/issues/attachmentText?id=27&aid= -80884174771817564 &name=UTM.py&token=46ab62caa041c3f240ca0e55b7b25ad6 I wrote a script (ConvertStations.py) to convert each of the stations in my list to their UTM equivalents (assuming these were in a WGS-84 ellipsoid). It would be nice if after having done this to the data, I could then write it out somehow, preferably to a le. Of course I could use the print command like this: #!/usr/bin/env python import UTM # imports the UTM module Ellipsoid=23-1 # UTMs code for WGS-84 StationNFO=open(station.list).readlines() for line in StationNFO: nfo=line.strip(\n).split() lat=float(nfo[0]) lon=float(nfo[1]) StaName= nfo[3] Zone,Easting, Northing=UTM.LLtoUTM(Ellipsoid,lon,lat) print StaName, : , Easting, Northing, Zone
which spits out something like this: % ConvertStations.py AAE : 474238.170087 998088.469113 37P
6.8. FILE I/O IN PYTHON AAK : 458516.115522 4720850.45385 43T ABKT : 598330.712671 4198681.92944 40S ADK : 521722.179764 5748148.625 1U AFI : 416023.683618 8462168.07766 2L etc. I could save the output with a UNIX re-direct: ConvertStations.py > mynewfile
167
But we yearn for more. So, more elegantly, I can open an output le [for appending a or (over)writing w] write a formatted string using the write method on the output le object with format string: #!/usr/bin/env python import UTM # imports the UTM module outfile=open(mynewfile,w) # creates outfile object Ellipsoid=23-1 # UTMs code for WGS-84 StationNFO=open(station.list).readlines() for line in StationNFO: nfo=line.strip(\n).split() lat=float(nfo[0]) lon=float(nfo[1]) StaName= nfo[3] Zone,Easting, Northing=UTM.LLtoUTM(Ellipsoid,lon,lat) outfile.write(%s %s %s %s\n%(StaName, Easting, Northing, Zone)) The only signicant changes are 1) the object outle is opened for writing. Note that this will clobber anything in a pre-existing le by that name and 2) the output le gets written to in the statement with a write method on the output le object: outfile.write(%s %s %s %s\n%(StaName, Easting, Northing, Zone))
The write statement uses the syntax: format string%(list of variables tuple). Format strings have these rules: For each variable in (what you...) you need a format: %s for string, %i for integer, %f for oat, %e for exponent you can also specify further, e.g.: %7.1f for 7 characters with 1 after the decimal %10.3e for 10 characters with 3 after the decimal where the number of characters include the decimal and padded spaces As noted before, the format string can include punctuation:
168
x,y=4.82,2.3e3 print %7.1f,%s\t%10.3e%(x,hi there,y) 4.8,hi there 2.300e+03
In the ConvertStations2.py script, the \n string puts a UNIX line ending on it. Without that, the whole le is but a single line (very annoying). A session using the script (ConvertStations2.py and a peek at the resulting le could look like this: % ConvertStations2.py % head mynewfile AAE 474238.170087 998088.469113 37P AAK 458516.115522 4720850.45385 43T ABKT 598330.712671 4198681.92944 40S ADK 521722.179764 5748148.625 1U AFI 416023.683618 8462168.07766 2L ALE 509467.666259 9161062.29194 20X ALQ 366981.843985 3868044.56906 13S ANMO 366981.843985 3868044.56906 13S ANTO 482347.254856 4413225.7807 36S AQU 368638.770654 4690300.1797 33T Im assuming you know what the UNIX head command does or at least how to nd out!
6.9
Functions
So far you have learned how to use functions from program modules like NumPy. You can imagine that there are many bits of code that you might write that you will want to use again and again, say converting between degrees and radians and back, or nding the great circle distance between two points on Earth, or converting between UTM and latitude/longitude coordinates (as in UTM.py, my new favorite package). In Fortran, you learned about subroutines and functions which do this sort of work for you. In Python, we also have a way to do this of course. The basic structure of a program with a Python function is: #!/usr/bin/env python def FUNCNAME(in_args): """ DOC STRING """ some code that the functions does something
6.9. FUNCTIONS return out_args FUNCNAME(in_args) # this calls the function
169
6.9.1
Line by line analysis
def FUNCNAME(in args): The rst line must have def as the rst three letters, must have a function name with parentheses and a terminal colon. If you want to pass some variables to the function, they go where in arg sits, separated by commas. Unlike in Fortran, there are no output variables here. There are four dierent ways to handle argument passing. 1) You could have a function that doesnt need any arguments at all: #!/usr/bin/env python def gimmepi(): """ returns pi """ return 3.141592653589793 print gimmepi() 2) You could use a Fortran like style, where there is a set list of what are called formal variables that must be passed: #!/usr/bin/env python def deg2rad(degrees): """ converts degrees to radians """ return degrees*3.141592653589793/180. print 42 degrees in radians is: ,deg2rad(42.) 3) You could have a more exible need for variables. You signal this by putting *args in the in args list (along with any formal variables you want): #!/usr/bin/env python def print_args(*args): """ prints argument list """ print You sent me these arguments: for arg in args: print arg print_args(1,4,hi there) print_args(42)
170
4) You can use a keyworded, variable-length list by putting **kwargs in for in args: #!/usr/bin/env python def print_kwargs(**kwargs): """ prints keyworded argument list """ for key in kwargs: print %s %s %(key, kwargs[key]) print_kwargs(arg1=one,arg2=42,arg3=ocelot)
Doc String Although you can certainly write functional code without a document string, make a habit of always including one. Trust me - youll be glad you did. This can later be used to remind you of what you thought you were doing years later. It can be used to print out a help message by the calling program and it also lets others know what you intended. Notice the use of the triple quotes before and after the documentation string - that means that you can write as many lines as you want. Function body This part of the code must be indented, just like in a for loop, or other block of code. Return statement You dont need this unless you want to pass back information to the calling body (see, for example print kwargs() above). But unlike in Fortran, where variables are passed back the same way they get in, through the rst line, Python separates the entrance and the exit. See how it can be done in the gimme pi() example above.
6.9.2
Main program as function
It is considered good Python style to treat your main program block as a function too. (This helps with using the document string as a help function and building program documentation in general.) In any case, I recommend that you just start doing it that way too. In this case, we have to call the main program with the nal (not indented) line main():
6.9. FUNCTIONS #!/usr/bin/env python def print_kwargs(**kwargs): """ prints keyworded argument list """ for key in kwargs: print %s %s %(key, kwargs[key]) def main(): """ calls function print_kwargs """ print_kwargs(arg1=one,arg2=42,arg3=ocelot) main() # runs the main program
171
Notice how in the above examples, all the functions preceded the main function. This is because Python is an interpreter and not compiled - so it wont know about anything declared below as it goes through the script line by line. On the other hand, weve been running lots of functions and they were not in the program we used to call them. The trick here is that you can put a bunch of functions in a separate le (in your path) and import it, just like we did with NumPy. Your functions can then be called from within your program in the same way as for NumPy. So lets say I put all the above functions in a le called myfuncs.py: def gimmepi(): """ returns pi """ return 3.141592653589793 def deg2rad(degrees): """ converts degrees to radians """ return degrees*3.141592653589793/180. def print_args(*args): """ prints argument list """ print You sent me these arguments: for arg in args: print arg I could then just import the module myfuncs from within another program, or just interactively. I can use the functions, or just call for help: % python
172
>>> import myfuncs >>> print myfuncs.gimmepi() 3.14159265359 >>> print myfuncs.print_args.__doc__ prints argument list >>>
6.9.3
Scope of variables
As in Fortran, inside a function, variable names have their own meaning which in many cases will be dierent from inside the calling function. So, variables names declared inside a function stay in the function. This is true unless you declare them to be global. Here is an example in which the main program knows about the functions variable V: def myfunc(): global V V=123 def main(): myfunc() print V main() In addition to being able to write your own functions, of course Python has LOTS of modules and a gazzillion functions. The enthought distribution that you are using Includes plotting, numerical recipes, trig functions, image manipulation, animation, and many more. We will explore some of these in the rest of the class. ASSIGNMENT P3: Write a subroutine module that has these functions: a function that returns the bulk parameters relating to the Earth from this website: http://nssdc.gsfc.nasa.gov/planetary/factsheet/earthfact.html Also include the average radius of: 6,371 km a function that converts degrees to radians one that converts radians to degrees one that converts longitude and latitude to cartesian coordinates. [hint, x = cos(az) cos(pl), y = sin(az) cos(pl), z = sin(pl)]], assuming a radius of unity one that converts cartesian coordinates back to longitude and latitude
6.10. COMBINING F90 CODE WITH PYTHON
173
and one that calculates the great circle distance between two points using the numpy.dot() function to get the angular separation and the radius of the Earth to get the arc length. Assume the average radius of the Earth (gotten from the rst function). Then write a program that takes keyboard entry for an longitude and latitude pair and prints out the X,Y,Z, converts back and prints the new longitude and latitude out (as a check) and gives you the great circle distance in km. [HINT: Take a look at the function SPH AZI in the Fortran chapter.] E-mail your code to ltauxe@ucsd.edu
6.10
Combining F90 code with Python
Now you have the very basics of Python programming under your belt, we can try to pull together the dierent threads you have been learning by investigation how to combine F90 with Python. You might ask why not just use Python? or Fortran? The answer is that there is a lot of Fortran code lying around that you might want to use, but no way to visualize the data, so you want Python for that. Or you are solving an extremely computationally intensive problem (global climate model, or convection in the Earths core or mantle, or ....) and you need code that is as fast as possible. Although NumPy is compiled and therefore very fast, Fortran is still the fastest thing around. For whatever reason, you are taking a class in both Fortran and Python so it makes pedagogical sense to try to tie the two halves together. The next question is How?. There are several dierent approaches to this ranging from the simple to the sophisticated. The simple approach would be to create output les with Fortran and the read it in and plot it or whatever with Python. A more sophisticated approach would be to call subroutines and functions from a Fortran module which is imported like any other module into Python. This is more tricky and involves a Python package called f2py which came with your Python distribution. I have tested f2py using the gfortran and python packages that were recommended for this class and it worked ne. But I also had to get rid of my beloved antique /usr/local/bin/g77 compiler, but you probably dont have one. This is just a trouble shooting tip. For reference, the URLs for these are: http://gcc.gnu.org/wiki/GFortranBinaries#MacOS and http://www.enthought.com/products/getepd.php
174
Because these websites change quickly, I also put the .dmg les on the class website: http://mahi.ucsd.edu/class233/gfortran-and-gcc-4-6-2-RC20111019-Sno-x86-64.dmb and http://mahi.ucsd.edu/class233/epd-7.1-2-macosx-i386.dmb Assuming you have installed everything properly, there we will cover three ways to use f2py in the following. These are: Brute force: Try to get f2py to create a compiled module (ending in .so) from a standard F90 program, which can then be imported like any other module. Because Fortran subroutines have arguments that both go into the subroutine and come out, and Python doesnt, f2py has to make guesses as to the use of variables and does its best. This works in simple cases, but can fail badly at times. Signature File: Ask f2py to read through the code, picking out all the variables and create what is called a signature le (ending in .pyf). This has f2pys guesses as to what the variables are supposed to do. The signature le can be edited to supply the correct intent of variables. Then f2py can create the compiled module based on what is in the signature le, which functions with fewer errors (from bad guesses). F90 surgery: The Fortran code can be modied itself to help f2py in interpreting variables.
6.10.1
Brute force f2py method:
Lets start with an example using the program gcf2.f90 from the chapter on Fortran. This has a function to calculate greatest common factors. In fact we dont need all the ddly bits at the top - just the function getgcf. Lets save it in a le called gcf.f90: function getgcf(x, y) implicit none integer :: getgcf, x, y, i, z do i = 1, min(x,y) if (mod(x, i) == 0 .and. mod(y, i) == 0) getgcf = i end do end function getgcf To compile the function (or the whole program!), use this syntax:
6.10. COMBINING F90 CODE WITH PYTHON f2py -c gcf.f90 -m gcf
175
The -c switch species which le to compile and the -m switch species the stem of the output le. f2py will create something called that stem with .so appended to it, so if all is well you will get a le called gcf.so. The output le gcf.so is a callable module from within python: >>> import gcf # note that no .so is needed >>> gcf.getgcf(8,4) # call the function in the usual way 4 The brute force method is not without potential problems. For example, we could use the function without checking for variable type and send the F90 function something with the wrong type, getting back the wrong answer with no error message. Also, f2py assumes that all variables are going IN to the subroutine and doesnt know that some are intended to come out. So we need a way to teach f2py which variables are intended to go in and which are intended to come out. This can be done either with a signature le if you cant modify the Fortran code itself, or by inserting a few python hints into the Fortran code itself. To illustrate the problem, lets consider another F90 subroutine sph azi from the Fortran chapter: subroutine SPH_AZI(flat1, flon1, flat2, flon2, del, azi) implicit none real :: flat1,flon1,flat2,flon2,del,azi,pi,raddeg,theta1,theta2, phi1,phi2,stheta1,stheta2,ctheta1,ctheta2, sang,cang,ang,caz,saz,az if ( (flat1 == flat2 .and. flon1 == flon2) .or. & (flat1 == 90. .and. flat2 == 90.) .or. & (flat1 == -90. .and. flat2 == -90.) ) then del=0. azi=0. return end if pi=3.141592654 raddeg=pi/180. theta1=(90.-flat1)*raddeg theta2=(90.-flat2)*raddeg phi1=flon1*raddeg phi2=flon2*raddeg stheta1=sin(theta1) stheta2=sin(theta2) ctheta1=cos(theta1) ctheta2=cos(theta2)
& &
176
cang=stheta1*stheta2*cos(phi2-phi1)+ctheta1*ctheta2 ang=acos(cang) del=ang/raddeg sang=sqrt(1.-cang*cang) caz=(ctheta2-ctheta1*cang)/(sang*stheta1) saz=-stheta2*sin(phi1-phi2)/sang az=atan2(saz,caz) azi=az/raddeg if (azi.lt.0.) azi=azi+360. end subroutine SPH_AZI Lets try the brute force f2py way: f2py -c sph_azi.f90 -m sph_azi and try sph azi out in Python: %python >>> import sph_azi >>> delta,azi=0.,0. # have to declare these.. >>> sph_azi.sph_azi(33,-117,41,-72,delta,azi) # note module/function name are same. >>> print delta, azi 0.0 0.0 Well, that didnt work. The problem is that Python cant get the variables delta, azi back out from the subroutine. The Fortran style of using the entrance as the exit makes it tough to gure out which variables are supposed to go in and which are supposed to come out, to f2py doesnt even try. The solution is to use the signature le method.
6.10.2
Signature les
We can use f2py to create a signature le sph azi.pyf with the command: f2py sph_azi.f90 -m sph_azi -h sph_azi.pyf The syntax is a little dierent from the brute force method in that we are not compiling sph azi yet, we are just creating the .pyf le (specied by the -h switch, for use in the eventual sph azi module (specied by the -m switch). If we look inside the ph azi.pyf le, we nd: ! -*- f90 -*! Note: the context of this file is case sensitive. python module sph_azi ! in
6.10. COMBINING F90 CODE WITH PYTHON
177
interface ! in :sph_azi subroutine sph_azi(flat1,flon1,flat2,flon2,del,azi) ! in :sph_azi:sph_azi.f90 real :: flat1 real :: flon1 real :: flat2 real :: flon2 real :: del real :: azi end subroutine sph_azi end interface end python module sph_azi ! This file was auto-generated with f2py (version:2). ! See http://cens.ioc.ee/projects/f2py2e/ At this point, all we have is a list of all the variables f2py found. Note that while Fortran doesnt care about case, Python does, so the variable names within the .f90 code are converted to lower case by default. (You can suppress this with the nlower switch.) The default assumption is that all of the variables found are headed IN to the subroutine and not intended to come out. Because in the case of sph azi this is not true (del and azi are coming out), we must edit the .pyf le to explicitly say which variables are coming in and which are coming out. This is done with the by inserted the function intent(in) or intent(out) after the variable type: python module sph_azi ! in interface ! in :sph_azi subroutine sph_azi(flat1,flon1,flat2,flon2,del,azi) real intent(in) :: flat1 real intent(in):: flon1 real intent(in):: flat2 real intent(in) :: flon2 real intent(out) :: del real intent(out) :: azi end subroutine sph_azi end interface end python module sph_azi If we save the modied .pyf le as sph azi1.pyf, we can compile sph azi with the command: % f2py -c sph_azi1.pyf sph_azi.f90 which creates a new le sph azi.so. Lets see if that one works better: % python >>> import sph_azi
178
>>> delta,azi=sph_azi.sph_azi(33,-117,41,-72) >>> print delta, azi 36.4012794495 64.0623321533 And it does!
6.10.3
F90 Surgery
In the last section we saw how to use F90 code, without touching it, just by clarifying a few things for poor old f2py. IF you can modify the F90 source, you can skip the signature le by inserting special commands, called f2py directives that the Fortran compiler will ignore (because they start with a !), but f2py will recognize (sneaky, huh?). Here is an example of a duly modied code, saved as sph azi2.f90: subroutine SPH_AZI(flat1, flon1, flat2, flon2, del, azi) implicit none real :: flat1,flon1,flat2,flon2,del,azi,pi,raddeg,theta1,theta2, & phi1,phi2,stheta1,stheta2,ctheta1,ctheta2, & sang,cang,ang,caz,saz,az !f2py intent(in) flat1 !f2py intent(in) flon1 !f2py intent(in) flat2 !f2py intent(in) flon2 !f2py intent(out) del !f2py intent(out) azi if ( (flat1 == flat2 .and. flon1 == flon2) .or. & (flat1 == 90. .and. flat2 == 90.) .or. & (flat1 == -90. .and. flat2 == -90.) ) then del=0. azi=0. return end if pi=3.141592654 etc. We can then compile the modied code with the command: f2py -c -m sph_azi2 sph_azi2.f90 This works just like the signature le example but using the module name sph azi2 instead of sph azi.
6.10.4
A few more things you need to know:
NB: If you are modifying Fortran 77, then use Cf2py instead of f2py but this isnt included in the Enthought Python you installed, and I have never used it, so you are on your own here.
6.11. CLASSES For variables that are both in and out, use: intent(inout) For variables are are change in place, use: intent(inplace) For allocatable variables, use, e.g., intent(out,allocatable)
179
Watch out with arrays Fortran and the default NumPy arrays are the transpose of one another! There are ways spelled out in the NumPy documentation for making your Pytnon arrays behave the same as the Fortran ones - so go read that before you get fancy ideas. f2py converts all the Fortran subroutine names to lower case. You can suppress this with a n-lower option. For more detailed information on what f2py is really doing, see the documentation available here: http://www.scipy.org/F2py There is also a helpful, but somewhat dated reference manual available here: http://cens.ioc.ee/projects/f2py2e/usersguide/ Note that in the latter, there are many references to a module named Numeric, which is a predecessor of NumPy - so dont try to call it because you dont have it installed. ASSIGNMENT P4 Modify your ASSIGNMENT F8 such that that main program is in Python and calls your F90 subroutine. Use the F90 surgery method above to modify your subroutine such that it compiles without the use of signature les. E-mail both les plus the script you would use to compile them to ltauxe@ucsd.edu
6.11
Classes
Before we go any further, we need to learn some basic concepts about classes. These are the basis of object oriented programming OOP (that again!). Class objects lie behind plotting, for example and a rudimentary understanding of what they are and how they work will come in handy when we start doing anything but the simplest plotting exercises. A class object is created by a call to a class denition which which can be thought of as a blueprint for the class object. Here is an simple example of a class denition:
180
class Circle: """ This is simple example of a class """ pi=3.141592653589793 def __init__(self,r): self.r=r def area(self): return 0.5*self.pi*self.r**2 def circumference(self): return 2.*pi*self.r
Saving this class in a le called Shapes.py we can use it in a Python session in a manner similar to function modules: >>> import Shapes # import the class >>> r=4.0 >>> C=Shapes.Circle(r) # create a class instance with r=4. >>> C.pi # retrieve an attribute 3.141592653589793 >>> C.area() # retrieve a method 25.132741228718345 >>> C.r=2.0 # change the value of r >>> C.area() # get a new area 6.283185307179586 In spite of supercial similarities, classes are not the same as functions. Although the Shape module is imported just the same as any other, to use it, we rst have to create a class instance (C=Shapes.Circle(r)). C is an object with attributes (variables) and methods. All methods (parts that start with def), have an argument list. The rst argument has to be a reference to the class instance itself, or self, followed by any variables you want to pass into the method. So the init method initializes the instance attributes of an object. In the above case, it dened the attribute r, which gets passed in when the class is rst called. Asking for any attribute (note the lack of parentheses), retrieves the current value of that attribute. Attributes can be changed (as in C.r=2.0). The other methods (area and circumference) are dened like any function except note the use of self as the rst argument. This is required in all class method denitions. In our case, no other parameters are passed in because the only one used is r, so the argument list consists of only self. Calling these methods returns the current values of these methods.
6.12. MATPLOTLIB
181
You can make a subclass (child) of the parent class which has all the attributes and methods of the parent, but may have a few attributes and methods of its own. You do this by setting up another class denition within a class. So, the bottom line about classes is that they are in the same category of things as variables, lists, dictionaries, etc. That is, they are data structures - they hold data, and the methods to process that data. If you are curious about classes, theres lots more to know about classes that we dont have time to get into, but you can nd useful tutorials online: (e.g., http://www.sthurlow.com/python/lesson08/) ASSIGNMENT P5 Write a module called Shapes Shapes should have classes for: circle, sphere, cylinder, rectangle, and cube. these classes should have methods that return things volume, circumference, area, mass (pass the argument density), where appropriate. write a class called Earth that has attributes using the bulk parameters from assignment P3 write a program that uses the Earth density, radius from the Earth class, passes these to the sphere class and calculates the mass. How does this compare with the mass given by the Earth class?
6.12
Matplotlib
So far you have learned the basics of Python, NumPy and how to link F90 code with Python. But Python was sold as a way of visualizing data and we havent yet seen a single plot! There are many plotting options within the Python umbrella. The most mature and the one I am most familiar with is matplotlib, a popular graphics module of Python. Actually matplotlib is a collection of a bunch of other modules, toolkits, methods and classes. For a fairly complete and readable tour of matplotlib, check out these links: http://matplotlib.sourceforge.net/Matplotlib.pdf and here: http://matplotlib.sourceforge.net/
6.12.1
A rst plot
Lets start with a simple plot script (matplotlib1.py):
182
#!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") # my favorite backend import pylab # module with matplotlib pylab.plot([1,2,3]) # plot some numbers pylab.ylabel(Y) # label the y-axis pylab.show() # reveal the plot The rst step should be obvious by now, it imports matplotlib. Figures are rendered on backends so they appear on screen. There are a lot of dierent backends with slightly dierent looks. Some work better on dierent operating systems. I use the very old school backend called TkAgg backend because it works. So step 2 sets the backend: matplotlib.use(TkAgg). The module matplotlib itself contains a lot of other modules. One of these, pylab is the business end that has a lot of plotting methods and classes. It must be loaded alongside matplotlib, so step 3 is: import pylab. After that the fun starts. In the above example, we call the plot method with a list as an argument. As I mentioned, matplotlib uses the concept of classes to make plots and this has just happened behind the scenes. We could have named the plot instance with a the gure() method (e.g., g=pylab.gure()) and then referred to it later with the command g.plot([1,2,3]), but we dont have to in this simple case - the class instance is implied and is the current plot. You can tell this, if you do the above example in interactive mode: >>> import matplotlib >>> matplotlib.use("TkAgg") >>> import pylab >>> pylab.plot([1,2,3]) [<matplotlib.lines.Line2D object at 0x4bd6eb0>] The bit about [<matplotlib.lines.Line2D object at 0x4bd6eb0>] is Pythons way of telling you that you just created an object and something about it. In any case, when you give plot() a single sequence of values (as above), it assumes they are y values and supplies the x values for you. Attributes of the pylab class, such as the Y axis label can be changes with the ylabel method. As you can imagine, there are LOTS of methods, including, surprise, an xlabel method. When we are done customizing the plot instance, we can view it with the show method. When that gets executed, we will get a plot something like this:
6.12. MATPLOTLIB
183
Use the red button to close the plot window
Use these buttons to save and modify your plot
Once that happens, we wont be able to change the plot any more and in fact, we wont get our terminal back until the little plot window is closed. You can save your plot with the little disk icon in a variety of formats. Adobe Illustrator likes .svg, or .eps while Microsoft products like .png le formats. If you nd it annoying to always have to close gures with the little red button, or save them with the disk icon, you can tweak the program like this: #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab pylab.ion() # turn on interactivity pylab.plot([1,2,3]) pylab.ylabel(Y) pylab.draw() # draw the current plot ans=raw_input(press [s] to save figure, any other key to quit: ) if ans==s: pylab.savefig(myfig.eps) The method pylab.saveg(FILENAME.FMT). The .FMT can be one of several, e.g., .eps, .svg, .ps, .pdf, .png, .gif, .jpg, etc.). Some of these (the vector graphics ones
184
like pdf, ps, eps and svg) can be opened in Adobe Illustrator for modication. As mentioned earlier, if you give plot() a single sequence of values, it assumes they are y values and supplies the x values for you. Garbage in, garbage out. But plot() takes an arbitrary number of arguments of the form: (X1 , Y1 , line style 1, X2 , Y2 , line style 2, etc.), where line style is a string that species the line style as illustrated in this script called matplotlib2.py #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab,numpy x=numpy.arange(0,360,10) r=x*numpy.pi/180. c=numpy.cos(r) s=numpy.sin(r) pylab.plot(x,c,r--,x,s,g^) pylab.show() which produces the plot:
1.0
0.5
0.0
0.5
From the code, you can probably gure out that a line style of r is a red dashed line, and g^ are green triangles. There are many other attributes that can be controlled: linewidth, dash style, etc. and I invite you to check out the matplotlib documentation. By now, you should understand enough about classes, keyword argument passing and other pythonalia to be able to gure things out on your own. But dont panic,
1.0 0
50
100
150
200
250
300
350
6.12. MATPLOTLIB
185
Im going to lead you through a few more examples, which I hope will speed you on your plotting way.
6.12.2
Multiple gures and more customization
As already mentioned, pylab has the concept of current gure which subsequent commands refer to. In the preceding examples, we only had one gure, so we didnt have to name it, but for fancier gures with several plots, we can create named gure objects by invoking a gure instance: fig = pylab.figure(num=1,figsize=(5,7)). Notice the syntax whereby gsize is a method with width and height (in inches) specied by a tuple and gnum is the gure number. Notice that these are keyword arguments, and that there are many more: consult the list of **kwargs in the online documentation located here: http://matplotlib.sourceforge.net/api/pyplot api.html#matplotlib.pyplot.gure Once we have a gure instance (sometimes called a container), we can do all kinds of things, including adding subplots. To do this, we can use the syntax: fig.add_subplot(211)
Here the argument 211 means 2 rows, one column and this is the rst plot. To make plots side by side, you would use: g.add subplot(121) for 1 row, two columns, etc. After each add subplot command, that subplot becomes the current gure for plotting on. If you want more freedom, say, you want to make a subplot at an arbitrary place, use the add axes([left, bottom,width, height]) 0 method, e.g., add axes([0.1,0.1,0.7,0.3]). The values are 0-1 in relative gure coordinates. To illustrate these new concepts, consider the example code, matplotlib3.py: #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab, numpy def f(t): return numpy.exp(-t)*numpy.cos(2.*numpy.pi*t) t1= numpy.arange(0.,5.,0.1) t2= numpy.arange(0.,5.,0.02) fig=pylab.figure(num=1,figsize=(7,5))
186
fig.add_subplot(211) pylab.plot(t1,f(t1),bo) pylab.plot(t1,f(t1),k-) fig.add_subplot(212) pylab.plot(t2,numpy.cos(2*numpy.pi*t2),r--) pylab.xlabel(Time (ms)) fig.add_axes([.6,.75,.25,.10]) pylab.plot([0,1],[0,1],r-,[0,1],[1,0],r-) pylab.ylabel(Inset) pylab.show() which produces:
1.0 Inset 1 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 0 1.0 2 3 4 5 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0
0.5
0.0
0.5
1.0 0
2 Time (ms)
By now, you should be able to gure out what everything in that script does by yourself!
6.12.3
Adding text
We already met xlabel and ylabel. But text can be added in a other ways, e.g., using the title, text, legend and arrow methods. Lets decorate one of our early examples to show how some of these things work: #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab,numpy x=numpy.arange(0,360,10)
6.12. MATPLOTLIB r=x*numpy.pi/180. c=numpy.cos(r) s=numpy.sin(r) s2=numpy.sin(r)**2 pylab.plot(x,c,r--,x,s,g^,x,s2,k-) pylab.title(Fun with trig) pylab.text(250,-.5,pithy note) pylab.legend([cos(x),\ sin(x),r$\sin(x^2$)],lower left) pylab.xlabel(r$\theta) pylab.annotate(triangles!,\ xy=(175,0),xytext=(110,-.25),\ arrowprops=dict(facecolor=black,\ shrink=0.05)) pylab.show() which produces this plot:
1.0
187
Fun with trig
0.5
0.0
triangles!
0.5
pithy note
The title appears at the top of the plot. Text labels get places at the x and y coordinates on the plot and the legend will appear in the upper/lower right/left corner as specied in the string. The pylab.text(x,y,string, kwargs) method also has optional key word arguments, specifying font, size, color and the like. The legend labelist is a list of labels for each plot element. So, every line or point style that you want in your legend, append a label to the label list after the relevant plot command. Also note that the legend and xlabel methods use a special format for strings (rLateX String) which allows embedded LaTeX equation syntax to make scientic equations look right - so now you have to learn LaTeX!. Finally, the arrow gets drawn with the annotate method, which has a lot of other attributes as well.
cos(x) sin(x)
sin(x )
2
1.0 0 50 100 150 200 250 300 350
$\theta
188
Check the matplotlib documentation for details. There are lots of graphing styles possible with matplotlib, e.g., histograms, pie charts, contour plots, whisker plots, etc. Im just going to show you a few examples. The best thing to do is to look through the online documentation for a plot that looks like what you need, then modify it. This is ALWAYS a good approach - start with something that works and ddle with it until it suits your own particular needs. ASSIGNMENT P6 The le SPECMAP.dat has 18 O data versus age (in ka). For those of you who dont know, this is classic record of Imbrie et al., 1984 variations in oxygen isotopic ratio of seawater controlled by changes in global ice volume which in turn was forced by the Earths orbit. Write a Python program that : reads in the data and makes a simple X,Y plot with symbols connected by lines put Age on the X axis and 18 O on the Y. -the LaTeX code for 18 O is: $\delta^{18}O$ send the script to me at ltauxe@ucsd.edu
6.12.4
Histograms
I downloaded a weeks worth of earthquake location, magnitude etc. from the website: http://earthquake.usgs.gov/earthquakes/catalogs/index.php by clicking on the XML merged catalog, past 7 days link. This a compressed (gnu zip) XML le. After unzipping it (by clicking on it), the le (called merged catalog.xml looked something like this: <?xml version="1.0" encoding="UTF-8"?> <merge> <event id="71672091" network-code="NC" time-stamp="2011/11/03_23:03:32 " version="2"> <param name="year" value="2011"/> <param name="month" value="10"/> <param name="day" value="28"/> <param name="hour" value="21"/> <param name="minute" value="21"/> <param name="second" value="52.7"/> <param name="latitude" value="38.8147"/> <param name="longitude" value="-122.7862"/> <param name="depth" value="1.7"/> <param name="magnitude" value="0.2"/>
6.12. MATPLOTLIB
189
<param name="num-phases" value="8"/> <param name="dist-first-station" value="0.0"/> <param name="rms-error" value="0.00"/> <param name="hor-error" value="0.8"/> <param name="ver-error" value="0.6"/> <param name="azimuthal-gap" value="42"/> <param name="magnitude-type" value="D"/> <param name="magnitude-type-ext" value="Mcd = coda duration magnitude"/> <param name="num-stations-mag" value="3"/> <param name="stand-mag-error" value="0.1"/> <param name="location-method" value="h"/> <param name="location-method-ext" value="Hypoinverse (Confirmed by human review)"/> </event> <event id="71672096" network-code="NC" time-stamp="2011/10/28_21:30:19 " version="0"> <param name="year" value="2011"/> <param name="month" value="10"/> etc. Reading in all the data, I can plot them various ways. In this example, I plot a histogram of the magnitudes: #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab def readEQs(infile): input=open(infile,rU).readlines() EQs=[] # list to put EQ dictionaries in linenum=0 while linenum <len(input): if event id in input[linenum]: # new event EQ={} # define a dictionary linenum+=2 # increment past time-stamp while param in input[linenum]: record=input[linenum].split(=) datakey=record[1].split()[0].strip(") EQ[datakey]=record[2].strip(\n).strip(/>).strip(") linenum+=1 # keep going until </event> if </event> in input[linenum]: # done with event EQs.append(EQ) linenum+=1 # look for next event id return EQs EQs=readEQs(merged_catalog.xml) Magnitudes=[] # set up container for eq in EQs: # step through earthquake dictionaries Magnitudes.append(float(eq[magnitude])) # collect magnitudes pylab.hist(Magnitudes,bins=50,normed=True) # plot em pylab.xlabel(Richter Magnitude) pylab.ylabel(Frequency) pylab.show()
190
0.8 0.7 0.6 Frequency 0.5 0.4 0.3 0.2 0.1 0.0 0 1 2
3 4 Richter Magnitude
In the example, please notice the clever way in which we parse the XML code. First, we look for the new event marker, then split the record on its =. This gives a element list with what we need. The second element contains the key and the third element the value. To get the key name, we split on the space which puts the key name (which puts the key name enclosed in quotes in the rst element of a list, which we select and strip o the quotes. This we use as the key in the dictionary, EQ. To get the value, we use the third element in the record list (record[2]), strip o the end of line character and the quotes. We pair the value with the key in the EQ dictionary and continue, incrementing linenum until we have all the keys picked o and hit the <event > line. When we are done with a dictionary, we append it to a list of all the dictionaries, increment linenum and press on. We keep going until we have read all the data. After we have parsed the data le into a list of dictionaries, we make a list for the thing we one to plot (Magnitudes), hunt through the list, picking out the magnitude data and after turning it into a oat, append it to the list. We plot the histogram using the pylab.hist method. We can label the plot as per usual and display it with show(). The dictionaries get appended to a list. This way, any particular key could easily get shed out later for plotting. In this case, we just put the oat of the Magnitude column into the list Magnitudes. These get plotted in the histogram.
6.12. MATPLOTLIB
191
6.12.5
Pie Charts
I mostly think pie charts are silly, but some people love them. So if you want to see data as a pie chart, say the fraction of earthquakes in each magnitude bin from the last example, we can modify the script thusly: #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab from readEQs import * EQs=readEQs(merged_catalog.xml) Fracs,Labels=[],[] bin0=0 for m in range(1,8): # assume no magnitudes bigger than 8 last week! num=0 # initialize count for eq in EQs: eqm=float(eq[magnitude]) if eqm<m and eqm>bin0:num+=1 # count all magnitudes in this bin Fracs.append(float(num)) Labels.append(str(bin0)+-+str(m)) bin0=m # increment to next bin pylab.pie(Fracs, labels=Labels) pylab.axis(equal) # make the pie round! pylab.title(Silly Pie Chart) pylab.show()
Silly Pie Chart 0-1
6-7 5-6 4-5 1-2 3-4 2-3
Notice how the function readEQs from the histogram example has been put into a module by itself and then called within this program.
192
6.12.6
Basemap
Most of your maps can be made with GMT, but python can make maps too. Once you know Python, it can be much easier to use GMT because the same principles apply and all of the power of matplotlib is available for enhancing your plots. We generate maps with a special toolkit of matplotlib called Basemap. Here is a simple example, using our earthquake data from the histogram example to make a mollweide projection with the earthquake locations as red dots. #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab,numpy from readEQs import * from mpl_toolkits.basemap import Basemap EQs=readEQs(merged_catalog.xml) # read in data (see histogram example) Lats,Lons=[],[] # set up lists for location data for eq in EQs: # step through the list of earthquakes Lats.append(float(eq[latitude])) # collect the latitudes (as floats) Lons.append(float(eq[longitude])) map=Basemap(projection=moll,lon_0=0,resolution=c) # create a map instance map.drawcoastlines() map.drawmapboundary() map.drawmeridians(numpy.arange(0,360,30)) # draws longitudes from list map.drawparallels(numpy.arange(-60,90,30)) # draws latitudes from list X,Y=map(Lons,Lats) # calculates the projection of the X,Y pylab.plot(X,Y,ro) # uses pylabs plot to plot these arrays pylab.savefig(map.eps) # save the figure in EPS format
6.12. MATPLOTLIB
193
In this example, the list of earthquake dictionaries, EQs is read in with the same le reading function from the histogram example. Then we sh out the latitude and longitudes from each earthquake dictionary (eq) and append the oating point equivalent (remember they are strings in the dictionary!) into the relevant lists (Lats, Lons). The basic concept of Basemap is that you create a map class instance with a call to Basemap. The attributes of you map (e.g., resolution, projection, central longitudes, map boundaries, map center, etc.), i.e., all the things you set with switches in pscoast are set with keyword arguments in the call to Basemap. The details of these key word arguments depend on the projection you choose. After you create the map instance (called map) in the above example, you can modify attributes in using methods available to this particular subclass. To draw the coastlines, use the method drawcoastlines(). To draw the lines of longitude we use the method drawmeridians() with the an array (or list) containing the longitudes you want to plot. In this case we generated the array with numpy.arange, but we could have used range(0,360,30) or any arbitrary integer or oating point list. Drawing the lines of latitude works the same way (with the drawparallels() method). Now we want to plot a bunch of points on the map. By sending in the longitudes and latitudes of the earthquake locations as arguments to the map class, the class chews on them and spits out x,y values based on the projection we specied when creating the map instance with Basemap. Now we can plot the returned X, Y lists using the regular pylab plotting functions. We can go on and decorate the map with anything available in pylab. The advantage of using Basemap over, say, GMT is you can build on your knowledge of matplotlib, instead of struggling with two completely dierent and incompatible plotting packages. For more examples, check the documentation available here: http://matplotlib.github.com/basemap/users/examples.html ASSIGNMENT P7 Redo your GMT1 assignment using the Basemap toolkit!. Add a point labelled Now Im here! and the location of SIO.
6.12.7
Contour plots
Of the many plotting methods available in matplotlib, one of the more useful for Earth Scientists is the contour plot. Here is a example of the gravity anomaly
194 created by a buried sphere:
#!/usr/bin/env python import numpy import matplotlib matplotlib.use("TkAgg") import pylab G=6.67e-11 # grav constant in Nm^2/kg^2 (SI) R=2. # radius in meters z=3. # depth of burial drho=500 # density contrast in kg/m^3 x=numpy.arange(-2.*z,2.*z,0.1) y=numpy.arange(-2.*z,2.*z,0.1) X,Y=pylab.meshgrid(x,y) h=numpy.sqrt(X**2+Y**2) g=(G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2)) pylab.imshow(g,interpolation=bilinear,cmap=pylab.cm.Spectral) pylab.colorbar() pylab.axis(equal) pylab.show()
1e 7
1.20
20
1.05
40
0.90
0.75 60 0.60
80 0.45
100
0.30
0.15 0 20 40 60 80 100 120
The rst new function for us in the above example is meshgrid. This makes 2D arrays X and Y for a 3D mesh/surface plots. These have the dimensions of x by y. Each row of X is the same as x and there is a row for every element in y. Similarly, each column in Y is a copy of y and there is a column for every element in x. The line h=numpy.sqrt(X**2+Y**2) calculates the horizontal distance h from the origin for every point on the X,Y grid. Then we calculate the gravity anomaly g for a spherical mass with radius (R) at depth of z with the command: g=(G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2))
6.13. DEEPER INTO NUMPY AND SCIPY
195
You can nd the formula for this in any geophysics text book. Anyway, now we have g which if you print it out, looks like this: [[ 1.37971509e-08 1.42112058e-08 [ 1.40028722e-08 1.44295575e-08 [ 1.42112058e-08 1.46508813e-08 ..., [ 1.44221090e-08 1.48751394e-08 [ 1.42112058e-08 1.46508813e-08 [ 1.40028722e-08 1.44295575e-08 1.40028722e-08 1.40028722e-08] 1.42148210e-08 1.42148210e-08] 1.44295575e-08 1.44295575e-08] 1.42112058e-08 ..., 1.44295575e-08 ..., 1.46508813e-08 ..., 1.44221090e-08 1.46470410e-08 1.48751394e-08 1.51063696e-08 1.48751394e-08 1.46470410e-08
1.46470410e-08 1.48751394e-08 ..., 1.46470410e-08] 1.44295575e-08 1.46508813e-08 ..., 1.44295575e-08] 1.42148210e-08 1.44295575e-08 ..., 1.42148210e-08]]
It is a 2D array in which every element is the value of g at that grid point. There are a bunch of ways to visualize this, but here we plot it as a contour plot. To do this we interpolate between all the grid points and choose a color map translating the value of g into a color from blue to orange. When choosing color maps, be aware that a lot of people are red-green color blind and appreciate some other color contrast, like the blue-orange one chosen here.
6.13
Deeper into NumPy and Scipy
Now we have some plotting skills under our belt, we can take advantage of the computational tools available in NumPy and another scientic package called Scipy. We have already met these: cos(), sin(), pi(), arctan2(), arange(), among others. Now meet polyt(x,y,order), and polyval(coes,x). The former ts an n-order polynomial to input data and returns a list of the coecients and the latter evaluates the value at x using coecients returned from polyt. To nd a best t line (y = mx + b where m is the slope and b is the intercept) we can use polyt() by setting for order to one. Order is two for a quadratic polynomial, (y = ax2 + bx + c) and three for cubic (y = ax3 + bx2 + cx + d). See how these work in the following example: #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg")
196
import numpy,pylab from numpy import random x,y=[1.,2.,3.,4.],[1.,1.8,3.4,4.2] # defines two lists X=numpy.arange(x[0],x[-1]+.5,.1) pylab.plot(x,y,ro) coeffs=numpy.polyfit(x,y,1) Y=numpy.polyval(coeffs,X) pylab.plot(X,Y,r-) y2=[3.42, 11.24, 19.86, 34.87] pylab.plot(x,y2,bs) coeffs2=numpy.polyfit(x,y2,2) Y2=numpy.polyval(coeffs2,X) pylab.plot(X,Y2,b-) pylab.show()
45
40
35
30
25
20
15
10
0 1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
In the rst call to the function polyt(), it returns a list coes=[ 1.12 -0.2 ], with the coecients m and b; strangely it has no commas, so be careful. This list can be passed directly into polyval(), which returns the y values evaluated at the positions in the X list (or array) that is passed in. We plot these values as the red line, compared to the original data points (x, y) which are red dots. In the second call to the function polyt(), we sent it new values for y (y2) and asked for a second order polynomial t. coes2 is [ 1.7975 1.3095 0.5925] which are a, b, c respectively. This in turn gets churned through polyval() and plotted as the blue curve (as compared to y2 which are the blue dots).
197
6.13.1
More on slicing with arrays
We zoomed by the ner points of list and array indexing earlier in the chapter, eager to get to plotting. Now we delve deeper. First a review of lists. You will recall that in python, indexing starts with 0, so for the list L=[0,2,4,6,8], L[1] is 2. The index of the last item is -1, so L[-1]=8. To nd out what the index for the number 4 is, for example, we have the index() method: L.index(4), which will return the number 2. We actually already used this method when we implemented command line arguments, but it wasnt really explained. We know that to reassign a given index a new value we use the syntax L[1]=2.5. And to use a part of a list (a slice) we use, e.g., B=L[2:4], which denes B as a list with Ls elements 2 and 3 (4 and 6). And you also know that B=L[2:] takes all the elements from 2 to the end. From these examples, you can infer that the basic syntax for slicing is [start:stop:step]; if the step is omitted it is assumed to be 1. Arrays (and matrices) work in a similar fashion to lists, but these are multidimensional objects, so things get hairy fast. The basic syntax is the same: [start:stop:step], or i:j:k. but with Python arrays, we step through all the js for each i at step k. This is best shown with examples: >>> import numpy >>> A=numpy.linspace(0,29,30) >>> B=A.reshape(5,6) array([[ 0., 1., 2., 3., [ 6., 7., 8., 9., [ 12., 13., 14., 15., [ 18., 19., 20., 21., [ 24., 25., 26., 27., >>> B[1:3,:-1:2] array([[ 6., 8., 10.], [ 12., 14., 16.]])
4., 10., 16., 22., 28.,
5.], 11.], 17.], 23.], 29.]])
Lets pick about the statement B[1:3,:-1:2] to see if we can understand what it does. The rst part alone returns lines 2 and 3: >>> B[1:3] array([[ 6., [ 12., 7., 13., 8., 14., 9., 15., 10., 16., 11.], 17.]])
Here j goes from [:-1], in other words, we all but the last element: >>> B[1:3,:-1] array([[ 6., 7., [ 12., 13., 8., 14., 9., 15., 10.], 16.]])
198
And nally, we have the step of 2, which takes every other element: >>> B[1:3,:-1:2] array([[ 6., 8., [ 12., 14.,
10.], 16.]])
For more on array slicing (indexing), see: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
6.13.2
Looping through arrays
Earlier in the course, we learned that for loops with lists just step through item by item. In n-dimensional arrays, they steps through row by row (like in slicing). For example, >>> for r in B: ... print r ... [ 0. 1. 2. 3. 4. [ 6. 7. 8. 9. [ 12. 13. 14. 15. [ 18. 19. 20. 21. [ 24. 25. 26. 27.
5.] 10. 16. 22. 28.
11.] 17.] 23.] 29.]
If you really want to step through element by element, you can use the ravel() method which attens an N-dimensional array to a single dimension: >>> for e in B.ravel(): ... print e ... 0.0 1.0 2.0 3.0 etc. For more on looping (or iterating), see: http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
6.13.3
Random Numbers
Many scientic investigations use techniques such as Monte Carlo simulations, or bootstrap statistics, which you (hopefully) will learn about in future classes. These applications require that you be able to generate random numbers from a variety
199
of distributions. Python does this with the numpy.random module. To learn more, read the documentation at: http://docs.scipy.org/doc/numpy/reference/routines.random.html Lets look at a few of these: #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab from numpy import random N,min,max=500,10,20 bins=(max-min)*2 Nums=[] for n in range(N): Nums.append(random.uniform(min,max)) pylab.hist(Nums,bins=bins,facecolor=orange) pylab.title(Uniform distribution) pylab.show()
35 30 25 20 15 10 5 0 10 12
Uniform distribution
14
16
18
20
The new twist here is the call to the uniform() method of the random() module of NumPy. This returns uniformly distributed numbers between the min and max values specied. The other embellishments were to the hist() module in matplotlib by increasing the number of bins (the default is too few in my opinion) and to change the color of the columns to a pretty shade of orange.
200
6.13.4
Normal Distribution
There are dozens of other distributions available in the random module, but the all time favorite is the normal, or Gaussian distribution. Here we have an example of how to use it to retrieve numbers from a normal distribution with a mean of 10 and a standard deviation of 2: #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab from numpy import random N,mu,sigma=500,10,2 Nums=[] for i in range(N): Nums.append(random.normal(mu,sigma)) pylab.hist(Nums,bins=20,facecolor=orange) pylab.title(Normal distribution) pylab.show()
60 50 40 30 20 10 02 4 6
Normal distribution
10
12
14
16
6.13.5
Statistics in Python
All this talk about normal distributions makes me hungry for some statistical analysis. There are a large number of statistical packages available in the Enthought Python Distribution we are using for this class. Numpy has a number of useful functions and there are many more hidden in the stats module of the Scipy package.
201
For a summary of available functions, see: The documentation is still a bit thin, but to get you started, see: http://docs.scipy.org/doc/numpy/reference/routines.statistics.html and http://docs.scipy.org/doc/scipy/reference/stats.html Here we will consider a few useful functions to give you a feel for how they work. Lets start with 1D arrays and nd the mean, standard deviation and sum. For fun, we can use some of the fake data generated by the random.normal() function and see how the actual means and standard deviations of a data set drawn from a normal distribution compare with the true mean of and standard deviation . From the normal distribution example, we generated a list of numbers drawn from a distribution with = 10 and sigma = 2: #!/usr/bin/env python import numpy from numpy import random N,mu,sigma=500,10,2 Nums=[] for i in range(N): Nums.append(random.normal(mu,sigma)) ANums=numpy.array(Nums) print numpy.mean(ANums), numpy.std(ANums),numpy.sum(ANums) We wanted to calculate the mean, standard deviation and sum of our sample (Nums using the functions numpy.mean(), numpy.std(), numpy.sum(). These work on arrays, not lists, so we rst convert to an array: ANums=numpy.array(Nums). When we run this code, we get: 10.0428129458 2.0027306608 5021.40647292 Note that you will get a dierent answer every time you run this because the random sample really is pretty random. The mean and standard deviation here are 10.042.... and 2.0027, which are close to the true values of 10 and 2 respectively. ASSIGNMENT P8: Modify the statistics example above to calculate 1000 versions of Nums, with an N of 10. Calculate the mean and standard error (standard deviation/ N ) values
202
for each sample. Plot a histogram of the means. The standard error times 1.96 is the 95% condence bound for the mean, i.e., the mean these bounds should contain the true mean (10) 95% of the time. For what fraction of the 1000 samples is this true? We can also use the statistics methods on n-dimensional arrays. Here, the argument can specify the axis along which we want to do the calculation. Recalling the arrays from before, we can illustrate the use of the numpy.sum() function as follows: >>> A= numpy.array([[1,2,3],[4,2,0],[1,1,2]]) >>> A.sum(axis=0) array([6, 5, 5]) >>> A.sum(axis=1) array([6, 6, 4])
6.14
Graphical User Interfaces - GUIs
Having introduced you to the joys of command lines, why now a lecture on GUIs? GUIs work dierently than all the other scripts we have been writing in that they dont just start at one end and proceed through to the end - the program ow can be controlled by the user. GUIs let command line phobic people use your software. They allows greater degree of interactivity with your data visualization. They can streamline data analysis in an intuitive way. And, they are fun. Okay, how do I make a GUI? GUIs are composed of a number of things like text boxes, radio buttons, sliders, etc. called widgets. When the program starts, it rst builds the GUI then waits for something to happen (a menu to be selected or a mouse click or some data to be entered.... ) - a state called an event loop. There are a number of ways of creating GUIs in Python. The oldest and most standard way is called Tkinter which is based on the even older UNIX language Tk. The newer wxPython has now reached some maturity and comes standard with the Enthought Python Distribution. see: http://wiki.wxpython.org/Getting%20Started http://zetcode.com/wxpython/ and http://wiki.wxpython.org/wxPython%20by%20Example It seems to be the way things are going, so well use wxPython in this class.
6.14. GRAPHICAL USER INTERFACES - GUIS There are six essential parts to a wxPython GUI: import wx app = App() OnInit() or init () function frame = wx.Frame() frame.Show() app.MainLoop() Imports the necessary wxPython package App() is a sub class of wx.App app is an instance thereof Specify what happens on initialization. Frame is a class for the main windows frame is a particular window instance. reveals the window - like pylab.show() starts the progam
203
Here is a simple example that makes a window with an image in it:

Did you know this was called the shebang line?
STARTS THE MAIN EVENT LOOP wxPython package describes the application subclass initializes the App subclass makes an image object makes the main window (frame in wxPython) describes the frame in this application prepares image for display initializes the frame and displays the image
resizes the window around the image makes a status bar
which makes this window:
204
CHAPTER 6. LISAS PYTHON NOTES Elements of a GUI: Widgets Things you put on your GUI like text boxes, buttons, radio buttons, check boxes, pull down menus, etc. see gure below. Things that trigger actions like mouse movements and clicks, menu selections, button clicks, etc. Pop up windows with information or action items. Setting up your GUI with a nice layout Getting data in and out.
Events
Dialogs and Messages Layout Input and Output
Here are some common GUI widgets:
Check boxes Buttons
Menus
List boxes
Text Entry
Radio Buttons
Common GUI widgets
From wxPython in Action, by Rappin - a great intro to GUIs.
The example is for a simple editor illustrating widgets, events, dialogs, and layout.
6.14. GRAPHICAL USER INTERFACES - GUIS
205
Make a container to put things in
Make a box for layout management Make an text editing window & add to box
Dialog pop up returns Make a button, bind it to OnQuit & add to box button id as result, dies Kills program, if result Fix up layout a bit is OK button
which looks like this:
Now lets add some le I/O:
206

Define some ID numbers for future use Make a Menu object, append two options with a separator Make a menuBar object, append the Menu object under the label File and stick it on the top of the frame Bind actions to each menu item ( g OnOpen to ID_OPEN) (e.g, p Gets the file name, opens it and reads it into the text editor (named control)
which creates a text box and a le menu. You can open a le for editing (but saving it costs extra.)
You can edit this, but you cant save it.
207
6.14.1
Interacting with plots
In the section on statistics, I demonstrated a program to calculate gaussian data and plot them as a histogram. The mean and standard deviation were hardwired into the code. There are a bunch of ways we could make this program more customizable, some of which use GUIs: 1. read in the mean and standard deviation using raw input() queries (annoying!) 2. read them in from a le using a redirect < 3. read them in as command line switches 4. read them in from Entry boxes using a GUI, construct a command and running the program automatically through the command line as in the above ideas. 5. read them in from Entry boxes using a GUI, and call a method from within the GUI You should already know how to do the rst three options from what you have already learned. But here are a few examples for fun. Option 1) Program queries with raw input(): #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab,sys import exceptions # new module that allows error trapping! from numpy import random """random_2_i""" N,mu,sigma=500,10,2 pylab.ion() fig=pylab.figure(1,figsize=(5,4)) while 1: Nums=[] try: # executes unless error! mu=float(raw_input(Enter mean: )) sig=float(raw_input(Enter sigma: )) for i in range(N): Nums.append(random.normal(mu,sigma)) pylab.hist(Nums,bins=20,facecolor=orange) pylab.title(Normal distribution) pylab.draw() ans=raw_input("Press return to continue, q to quit: ") if ans==q: break pylab.clf() # clears figure so we can plot a new one
208
except TypeError: # executes if there is a TypeError print Invalid entries, try again print Quitting which produces this:
The program will keep updating the gure until you type q. There are two new things in this code snippet, which could come in handy: 1) the pylab.clf() command, which clears the gure and allows us to plot a new one and 2) error trapping with the module exceptions. In this example, when you enter an invalid number, the error trap syntax: try: execute some code if no error happens except: do this if you get an error - note you can specify the error type. will give you a warning and then go back up to the top of the while loop. Option 2) Read from standard input #!/usr/bin/env python
6.14. GRAPHICAL USER INTERFACES - GUIS import matplotlib matplotlib.use("TkAgg") import pylab,sys from numpy import random """random\_2\_file.py""" fig=pylab.figure(1,figsize=(5,4)) pylab.title(Normal distribution) line=sys.stdin.readline() N,Nums=500,[] params=line.split() mu,sigma=float(params[0]),float(params[1]) for i in range(N): Nums.append(random.normal(mu,sigma)) pylab.hist(Nums,bins=20,facecolor=orange) pylab.show() We can execute this program with a command like: % random_2_file.py < random_file.dat
209
The le random le.dat contains the desired parameters. Option 3) Command line switch #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab,sys from numpy import random """random\_2\_switch.py""" N,mu,sigma=500,10,2 pylab.ion() if -mu in sys.argv: mu=float(sys.argv[sys.argv.index(-mu)+1]) if -sig in sys.argv: sig=float(sys.argv[sys.argv.index(-sig)+1]) Nums=[] for i in range(N): Nums.append(random.normal(mu,sigma)) pylab.hist(Nums,bins=20,facecolor=orange) pylab.title(Normal distribution) pylab.draw() raw_input("Any key to quit") This uses the already familiar use of command line switches and can be run with a command like: % random_2_switch.py -mu 10 -sig 2
210
Option 4) Construct command line and run from within GUI:
#!/usr/bin/env python import os,wx """ gauss_GUI_cmd.py""" ID_MEAN,ID_STD=101,102 class App(wx.App): def OnInit(self): self.frame=MyFrame(None,-1,Gauss command line) self.frame.Center() self.frame.Show() return True class MyFrame(wx.Frame): def __init__(self, parent,id,title): wx.Frame.__init__(self,parent,id,title,(-1,-1),wx.Size(300,300)) panel=wx.Panel(self,-1) self.mean_label=wx.StaticText(panel,-1," Mean",wx.DLG_PNT(panel,15,5)) self.mean_entry=wx.TextCtrl(panel,ID_MEAN,"",wx.DLG_PNT(panel,40,5),\ wx.DLG_SZE(panel,80,12)) self.std_label=wx.StaticText(panel,-1," STD",wx.DLG_PNT(panel,15,20)) self.std_entry=wx.TextCtrl(panel,ID_STD,"",wx.DLG_PNT(panel,40,20), wx.DLG_SZE(panel,80,12)) button=wx.Button(panel,-1,"Plot",wx.DLG_PNT(panel,40,45),wx.DLG_SZE(panel,25,12)) button.Bind(wx.EVT_BUTTON,self.PlotGauss) def PlotGauss(self,event): commandstring=random_2_switch.py -mu +self.mean_entry.GetValue() + -sig +self.std_entry.GetValue() print running: ,commandstring os.system(commandstring) app=App(False) app.MainLoop()
And we get:
211
Option 5) Embedding plot in GUI This is the most elegant option of all and makes us real GUI experts: Basic Structure: #!/usr/bin/env python import matplotlib; matplotlib.use(TkAgg) from matplotlib.backends.backend_wx import FigureCanvasWx,\ FigureManager, NavigationToolbar2Wx # import some matplotlib tools for wxPython import numpy,pylab,wx ID_MEAN,ID_STD,ID_PLOT=wx.NewId(),wx.NewId(),wx.NewId() # makes wxPython assign ids class PlotFigure(wx.Frame): # initializes plot window def __init__(self): wx.Frame.__init__(self, None, -1, "matplotlib in wxFrame") HERE WE INITIALIZE SOME STUFF LIKE BUTTONS AND AND THE PLOT WINDOW def PlotGauss(self, event): HERE WE DO THE PLOTTING def OnQuit(self,event): # quits nicely
212 self.Destroy() app = wx.PySimpleApp() frame = PlotFigure() frame.Show() app.MainLoop()
# a quick start for simple apps.
Lets take a closer look at the PlotFigure class. This is where I design the basic GUI. I want a plot window to stick my matplotlib gure. I want a toolbar under my gure. I want some text entry boxes for putting in the mean and standard deviation I want a button to trigger a new plot, and I want a button to quit the program nicely. These all need to be laid out in a reasonable fashion. So, rst, I need to make some boxes to put things in. Actually, I need two boxes for this: one to put all the text entry and plot widgets in and one to assemble everything nicely in. To create the entry and plot widgets and putting them in a box I can use the following code: box=wx.BoxSizer(wx.HORIZONTAL) # makes a box to put things in - horizontally self.mean_label=wx.StaticText(self,-1," Mean") # makes a label self.mean_entry=wx.TextCtrl(self,ID_MEAN,"") # makes a text entry box self.mean_entry.SetValue(10.) # initializes the value to 10. self.std_label=wx.StaticText(self,-1," STD") self.std_entry=wx.TextCtrl(self,ID_STD,"") self.std_entry.SetValue(2.) self.plotbutton=wx.Button(self,-1,"Plot") # makes a button labeled Plot self.plotbutton.Bind(wx.EVT_BUTTON,self.PlotGauss) # binds button to PlotGauss self.quitbutton=wx.Button(self,-1,"Quit") # makes a button labeled Quit self.quitbutton.Bind(wx.EVT_BUTTON,self.OnQuit) # binds button to OnQuit box.Add(self.mean_label, 0, wx.GROW) # puts the mean label into the box box.Add(self.mean_entry, 0, wx.GROW) # puts in the mean text entry box box.Add(self.std_label, 0, wx.GROW) # puts in the std label box.Add(self.std_entry, 0, wx.GROW) # puts in the std text entry box box.Add(self.plotbutton, 0, wx.GROW) # puts in the plot button Now lets work on the main box - here called sizer: self.fig = pylab.figure(num=1,figsize=(5,4)) # make a figure instance self.fig.add_subplot(111) # put a starter plot in it self.PlotGauss(self) # make PlotGauss a method of this class self.canvas = FigureCanvasWx(self, -1, self.fig) # make a canvas object, attach the figure self.toolbar = NavigationToolbar2Wx(self.canvas) # make a toolbar self.toolbar.Realize() # attach it to the canvas tw, th = self.toolbar.GetSizeTuple() # get the size of the toolbar fw, fh = self.canvas.GetSizeTuple() # get the size of the canvas self.toolbar.SetSize(wx.Size(fw, th)) # set the width of the toolbar to the canvas
213
sizer = wx.BoxSizer(wx.VERTICAL) # make a box called sizer, things will add vertically sizer.Add(self.canvas, 1, wx.LEFT|wx.TOP|wx.GROW) # put in the canvas sizer.Add(box, 0, wx.GROW) # put in the widget box (with text entry boxes, etc.) sizer.Add(self.quitbutton, 0, wx.GROW) # put in the quit button sizer.Add(self.toolbar, 0, wx.GROW) # put in the toolbar self.SetSizer(sizer) # resize things nicely self.Fit() # make things fit in properly And nally the plot function bit: def PlotGauss(self, event): pylab.figure(num=1) # set the current figure to #1 pylab.clf() # clear the current figure mu=float(self.mean_entry.GetValue()) # retrieve mu, sigma from their boxes sigma=float(self.std_entry.GetValue()) # and make them floats N,Nums=500,[] for i in range(N): Nums.append(random.normal(mu,sigma)) pylab.hist(Nums,bins=20,facecolor=orange) self.canvas.draw() # redraw the plot window self.canvas.gui_repaint() # refresh the canvas frame with the new plot Putting it all together we have: #!/usr/bin/env python import matplotlib; matplotlib.use(TkAgg) from matplotlib.backends.backend_wx import FigureCanvasWx,\ FigureManager, NavigationToolbar2Wx import numpy,pylab,wx from numpy import random """Program gauss_GUI_embed.py""" ID_MEAN,ID_STD,ID_PLOT=wx.NewId(),wx.NewId(),wx.NewId() class PlotFigure(wx.Frame): def __init__(self): wx.Frame.__init__(self, None, -1, "matplotlib in wxFrame") self.fig = pylab.figure(num=1,figsize=(5,4)) self.canvas = FigureCanvasWx(self, -1, self.fig) self.toolbar = NavigationToolbar2Wx(self.canvas) self.toolbar.Realize() tw, th = self.toolbar.GetSizeTuple() fw, fh = self.canvas.GetSizeTuple() self.toolbar.SetSize(wx.Size(fw, th)) sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.canvas, 1, wx.LEFT|wx.TOP|wx.GROW) box=wx.BoxSizer(wx.HORIZONTAL) self.mean_label=wx.StaticText(self,-1," Mean") self.mean_entry=wx.TextCtrl(self,ID_MEAN,"") self.mean_entry.SetValue(10.) self.std_label=wx.StaticText(self,-1," STD") self.std_entry=wx.TextCtrl(self,ID_STD,"") self.std_entry.SetValue(2.)
214
self.plotbutton=wx.Button(self,-1,"Plot") self.plotbutton.Bind(wx.EVT_BUTTON,self.PlotGauss) self.quitbutton=wx.Button(self,-1,"Quit") self.quitbutton.Bind(wx.EVT_BUTTON,self.OnQuit) box.Add(self.mean_label, 0, wx.GROW) box.Add(self.mean_entry, 0, wx.GROW) box.Add(self.std_label, 0, wx.GROW) box.Add(self.std_entry, 0, wx.GROW) box.Add(self.plotbutton, 0, wx.GROW) sizer.Add(box, 0, wx.GROW) sizer.Add(self.quitbutton, 0, wx.GROW) sizer.Add(self.toolbar, 0, wx.GROW) self.SetSizer(sizer) self.Fit() self.fig.add_subplot(111) self.PlotGauss(self) def PlotGauss(self, event): pylab.figure(num=1) pylab.clf() mu,sigma=float(self.mean_entry.GetValue()),float(self.std_entry.GetValue()) N,Nums=500,[] for i in range(N): Nums.append(random.normal(mu,sigma)) pylab.hist(Nums,bins=20,facecolor=orange) self.canvas.draw() self.canvas.gui_repaint() def OnQuit(self,event): self.Destroy() app = wx.PySimpleApp() frame = PlotFigure() frame.Show() app.MainLoop() This gives a completed GUI that responds to the text entry and replots the histogram when the plot button is clicked. It also quits nicely when you click on quit.
6.14.2
Event Handling in matplotlib
In the section on matplotlib, we learned about the pylab module. Behind the scenes, pylab is an interface to three layers in matplotlib: the FigureCanvas (the area onto which the gure gets drawn), the Renderer (the thing that does the drawing) and the Artist (controls the Renderer to paint on the FigureCanvas). Up to now, we used pylab handle the details for us. To take control of the plots (placement of gures, fonts, tickmarks, axes....) you need to know more about Artist containers (Figure, Axis, and Axes) and things that get drawn in them called primitives (lines,
215
rectangles, text, images). For a nice tutorial look in the matplotlib documentation: http://matplotlib.sourceforge.net/users/artists.html#artist-tutorial Artist Artist objects (like lines, tick marks, axes, text) are all congurable. When you use a command like add axes or add subplot you create an Axes instance. Remember how we made an Axes instance and called ax? fig=pylab.figure() ax=fig.add_subplot(111) Every time you plot something on g with,e.g., the plot command, you create an object of the Line2D class (yes even points). pylab keeps a record of each of these plot instances by adding to a list associated with the Axis instance and retrieved by the method lines (e.g., ax.lines is the list of plotting calls on the Axes instance ax). BTW: This is how the legend commanworks, if you remember. So by identifying the line you want in the list, you can change its attributes (e.g., color, linewidth, linestyle or marker). I realize that was way more than you wanted to know right now, but we will need some idea of what FigureCanvas does in the following. Line color #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab, numpy """ Program linecolor.py""" pylab.ion() # makes plot interactive fig=pylab.figure() # makes a figure instance ax=fig.add_subplot(111) # Axes instance t=numpy.arange(0,1,.01) s=numpy.sin(2*numpy.pi*t) c=numpy.cos(2*numpy.pi*t) ax.plot(t,s,color=blue,lw=2) #Line2D instance ax.plot(t,c,color=magenta,lw=2) pylab.draw() print ax.lines # prints all your plot instances print last line color: ,ax.lines[-1].get_color() raw_input("Any key to change last line to red ") ax.lines[-1].set_color(red) # sets last line to red pylab.draw() raw_input()
216 Mouse events
Earlier, we learned how to make GUIs using wxPython and even managed to embed a matplotlib gure into a wxPython Frame. But we couldnt interact with the plot directly, say by clicking on an individual box in the tictactoe example and have the program place an X or an O there. To do this, we need the program to recognize key or mouse events and return these to the program. In wxPython we had events (EVT BUTTON) which were connected to callback functions (like OnQuit) using the method BIND. In matplotlib we have a similar event button press event which can be connected to a function (e.g., onclick) using the FigureCanvas method mpl connect, e.g.: fig.canvas.mpl_connect(button_press_event, \ onclick)
Mouse events are the most common type of interaction with plots. We can use them to identify data points, say in a digitizer, or to ag them as bad, or pick them as special (e.g., P wave arrival, stratigraphic tie point, start or end point for a calculation) matplotlib supports several mouse events: Event Name Description button press event: mouse button pressed button release event mouse button is released motion notify event mouse action mouse scroll wheel is rolled scroll event pick event an Artist object is selected Here is an example of a button press event: #!/usr/bin/env python """ Program onclick.py""" import matplotlib, numpy matplotlib.use("TkAgg") import pylab from matplotlib.backend_bases import FigureCanvasBase # imports canvas tools pylab.ion() fig=pylab.figure() ax=fig.add_subplot(111) data=numpy.random.rand(10) # get 10 random numbers ax.plot(data) ax.plot(data,ro) pylab.draw() def onclick(event): print button=%d, x=%d, y=%d, xdata=%f,\ ydata=%f%(event.button, event.x, event.y, \
6.14. GRAPHICAL USER INTERFACES - GUIS event.xdata, event.ydata) cid=fig.canvas.mpl_connect(button_press_event,\ onclick) # connect the button press to the function onclick raw_input() #pauses the program You could combine this with the editor we wrote before to make a digitizer!
217
In the last example, we just identied the location of the mouse click, but didnt identify any particular plot object. In principle, each object (line, text, rectangle, axes) could be picked. There is a catch however. When you create the object, you have to do these things too: 1. set the picker to True (and usually some oating point tolerance). 2. connect the pick event to some action 3. dene the action. Consider the following program: #!/usr/bin/env python import matplotlib; matplotlib.use("TkAgg") import pylab,numpy class LineColor: """connects the picker to an Artist Line2D object and changes line color""" def __init__(self,line): self.line=line self.connect=\ self.line.figure.canvas.mpl_connect(\ pick_event,self.on_pick) def on_pick(self,event): # finds right line if event.artist!=self.line: return self.line.set_color(red) # makes red self.line.set_linewidth(2) # makes fatter self.line.figure.canvas.draw() # redraws line def main(): """ Program clickme.py Plots some lines that are clickable, connects them to the LineColor action. """ fig=pylab.figure() ax=fig.add_subplot(111) t=numpy.arange(0,1,.01) s=numpy.sin(2*numpy.pi*t) ax.plot(t,s,color=blue,picker=True) ax.plot(t+.25*numpy.pi,s,color=magenta,picker=True) ax.plot(t+.5*numpy.pi,s,color=cyan,picker=True) lines=[] # makes a list to store clickable line objects
218
for line in ax.lines: # steps through list of plot objects ln=LineColor(line) # make the line a LineColor objects lines.append(ln) # stores in the list pylab.show() main() The class LineColor is the action that happens when a plot object gets clicked on. On initialization, it connects the click action to the function on pick which turns the object red and fattens it up a bit. The main program draws some lines. Each plot instance gets stored in the list ax.lines. Then the objects in ax.lines get turned into clickable LineColor objects. When you click on a line, it turns red and fattens up a bit. Your nal is to do a tic-tac-toe program, and it would be fun to make it work inside a GUI. To get you most of the way there, I wrote a really silly Tic-Tac-Toe program. It is not a GUI, just a clickable matplotlib window: #!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab import numpy,sys,exceptions from numpy import random def finish(who,xline,yline): """ Check who won""" if len(xline)>0: if who==0: # player wins pylab.plot(xline,yline,r-) print you have won! else: # computer wins pylab.plot(xline,yline,g-) raw_input(you have lost to a stupid computer!) pylab.draw() cid = fig.canvas.mpl_connect(button_press_event, quit) # quit on mouse click print click anywhere on plot to quit def quit(event): # graceful exit sys.exit() def winner(who,myboxes): """ checks to see if boxes selected have won """ # check for diagonals first if 11 in myboxes and 22 in myboxes and 33 in myboxes: \ finish(who, [.5,2.5],[.5,2.5]) if 13 in myboxes and 22 in myboxes and 31 in myboxes: \ finish(who, [.5,2.5],[2.5,.5]) # check for rows) if 11 in myboxes and 21 in myboxes and 31 in myboxes: \
6.14. GRAPHICAL USER INTERFACES - GUIS finish(who, [.5,2.5],[.5,.5]) if 12 in myboxes and 22 in myboxes finish(who, [.5,2.5],[1.5,1.5]) if 13 in myboxes and 23 in myboxes finish(who, [.5,2.5],[2.5,2.5]) # check for columns) if 11 in myboxes and 12 in myboxes finish(who, [.5,.5],[.5,2.5]) if 21 in myboxes and 22 in myboxes finish(who, [1.5,1.5],[.5,2.5]) if 31 in myboxes and 32 in myboxes finish(who, [2.5,2.5],[.5,2.5]) and 32 in myboxes: \ and 33 in myboxes: \ and 13 in myboxes: \ and 23 in myboxes: \ and 33 in myboxes: \
219
def onclick(event): # if someone clicks in a square, return x,y """ what to do for mouse clicks""" x,y=event.xdata,event.ydata # assign mouse click to x and y if y<1: # first row if x< 1: # box 1,1 xtext,ytext= 1,1 elif x<2: # box 2,1 xtext,ytext= 2,1 else: # box 3,1 xtext,ytext= 3,1 elif y<2: # second row if x< 1: # box 1,2 xtext,ytext= 1,2 elif x<2: # box 2,2 xtext,ytext= 2,2 else: # box 3,2 xtext,ytext= 3,2 else: # third row if x< 1: # box 1,3 xtext,ytext= 1,3 elif x<2: # box 2,3 xtext,ytext= 2,3 else: # box 3,3 xtext,ytext= 3,3 ## check if legal move (if already taken!) box=str(xtext)+str(ytext) # make the box name if box not in Xs and box not in Os: # box not yet taken pylab.text(xtext-.5,ytext-.5,x,fontsize=24,color=red) # put a red x in print \n your move: box id , box, \n Xs.append(box) del boxes[boxes.index(box)] # delete box from boxes winner(0,Xs) # check if winner, 0 is player ## pick a box at random for the computers move! ind = random.randint(0,len(boxes)-1) # pick a box at random pylab.text(int(boxes[ind][0])-.5,int(boxes[ind][1])-.5,o,\ fontsize=24,color=green) # put green o in box Os.append(boxes[ind]) winner(1,Os) # check if winner, 1 is computer
box
220
del boxes[ind] # delete box from boxes ## check if computer won # else: # box already taken print " \n box ",box, is taken!, choose another one\n ### now figure out computers move! c def main(): """ tictactoe2.py program""" global fig,Xs,Os,boxes # makes these have global scope pylab.ion() boxes=[11,12,13,21,22,23,31,32,33] # some box labels Xs=[] # list of all x moves Os=[] # list of all o moves fig = pylab.figure() # create figure instance ax = fig.add_subplot(111) # make a subplot ax.plot([0,3,3,0,0],[0,0,3,3,0],k-) # draws a box ax.plot([0,3],[1,1],k-) # make black lines ax.plot([0,3],[2,2],k-) ax.plot([1,1],[0,3],k-) ax.plot([2,2],[0,3],k-) ax.text(0.1,0.1,11) # labels boxes with their ids ax.text(0.1,1.1,12) ax.text(0.1,2.1,13) ax.text(1.1,0.1,21) ax.text(1.1,1.1,22) ax.text(1.1,2.1,23) ax.text(2.1,0.1,31) ax.text(2.1,1.1,32) ax.text(2.1,2.1,33) pylab.draw() # draw the canvas cid = fig.canvas.mpl_connect(button_press_event, \ onclick) # tell what do do for mouse clicks raw_input() # make program wait for response main() # run main program ASSIGNMENT P9: You have been writing Fortran code to play tic-tac-toe. For this (nal) assignment, write a program that calls on your F90 functions to calculate the computers moves instead randomly choosing an empty box. Tie these in to the Tic-Tac-Toe program (in the onlick() function) so you can play a more challenging game! Now put the whole thing inside a GUI.
6.15
3D plotting with Python
Contour plots are really just a way to visualize something that is inherently 3D on a 2D surface. Think about a topographic map - the contour intervals are elevations and our brains can reconstruct the 3D world by looking at the contours on the map.
6.15. 3D PLOTTING WITH PYTHON
221
But with computers we can visualize the 3D world in a more realistic manner. There are lots of 3D plotting packages, and even within Python there are several dierent approaches, one using a 3D toolkit of matplotlib that uses the same logic as for regular matplotlib. For more on this module, see: http://matplotlib.sourceforge.net/mpl toolkits/mplot3d/index.html But for more 3D horsepower, there is a module called mlab, which is part of the enthought.mayavi module. See: http://github.enthought.com/mayavi/mayavi/mlab.html And then there is Mayavi itself, which comes with the Enthought Python Edition. This was way beyond what I know, but if you are curious, check out this website: http://github.enthought.com/mayavi/mayavi/examples.html Lets start with a 3D version of the gravity anomaly of a buried sphere:
#!/usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab,numpy from mpl_toolkits.mplot3d import axes3d G=6.67e-11 # grav constant in Nm^2/kg^2 (SI) R=2. # radius in meters z=3. # depth of burial drho=500 # density contrast in kg/m^3 x=numpy.arange(-2.*z,2.*z,0.1) y=numpy.arange(-2.*z,2.*z,0.1) X,Y=pylab.meshgrid(x,y) h=numpy.sqrt(X**2+Y**2) g= (G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2)) fig=pylab.figure() ax=axes3d.Axes3D(fig) ax.plot_surface(X,Y,g) ax.set_xlabel(X) ax.set_ylabel(Y) ax.set_zlabel(Z) pylab.show()
222
1.2 1.0 0.8 0.6 0.4 0.2 4 Z
2 X
2 Y
In this example, we import the axes3d module from the mplot3d toolkit of matplotlib. We create an Axes3D instance called ax from the gure object, g. Axes3D objects have lots of methods, one of which is plot surface, which plots a wireframe surface on the meshgrid X,Y of the data in g. Other methods are set xlabel() and so on. Note that when you try this example on your own computer, you can twirl the plot around to see it from various perspectives. You can save any of these with the little disk icon or with pylab.saveg() as before. To give you a avor of what mlab can do, here is essentially the same code, but using functions from mayavi.mlab: #!/usr/bin/env python import numpy from enthought.mayavi import mlab def g(X,Y): G=6.67e-11 # grav constant in Nm^2/kg^2 (SI) R=2. # radius in meters drho=500 # density contrast in kg/m^3 h=numpy.sqrt(X**2+Y**2) return 1e8*(G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2)) z=3. # depth of burial X,Y=numpy.mgrid[-2.*z:2.*z:0.1,-2.*z:2.*z:0.1] mlab.figure(bgcolor=(1,1,1)) s=mlab.surf(X,Y,g) mlab.show()
223
The call to mlab.gure creates a gure instance instance with the background color (bgcolor) set to white. In mlab, color gets set with the familiar r,g,b but here the colors run from 0 (black) to 1 (full strength), so a color of (1,1,1) is white, (1,0,0) is red, and so on. The default is for a black background. Then the surface gets drawn with a call to mlab.surf().
There are lots more 3D plotting functions available in the two packages described here. To whet your appetite, Ive picked out a few:
224
plot()
scatter3d()
plot_wireframe()
PolyCollection() contour()
bar3d()
plot_surface()
Examples from mplot3d.Axes3D examples http://matplotlib.sourceforge.net/examples/mplot3d/
plot3d() Magnetic field lines?
imshow()
points3d() pts in space
color contour plots
surf() contour_surf() topography, anomalies... contour plots mesh() spherical harmonics
flow()
flow fields
barchart() no idea....
triangular_mesh()
quiver3d() currents, winds magnetic moments
tessellation grids
isopicnals or isothermals
contour3d() Examples from the Mlab gallery http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/examples.html
225
Here are some considerations for you to help you decide which way you want to go: mplot3d versus mlab mplot3d (matplotlib) mlab (mayavi) Pros: Pros: mplot3d is a natural extension of pylab Prettier it is easier to learn Interactivity pylab functions work for mplot3d too More functions Can be animated :) Cons Cons Limited plotting styles no svg output ps and eps are buggy slower harder to learn for pylab masters
6.15.1
Geoscience applications
There wouldnt be much point in learning how to program in a geoscience class if there were no practical applications. In this section I will very briey point out a few ways beyond the obvious X,Y plots and maps that you have already encountered. Lines of ux Professor R.L. Parker (our hero and professor emeritus of the SIO department) wrote a Fortran (f77) program called force.f. This was a slight modication of his magmap program that calculates magnetic eld vectors given a geomagnetic reference model (see Bobs software website http://igppweb.ucsd.edu/parker/Software/). The program force.f has disappeared from Professor Parkers website in the mean time but is available on the class website at: http://mahi.ucsd.edu/class233/force.zip You run it with a session like this: it creates a le draws the magnetic lines of ux from the core outward and saves the data to a le like this: % force
=================
226
Enter commands for spherical harmonic coefficients (? for help)
igrf 2005 lines 200 radius 0.547 output lines.f05 exec
=================== igrf 2005 lines 200 radius 0.547 output lines.f05 =================== Field evaluated for IGRF 2005 Equatorial radius (km) 6378.170 Polar radius (km) 6356.910 Reference radius (km) 6371.200 Evaluation r/a 0.547 Maximum degree of accepted coefficients: 13 Coefficients normalized to radius: 0.54700 New field-line file opened in: lines.f05 Initial and terminal points 0.047 -0.049 0.998 0.054 -0.040 -0.071 0.066 -0.995 0.532 0.164 0.226 0.217 -0.950 0.132 0.266 -0.272 0.686 -0.674 -0.160 0.516 0.083 0.987 -0.139 0.123 0.945 0.487 -0.385 -0.784 0.479 -0.382 etc. 0.016 0.856 -0.516 0.105 0.632 0.642 0.316 -0.699 0.662 0.306 Field line coordinates written to lines.f05 ================= Enter commands for spherical harmonic coefficients (? for help)
0.998 -0.831 -0.955 0.842 0.302 -0.790 0.767 -0.684
quit
Magmap run complete
227
The red commands are typed by the user and then blue stu is the program response. This will create an output le lines.f05 which looks something like this: 0.047 0.054 -0.071 -0.054 -0.049 0.998 -0.040 0.998 2 50.00000 0.066 -0.995 0.070 -1.005 1.000 1.000 50.00000 1.000 1.009
50.00000
The rst three columns are x, y, z on a magnetic ux line and the fourth is R in units of core mantle boundary radii. Each eld line is separated from the rest by an entry with the number of points in the previous line and 3 50.000s Some of the eld lines go WAY out in space (100 CMB radii); well come back to this later. Parker also provided a script (look) which chops o the parts of the lines that are more than 4 radii away, projects the 3D lines onto a plane specied by the user and saves the data in a new le. It also invokes Parkers most famous program plotxy (which IS available on his website). Plotxy produces a postscript le mypost. When run with the command look 45 75 lines.f05, we get:
The thought occurs, wouldnt this look cool in 3D? Here is a little script inspired by the plot3d() example from the Mlab gallery.
228
#!/usr/bin/env python import numpy as np from enthought.mayavi import mlab lines=np.fromstring(open(lines.f05).read(),dtype=float,sep= ) Xs,Ys,Zs,Rs=lines[0:-4:4],lines[1:-3:4],lines[2:-2:4],lines[3:-1:4] line=0 lx,ly,lz,lr=[],[],[],[] mlab.figure(bgcolor=(1,1,1)) # sets the background to white while line<len(Xs): if Rs[line]<5: # truncates far away field lines lx.append(Xs[line]) ly.append(Ys[line]) lz.append(Zs[line]) lr.append(Rs[line]) elif Rs[line]>=5 and Ys[line]!=50.: # detects the 50s while Rs[line]>5 and Rs[line]!=50: line+=1 x,y,z,r=np.array(lx),np.array(ly),np.array(lz),np.array(lr) mlab.plot3d(x,y,z,r,colormap=Spectral) lx,ly,lz,lr=[],[],[],[] if Ys[line]==50. and line<len(Rs): x,y,z,r=np.array(lx),np.array(ly),np.array(lz),np.array(lr) mlab.plot3d(x,y,z,r,colormap=Spectral) lx,ly,lz,lr=[],[],[],[] line+=1 mlab.show()
which produces something like this (but in 3D you can wiggle it around - much more fun!):
6.15. 3D PLOTTING WITH PYTHON Eigenvectors
229
Linear algebra has a lot of applications in the geosciences. One of the most useful tricks is to calculate what are called eigenparameters. Say you have a bunch of points and want to calculate a best-t line through them - but they are in three dimensions. Or you want a best t plane, say the fault surface through a bunch of earthquakes. Or you want to know the principal axis of the moment of inertia tensor. Or the orientations of a stress or strain tensors. Or the preferred orientation of mineral grains or clasts in a sedimentary deposit. Or the directions of the anisotropy of just about anything. And the list goes on. Here is an example that nds the eigenvectors and eigenvalues of what is called the covariance matrix of a bunch of 3D points. These could be the end points of unit vectors (directions), or point masses in space, for example.
#!/usr/bin/env python import numpy from numpy import linalg from enthought.mayavi import mlab dat=open(points.xyz,rU).readlines() x,y,z=[],[],[] for line in dat: rec=line.strip(\n).split() x.append(float(rec[0])) y.append(float(rec[1])) z.append(float(rec[2])) X,Y,Z=numpy.array(x),numpy.array(y),numpy.array(z) T=numpy.array([[numpy.sum(X*X),numpy.sum(X*Y),numpy.sum(X*Z)],\ [numpy.sum(Y*X),numpy.sum(Y*Y),numpy.sum(Y*Z)],\ [numpy.sum(Z*X),numpy.sum(Z*Y),numpy.sum(Z*Z)]]) evals,evects=linalg.eig(T) print principal axis: ,evects.transpose()[0], with variance of ,evals[0] print major axis: ,evects.transpose()[1], with variance of ,evals[1] print minor axis: ,evects.transpose()[2], with variance of ,evals[2] pv=evects.transpose()[0]*3. mlab.figure(bgcolor=(1,1,1)) mlab.points3d(X,Y,Z,color=(0,0,0),scale_factor=0.25,opacity=.5) mlab.outline(color=(.7,0,0)) mlab.plot3d([pv[0],-pv[0]],[pv[1],-pv[1]],\ [pv[2],-pv[2]],tube_radius=0.1,color=(0,1,0)) mlab.show()
230
This script opens a le called points.xyz which looks like this: 1.4844e+00 1.6928e+00 9.7893e-01 1.1998e+00 7.6767e-01 1.5916e+00 1.4537e+00 etc. 5.9620e-02 2.2025e+00 4.7284e-02 1.0015e+00 -2.7332e-01 6.6763e-01 -1.9863e-01 5.1122e-01 2.1443e-02 7.9248e-01 9.7349e-02 1.8151e+00 1.3741e-01 2.1087e+00
It then parses the data into lists of oating point variables. These get turned into arrays. Then the sums of the products and squares get put into a coherence matrix of the form:

x2 xy xz
xy y2 yz
xz yz . z2
The function linalg.eigs() returns the eigenvectors and eigenvalues of this T matrix. The largest (principal) eigenvalue corresponds to the principal eigenvector and is the axis along which the variance (spread) is the greatest. The minor eigenvector corresponds to the axis along which the variance is least. One feature of this function is that the coordinates of the eigenvectors are along axis 0, so are the rst column of the evects array: evects= [[ 0.70464008 -0.70917141
0.02362748]
6.15. 3D PLOTTING WITH PYTHON [-0.001072 [ 0.70956409 0.03223454 0.99947976] 0.70429883 -0.02195351]]
231
That is why to print out the vectors: % points.py principal axis: [ 0.70464008 -0.001072 0.70956409] with variance of 733.172618519 major axis: [-0.70917141 0.03223454 0.70429883] with variance of 30.501077006 minor axis: [ 0.02362748 0.99947976 -0.02195351] with variance of 8.55431560739 I take the transpose. The principal eigenvector gets assigned to the array pv. he function points3d plots the points a nice shade of grey (opacity low). mlab.outline(color=(.7,0,0)) puts a red box around the gure and mlab.plot3d plots the principal eigenvector as a green tube. Nice.
232
Chapter 7
Peters Python Notes

These notes will reect my attempt to learn Python from Lisas lecture notes. My strategy will be to write a series of programs that mimic the programs in my Fortran90 notes. This assumes that Enthought Python is installed. To verify this, enter: % which python which should return: /Library/Frameworks/Python.framework/Versions/Current/bin/python It is possible to run Python from the command line (like Matlab), in which case it functions like a fancy calculator. However, it quickly becomes tiresome to enter everything manually, so we will start immediately with Python scripts, which execute a series of Python commands. These are text les that end in .py and can be thought of as Python source code. However, unlike F90 source code they do not need to be compiled before they are run. Python is similar to Matlab in this way. However, both Python and Matlab are much slower to run than compiled languages like Fortran and C. My rst program is called printmess.py: #! /usr/bin/env python # simple Python test program (printmess.py) print test message Provided one has execute permission on this le (chmod 755 printmess.py), this can be run by entering: % ./printmess.py test message % 233
234
CHAPTER 7. PETERS PYTHON NOTES
Notice that we need to start with ./ because . is not in our path (for security reasons). The rst line MUST be: #! /usr/bin/env python so that the le is interpreted as Python. Unlike Fortran or C, you CANNOT start with a comment line (try switching lines 1 and 2 and see what happens). The second line is a comment line. Anything to the right of # is assumed to be a comment (in Fortran ! serves the same function). Notice that print goes by default to your screen. You can use single or double quotes for the test message. You can get an apostrophe in your output by using double quotes and quote marks by using single quotes, i.e., #! /usr/bin/env # simple Python print "The pump print She said produces: % ./printmess2.py The pump dont work cuz the vandals took the handles She said "I know what its like to be dead" % In the second print statement, the backslash in front of the apostrophe is necessary to prevent an error (try it). ASSIGNMENT P1 Write a Python script to print your favorite pithy phrase. python test program 2 (printmess2.py) dont work cuz the vandals took the handles" "I know what it\s like to be dead"
7.1
How to multiply two integers
Here is a simple Python program to multiply 2 and 3 (multint.py): #! /usr/bin/env python a = 2 b = 3 c = a*b print product = , c
7.1. HOW TO MULTIPLY TWO INTEGERS
235
The program uses three variables, the letters a, b and c. Variable names should always start with a letter. The remaining characters can be any combination of letters, numbers, underscores ( ), and dashes (-). Unlike Fortran, variables are CASE SENSITIVE (x is dierent from X). Variables in Python can be of many types, including real (oating point), integer, complex, string, and logical. However, unlike C or Fortran their type is not explicitly dened. Instead it is determined by the syntax upon rst use, i.e., x = 2 defines x as integer x = 2. defines x as real x = 2 defines x as a string Basic Python has a very limited number of math operations, which include +, -, * (multiply), / (divide), ** (to power of), and % (remainder). To get more advanced math operations, use the numpy module (see below).
7.1.1
Declaring variables
Consider the following program with a typo in line 4: #! /usr/bin/env python abeg = 2.1 aend = 3.9 adif = aend - abge print adif = , adif You had intended to type abeg but typed abge instead. When you run the program, you get an error message: Traceback (most recent call last): File "./undeclared.py", line 4, in <module> adif = aend - abge NameError: name abge is not defined This is a desirable feature of Python. You dont want the program to run by assigning some arbitrary value to abge and giving you a wrong answer. Yet many languages will do exactly that, including Fortran (we can avoid this potential problem in Fortran by using the implicit none statement at the beginning of our programs).
236
7.1.2
Alternate coding options
There are always lots of ways to write the same program. Here is another way to write the multint.py code (multint2.py): #! /usr/bin/env python a = 2; b = 3; print product = , a*b Notice that more than one command can be included on a line if the commands are separated by a semicolon (also OK in F90). This makes the code more like C. This option is rarely a good idea because it makes the code harder to read. Unless you have a really good reason to put more than one command on a line (saving space is NOT a good reason!), I suggest that you never use semicolons in this way.
7.2
Numpy
Standard Python only includes basic arithmetic operators (+, -, *, /, ** (power), and % (modulus operator, see gcc example below)). For sqrt, trig and other highorder math functions, we need to import the numpy module and then then import the individual commands. For example (testnumpy.py): #! /usr/bin/env python import numpy print pi = , numpy.pi print sin(pi/6) = , numpy.sin(numpy.pi/6.) produces: % ./testnumpy.py pi = 3.14159265359 sin(pi/6) = 0.5 Notice that, like almost all languages, the trig arguments are in radians. It can be annoying to have to type numpy. lots of times. We can save typing by importing numpy as a name that we assign, i.e., #! /usr/bin/env python import numpy as np #or anything else print pi = , np.pi print sin(pi/6) = , np.sin(np.pi/6.) You can avoid having to type even the np. by importing everything: from numpy import *
7.3. MAKING A TRIG TABLE USING A FOR STATEMENT or just a few functions: from numpy import pi, sqrt
237
in which case one can use pi and sqrt with no prexes. However, this is NOT recommended because one can lose track of where things come from. You code will be clearer and less error prone if you include the prex to show which functions and variables are coming from numpy.
7.3
Making a trig table using a for statement
Any advanced programming language must provide a way to loop over a series of values for a variable. In C, this is most naturally implemented with the for statement. In FORTRAN this is done with the do loop. Python uses its own version of the for loop. Here is an example program that generates a table of trig functions: #! /usr/bin/env python import numpy as np degrad = 180./3.1415927 for theta in range(0, 90, 1): ctheta = np.cos(theta/degrad) stheta = np.sin(theta/degrad) ttheta = np.tan(theta/degrad) print %5.1f %8.4f %8.4f %8.4f %(theta, ctheta, stheta, ttheta) Notice the use of the variable degrad to convert from degrees to radians. Next comes for theta in range(0, 90, 1): This begins the for loop and contains a lot of interesting syntax. Unlike most languages (e.g., C or Fortran), the lines that follow that are inside of the for loop MUST be indented. Statements that expect a subsequent indentation level end in a colon (:). For clarity, most people indent loops in other languages, but the indentation is optional. Here the indentation is how Python knows what is inside the loop and where the loop ends, because Python has NO STATEMENT TO END THE LOOP (e.g., enddo in Fortran). The key point is that the 4 lines following the for statement, must ALL BE INDENTED EXACTLY THE SAME. Tabs and blanks are treated dierently in Python, so two lines may appear to have the same indentation but actually be dierent. Thus, it is safest to NEVER USE TABS so that one can always can see exactly what the true indentation is.
238
CHAPTER 7. PETERS PYTHON NOTES Within the for loop, theta will assume a series of specied values, in this case
given by range(0, 90, 1). Range is an integer function that in this case will assume the values from 0 to 89 (!), step 1. Why 89? Because it is the last number less than 90. Youre right to think this makes no sense but it is typical of Python syntax; the upper limit is always one more than the last value used. The 1 for step is optional because 1 is the default step size, i.e., range(0, 90) would give the same numbers. Note that because range is an integer function, you cannot enter real values as limits to dene theta as real in the for loop. This is consistent with F90, which does not allow real do loops, although they were OK in older versions of Fortran. Loops with real increments are not considered good programming practice because of the precision problems that repeated sums can cause. Next comes ctheta = np.cos(theta/degrad) This uses the numpy cosine function. Ideally theta should be a real variable in this expression, but fortunately Python gures out that int/real should be real (see more discussion of this later). Better style is to write ctheta = np.cos(float(theta)/degrad) where oat converts from integer to real (int converts from real to integer), or for maximum eciency we could write rtheta_deg = float(theta)/degrad ctheta = np.cos(rtheta_deg) stheta = np.sin(rtheta_deg) etc. To make the output look nice, we do not use print theta, ctheta, stheta, ttheta which would space the numbers irregularly among the columns. Instead, we explicitly specify the output format using a format specication: print %5.1f %8.4f %8.4f %8.4f %(theta, ctheta, stheta, ttheta) Here the output format is given in the single quotes. The format for each number follows the space to the right of the decimal point (in Fortran this is f5.1). The single blank space between text there is reproduced exactly in the output, thus to put commas between the output numbers, write:
7.3. MAKING A TRIG TABLE USING A FOR STATEMENT
239
print %5.1f, %8.4f, %8.4f, %8.4f %(theta, ctheta, stheta, ttheta) Another way to write the print line is: print %5.1f %8.4f %8.4f %8.4f \ %(theta, ctheta, stheta, ttheta) Here we have used the continuation character to split the statement into two lines. Notice that in this case the 2nd line does not have to be indented to match the other lines, because this is all considered to be one line by the Python interpreter. This example highlights a (minor) disadvantage of Python. Because it cares about the spaces between the dierent formats, we cannot add spaces in the top line so that the formats and their variables line up perfectly. Aligning things this neatly is probably more trouble than its worth, but it certainly makes the code easier to understand.
7.3.1
Numpy mathematical functions
We used the numpy sine, cosine and tangent function in the trigtable program. Here are more numpy math functions: absolute(x) arccos(x) arcsin(x) arctan(x) arctan2(y,x) cos(x) cosh(x) exp(x) log(x) log10(x) sin(x) sinh(x) sqrt(x) tan(x) tanh(x) absolute value arccosine arcsine arctangent arctangent of y/x in correct quadrant (***very useful!) cosine hyperbolic cosine exponential natural logarithm base 10 log sine hyperbolic sine square root tangent hyperbolic tangent
Note that the arcsin, etc., functions have dierent names than Fortran and C (which use asin, acos, etc.)
7.3.2
Possible integer/real problems
As an aside, note that the trigtable program uses:
240 degrad = 180./3.1415927 rather than simply degrad = 180/3.1415927
The reason is to make completely sure that the program will compute a real quotient and not an integer quotient. In fact, this caution is not needed in this case, as the following program demonstrates: #! /usr/bin/env python print 2/3 = , 2/3 print 2./3 = , 2./3 print 2/3. = , 2/3. print 2./3. = , 2./3. Running the program yields: 2/3 = 0 2./3 = 0.666666666667 2/3. = 0.666666666667 2./3. = 0.666666666667 As long as one part of the fraction is real, the program will compute a real quotient. It is only when both numbers are written as integers that the result is truncated. However, I have gotten into the habit of always including the decimal point in real expressions to avoid someday accidentally writing something like: a = np.sin(phi/degrad) + (2/3) * np.cos(theta/degrad)**2 which will denitely produce the wrong answer!
7.4
More about Python for loops
The for loop in the trigtable program, for theta in range(0, 90, 1): mimics the do loop syntax in Fortran. However, the Python for loop is a much more versatile operator than the Fortran do loop. Here are some examples: for x in [1, 10, 100, 1000]: for name in ["Sue", "Dave", "Mary"]: names = ("Sue", "Dave", "Mary") for name in names: # x will assume these 4 values in the loop # name will assume these 3 names # example of tuple (see below) # name will assume these 3 names
7.5. INPUT USING KEYBOARD
241
7.4.1
More about formats
There are many dierent formats that can specied. Here are some common examples: %8i %10.4f %10.3e %10.4g %7s 8 characters 10 character 10 characters use either f 7 characters for integer, right justified field width, 4 to right of decimal in scientific notation, 3 to right of decimal or e as appropriate of string output
Flags can go immediately after the % to customize these: + 0 left justify (e.g., \%-8i) include + sign (e.g., \%+8i) zero fill on left (e.g., \%09i)
ASSIGNMENT P2 Write a Python program to print a table of x, sinh(x), and cosh(x) (the hyperbolic sine and cosine) for values of x ranging in radians from 0.0 to 6.0 at increments of 0.5. Use a suitable format to make nicely aligned columns of numbers. NOTE: sinh and cosh are not included in numpy, but are included in the math module. So you will need to include the line import math and then write math.sinh, etc., to get the functions.
7.5
Input using keyboard
So far all of our example programs have run without prompting the user for any information. To expand our abilities, lets learn how to input data from the keyboard. In most programs, we will want to rst prompt the user to input the data, so here is an example of how to input a number: a = float(raw_input("Enter number here: ")) raw input is preferred over input for reasons I dont fully understand that have to do with hacking issues. The prompt Enter number here: will print on the screen and the user types the number to the right of this and hits return. The input is ALWAYS a string variable, so we convert it to a real number using the oat function. We can write a program to multiply two numbers as follows (usermult.py):
242
#! /usr/bin/env python a = float(raw_input("Enter first number: ")) b = float(raw_input("Enter second number: ")) c = a*b print Product = , c This is a little clunky because ideally we might want to input the numbers on the same line. This is surprisingly complicated to do in Python (at least as far as I could tell, more experienced users can correct me). Here is one way to do this (usermult2.py): #! /usr/bin/env python a, b = [float(x) for x in raw_input("Enter two numbers: ").split()] c = a*b print Product = , c The logic is as follows: The raw input will be a string, such as 5 23 which we then need to split into two parts. This is done using the split operator by appending .split() to the string. We now have the two strings 5 and 23 and we use a for loop to assign x to each in turn and convert to the real variables a and b using the oat function. Note that a and b can be assigned together, i.e., in Python you can write a, b = 1, 2 to assign two numbers in one line. This program will work correctly as long as a blank is used to separate the numbers being input. If the user enters a comma following the rst number, it will crash because if will try to convert something like 2, to a number. That is, the .split operator looks for blanks, not commas. As a long-time Fortran programmer, you will have a hard time convincing me that this syntax is easier to learn than: print *, "Enter two numbers" read *, a, b which is how you input two numbers in Fortran. In addition, the Fortran version will accept a comma between the numbers without giving an error.
7.6
If statements
Next, lets modify this program so that it will allow the user to continue entering numbers until he/she wants to stop (usermult3.py):
7.6. IF STATEMENTS #! /usr/bin/env python while (1 < 2): a, b = [float(x) for x in raw_input \ ("Enter two numbers: ").split()] if (a==0 and b==0): break c = a*b print Product = , c
243
In this case we use a while loop, which continues until the following statement is no longer true. Since 1 is always less than 2, the loop will continue forever, unless we explicitly break out of it. But of course we could include a variable in the expression as in: x = 1 while (x < 11): ... ... x = x + 1 in which case x will assume the values from 1 to 10 (easy to also do with a for loop). The program will allow the user to continuing entering numbers to be multiplied. When the user wishes to stop the program (in a more elegant way than hitting [CNTRL] C!), he/she enters zeros for both arguments. The if statement checks for this and exits the loop in this case: if (a==0 and b==0): break The break statement (exit in Fortran) breaks out of the while loop. In this case, we just execute a single command, but we could also execute a block of code, e.g. (usermult4.py): #! /usr/bin/env python while (1 < 2): a, b = [float(x) for x in raw_input \ ("Enter two numbers (zeros to stop): ").split()] if (a==0 and b==0): print You entered two zeros print so the program will now end break c = a*b print Product = , c Notice that this block must be indented relative to the if statement line, and that there is no end if line (just like there is no end do line for the while loop). The end of the if block is shown simply by the end of the indentation.
244 The parentheses in while (1 < 2): are optional, i.e., while 1 < 2:
will work the same. But I think it looks nicer with them, and you will get less confused when you switch between computer languages if you always leave them in. Getting the while loop to go forever by using 1 < 2 is a bit of a kludge. Its probably better to write while (True): which uses the boolean expression True (similarly False is also permitted) . Note that we could use a variable for this purpose, i.e., b = True while (b) or even the bizarre b = 1 < 2 while (b) A list of relational operators (e.g., used in while and if statements) in dierent languages is as follows: FORTRAN 90 == /= < <= > >= .and. .or.
77
C == != < <= > >= && ||
MATLAB == ~= < <= > >= & |
PYTHON == != < <= > >= and or
meaning equals does not equal less than less than or equal to greater than greater than or equal to
.eq. .ne. .lt. .le. .gt. .ge. .and. .or.
These operators can be combined to make complex tests, e.g., if ( (a > b and c <= 0) or d == 0):
7.7. IF, ELIF, ELSE CONSTRUCTS
245
There is very likely a specic order of operations for these things which I cant remember very well. Look it up in a book if you are unsure or, better, just put in enough parenthesis to make it completely clear to anyone reading your code. One nice aspect of Python compared to C is that if you make a mistake and type, for example, if (a = 0): you will get an error message during compilation. In C this is a valid statement with a completely dierent meaning than is intended!
7.7
If, elif, else constructs
A more versatile form of the if statement is as follows: if (logical expression): (block of code) elif (logical expression): (block of code) elif (logical expression): (block of code) . . else: (block of code) Note that elif is the Python version of the F90 else if and that the blocks of code can contain many lines. As many elif statements as required can be used. At most, one block of code will be executed (once one of the if tests is satised, it does not check the others). The nal else will be executed if none of the preceding if statements is true. The nal else is optional. Here is a demonstration program (usersqrt.py) that repeatedly prompt the user for a positive real number. If it is negative, ask the user to try again. If it is positive, it computes and displays the square root using the numpy.sqrt() function. If the user enters zero, the program stops. #! /usr/bin/env python import numpy as np while (True): a = float(raw_input("Enter positive real number (0 to stop) ")) if (a < 0): print This number is negative!
246 continue elif (a == 0): break else: b = np.sqrt(a) print sqrt = , b
This program is very similar to the F90 version we saw earlier. The continue line (optional in this case) continues the while loop and serves the same function as the F90 cycle command. The break line leaves the while loop and thus ends the program. Recall that exit is the F90 equivalent. ASSIGNMENT P3 Write a Python program to repeatedly ask the user for the constants a, b, and c in the quadratic equation a*x**2+b*x+c=0. Using the quadratic formula, have the program identify and compute any real roots. Output the number of real roots and their values. Stop the program if the users enters zeros for all three values. HINTS: If you have trouble getting Python to read in all three numbers on one line, feel free to enter them on three separate lines. Test your program for some simple examples to make sure it is working correctly (a=1, b=2, c=-3 should return -3 and 1).
7.8
Greatest common factor example
Here is a program that illustrates some of the concepts that we just learned: #! /usr/bin/env python # compute the greatest common factor of two integers while (True): a, b = [int(x) for x in raw_input \ ("Enter two numbers (zeros to stop): ").split()] if (a == 0 and b == 0): break else: for i in range(1, min(a, b) + 1): if (a % i == 0 and b % i == 0): imax = i print Greatest common factor = , imax The structure is similar to the corresponding F90 gcf program, but it has fewer lines due to the lack of enddo and endif statements. Note that a % i computes the modulus (remainder) of a/i. Note that a and b must be integers, so we changed the user input line to int(x) from the oat(x) that we used for the earlier programs. Finally, note that we add one to min(a,b) because the Python upper limit is one
7.9. USER DEFINED FUNCTIONS
247
less than the number in the range argument (this is hard to remember for F90 programers!). ASSIGNMENT P4 Modify gcf.py to compute the least common multiple of two integers.
7.9
User dened functions
(To motivate this, lets repeat the F90 notes here:) As the length and complexity of a computer program grows, it is a good strategy to break the problem down into smaller pieces by dening functions or subroutines to perform smaller tasks. This provides several advantages: 1. You can test these pieces individually to see if they work before trying to get the complete program to work. 2. Your code is more modular and easier to understand. 3. It is easier to use parts of the program in a dierent program. To illustrate how to dene your own function in Python, here again is the greatest common factor program: #! /usr/bin/env python # use a function to compute the greatest common factor of two integers def GETGCF(a, b): for i in range(1, min(a, b) + 1): if (a % i == 0 and b % i == 0): imax = i return imax while (True): a, b = [int(x) for x in raw_input \ ("Enter two numbers (zeros to stop): ").split()] if (a == 0 and b == 0): break else: imax = GETGCF(a, b) print Greatest common factor = , imax Because Python is not compiled (it just interprets your script line by line), you dene your functions BEFORE the main program. This is opposite to the convention in Fortran, where functions and subroutines normally follow the main program. To
248
dene a function, you must start with def and then the name of the function and any arguments: def GETGCF(a, b): Notice the syntax: a and b are passed into the function from the calling program. imax is returned to the calling program with the return statement. The contents of GETGCF are indented; we know the function ends when the indentation ends. Of course, we dont really need the imax in the main programwe could have simply written: else: print Greatest common factor = , GETGCF(a, b) In the F90 notes, we introduce subroutines at this point. Im not sure if there is an exact equivalent in Python, that is, something you call with variables in an argument list that can be used for both input and output from the subroutine. However, Python functions are more versatile than Fortran functions because they can return more than one value. Thus, we will do the SPH AZI example using a function in userdist.py: #! /usr/bin/env python # # # # # # # # # # # # # # # # # SPH_AZI computes distance and azimuth between two points on sphere Inputs: flat1 flon2 flat2 flon2 Returns: del azi Notes: (1) applies to geocentric not geographic lat,lon on Earth (2) This routine is inaccurate for del less than about 0.5 degrees. For greater accuracy, use double precision or perform a separate calculation for close ranges using Cartesian geometry. = = = = = = latitude of first point (degrees) longitude of first point (degrees) latitude of second point (degrees) longitude of second point (degrees) angular separation between points (degrees) azimuth at 1st point to 2nd point, from N (deg.)
def SPH_AZI(flat1, flon1, flat2, flon2): import numpy as np if ( (flat1 == flat2 and flon1 == flon2) or (flat1 == 90. and flat2 == 90.) or
\ \
7.9. USER DEFINED FUNCTIONS (flat1 == -90. and flat2 == -90.) ): delta = 0. azi = 0. return delta, azi raddeg = np.pi/180. theta1 = (90. - flat1)*raddeg theta2 = (90. - flat2)*raddeg phi1 = flon1*raddeg phi2 = flon2*raddeg stheta1 = np.sin(theta1) stheta2 = np.sin(theta2) ctheta1 = np.cos(theta1) ctheta2 = np.cos(theta2) cang = stheta1*stheta2*np.cos(phi2 - phi1)+ctheta1*ctheta2 ang = np.arccos(cang) delta = ang/raddeg sang = np.sqrt(1. - cang*cang) caz = (ctheta2 - ctheta1*cang)/(sang*stheta1) saz = -stheta2*np.sin(phi1 - phi2)/sang az = np.arctan2(saz, caz) azi = az/raddeg if (azi < 0.): azi = azi + 360. return delta, azi while (True): lat1, lon1 = [float(x) for x in raw_input \ ("Enter first point lat, lon: ").split()] lat2, lon2 = [float(x) for x in raw_input \ ("Enter second point lat, lon: ").split()] delta, azi = SPH_AZI(lat1, lon1, lat2, lon2) print delta, azi = , delta, azi
249
Notice that delta and azi are not part of the argument list, but both are returned from SPH AZI when we write: delta, azi = SPH_AZI(lat1, lon1, lat2, lon2)
7.9.1
Python keywords
When I rst tried translating this program from the F90 version, I had trouble because it turns out that del is a special word in Python and is used to delete items from lists (we have not talked about this yet). So I had to change the variable name del to delta to get it to work. If you use Xcode to do your editing, then the key words show up in dierent colors, so you have a clue that you should not use them for variable names. Here are the keywords in basic Python:
250 and as assert break class continue def del elif else except exec finally for from global if import in is lambda
CHAPTER 7. PETERS PYTHON NOTES not or pass print raise return try while with yield
DO NOT USE THESE AS VARIABLE NAMES! Most of these words kind of sound like things that make sense at some level, although we arent going to discuss them until we need to. But the one the really bugs me is lambda as one would like to think that a Greek letter would always be OK to use in mathematical expressions1 . ANOTHER WARNING: Another potential pitfall is using certain special names for your Python script. I ran into this recently when I called one of my programs string.py. The program worked ne, but created a module that prevented numpy from working in my other Python scripts! I believe the problem is that there is a module called string but am not sure about this. Note that string is not one of the Python keywords. It would be good to nd a complete list of program names to avoid in Python, but I have not found this yet. Can any of you nd such a list or a way to generate it?
7.9.2
Using functions in separate les
If we want to use SPH AZI in multiple programs, its annoying and inecient to have to include it in each script. To avoid this, we can simply include the function in a separate script, named, for example, sph subs.py: # SPH_AZI computes distance and azimuth between two points on sphere # . . . az = np.arctan2(saz, caz) azi = az/raddeg if (azi < 0.): azi = azi + 360. return delta, azi and then our userdist2.py script can have the form: #! /usr/bin/env python
Incidentally, it is not easy to understand what lambda actually does, at least for a Python newbie like me. One website says that it is best described as a tool for building callback handlers. Thanks, that really clears things up!
1
7.10. ARRAYS import sph_subs as ss while (True): lat1, lon1 = [float(x) for x in raw_input \ ("Enter first point lat, lon: ").split()] lat2, lon2 = [float(x) for x in raw_input \ ("Enter second point lat, lon: ").split()] delta, azi = ss.SPH_AZI(lat1, lon1, lat2, lon2) print delta, azi = , delta, azi
251
Note that we drop the .py sux when we import sph subs and that we choose ss as the prex to use before the function names (analogous to using np as the prex for the numpy functions. The sph subs.py script must be in the same directory as userdist2.py. When you run the userdist2.py script, a le called sph subs.pyc, is automatically created. This is a binary le (dont try to edit it), which helps future scripts to load faster.
7.10
Arrays
Here are some examples of how to dene arrays in Python using numpy: #! /usr/bin/env python import numpy as np a = np.ndarray(shape=(100), dtype=float) ii = np.ndarray(shape=(50,2), dtype=int) In this case, a is dened as a vector (1-D array) with 100 real elements and ii is dened as a 50 x 2 matrix of integers. Annoyingly, like C, array indices start with zero, not one. Thus, in the above example the a array has values from a[0] to a[99], and the b array includes b[0,0] but not b[50,2]. Notice that in Python, unlike Fortran, we use brackets, not parenthesis, to refer to specic array values. Here is an example program that uses an array to compute prime numbers less than 100: #! /usr/bin/env python import numpy as np maxnum = 100 prod = np.ndarray(shape=(maxnum+1), dtype=int) for i in range(1, maxnum+1): prod[i] = 0 max_i = int(np.sqrt(maxnum))
252
for i in range(2, max_i+1): if (prod[i] == 0): max_j = maxnum/i for j in range(2, max_j+1): prod[i*j] = 1 nprime = 0 for i in range(2, maxnum+1): if (prod[i] == 0): nprime = nprime + 1 print i print Number of primes found = , nprime Note that we set the upper limit of the prod array to maxnum+1, so that the actual array goes from prod[0] to prod[maxnum] (recall that array indices start with zero). We dont use prod[0] for anything. Actually, we also dont use prod[1] for anything, just like in the F90 version. Now lets change the code to print out many numbers per line by saving the prime numbers in a separate array called pnum. Here is the code: #! /usr/bin/env python import numpy as np maxnum = 1000 prod = np.ndarray(shape=(maxnum+1), dtype=int) pnum = prod for i in range(1, maxnum+1): prod[i] = 0 max_i = int(np.sqrt(maxnum)) for i in range(2, max_i+1): if (prod[i] == 0): max_j = maxnum/i for j in range(2, max_j+1): prod[i*j] = 1 nprime = 0 for i in range(2, maxnum+1): if (prod[i] == 0): nprime = nprime + 1 pnum[nprime] = i print Number of primes found = , nprime print pnum[1:nprime+1]
7.11. CHARACTER STRINGS
253
Notice that we can dene the second array, pnum, the same size as the rst array by simply saying pnum = prod The output of this code is: Number of primes [ 2 3 5 7 67 71 73 79 157 163 167 173 257 263 269 271 367 373 379 383 467 479 487 491 599 601 607 613 709 719 727 733 829 839 853 857 967 971 977 983 found = 168 11 13 17 19 83 89 97 101 179 181 191 193 277 281 283 293 389 397 401 409 499 503 509 521 617 619 631 641 739 743 751 757 859 863 877 881 991 997] 23 103 197 307 419 523 643 761 883 29 107 199 311 421 541 647 769 887 31 109 211 313 431 547 653 773 907 37 113 223 317 433 557 659 787 911 41 127 227 331 439 563 661 797 919 43 131 229 337 443 569 673 809 929 47 137 233 347 449 571 677 811 937 53 139 239 349 457 577 683 821 941 59 149 241 353 461 587 691 823 947 61 151 251 359 463 593 701 827 953
Notice that, like in F90, we can specify a range of array indices using pnum[1:nprime+1] and how Python adds an opening and closing bracket around the array contents upon output. Why is the upper limit nprime+1 and not nprime? Because of the non-intuitive way Python sets the upper limits. Recall that we ran into this before with range and shape in (np.ndarray). Its confusing because of the dierence from the F90 convention. The output looks pretty nice, but for completeness, at some point it would be nice to know how to make an explicitly formatted output table of the primes like we did in the F90 notes. I was not able to easily nd an example of this by Googling around.
7.11
Character strings
You can dene a string variable in Python very simply: s1 = Peter and combine strings like this: s1 = Peter s2 = Shearer s3 = s1 + + s2 There are lots of built in functions to manipulate strings in Python. Here are some examples:
254 #! /usr/bin/env python s1 = Peter Shearer print s1[0:5] print len(s1) print s1.find(Sh) print s1.split() name1, name2 = s1.split() print name1 print name2 and its output: Peter 13 6 [Peter, Shearer] Peter Shearer
Here is the voter program from the F90 notes, translated into Python: #! /usr/bin/env python name = raw_input("Who did you vote for in 1992? ")
if (name.find(ill) != -1 or name.find(lin) != -1): print Then you are likely a Democrat. elif (name.find(eor) != -1 or name.find(ush) != -1): print Then you are likely a Republican. else: print Then you are likely an independent voter.
7.12
I/O with les
Here is the Python version of leinout.f90, a program that reads pairs of numbers from an input le, computes their product, and outputs the original numbers and their product to an output le: #! /usr/bin/env python infile = raw_input("Enter input file name: ") filein = open(infile, r)
7.12. I/O WITH FILES outfile = raw_input("Enter output file name: ") fileout = open(outfile, w) while (True): line = filein.readline() if not line: break x, y = [float(a) for a in line.split()] z = x * y line2 = "%10.3f %10.3f %10.3f \n" % (x, y, z) fileout.write(line2) filein.close fileout.close Lets examine how it works. First we read and open the input le: infile = raw_input("Enter input file name: ") filein = open(infile, r)
255
The rst line here is similar to what we used before to input numbers except this time all we need is the string returned by raw input. Next we open le le and assign it to object inle (this is like a unit number in F90). The optional 2nd argument r opens the le for read only. This is good to specify because it will prevent our accidentally writing on top of it. It also means the le must already exist. Similarly we read in and open the output le: outfile = raw_input("Enter output file name: ") fileout = open(outfile, w) This is almost identical to the input version, except we specify w for write. Next, we use a while loop to read the input le one line at a time: while (True): line = filein.readline() if not line: break We use the readline method to read line (a string) from the object lein. If there is nothing more to read, then not line will be true and we exit the while loop. Next we split line into two parts (assumes the numbers are separated with blanks and no commas) and then converts to two real numbers (we saw this before when we input two numbers on one line), and then computes their product: x, y = [float(a) for a in line.split()] z = x * y
256
CHAPTER 7. PETERS PYTHON NOTES Next we perform a formatted write of the three numbers to line2 (another string
variable) and then write this to object line2: line2 = "%10.3f %10.3f %10.3f \n" % (x, y, z) fileout.write(line2) When we are done (having left the while loop with the break), we close the two les: filein.close fileout.close There are typically many ways to write the same program. This is even more true in Python than it was in Fortran. Here is a more compact version of leinout.py: #! /usr/bin/env python filein = open(raw_input("Enter input file name: "), r) fileout = open(raw_input("Enter output file name: "), w) for line in filein: x, y = [float(a) for a in line.split()] z = x * y fileout.write("%10.3f %10.3f %10.3f \n" % (x, y, z)) This should be self-explanatory, except for the line for line in filein: which is a new syntax and shows how Python will automatically grab one line at a time from the input le object. Alternatively, we could read the entire input le in at once: #! /usr/bin/env python filein = open(raw_input("Enter input file name: "), r) fileout = open(raw_input("Enter output file name: "), w) lines = filein.readlines() for line in lines: x, y = [float(a) for a in line.split()] z = x * y fileout.write("%10.3f %10.3f %10.3f \n" % (x, y, z)) filein.close fileout.close
7.13. USING TUPLES AND LISTS
257
where we use readlines rather than readline in order to input the entire le. Then notice how we can pull out one line at a time by writing: for line in lines: There is probably even a way to do this where we dont loop over the lines, but somehow directly read x and y as vectors, compute z as a vector, and then output the vectors directly to the output le. Any Python experts want to try to nd this approach?
7.13
Using tuples and lists
This is a placeholder. For now, look at Lisas notes.
7.14
Plotting with Python
A big advantage of Python over Fortran is the ability to make plots directly from Python scripts. There are a variety of plot packages available for Python. We are going to use matplotlib because this is what Lisa uses. Please consult her notes for more details. Lets start with an example program (xyplot.py): #! /usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab x = [1., 2., 3., 4.] y = [1., 1.9, 3.2, 4.] pylab.plot(x, y) pylab.show() which produces Figure 7.1. The rst three lines are necessary to import the plotting packages and dene some of their attributes. Next, we dene two lists of 4 numbers each: x = [1., 2., 3., 4.] y = [1., 1.9, 3.2, 4.] These are not the same as the arrays that we used earlier. In fact they could be a mixture of dierent types of variables (e.g., int, oat, string), but in this example
258
Figure 7.1: The pop-up window that results from xyplot.py. they need to be of the same type and size. Next, we plot the points and display the result: pylab.plot(x, y) pylab.show() which brings up the pop-up window. From the window, you can save the plot by clicking on the little microdisk icon. You can save it in dierent format (e.g., .eps, .png) by using the appropriate sux on the le name. You must close the window (click on the ref button in the far upper left) to get back to your terminal window. Now lets improve the appearance of this plot and add some labels (xyplot2.py): #! /usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab x = [1., 2., 3., 4.] y = [1., 1.9, 3.2, 4.] pylab.xlim(0.5, 4.5) pylab.ylim(0.5, 4.5)
7.14. PLOTTING WITH PYTHON pylab.plot(x, y, bs) pylab.xlabel(X axis) pylab.ylabel(Y axis) pylab.title(My x-y points) pylab.show()
259
Figure 7.2: The pop-up window that results from xyplot2.py. which produces Figure 7.2. In this case we dene the plot limits so that the endpoints do not sit on the plot axes: pylab.xlim(0.5, 4.5) pylab.ylim(0.5, 4.5) and we plot the points with blue squares: pylab.plot(x, y, bs) where the optional pylab.plot 3rd argument can indicate various colors, symbols, and line types (r=red, b=blue, g=green, c=cyan, m=magenta, y=yellow, k=black, w=white; +=plus, .=dot, o=circle, *=star, p=pentagon, s=square, x=x, D=diamond, h=hexagon, carat=triangle; -=solid line, =dashed line, :=dotted line, -.=dashdotted line, None=no connecting lines).
260
7.14.1
Example of computing least-squares line
As a example to illustrate both Python matrix math and plotting capabilities, lets compute and plot the best-tting line to the points from the last section, i.e., (1, 1), (2, 1.9), (3, 3.2), and (4, 4). Let the equation of the line be y = a + bx, where a is the y-intercept and b is the slope. Then we can express the desired t to the data as the matrix equation: 1 1 1.9 1 3.2 = 1 1 4

1 2 a 3 b 4
(7.1)
where the goal is to adjust a and b so that the r.h.s. best ts the l.h.s. Note that the 2x4 matrix in this case consists of a column vector with ones (that are multiplied by a) and a column vector containing the x values of the data points (which are multiplied by b, the slope). This is a standard form for linear problems: d = Gm where d is a data vector with the y-values of the data, G is the linear operator for predicting the data from the model, and m is the desired model. In this case the model is simply the coecients a and b of the line. Our program will need to import numpy to do basic matrix arithmetic. To do more advanced matrix operations, such as computing inverses and eigenvalues, we also import the linear algebra package numpy.linalg (see, e.g., http://docs.scipy.org/doc/numpy/reference/routines.linalg.html): import numpy as np import numpy.linalg as lin In the last section, we dened the x and y vectors as lists, but we now need to make them matrices, using the numpy mat method: x = np.mat([1, 2, 3, 4]) y = np.mat([1.0, 1.9, 3.2, 4.0]) Similarly, we set up the 2x4 G matrix: G = np.mat([[1,1], [1,2], [1, 3], [1, 4]]) and the 1x2 column vector d, given by yT d = y.T
7.14. PLOTTING WITH PYTHON
261
where .T is the transpose operator. In our case, m will also be a 2-vector that contains the y-intercept and slope of the best tting line. To solve for m, we apply the standard formula for the least-squares solution2 to d = Gm. m = GT G Translated to Python, this becomes m = lin.inv(G.T * G) * G.T * d print m = \n, m where lin.inv is the linalg matrix inverse function. The print statement is to show the coecients a and b in the best-tting line, i.e., m = [[-0.05] [ 1.03]] The best-tting line is of the form y = 0.05 + 1.03 x. To plot this line, we compute the y values for x values of 0.8 and 4.2: x2 = np.mat([0.8, 4.2]) d2 = np.mat([[1, 0.8], [1, 4.2]]) * m Putting it all together, here is the complete program: #! /usr/bin/env python import matplotlib matplotlib.use("TkAgg") import pylab import numpy as np import numpy.linalg as lin
1
GT d
(7.2)
#matrix ops, such as inverse and determinant
x = np.mat([1, 2, 3, 4]) y = np.mat([1.0, 1.9, 3.2, 4.0]) G = np.mat([[1,1], [1,2], [1, 3], [1, 4]]) d = y.T #data vector #model vector for least-squares fit
m = lin.inv(G.T * G) * G.T * d print m = \n, m x2 = np.mat([0.8, 4.2])
Note to Geophysics Graduate students: The formula for m given here is the sort of thing that you are expected to know how to derive before you take your departmental exam. See a linear algebra or statistics text for details
262
d2 = np.mat([[1, 0.8], [1, 4.2]]) * m pylab.xlim(0.5, 4.5) pylab.ylim(0.5, 4.5) pylab.plot(x, y, sb) pylab.plot(x2.T, d2, r-) #(x2, d2.T) fails for lines, but works for symbols (bug?)
pylab.xlabel(X axis) pylab.ylabel(Y axis) pylab.title(My best-fitting line) pylab.show() which produces the plot shown in Figure 7.3. I had trouble getting the red line to plot using the matrix input to pylab.plot. For some weird reason (bug?), pylab.plot(x2, d2, sr) works, but pylab.plot(x2, d2, r-) fails. In addition, pylab.plot(x2, d2.T, sr) works and pylab.plot(x2, d2.T, r-) fails. The only way I succeeded in plotting the line is pylab.plot(x2.T, d2, r-). This makes no sense to me!
Figure 7.3: The pop-up window that results from xyplot3.py. Note that this problem can also be solved using the numpy.linalg least-squares
7.14. PLOTTING WITH PYTHON solution function lstsq: m, residues, rank, s = lin.lstsq(G, d) print m = \n, m
263
264
Chapter 8
LaTeX
Someday you will want to publish the results of your research. IGPP scientists typically use either Word of Latex for this purpose (some old timers still use tro, a UNIX precursor to LaTeX). Most of you probably already know how to use Word. It is very easy to use (WYSIWYG) and is the standard word processor of choice in many institutions. It is not, however, ideal for writing scientic articles. It has particular diculty handling equations and specialized typesetting requirements. A far better choice is TeX or LaTeX. Advantages of LaTeX: 1. It is in the public domain so you are not supporting the Evil Empire. 2. You can run it on Macs, PCs or UNIX machines. 3. By using macros, it is a far more powerful and versatile system than all that pull down menu crap you nd in most commercial word processors. 4. AGU uses LaTeX for its electronic submission of articles and abstracts. They provide macros to match the style of their journals so that preparing cameraready copy is trivial. 5. You can produce beautiful looking equations much more easily than with programs like Word. LaTeX takes a little while to learn at rst, but you will nd it well worth your eort. I will describe running LaTeX from the UNIX command line although I am actually most familiar with using it on my Mac using a fancy implementation (TexShop), which you can download from: 265
266 http://pages.uoregon.edu/koch/texshop/
CHAPTER 8. LATEX
8.1
A simple example
Use your favorite text editor to create the le samp1.tex (an example from the Latex book, Kopka and Daly, A Guide to Latex2e): \documentclass{article} \begin{document} Today (\today) the rate of exchange between the British pound and the American dollar is \pounds 1 = \$1.63, an increase of 1\% over yesterday. \end{document} which will produce: Today (November 30, 2012) the rate of exchange between the British pound and the American dollar is 1 = $1.63, an increase of 1% over yesterday. The document must rst dene a document class. Options are book, report, article, or letter. The guts of the document are then enclosed between the \begin{document} and \end{document} commands. This example highlights some of the special characters and macros in Latex. The following characters are normally interpreted as Latex commands: $ & % # _ { }
To print these out in your text, you must precede them with a backslash, i.e., \$ \& \% \# \_ \{ \}
In this example, \today invokes a macro to print todays date and \pound will print the British pound symbol. From this example you can see that Latex is not WYSIWYG! To typeset this document, enter: latex samp1 Assuming you have no errors in the input le, this will generate the le samp1.dvi. To preview this le, enter: xdvi samp1
8.2. EXAMPLE WITH EQUATIONS To convert the le to a Postscript le, enter: dvips samp1 -o samp1.ps
267
Notice that Latex automatically indented the rst line of the paragraph. To avoid this, use the command for the target paragraph: \documentclass{article} \begin{document} \noindent Today (\today) the rate of exchange between the British pound and the American dollar is \pounds 1 = \$1.63, an increase of 1\% over yesterday. \end{document}
To globally change the paragraph indenting, you can change the default for this directly: \documentclass{article} \begin{document} \setlength{\parindent}{0.5in} Today (\today) the rate of exchange between the British pound and the American dollar is \pounds 1 = \$1.63, an increase of 1\% over yesterday. \end{document}
This will now indent all paragraphs by 0.5 inches. To remove paragraph indenting, just set this parameter to zero. Latex ignores the carriage returns at the ends of each line in the input le. It also ignores extra blanks; it only considers the rst blank. New paragraphs are dened by adding a blank line between blocks of text.
8.2
Example with equations
\documentclass{article} \begin{document} \setlength{\parindent}{0.0in} The function $X(p)$ is more nicely behaved than $T(X)$ since it does not cross itself (there is a single value of $X$ for each value of $p$), but the inverse function $p(x)$ is multi valued. An even nicer function is the combination \begin{equation}
268
CHAPTER 8. LATEX
\tau(p) = T(p) - pX(p), \end{equation} where $\tau$ is called the {\it delay time}. It can be calculated very simply: \begin{equation} \tau(p) = 2 \int_0^{z_p} \left[ {u^2 \over {(u^2-p^2)^{1/2}}} {p^2 \over {(u^2-p^2)^{1/2}}} \right] \, dz \end{equation} or \begin{eqnarray} \tau(p) &=& 2 \int_0^{z_p} (u^2-p^2)^{1/2} \, dz \\ &=& 2 \int_0^{z_p} \eta(z) \, dz \end{eqnarray} where $\eta$ is the vertical slowness. \end{document}
which typesets as: The function X(p) is more nicely behaved than T (X) since it does not cross itself (there is a single value of X for each value of p), but the inverse function p(x) is multi valued. An even nicer function is the combination (p) = T (p) pX(p), where is called the delay time. It can be calculated very simply:
zp
(8.1)
(p) = 2
0
p2 u2 2 (u2 p2 )1/2 (u p2 )1/2

zp
dz
(8.2)
or (p) = 2
0 zp
(u2 p2 )1/2 dz (z) dz
(8.3) (8.4)
= 2
0
where is the vertical slowness. Note that equations within the text are enclosed with $ signs. \begin{equation} and \end{equation} are used to put the equation on a separate line. Variables within the equations are automatically put into italics. Greek letters are dened as \tau, \eta, etc. Equations are automatically numbered (this can be changed if desired). Subscripts are dened with the underscore ( ), superscripts with the carat. Fractions are written as, for example, {x \over y}. \int is for the integral symbol; note how the limits are written. Curly brackets are used to separate thingsthey do not appear in the typeset version.
8.3. CHANGING THE DEFAULT PARAMETERS
269
The \begin{eqnarray} section is used to align the = signs in the two lines of equations. Note that &=& is used to dene what it is that is being aligned. Every line except the last line has a carriage return (\\). Although TeX generally does an excellent job of spacing equations, sometimes some ne tuning will help. In this case \, is used to place a tiny amount of space before the dz.
8.3
8.3.1
Changing the default parameters

Font size
There is a standard font size declared for each document class. In most cases this is Roman 10 pt. One easy way to change the size is with the following commands, arranged in increasing order of size: \tiny \scriptsize \footnotesize \small \normalsize \large \Large \LARGE \huge \Huge
8.3.2
Font attributes
= = = = = switch to italics switch to slanted font switch to upright (normal) font switch to bold switch (back) to regular weight
\itshape \slshape \upshape \bfseries \mdseries
These commands can be invoked in several dierent ways to put in situ in italics and then return to normal font: \itshape in situ \upshape OR {\itshape in situ}
270 OR \begin{itshape} in situ \end{itshape}
CHAPTER 8. LATEX
These dierent options also apply to the font sizes listed above. Here is an example of how these options can be used: \documentclass{article} \begin{document} Lets test different ways to say \itshape in situ \upshape in italics. There are several different ways to say {\itshape in situ} in italics; indeed \begin{itshape} in situ \end{itshape} can be written in italics in at least three different ways. Next lets experiment with {\large large}, {\Large Large}, and {\LARGE LARGE} type, as well as {\small small}, {\footnotesize footnotesize}, and {\scriptsize scriptsize} type. \end{document} which produces: Lets test dierent ways to say in situ in italics. There are several dierent ways to say in situ in italics; indeed in situ can be written in italics in at least three dierent ways. Next lets experiment with large, Large, and LARGE type, as well as small,
footnotesize,
and
scriptsize
type.
8.3.3
Line spacing
There is a default line spacing created whenever a new font is selected. Here are examples of two dierent ways that this can be changed: \setlength{\baselineskip}{10pt} Change line spacing to 10 point (single spacing for 10pt font). Change normal (single) line spacing by a factor of 2. This will result in double spacing. This must be called before \begindocument or before a new font declaration.
\renewcommand{\baselinestretch}{2}
8.4. INCLUDING GRAPHICS Here is an example: \documentclass{article} \begin{document} \setlength{\baselineskip}{20pt} Today (\today) the rate of exchange between the British pound and the American dollar is \pounds 1 = \$1.63, an increase of 1\% over yesterday. Lets write one more line here so that we get up to three lines and can see the spacing better. \end{document} which produces:
271
Today (November 30, 2012) the rate of exchange between the British pound and the American dollar is 1 = $1.63, an increase of 1% over yesterday. Lets write one more line here so that we get up to three lines and can see the spacing better.
8.4
Including graphics
Postscript, EPS, and PDF les can easily be embedded as gures in a Latex document by including Latex extension packages such as graphics or graphicx. Here is an example that uses graphicx: \documentclass{article} \usepackage{graphicx} \begin{document} \setlength{\baselineskip}{20pt} We are now going to show how to embed a Postscript file into a LaTex document using the includegraphics command. \begin{figure}[h] \begin{center} \includegraphics[scale=0.7]{plot_1.pdf} \end{center} \caption{Here is the caption for this plot. positioned below the plot.} \end{figure}
This will be automatically
And then here is some more text to show where the next block of text will appear. Blah, blah, blah... \end{document} which will produce: We are now going to show how to embed a Postscript le into a LaTex document using the includegraphics command.
272
CHAPTER 8. LATEX
Figure 8.1: Here is the caption for this plot. This will be automatically positioned below the plot. And then here is some more text to show where the next block of text will appear. Blah, blah, blah... Note that the graphicx package is loaded with the \usepackage{graphicx} command at the start of the le. The \begin{gure} macro has various options for where the gure will be positioned, including at the present location in the text [h], or at the top [t] or bottom of the page [b]. These options can be combined, i.e., [tb] will position the gure at either then top or bottom of the page. In this example, the gure is scaled to 70% of its original size. Depending upon how the gure is positioned in the PDF le, you also may need to apply a bounding box to remove the surrounding white space using the bb= option, e.g., \includegraphics[bb=2in 4in 7in 9in, scale=0.7]{plot_1.pdf} which species exactly what part of the page will be windowed and displayed. This is necessary for PDF gures that appear in only part of an entire page and it can be tedious to nd the right bounding box. In my experience, an easier option is to use the Mac Preview program to open Postscript or EPS les (from Adobe Illustrator
8.5. WANT TO KNOW MORE?
273
or other graphics programs) and then save them within Preview as PDF les. In this case, they are tightly windowed and the bb option is not needed. Figures are automatically numbered; users have control over the starting gure number.
8.5
Want to know more?
There is a huge amount of material about Latex on the web. Check out David McMillans great Latex example le, ex.tex, which you can nd in shearer/CLASS/COMP/LATEX. The accompanying les psg.tex, hobbes.ps, and gnug.tex are also included. A good Latex reference book is: Kopka, H., and P.W. Daly, A Guide to Latex 2e, Addison-Wesley, New York, 1995.
274
CHAPTER 8. LATEX
Chapter 9
Postscript plotting
PostScript is a page description language that is used by most modern printers for both text and graphics. You can generate Postscript les automatically from programs. For example on the Macs, you can print to a le rather than to the printer and it will save whatever you were going to print out as a Postscript le (most people save as a PDF le, but there is still an option to save in Postscript). However, you can also generate your own Postscript les and that is the subject of these notes. Postscript is ideally suited for user generated graphics because it is in ascii (plain text), is well documented, and is generally easy to understand. Lets start with an example Postscript le (mypost1): newpath 144 72 moveto 288 432 lineto stroke showpage We start with the NEWPATH operator. This empties the current path and declares we are starting a new path (because we are at the beginning of the le, this command is not actually necessary here but its a good idea to start with this). Next we move to the point (144, 72): 144 72 moveto The default coordinate system for Postscript les is in units of 1/72 of an inch, measured from the lower left corner of the page. The arguments for moveto are the x and y coordinates. These numbers move onto a stack and are then read by the moveto command. Thus, this command moves us to a point 2 inches to the right of 275
276
CHAPTER 9. POSTSCRIPT PLOTTING
the left page edge and 1 inch above the page bottom. This is the current point and we can think of this as a pen location. Next we add a line segment between the current point and a new point at (288 432) (4 inches, 6 inches): 288 432 lineto You can think of this as a draw command except the draw is not actually executed yetit is just added to the currently dened path. To draw the path on the page, we use the stroke command: stroke Finally, we display the current page: showpage We can preview this le with pageview or ghostview. We can also send it to a Postscript printer. However, to be sure that the printer recognizes that it is a Postscript le, we should always start the le with %! PostScript le, i.e. (mypost2), %! PostScript file newpath 144 72 moveto 288 432 lineto stroke showpage If you dont start with this, then the printer will just print out the ascii text. This is no big deal with a small le like this but is a big problem for a large Postscript le. You dont want 200 pages of text to come out of the printer when you really wanted a single page of graphics! If you ever do make this mistake, you will need to cancel the print job on the printer and/or kill the job on your computer. If you are like me, you will nd the 1/72 inch coordinate system an annoyance. Both Bob Parker and I prefer to use 1/1000 inch coordinates. This can by adding the following line at the beginning of your le: 0.072 0.072 scale % Coords are 1/1000 th inch
277 This means that objects will be drawn 0.072 times as large as they would have been drawn with the old coordinate system. The coordinate 1000 thus now indicates 1000*(1/72)*0.072 = 1 inch. Note that % is used to add comments. Anything to the right of the % is assumed to be a comment (the rst line is a special comment that the printer recognizes). Why use 1/1000 inch as the coordinate instead of inches directly? The nice thing about 1/1000 inch is that this is roughly the limit of what the human eye can resolve on the page so it is not likely that one will need to divide things smaller than this. Thus, one can avoid decimal places in the numbers and can save a little bit of space in the le. We can save more space if we shorten the moveto and lineto commands since they will be used many times in a complicated le. We can shorten then by dening our own shorter operators: /m {moveto} def /d {lineto} def This tells the Postscipt interpreter to replace m with moveto and d with lineto. Our revised Postscript le thus becomes (mypost3): %! PostScript file /m {moveto} def /d {lineto} def 0.072 0.072 scale newpath 2000 1000 m 4000 6000 d stroke showpage Now lets draw a box with thicker lines than our starting examples (mypost4): %! PostScript file /m {moveto} def /d {lineto} def 0.072 0.072 scale newpath 4000 4000 4000 5000 5000 5000 5000 4000 4000 4000 m d d d d
% Coords are 1/1000 th inch
278 30 setlinewidth stroke showpage
We draw the four sides with four d (lineto) commands. Before actually drawing the box, we set the line width to 30/1000 inch. The resulting plot will have a slight notch in the starting corner. To avoid this, we can use the closepath command (mypost5): %! PostScript file /m {moveto} def /d {lineto} def 0.072 0.072 scale newpath 4000 4000 m 4000 5000 d 5000 5000 d 5000 4000 d closepath 30 setlinewidth stroke CLOSEPATH adds a line segment from the current point to the initial point in the path, thus we dont need the fourth LINETO command. It also closes the path with a mitered join so we dont have the notch as in the previous example. Once a closed path is dened, we can ll in the box using the FILL command (mypost6): %! PostScript file /m {moveto} def /d {lineto} def 0.072 0.072 scale newpath 4000 4000 m 4000 5000 d 5000 5000 d 5000 4000 d closepath 0.5 setgray fill showpage Before doing the ll, we set the gray level for printing at 0.5. The gray levels range from zero (black) to one (white).
9.1. PSPLOT FORTRAN SUBROUTINES
279
Of course we can also print text of various sizes and types. Here is an example (mypost7): %! PostScript file /m {moveto} def /d {lineto} def 0.072 0.072 scale
/Times-Roman findfont 300 scalefont setfont 2000 3000 moveto (Example text) show showpage Here we set the type of font with /Times-Roman ndfont and then set the font size with: 300 scalefont This sets the font height to 0.3 inches. The scaled font must then be set to the current font with the SETFONT command. We then move to the point (2000 3000). Finally, we print the string Example text at the current location using the SHOW command: (Example text) show We must enclose the text in parentheses to denote it as a string. There are lots of dierent choices for the fonts. The main choices for science graphics are Helvetica and Times and their bold and italic variations. Well, I could go on with these examples but it would be better to just consult a Postscript book to learn more about all the things one can do. Two good ones are: Postscript Language, Tutorial and Cookbook and Postscript, Language Reference Manual, both by Adobe Systems and printed by Addison-Wesley.
9.1
PSPLOT Fortran subroutines
It is useful to be able to generate Postscript les directly from within a Fortran program. Given knowledge of the Postscript language, one could write the appropriate commands to a le. For convenience, I have written a set of Fortran subroutines for making Postscript les that handle a lot of the bookkeeping details and allow the user to concentrate on the graphics. These routines are contained in:
280
CHAPTER 9. POSTSCRIPT PLOTTING /users/shearer/PROG/PLOT/psplot.f
To call these routines from a F90 program, they should be linked with: /users/shearer/PROG/PLOT/psplotlib.a Documentation for these routines is contained in: /users/shearer/PROG/PLOT/psplot.man Here is an example program that uses some of these routines: program testpsplot implicit none call PSFILE(mypost) call PSWIND(1.5, 7.5, 3.0, 7.0, 0., 10., 0., 20.) call PSAXES(1., 2., 0.05, i3, 5., 10., 0.05, i3) call PSLAX(X label, 0.5, Y label, 0.5) call PSMOVE(2., 2.) call PSDRAW(5., 8.) call PSEND end program testpsplot The resulting plot is shown in Figure 9.1. Let us consider the commands one at a time: call PSFILE(mypost) This opens the Postscript le with name mypost (assigning it to Fortran unit number 17; units 17 and 18 should not be used in any program that uses the psplot routines). We could of course give the le any name we want at this stage. The argument could also be a string variable, allowing the user to input the name. call PSWIND(1.5, 7.5, 3.0, 7.0, 0., 10., 0., 20.) This denes a user coordinate system that will be used for subsequent psplot commands. The rst four numbers (x1,x2,y1,y2) dene the location of a coordinate box on the page in inches. In this case, (1.5, 3.0) and (7.5, 7.0) dene the (x,y) coordinates of the lower left and upper right corner of the this rectangle in inches.
9.1. PSPLOT FORTRAN SUBROUTINES
281
Figure 9.1: The result of the testpsplot program. The next four numbers (x3,x4,y3,y4) dene the corresponding values at these points for the user scale. Thus in this case, the lower left point in user coordinates is (0, 0) while the upper right point is (10,20). All other points will be linearly interpolated (or extrapolated) using these values. Often we will want to plot the frame dened by this imaginary box (as in the PSAXES command below) but this is not required to dene the coordinate transformation. call PSAXES(1., 2., 0.05, i3, 5., 10., 0.05, i3) This draws a frame with tics and numbered tic labels as specied. The position of the frame is assumed to be dened by the corners set in PSWIND. Here is the documentation for the PSAXES command: PSAXES(xtic,xlab,xticl,xfrmt,ytic,ylab,yticl,yfrmt) makes a frame with tics and tic labels as specified. The position of the frame is assumed to be defined by the corners set in WINDOW. xtic = small tic increment ylab = numbered tic increment (length is 2*nxtic) xticl = small tic length (inches) xfrmt = format for x-axis numbers (see PSNUMB for format details) etc. for y Note: Dont make tic increments negative even for reversed
282
CHAPTER 9. POSTSCRIPT PLOTTING scale plots. PSAXES should figure it out correctly. If you dont want tic labels or a particular axis, set xfrmt or yfrmt to a blank character, i.e. to .
Next we draw axes labels with the PSLAX command: call PSLAX(X label, 0.5, Y label, 0.5) The labels are oset by 0.5 inches from the frame. Next we draw a line between user-coordinate points at (2,2) and (5,8): call PSMOVE(2., 2.) call PSDRAW(5., 8.) Finally, we must close the plot le: call PSEND If you leave out this step, the program will not put a showpage command at the end of the le and your Postscript plot will not work! WARNING: Most of the numerical inputs to the PSPLOT routines must be real numbers! Do not use 2 instead of 2. because the integer and real binary representations of 2 are dierent (the subroutines have no way of knowing that you used 2 rather than 2.) Note, however, that you can use i3 for the format for the axes numbers if you know that there wont be decimal places in the numbers. Here is a more complicated example that demonstrates many of the PSPLOT subroutines. See the psplot.doc le for details. program testpsplot2 implicit none real :: x, y integer :: i, icol character (len=10) :: text call PSFILE(mypost) call PSWIND(1.5, 7.5, 3.0, 7.0, 0., 10., 0., 20.) call call call call PSAXES(1., 2., 0.05, i3, 5., 10., 0.05, i3) PSLAX(X label, 0.3, Y label, 0.4) PSTIT(Example title,0.2) PSNOTE(note that you might want to put down here)
call PSFRAM(1., 1.5, 2., 3.) call PSBOX(1., 1.5, 4., 5., 0.5)
9.1. PSPLOT FORTRAN SUBROUTINES call PSCBOX(1., 1.5, 6., 7., 0., 1., 1.) call PSGRAY(0.) call PSMOVE(0.5, 9.) call PSDRAW(1., 10.) call PSDRAW(1., 15.) call PSTIC(0.1, 0., 1) x = 3.1415926 call PSNUMB(x,f6.3) do icol=1,15 x=2.5 y=float(icol)+2. call PSMOVE(x,y) call PSCOL(icol) call PSSYMB(-3,0.15) call PSMOVE(x,y) call PSTIC(0.2, 0., 0) call PSNUMB(float(icol),i3) enddo call PSGRAY(0.) do i=1,9 call PSLORG(i) x=8.5 y=float(i)*2 call PSMOVE(x, y) call PSSYMB(1, 0.1) write (text,(a4,i1)) lorg, i call PSLAB(text) enddo call call call call call call call call call call call call call call call call call PSMOVE(5., 2.) PSSYMB(1, 0.1) PSANG(45.) PSCOL(2) PSLORG(1) PSLAB(Red text) PSMOVE(5., 7.) PSSYMB(2, 0.05) PSANG(90.) PSCOL(3) PSLAB(blue text) PSMOVE(5., 12.) PSFONT(hb14) PSSYMB(3, 0.05) PSANG(0.) PSCOL(1) PSLAB(bold text)
283
284 call call call call call call PSMOVE(5., 15.) PSFONT(h8) PSSYMB(3, 0.05) PSANG(0.) PSCOL(1) PSLAB(small font)
call PSEND end program testpsplot2
Figure 9.2: The result of the testpsplot2 program. and the resulting plot is shown in Figure 9.2. The PSIMAG command allows the display of a bitmap in either black and white or color. This can be rather tricky to use but is powerful once you understand how it works. Here is an example program:
9.1. PSPLOT FORTRAN SUBROUTINES program testpsimag implicit none integer :: i, j integer*2, dimension(50, 50) :: ibuf integer*2, dimension(3, 50, 50) :: ibuf3 integer :: red, green, blue real :: gray call PSFILE(mypost) ! First do black and white image call PSWIND(1.5, 7.5, 6.0, 9.0, 0., 50., 0., 50.) do i = 1, 50 do j = 1, 50 gray = sqrt(real(i*j))/50. ibuf(i,j) = nint(gray*255.) !must be 0 to 255 enddo enddo call PSIMAG(ibuf, 50, 50, 0) call PSAXES(5., 10., 0.05, i3, 5., 10., 0.05, i3) call PSLAX(X label, 0.25, Y label, 0.4) call PSTIT(Square root of x*y ,0.15) ! Then do color image call PSWIND(1.5, 7.5, 2., 5., 0., 1., 0., 1.) do i = 1, 50 do j = 1, 50 red = nint(real(i)*(255./50.)) green = nint(real(j)*(255./50.)) blue = 0 ibuf3(1,i,j) = red !must be 0 to 255 ibuf3(2,i,j) = green ibuf3(3,i,j) = blue enddo enddo call PSIMAG(ibuf3, 50, 50, 1) call PSAXES(0.1, 0.2, 0.05, f4.1, 0.1, 0.2, 0.05, f4.1) call PSLAX(Postscript Red, 0.3, Postscript Green, 0.45) call PSTIT(RGB colors, Green = 0, 0.15) call PSEND end program testpsimag and the resulting plot is shown in Figure 9.3.
285
286
Figure 9.3: The result of the testpsimag program.

Introduction to Computing at SIO: Notes on Scientific Programming

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Introduction to Computing at SIO: Notes on Scientific Programming

Enviado por

Direitos autorais:

Formatos disponíveis

Introduction to Computing at SIO: Notes for Fall class, 2012

4.6 4.7 4.8 4.9 4.10 4.11 4.12

9 Postscript plotting 275 9.1 PSPLOT Fortran subroutines . . . . . . . . . . . . . . . . . . . . . . 279

Scientic Computing at SIO

Linex is pronounced similar to linen (see http://www.paul.sladen.org/pronunciation/)

1.1. SCIENTIFIC COMPUTING AT SIO

Why you should learn a real language

1.4. FORTRAN VS. C

Peter Shearer, pshearer@ucsd.edu, x42260, Munk Lab, IGPP

Class web site: http://mahi.ucsd.edu/class233/

CHAPTER 2. UNIX INTRODUCTION

2.2. BASIC COMMANDS

CHAPTER 2. UNIX INTRODUCTION

2.2. BASIC COMMANDS To change your directory, use: cd dirname

CHAPTER 2. UNIX INTRODUCTION

Files and editing

Basic commands, continued

CHAPTER 2. UNIX INTRODUCTION

cp filename1 filename2 makes a copy of lename1 called lename2.

2.4. BASIC COMMANDS, CONTINUED

The .login and .cshrc les

CHAPTER 2. UNIX INTRODUCTION

CHAPTER 2. UNIX INTRODUCTION

2.6. FILE TRANSFER AND COMPRESSION

File transfer and compression

CHAPTER 2. UNIX INTRODUCTION

2.6. FILE TRANSFER AND COMPRESSION

CHAPTER 2. UNIX INTRODUCTION

Using the tar command

2.6. FILE TRANSFER AND COMPRESSION

Remote logins and job control

2.7. MISCELLANEOUS COMMANDS wc filename

CHAPTER 2. UNIX INTRODUCTION

2.8. ADVANCED UNIX

CHAPTER 2. UNIX INTRODUCTION

2.9. EXAMPLE OF UNIX SCRIPT TO PROCESS DATA

Some sed and awk examples

Example of UNIX script to process data

CHAPTER 2. UNIX INTRODUCTION

2.9. EXAMPLE OF UNIX SCRIPT TO PROCESS DATA

34 \rm procdata.log cd data.dir \rm *.proc

CHAPTER 2. UNIX INTRODUCTION

Common UNIX command summary

cat filename cd cd cd cd cd dirname .. ~/dirname ~otheruser

lpr -P silo filename ls ls ls ls ls -l -a -F *.f list list list flag list

CHAPTER 2. UNIX INTRODUCTION

Generic Mapping Tools

Installation of Fink, GMT other useful tools

CHAPTER 3. GENERIC MAPPING TOOLS

3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS

3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS -G200

-G0/255/0 >! map.ps

3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS

51.88370 -176.68440 -13.90930 -171.77730

Sets the plot boundaries (defaults to those set by pscoast)

CHAPTER 3. GENERIC MAPPING TOOLS

Set ll parameter to 0 (black). This will ll in the triangles. -:

CHAPTER 3. GENERIC MAPPING TOOLS

Use Mercator projection, width = 6 inches

-B1g1:."IRIS FARM stations":

-I1 Plot permanent major rives

-I2 Plot additional major rivers