Structure of Programming Languages

Structure of Programming Languages I. COURSE OUTLINE Basic Concepts of Programming Languages A. B. C. II.
. Brief PL History Reasons for Studying Concepts of PL Programming Domains
Language Development Issues A. B. C. D. E. Readability Writability Reliability Factors Influence language design Language Design Trade-offs
III.
Language Implementation Issues A. Implementation Method 1. 2. 3. Compilation Interpreting Hybrid
IV.
COURSE OUTLINE Programming Environments A. Programming Paradigms
V.
Identifiers, Reserved Words and Keywords A. B. C. D. Identifiers/Names Reserved Words Keywords Predefined Identifiers
VI.
Variables A. Variable Attributes
B. C. D. E. F. VII.
Name Address Type Value Lifetime and Scope
COURSE OUTLINE Binding A. B. C. Name Binding Attributes to Variables Address and Lifetime
VIII.
Type-checking A. B. Strong-Typing Structurally Compatible
IX. X.
Scope, Lifetime and Referencing Environments Data Types A. B. C. D. E. F. Primitive Data Types Structured Data Types User-Defined Data Types Multi Dimension Arrays Records Pointers
XI.
COURSE OUTLINE Program Syntax A. Language 1. 2. 3. Lexemes and tokens Recognition Generation
XII.
Language Generators A. B. C. Grammar Regular Grammars/Expressions BNF (Backus-Naur Form) grammars
XIII.
Semantics A. B. C. Static Semantics Dynamic Semantics Attributes
XIV.
COURSE OUTLINE Expressions A. B. C. Precedence and Associativity Operator Overloading Coercion and conversion 1. D. E. F. Explicit and Implicit
Boolean expressions Short-Circuit Evaluation Assignment
XV.
Control Statements A. B. Selection Iteration
XVI.
Subprograms A. B. C. D. Procedures Functions Local variables Parameters
Chapter I Basic Concepts of Programming Languages
Topics 1. 2. 3. 4. Brief PL History Reasons for Studying Concepts of PL Programming Domains Brief PL History
A programming language is a set of words, codes, and symbols that allow a programmer to give instructions to the computer. Many programming languages exist, each with their own rules, or syntax, for writing these instructions. Programming languages can be classified as low-level and high-level languages. Low-level programming languages include machine language and assembly language. Machine language, which is referred to as a first generation programming language, can be used to communicate directly with the computer. Brief PL History However, it is difficult to program in machine language because the language consists of 0s and 1s to represent the status of a switch (0 for off and 1 for on). Assembly language uses the same instructions and structure as machine language but the programmer is able to use meaningful names or abbreviations instead of numbers. Assembly language is referred to as a second generation programming language. Brief PL History High-level programming languages, which are often referred to as third generation programming languages (3GL), were first developed in the late 1950s. High-level programming languages have Englishlike instructions and are easier to use than machine language. High-level programming languages include Fortran, C, BASIC, COBOL, and Pascal. In order for the computer to understand a program written in a high-level language, programmers convert the source code into machine language using a compiler or an interpreter. A compiler is a program that converts an entire program into machine code before the program is executed. An interpreter translates and executes an instruction before moving on to the next instruction in the program. Brief PL History In the 1980s, object-oriented programming (OOP) evolved out of the need to better develop complex programs in a systematic, organized approach. The OOP approach allows programmers to create modules that can be used over and over again in a variety of programs. These modules contain code called classes, which group related data and actions. Properly designed classes encapsulate data to hide the implementation details, are versatile enough to be extended through inheritance, and give the programmer options through polymorphism. Object-oriented languages include Java, C++ and Visual Basic. Brief PL History Grace Murray Hopper 1906 1992
1.
The first compiler, A-0, was created in 1951 by Grace Murray Hopper, a Commodore in the Navy at the time. She is also credited with developing COBOL with the United States Department of Defense in 1959. She later used the term debug when removing a moth that had flown into the circuitry of a Mark II computer.
Brief PL History In 1957, John Backus and a team of researchers developed Fortran. In the 1960s, John Kemeny and Thomas Kurtz developed BASIC at Dartmouth University. In the late 1970s, the United States Department of Defense developed Ada, a high-level language that supports real-time applications. Also in the 1970s, the C programming language was created by Dennis Ritchie at Bell Laboratories. Brief PL History Fourth and Fifth Generation Languages 1. Fourth generation languages (4GL), such as SQL, have higher English-like instructions than most high-level languages and are typically used to access databases. Fifth generation languages are used for artificial intelligence. Reasons for Studying Concepts of PL
2.
1. To be more expressive 2. Improve ability to choose appropriate languages 3. Improve ability to learn new languages 4. Better understanding of the significance of implementation 5. Increase ability to design new languages 6. Overall advancement of computing Reasons for Studying Concepts of PL Increased ability to express ideas. It is believed that the depth at which we think is influenced by the expressive power of the language in which we communicate our thoughts. It is difficult for people to conceptualize structures they can t describe, verbally or in writing. Language in which they develop S/W places limits on the kinds of control structures, data structures, and abstractions they can use. Awareness of a wider variety of P/L features can reduce such limitations in S/W development. Can language constructs be simulated in other languages that do not support those constructs directly?
Reasons for Studying Concepts of PL
Improved background for choosing appropriate languages Many programmers, when given a choice of languages for a new project, continue to use the language with which they are most familiar, even if it is poorly suited to new projects. If these programmers were familiar with other languages available, they would be in a better position to make informed language choices.
Reasons for Studying Concepts of PL Greater ability to learn new languages Programming languages are still in a state of continuous evolution, which means continuous learning is essential. Programmers who understand the concept of OO programming will have easier time learning Java. Once a thorough understanding of the fundamental concepts of languages is acquired, it becomes easier to see how concepts are incorporated into the design of the language being learned.
Reasons for Studying Concepts of PL Understand significance of implementation Understanding of implementation issues leads to an understanding of why languages are designed the way they are. This in turn leads to the ability to use a language more intelligently, as it was designed to be used.
Ability to design new languages The more languages you gain knowledge of, the better understanding of programming languages concepts you understand.
Reasons for Studying Concepts of PL Overall advancement of computing In some cases, a language became widely used, at least in part, b/c those in positions to choose languages were not sufficiently familiar with P/L concepts. Many believe that ALGOL 60 was a better language than Fortran; however, Fortran was most widely used. It is attributed to the fact that the programmers and managers didn t understand the conceptual design of ALGOL 60. Do you think IBM has something to do with it?
Programming Domains
1. 2. 3. 4. 5. 6. 7.
Scientific applications Business applications Artificial intelligence Systems programming Scripting languages Web-based applications Special-purpose languages Programming Domains
1. Scientific applications In the early 40s computers were invented for scientific applications. The applications require large number of floating point computations. Fortran was the first language developed scientific applications. ALGOL 60 was intended for the same use. simple data structures, large amount of floating-point arithmetic computations, efficiency e.g.: Fortran, C/C++, Algol 60
Programming Domains
2. Business applications The first successful language for business was COBOL. Produce reports, use decimal arithmetic numbers and characters. The arrival of PCs started new ways for businesses to use computers. Spreadsheets and database systems were developed for business. elaborate input and output facilities, decimal data types, spread-sheet, databases e.g.: Cobol, SQL
Programming Domains
3. Artificial intelligence Symbolic rather than numeric computations are manipulated. Symbolic computation is more suitably done with linked lists than arrays.
LISP was the first widely used AI programming language. symbolic processing, lists as primary data structures e.g.: Lisp, Proglog, OPS-5
Programming Domains
4. Systems programming The O/S and all of the programming supports tools are collectively known as its system software. Need efficiency because of continuous use. efficiency, low-level features, portable e.g.: PL/S (IBM), C/C++, Assembly
Programming Domains
5. Scripting languages Put a list of commands, called a script, in a file to be executed. PHP is a scripting language used on Web server systems. Its code is embedded in HTML documents. The code is interpreted on the server before the document is sent to a requesting browser. string processing, pattern matching, closely integration with file systems e.g.: tcl, awk, bash, sh
Programming Domains
6. Web-based applications platform independent, e.g.: html, xml, VB-script, java-script, java servlets/JSP
Programming Domains
7. Special-purpose languages Chapter II Language Development Issues RPG is an example of these languages. e.g.: RPG: used to produce business reports, GPSS: used for system simulation
Topics 1. 2. 3. 4. 5. 6. 7. Readability Orthogonality Writability Reliability Factors Influence language design Language Design Trade-offs Readability
Overall Simplicity: Strongly affects readability in terms of the number of elementary features 1. 2. Too many features: make the language difficult to learn. Feature Multiplicity: having more than one way to accomplish a particular operation
e.g.: c = c + 1; c++; ++c; c += 1; Readability
Operator Overloading: a single operator symbol has more than one meaning. sensible overloading operator symbols is important
Although this is a useful feature, it can lead to reduced readability if users are allowed to create their own overloading and do not do it sensibly. How easy is it to understand what the program does? Decisive factor for software maintenance
Orthogonality Makes the language easy to learn and read. Meaning is context independent. Pointers should be able to point to any type of variable or data structure. The lack of orthogonality leads to exceptions to the rules of the language. A relatively small set of primitive constructs can be combined in a relatively small number of ways to build the control and data structures of the language.
Every possible combination is legal and meaningful. Orthogonality The more orthogonal the design of a language, the fewer exceptions the language rules require. The most orthogonal programming language is ALGOL 68. Every language construct has a type, and there are no restrictions on those types. This form of orthogonality leads to unnecessary complexity. Writability It is a measure of how easily a language can be used to create programs for a chosen problem domain. Most of the language characteristics that affect readability also affect writability. Simplicity and orthogonality A smaller number of primitive constructs and a consistent set of rules for combining them is much better than simply having a large number of primitives.
Writability Support for abstraction Abstraction means the ability to define and then use complicated structures or operations in ways that allow many of the details to be ignored. A process abstraction is the use of a subprogram to implement a sort algorithm that is required several times in a program instead of replicating it in all places where it is needed.
Expressivity It means that a language has relatively convenient, rather than cumbersome, ways of specifying computations. Ex: ++count count = count + 1 // more convenient and shorter
Writability How easy is it to write a program? To formulate / express the problem and its solution? Writing programs is the basic motivation for developing programming languages A language can have different writeability for different application areas No writeability without readability Reliability A program is said to be reliable if it performs to its specifications under all conditions.
Type checking: is simply testing for type errors in a given program, either by the compiler or during program execution. The earlier errors are detected, the less expensive it is to make the required repairs. Java requires type checking of nearly all variables and expressions at compile time.
Exception handling: the ability to intercept run-time errors, take corrective measures, and then continue is a great aid to reliability. Reliability Aliasing: it is having two or more distinct referencing methods, or names, for the same memory cell. It is now widely accepted that aliasing is a dangerous feature in a language.
Readability and writability: Both readability and writability influence reliability. Are programs doing what they are supposed to do? No reliability without readability and writeability The most important criterion for some applications Factors Influence language design
1. 2.
Computer Architecture Programming Methodologies Computer Architecture Von Neumann architecture Data and programs are stored in the same memory CPU is separated from memory Data and instructions must be piped from memory to CPU Results of operations are stored back to memory
Related Programming Constructs: Variables model memory cells Assignment: based on piping operation Iterative form of repetition is most efficient, because instructions are stored in adjacent memory cells Recursive form of repetition is discouraged
Non-Von Neumann architectures
Data-flow computers: support functional languages
Programming Methodologies Cost of software and hardware: 1970s language deficiencies: incompleteness of type checking inadequacy of control statements leads to extensive use of goto s lack of facilities for exception handling
Shift from operation abstraction to data abstraction: Inheritance: reuse Dynamic type checking: generic types Concurrent Programming: Language Design Trade-offs
1. 2. 3. 4.
Low-level vs. High-level Imperative vs. Declarative Basic Operators vs. Advanced Operators Interpreters vs. Compilers Low-level vs. High-level Low-level interfaces, such as assembly language, often perform more efficiently than high-level interfaces. However, high-level programming filters increases usability since they are easier to program. High-level filters are also more portable since they are less dependent upon hardware and underlying internal filter representation. Imperative vs. Declarative The declarative approach makes filter programs more concise and easier to write when compared with the imperative approach. Thus, the declarative approach increases usability, extensibility and maintainability of the monitoring programs. However, declarative languages require programming within the language framework. As a result, the declarative approach imposes some limitations that may decrease the expressiveness of event filter programming. Imperative vs. Declarative For example, declarative filtering interfaces such as Path Finder are customized to work for specific applications. In contrast, the imperative approach overcomes this limitation through the use of language constructors which permit more expressive filtering programs. An example of a filter imperative language
interface is the Interpretive Pseudo Machine (IPM). Designing a declarative and expressive monitoring language is a challenging issue of which this section attempts to address. Basic Operators vs. Advanced Operators Some filter programming interfaces provide basic operators (such as AND, OR, and NOT), while others provide more advanced operators such as Before, After, and Sequence. Advanced operators increase the expressive power of event filtering expressions. There are two disadvantages of using advanced operators: the increased performance overhead at run-time due to interpretation and processing of these operators. the increase complexity of use. Conversely, basic operators are usually simplistic as they represent the core instructions of the filter expression. Therefore, basic operators typically incur less runtime overhead and are easier to use. Interpreters vs. Compilers Interpreters are normally used when filters are implemented in the OS kernel. In this type of implementation, interpreters provide better system protection and robustness than compilers. Compilers are more convenient when the filtering mechanism is implemented in user-level applications. Compilers permit dynamic linking and run-time optimization thereby increasing run-time efficiency of event filtering mechanisms. Interpreters vs. Compilers Interpreters continuously re-examine program code increasing execution overhead and causing significant degradation in monitoring performance. Interpreters may also have greater core storage requirements when compared with storage needs for compilers. The interpreter and supported routines usually must be kept in memory simultaneously using larger amounts of core storage resources. In contrast, compilers dynamically link to target routines at runtime, which minimizes space utilization.
1.
2.
Chapter III Language Implementation Issues Topics 1. 2. 3. Compilation Interpreting Hybrid
Programming Language Implementation A system for executing programs written in a programming language.
1. 2.
There are two general approaches to programming language implementation: Interpretation Compilation Interpretation An interpreter takes as input a program in some language, and performs the actions written in that language on some machine. Program is executed in software, by an interpreter, without going through any form of translations Source level instructions are executed by a virtual machine Allows for robust run-time error checking and debugging Penalty is speed of execution Example: Web server scripts
Compilation A compiler takes as input a program in some language, and translates that program into some other language, which may serve as input to another interpreter or another compiler. Source code is converted by the compiler into binary code that is directly executable by the computer Compilation process can be broken into 4 separate steps: Lexical Analysis Syntax Analysis Semantic Analysis/Intermediate Code Generation Code Generation
Compilation Lexical Analysis Breaks up codes into lexical units, or tokens Token is a string of characters, categorized according to the rules as symbol Examples of tokens: reserved words, identifiers, punctuations
Feeds the tokens to syntax analyzer
Syntax Analysis Tokens are parsed and examined for correct syntactic structure, based on the rules for the language Programmer syntax errors are detected in this phase
Compilation Semantic Analysis/Intermediate Code Generation Declaration and type errors are checked here Intermediate code generated is similar to assembly code Optimizations can be done here as well, for example: Unnecessary statements eliminated Statements moved out of loops if possible Recursion removed if possible
Code Generation Intermediate code is converted into executable code Code is also linked with libraries if necessary
Compilation Notice that a compiler does not directly execute the program. Ultimately, in order to execute a program via compilation, it must be translated into a form that can serve as input to an interpreter. When a piece of computer hardware can interpret a programming language directly, that language is called machine code. A so-called native code compiler is one that compiles a program into machine code. Actual compilation is often separated into multiple passes, like code generation (often for assembler language), assembling (generating native code), linking, loading and execution. Compilation If a compiler of a given high level language produces another high level language it is called translator (source to source translation), which is often useful to add extensions to existing languages or to exploit good and portable implementation of other language (for example C), simplifying development. Compilation Many combinations of interpretation and compilation are possible, and many modern programming language implementations include elements of both. For example, the Smalltalk programming language is conventionally implemented by compilation into bytecode, which is then either interpreted or compiled by a virtual machine (most popular ways is to use JIT or AOT compiler compilation). This implementation strategy has been copied by many languages since Smalltalk pioneered it in the 1970s and 1980s.
Chapter IV Programming Environments Topics
1.
Programming Paradigms/Patterns
Programming Paradigms We will here look at the meaning of the word 'paradigm', as it appears in 'The American Heritage Dictionary of the English Language, Third Edition': "An example that serves as pattern or model."
Programming Paradigms Another and slightly more complicated explanation stems from the 'The Merriam-Webster s Collegiate dictionary':
"A philosophical and theoretical framework of a scientific school or discipline within which theories, laws, and generalizations and the experiments performed in support of them are formulated" Programming Paradigms Programming paradigm (in this course) A pattern that serves as a school of thoughts for programming of computers
Programming technique Related to an algorithmic idea for solving a particular class of problems Examples: 'Divide and conquer' and 'program development by stepwise refinement
Programming style The way we express ourselves in a computer program Related to elegance or lack of elegance
Programming culture The totality of programming behavior, which often is tightly related to a family of programming languages The sum of a main paradigm, programming styles, and certain programming techniques.
The main programming paradigms In the concept definition below, we characterize a main programming paradigm in terms of an idea and a basic discipline.
A main programming paradigm stems an idea within some basic discipline which is relevant for performing computations Programming Paradigms Main programming paradigms The imperative paradigm The functional paradigm The logical paradigm The object-oriented paradigm
Other possible programming paradigms The visual paradigm One of the parallel paradigms The constraint based paradigm
the four main programming paradigms Imperative paradigm Functional paradigm Logic paradigm Object-oriented paradigm Imperative paradigm The word 'imperative' can be used both as an adjective and as a noun. As an adjective it means 'expressing a command or plea'. In other words, asking for something to be done. As a noun, an imperative is a command or an order. Some programming languages, such as the object oriented language Beta, uses the word 'imperative' for commands in the language.
Imperative paradigm First do this and Next do that
The 'first do this, next do that' is a short phrase which really in a nutshell describes the spirit of the imperative paradigm. The basic idea is the command, which has a measurable effect on the program state. The phrase also reflects that the order to the commands is important. 'First do that, then do this' would be different from 'first do this, then do that'. Imperative paradigm
In the itemized list below we describe the main properties of the imperative paradigm. Characteristics: Discipline and idea Digital hardware technology and the ideas of Von Neumann
Incremental change of the program state as a function of time. Execution of computational steps in an order governed by control structures We call the steps for commands
Imperative paradigm Straightforward abstractions of the way a traditional Von Neumann computer works Similar to descriptions of everyday routines, such as food recipes and car repair Typical commands offered by imperative languages Assignment, IO, procedure calls
Language representatives Fortran, Algol, Pascal, Basic, C
The natural abstraction is the procedure Abstracts one or more actions to a procedure, which can be called as a single command. "Procedural programming"
Functional Paradigm Functional programming is in many respects a simpler and more clean programming paradigm than the imperative one. The reason is that the paradigm originates from a purely mathematical discipline: the theory of functions. As described in Section 2.1, the imperative paradigm is rooted in the key technological ideas of the digital computer, which are more complicated, and less 'clean than mathematical function theory. Functional Paradigm Below we characterize the most important, overall properties of the functional programming paradigm. Needless to say, we will come back to most of them in the remaining chapters of this material. Evaluate an expression and use the resulting value for something
Functional Paradigm Characteristics:
Discipline and idea Mathematics and the theory of functions
The values produced are non-mutable Impossible to change any constituent of a composite value As a remedy, it is possible to make a revised copy of composite value
Atemporal Abstracts a single expression to a function which can be evaluated as an expression Functions are first class values Functions are full- fledged data just like numbers, lists, ...
Fits well with computations driven by needs Opens a new world of possibilities
Logic Paradigm The logic paradigm is dramatically different from the other three main programming paradigms. The logic paradigm fits extremely well when applied in problem domains that deal with the extraction of knowledge from basic facts and relations. The logical paradigm seems less natural in the more general areas of computation. Answer a question via search for a solution
Logic Paradigm Below we briefly characterize the main properties of the logic programming paradigm. Characteristics: Discipline and idea Automatic proofs within artificial intelligence
Based on axioms, inference rules, and queries. Program execution becomes a systematic search in a set of facts, making use of a set of inference rules
Object-Oriented Paradigm The object-oriented paradigm has gained great popularity in the recent decade. The primary and most direct reason is undoubtedly the strong support of encapsulation and the logical grouping of program aspects. These properties are very important when programs become larger and larger.
Object-Oriented Paradigm The underlying, and somewhat deeper reason to the success of the object-oriented paradigm is probably the conceptual anchoring of the paradigm. An object-oriented program is constructed with the outset in concepts, which are important in the problem domain of interest. In that way, all the necessary technicalities of programming come in second row.
Send messages between objects to simulate the temporal evolution of a set of real world phenomena Object-Oriented Paradigm As for the other main programming paradigms, we will now describe the most important properties of object-oriented programming, seen as a school of thought in the area of computer programming. Characteristics: Discipline and idea The theory of concepts, and models of human interaction with real world Phenomena
Data as well as operations are encapsulated in objects Information hiding is used to protect internal properties of an object
Object-Oriented Paradigm Objects interact by means of message passing A metaphor for applying an operation on an object
In most object-oriented languages objects are grouped in classes Objects in classes are similar enough to allow programming of the classes, as opposed to programming of the individual objects Classes represent concepts whereas objects represent phenomena
Classes are organized in inheritance hierarchies Provides for class extension or specialization
Object-Oriented Paradigm This ends the overview of the four main programming paradigms. From now on the main focus will be functional programming in Scheme, with special emphasis on examples drawn from the domain of web program development.
Chapter V Identifiers, Reserved Words and Keywords Topics
Identifiers/Names Reserved Words Keywords Predefined Identifiers Identifier An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical object or physical substance. The abbreviation ID often refers to identity, identification, or an identifier. An identifier may be a word, number, letter, symbol, or any combination of those. String of characters used to name an entity within a program Identifier The words, numbers, letters, or symbols may follow an encoding system (wherein letters, digits, words, or symbols stand for (represent) ideas or longer names) or they may simply be arbitrary. When an identifier follows an encoding system, it is often referred to as a code or ID code. Identifiers that do not follow any encoding scheme are often said to be arbitrary IDs; they are arbritrarily assigned and have no greater meaning. (Sometimes identifiers are called "codes" even when they are actually arbitrary, whether because the speaker believes that they have deeper meaning or simply because he is speaking casually and imprecisely.) Identifier String of characters used to name an entity within a program Associated with variables, tables, subprograms and formal parameters Most languages have similar rules for ids, but not always C++ and Java are case-sensitive, while Ada is not C++, Ada and Java allow underscores, while standard Pascal does not FORTRAN originally allowed only 6 chars
Reserved Words Reserved words (occasionally called keywords) are one type of grammatical construct in programming languages. These words have special meaning within the language and are predefined in the language s formal specifications. Typically, reserved words include labels for primitive data types in languages that support a type system, and identify programming constructs such as loops, blocks, conditionals, and branches. Name whose definition is part of the syntax of the language
Cannot be used by programmer in any other way
Reserved Words The list of reserved words in a language are defined when a language is developed. Occasionally, depending on the flexibility of the language specification, vendors implementing a compiler may extend the specification by including non-standard features. Also, as a language matures, standards bodies governing a language may choose to extend the language to include additional features such as objectoriented capabilities in a traditionally procedural language. Sometimes the specification for a programming language will have reserved words that are intended for possible use in future versions. Reserved Word Name whose definition is part of the syntax of the language Cannot be used by programmer in any other way
Most newer languages have reserved words Make parsing easier, since each reserved word will be a different token
Example: if, else Keyword In computer programming, a keyword is a word or identifier that has a particular meaning to the programming language. The meaning of keywords and, indeed, the meaning of the notion of keyword differs widely from language to language. In many languages, such as C and similar environments like C++, a keyword is a reserved word which identifies a syntactic form. Words used in control flow constructs, such as if, then, and else are keywords. In these languages, keywords cannot also be used as the names of variables or functions. Keyword Some languages, such as PostScript, are extremely liberal in this approach, allowing core keywords to be redefined for specific purposes. In Common Lisp, the term "keyword" (or "keyword symbol") is used for a special sort of symbol, or identifier. Unlike other symbols, which usually stand for variables or functions, keywords are selfquoting and evaluate to themselves. Keywords are usually used to label named arguments to functions, and to represent symbolic values. Keywords To some, keyword reserved word Ex: C++, Java To others, there is a difference
Keywords are only special in certain contexts Can be redefined in other contexts Ex: FORTRAN keywords may be redefined
Predefined Identifiers Identifiers defined by the language implementers, which may be redefined cin, cout in C++ real, integer in Pascal predefined classes in Java
Programmer may wish to redefine for a specific application Ex: Change a Java interface to include an extra method
Chapter VI Variables Topics Simple (nave) definition: a name for a memory location Abstraction of the computer memory cell or collection of cells
Six attributes Name Address Value Type Lifetime Scope
Variables The scope of variables can be: 1. 2. global or; local
A global variable s scope includes all the statements in a program.
1. Name
The scope of a local variable includes only statements inside the function in which it is declared. The same identifier can be reused inside different functions to name different variables. A name is local if it is declared in the current scope, and it is global if declared in an outer scope. Variables
Identifier In most languages the same name may be used for different variables, as long as there is no ambiguity
2. Address Location in memory Some situations that are possible: Different variables with the same name have different addresses Declared in different blocks of the program
Same variable has different addresses at different points in time
Variables Different variables share same address: aliasing C++ unions, pointer variables, reference parameters Adds to the flexibility of a language, especially with pointers and reference parameters
Can also save memory in some situations Many references to a single copy rather than having multiple copies
Can be quite problematic if programmer does not handle them correctly Ex: copy constructor and = operator for classes with dynamic components in C++ Ex: shallow copy of arrays in Java
3. Type
Variables
Determines the range of values the variable can have and set of operations that are defined and how they can be used in the given programming environment. Modern data types include both the structure of the data and the operations that go with it
4. Value Contents of memory locations allocated for that variable is an expression which cannot be evaluated.
Variables
5. Lifetime 6. Scope Section of the program in which a variable is visible Accessible to the programmer/code in that section Time during which the variable is bound to a specific memory location Period of time when location is allocated to program The lifetime of a variable is the interval of time in which storage is bound to the variable.
Region of program text where declaration is visible Inner declaration of x hides outer: hole in scope
Binding Association of variable attributes with actual values Association such as between an attribute and an entity or between an operation and a symbol The time when each of these occurs in a program is the binding time of the attribute Static binding: occurs before runtime and does not change during program execution Dynamic binding: occurs during runtime or changes during runtime
Chapter VII Binding Topics 1. 2. 3. Name Binding Attributes to Variables Address and Lifetime
Possible Binding Times Language design time -- bind operator symbols to operations
Language implementation time-- bind floating point type to a representation Compile time -- bind a variable to a type in C or Java Load time -- bind a FORTRAN 77 variable to a memory cell (or a C static variable) Runtime -- bind a non-static local variable to a memory cell Name 1. 2. Identifier In most languages the same name may be used for different variables, as long as there is no ambiguity
Occurs when program is written (chosen by programmer) and for most languages will not change: static 1. Binding Variable Attributes: Names
Special characters 1. 2. 3. 4. PHP: all variable names must begin with dollar signs Perl: all variable names begin with special characters, which specify the variable s type Ruby: variable names that begin with @ are instance variables; those that begin with @@ are class variables
Case sensitivity 1. Disadvantage: readability (names that look alike are
different) Names in the C-based languages are case sensitive Convention: variables names do not include uppercase letters
Worse in C++, Java, and C# because predefined names are mixed case IndexOutOfBoundsException parseInt
Binding Variable Attributes: Address Address - the memory address with which it is associated 1. A variable may have different addresses at different times during execution
2. 3. 4. 5.
A variable may have different addresses at different places in a program If two variable names can be used to access the same memory location, they are called aliases Aliases are created via pointers, reference variables, C and C++ unions Aliases are harmful to readability (program readers must remember all of them)
Binding Variable Attributes: Type Type - determines the range of values of variables and the set of operations that are defined for values of that type; in the case of floating point, type also determines the precision 1. Dynamic Binding 2. Type associated with a variable is determined at run-time A single variable could have many different types at different points in a program
Static Binding Type associated with a variable is determined at compile-time (based on var. declaration) Once declared, type of a variable does not change
Binding Variable Attributes: Type Advantage of dynamic binding 1. More flexibility in programming 2. Can use same variable for different types
Disadvantage of dynamic binding Type-checking is limited and must be done at run-time
Allocation Process where memory cell to which a variable is bound must be somehow taken from a pool of available memory Deallocation Process of placing a memory cell that has been unbound from a variable back in the pool of available memory Binding Variable Attributes: Type Value - the contents of the location with which the variable is associated 1. The l-value of a variable is its address
2. 3.
The r-value of a variable is its value Dynamic by the nature of a variable can change during run-time
Binding Variable Attributes: Type Binding A binding is an association, such as between an attribute and an entity, or between an operation and a symbol Static Type Binding 1. 2. 3. if it occurs in compile time and remains unchanged throughout program execution. Explicit/implicit declarations May increase reliability
Dynamic Type Binding 1. 2. if it first occurs during execution or can change during execution of the program Reduce production cycle since changes result in less checking to perform and less code to revisit
Static and Dynamic Binding A binding is static if it first occurs before run time and remains unchanged throughout program execution. A binding is dynamic if it first occurs during execution or can change during execution of the program Binding Variable Attributes: Type Binding Static 1. 2. 3. 4. 5. 6. 7. 8. 9. Ada Java C, C++, C#, F# Fortran Haskell ML Objective-C Perl (built in types)
Explicit/Implicit Declaration
An explicit declaration is a program statement used for declaring the types of variables An implicit declaration is a default mechanism for specifying types of variables (the first appearance of the variable in the program) 1. FORTRAN, JavaScript, Ruby, and Perl provide implicit declarations (Fortran has both explicit and implicit) Advantage: writability Disadvantage: reliability less trouble with Perl: $-scalar;@-array;%-hash
2. 3.
A language can be Statically typed without requiring type declarations (e.g. Haskell, F#) Dynamic Type Binding Dynamic Type Binding (JavaScript and PHP) Specified through an assignment statement 1. e.g., JavaScript
list = [2, 4.33, 6, 8]; list = 17.3; Advantage: flexibility (generic program units) Typically makes meta-programming more effective and easier to use
Disadvantages: High cost (dynamic type checking and interpretation) Reliability: Type error detection by the compiler is difficult
Meta-Programming Meta-program: Program (meta-language) that write or manipulates other program (object language) as its data If meta-language = object language then the language is Reflective
Reflection is the process by which a computer program can observe and modify its own structure and behavior at runtime
(e.g. Objetive-C, Python) Type Inference
Type Inference in ML, Miranda, and Haskell Rather than by assignment statement, types are determined (by the compiler) from the context of the reference
ML examples
fun circumf(s)=3.14*s*s; fun square(x)=x*x; fun square(x): real = x*x; fun square(x)=x*(x : real); Variable Attributes: Lifetime Allocation - getting a cell from some pool of available cells Deallocation - putting a cell back into the pool Lifetime of a variable - time during which it is bound to a particular memory cell Categories of variables by lifetimes: Static Stack-dynamic Explicit heap-dynamic Implicit heap-dynamic
Static variables Bound to memory cells before execution begins and remains bound to the same memory cell throughout execution e.g., C and C++ static variables Advantages: efficiency (direct addressing), history-sensitive subprogram support
Disadvantage: lack of flexibility (no recursion) Storage can no be shared
Stack-dynamic Variables
Storage bindings are created for variables when their declaration statements are elaborated . A declaration is elaborated when the executable code associated with it is executed e.g. in Java, C++, C# variables defined in methods are by default stack-dynamic Variables declarations that appear at the beginning of a method are elaborated when the method is called.
Advantage: allows recursion; conserves storage
Disadvantages: Subprograms cannot be history sensitive Inefficient references (indirect addressing)
Explicit heap-dynamic Allocated and deallocated by explicit run time instructions , specified by the programmer, which take effect during execution Referenced only through pointers or reference variables Dynamic objects in C++ (via new and delete) Int *intnode; intnode= new int; delete intnode
Java objects Accessed through reference variables. No way to explicitly destroying a heap-dynamic variable; rather implicit garbage collection is used
Advantage: provides for dynamic storage management Disadvantage: inefficient and unreliable Complexity of pointers and reference variables Cost of reference to variables
C# has both explicit heap dynamics and stack dynamic objects The header of any method that define a pointer must be declared as unsafe
Implicit heap-dynamic Bound to heap storage only when they are assigned values e.g. all variables in APL
e.g. all strings and arrays in Perl, JavaScript, and PHP Array2=[10, 15, 20, 25]; (JavaScript)
Advantage: flexibility (generic code)
Disadvantages: Inefficient, because all attributes are dynamic
Loss of error detection
Variable Attributes: Scope The scope of a variable is the range of statements over which it is visible The nonlocal variables of a program unit are those that are visible but not declared there The scope rules of a language determine how references to names are associated with variables Static Scope The scope of the variable is determined prior the execution (compiler time) To connect a name reference to a variable, the compiler must find the declaration Search process : search declarations, first locally, then in increasingly larger enclosing scopes, until one is found for the given name
Enclosing static scopes (to a specific scope) are called its static ancestors ; the nearest static ancestor is called a static parent
Evaluation of Static Scoping Works well in many situations Allows the compiler to hard code information about the variable into the executable code Allows the compiler to perform optimizations based on its knowledge of the variable.
Problems: In most cases, too much access is possible e.g. all variables declared in the main program are visible to all the procedures whether or not that is desired
As a program evolves, the initial structure is destroyed and local variables often become global; subprograms also gravitate toward become global, rather than nested
Dynamic Scope
Based on calling sequences of program units, not their textual layout e.g. Perl and COMMON LISP allow both dynamic and static scope
References to variables are connected to declarations by searching back through the chain of subprogram calls that forced execution to this point Advantage: convenience A variable s scope could change during the course of execution, or remain undetermined very flexible. Information about the variable is usually stored with it.
Disadvantages: While a subprogram is executing, its variables are visible to all subprograms it calls Impossible to statically type check Poor readability- it is not possible to statically determine the type of a variable
Chapter VIII Type-checking Topics 1. 2. Strong-Typing Structurally Compatible
Type system In computer science, a type system may be defined as "a tractable syntactic framework for classifying phrases according to the kinds of values they compute". A type system associates types with each computed value. By examining the flow of these values, a type system attempts to prove that no type errors can occur. The type system in question determines what constitutes a type error, but a type system generally seeks to guarantee that operations expecting a certain kind of value are not used with values for which that operation does not make sense. Type system A compiler may use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. In many C compilers the float data type, for example, is represented in 32 bits, in accord with the IEEE specification for single-precision floating point numbers. They will thus use floating-point-specific microprocessor operations on those values (floating-point addition, multiplication, etc.). Type system
The depth of type constraints and the manner of their evaluation affect the typing of the language. A programming language may further associate an operation with varying concrete algorithms on each type in the case of type polymorphism. Type theory is the study of type systems, although the concrete type systems of programming languages originate from practical issues of computer architecture, compiler implementation, and language design. Fundamentals Assigning data types (typing) gives meaning to sequences of bits. Types usually have associations either with values in memory or with objects such as variables. Because any value simply consists of a sequence of bits in a computer, hardware makes no intrinsic distinction even between memory addresses, instruction code, characters, integers and floating-point numbers, being unable to discriminate between them based on bit pattern alone. Advantages provided by type systems include: Abstraction (or modularity) Documentation Optimization Safety Abstraction (or modularity) Types allow programmers to think about programs at a higher level than the bit or byte, not bothering with low-level implementation. For example, programmers can think of a string as a collection of character values instead of as a mere array of bytes. Or, types can allow programmers to express the interface between two subsystems. Documentation In more expressive type systems, types can serve as a form of documentation, since they can illustrate the intent of the programmer. For instance, timestamps may be represented as integers but if a programmer declares a function as returning a timestamp type rather than merely an integer type, this documents part of the meaning of the function. Optimization Static type-checking may provide useful compile-time information. For example, if a type requires that a value must align in memory at a multiple of four bytes, the compiler may be able to use more efficient machine instructions. Safety Use of types may allow a compiler to detect meaningless or probably invalid code. For example, we can identify an expression 3 / "Hello, World" as invalid, because the rules of arithmetic do not specify how to divide an integer by a string. As discussed below, strong typing offers more safety, but generally does not guarantee complete safety (see type safety for more information).
1. 2. 3. 4.
Type system A program typically associates each value with one particular type (although a type may have more than one subtype). Other entities, such as objects, modules, communication channels, dependencies, or even types themselves, can become associated with a type. Some implementations might make the following identifications (though these are technically different concepts): class a type of an object data type a type of a value kind a type of a type
Type system A type system, specified for each programming language, controls the ways typed programs may behave, and makes behavior outside these rules illegal. An effect system typically provides more fine-grained control than does a type system. Type checking The process of verifying and enforcing the constraints of types type checking may occur either at compile-time (a static check) or run-time (a dynamic check). If a language specification requires its typing rules strongly (i.e., more or less allowing only those automatic type conversions that do not lose information), one can refer to the process as strongly typed, if not, as weakly typed. The terms are not used in a strict sense. Static typing A programming language is said to use static typing when type checking is performed during compiletime as opposed to run-time. Statically typed languages include ActionScript 3, Ada, C, D, Eiffel, F#, Fortran, Go, Haskell, JADE, Java, ML, ObjectiveC, Ocaml, Pascal, and Scala. C++ is statically typed, aside from its run-time type information system. The C# type system performs static-like compile-time type checking, but also includes full runtime type checking. Perl is statically typed with respect to distinguishing arrays, hashes, scalars, and subroutines. Static typing Static typing is a limited form of program verification (see type safety): accordingly, it allows many type errors to be caught early in the development cycle. Static type checkers evaluate only the type information that can be determined at compile time, but are able to verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed. Program execution may also be made more efficient (i.e. faster or taking reduced memory) by omitting runtime type checks and enabling other optimizations. Dynamic typing
A programming language is said to be dynamically typed when the majority of its type checking is performed at run-time as opposed to at compile-time. In dynamic typing values have types, but variables do not; that is, a variable can refer to a value of any type. Dynamically typed languages include APL, Clojure , Erlang, Groovy, JavaScript, Jython, Lisp, Lua, MATLAB/GNU Octave, Perl (for userdefined types, but not built-in types), PHP, Prolog, Python, Ruby, Smalltalk and Tcl. Dynamic typing Implementations of dynamically-typed languages generally associate run-time objects with "tags" containing their type information. This run-time classification is then used to implement type checks and dispatch overloaded functions, but can also enable pervasive uses of dynamic dispatch, late binding and similar idioms that would be cumbersome at best in a statically-typed language, requiring the use of variant types or similar features. Combinations of dynamic and static typing The presence of static typing in a programming language does not necessarily imply the absence of all dynamic typing mechanisms. For example, Java and some other ostensibly statically typed languages, support downcasting and other type operations that depend on runtime type checks, a form of dynamic typing. More generally, most programming languages include mechanisms for dispatching over different 'kinds' of data, such as disjoint unions, polymorphic objects, and variant type: Combinations of dynamic and static typing Even when not interacting with type annotations or type checking, such mechanisms are materially similar to dynamic typing implementations. See programming language for more discussion of the interactions between static and dynamic typing. Certain languages, for example Clojure or Cython, are dynamically typed by default, but allow this behaviour to be overridden through the use of explicit type hints that result in static typing. One reason to use such hints would be to achieve the performance benefits of static typing in performance-sensitive parts of code. Static and dynamic type checking in practice The choice between static and dynamic typing requires trade-offs. Static typing can find type errors reliably at compile time. This should increase the reliability of the delivered program. However, programmers disagree over how commonly type errors occur, and thus disagree over the proportion of those bugs that are coded that would be caught by appropriately representing the designed types in code. Static typing advocates believe programs are more reliable when they have been well typechecked, while dynamic typing advocates point to distributed code that has proven reliable and to small bug databases.
Static and dynamic type checking in practice Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it can produce optimized machine code.
Statically typed languages that lack type inference (such as C and Java) require that programmers declare the types they intend a method or function to use. This can serve as additional documentation for the program, which the compiler will not permit the programmer to ignore or permit to drift out of synchronization.
Static and dynamic type checking in practice Dynamic typing allows constructs that some static type checking would reject as illegal. For example, eval functions, which execute arbitrary data as code, become possible. An eval function is possible with static typing, but requires advanced uses of algebraic data types. Dynamic typing is used in duck typing that can support easier code reuse. Dynamic typing typically makes metaprogramming more effective and easier to use. For example, C++ templates are typically more cumbersome to write than the equivalent Ruby or Python code. More advanced run-time constructs such as metaclasses and introspection are often more difficult to use in statically typed languages.
Strong and weak typing: Liskov Definition In 1974 Jones and Liskov describe strong-typed language is one in which each type defines a set of primitive operations that are the only direct means for manipulating objects of that type. Jackson wrote,
In a strongly typed language each data area will have a distinct type and each process will state its communication requirements in terms of these types. Strong and weak typing A type system is said to feature strong typing when it specifies one or more restrictions on how operations involving values of different data types can be intermixed. A computer language that implements strong typing will prevent the successful execution of an operation on arguments that have the wrong type. Weak typing means that a language implicitly converts (or casts) types when used. Consider the following example: var x := 5; // (1) (x is an integer) var y := "37"; // (2) (y is a string) x + y; // (3) (?)
Strong and weak typing In a weakly typed language, the result of this operation is unclear. Some languages, such as Visual Basic, would produce runnable code producing the result 42: the system would convert the string "37" into the number 37 to forcibly make sense of the operation. Other languages like JavaScript would produce the result "537": the system would convert the number 5 to the string "5" and then concatenate the two. In both Visual Basic and JavaScript, the resulting type is determined by rules that take both operands into consideration. In JavaScript, the order of the operands is not significant (y + x would be "375").
Strong and weak typing As the use of the "+" operator on a String and a Number (a JavaScript type) results in a String - if the value of y was a String that could not be converted to a Number (e.g. "Hello World") is moot. In some languages, such as AppleScript, the type of the resulting value is determined by the type of the left-most operand only. Safely and unsafely typed systems A third way of categorizing the type system of a programming language uses the safety of typed operations and conversions. Computer scientists consider a language "type-safe", if it does not allow operations or conversions that lead to erroneous conditions. Some observers use the term memory-safe language (or just safe language) to describe languages that do not allow undefined operations to occur. For example, a memory-safe language will check array bounds, or else statically guarantee (i.e., at compile time before execution) that array accesses out of the array boundaries will cause compile-time and perhaps runtime errors.
var x := 5;
// (1)
var y := "37"; // (2) var z := x + y; // (3) Safely and unsafely typed systems In languages like Visual Basic, variable z in the example acquires the value 42. While the programmer may or may not have intended this, the language defines the result specifically, and the program does not crash or assign an ill-defined value to z. In this respect, such languages are type-safe; however, in some languages, if the value of y was a string that could not be converted to a number (e.g. "Hello World"), the results would be undefined. Such languages are type-safe (in that they will not crash), but can easily produce undesirable results. In other languages like JavaScript, the numeric operand would be converted to a string, and then concatenation performed. In this case, the results are not undefined and are predictable. Safely and unsafely typed systems Now let us look at the same example in C:
int x = 5;
char y[] = "37"; char* z = x + y; In this example z will point to a memory address five characters beyond y, equivalent to three characters after the terminating zero character of the string pointed to by y. The content of that location is undefined, and might lie outside addressable memory. The mere computation of such
a pointer may result in undefined behavior (including the program crashing) according to C standards, and in typical systems dereferencing z at this point could cause the program to crash. We have a well-typed, but not memory-safe program a condition that cannot occur in a typesafe language. Variable levels of type checking Some languages allow different levels of checking to apply to different regions of code. Examples include: The use strict directive in PERL applies stronger checking. The @ operator in PHP suppresses some error messages.
Additional tools such as lint and IBM Rational Purify can also be used to achieve a higher level of strictness. Optional type systems It has been proposed, chiefly by Gilad Bracha, that choice of type system be made independent of choice of language; that a type system should be a module that can be "plugged" into a language as required. He believes this is advantageous, because what he calls mandatory type systems make languages less expressive and code more fragile. The requirement that types do not affect the semantics of the language is difficult to fulfil: for instance, class based inheritance becomes impossible. Polymorphism and types The term "polymorphism" refers to the ability of code (in particular, methods or classes) to act on values of multiple types, or to the ability of different instances of the same data structure to contain elements of different types. Type systems that allow polymorphism generally do so in order to improve the potential for code re-use: in a language with polymorphism, programmers need only implement a data structure such as a list or an associative array once, rather than once for each type of element with which they plan to use it. For this reason computer scientists sometimes call the use of certain forms of polymorphism generic programming. The type-theoretic foundations of polymorphism are closely related to those of abstraction, modularity and (in some cases) subtyping. Duck typing In "duck typing",[ a statement calling a method m on an object does not rely on the declared type of the object; only that the object, of whatever type, must supply an implementation of the method called, when called, at run-time. Duck typing differs from structural typing in that, if the part (of the whole module structure) needed for a given local computation is present at runtime, the duck type system is satisfied in its type identity analysis. On the other hand, a structural type system would require the analysis of the whole module structure at compile time to determine type identity or type dependence. Duck typing Duck typing differs from a nominative type system in a number of aspects. The most prominent ones are that for duck typing, type information is determined at runtime (as contrasted to compile time), and the
name of the type is irrelevant to determine type identity or type dependence; only partial structure information is required for that for a given point in the program execution. Duck typing uses the premise that (referring to a value) "if it walks like a duck, and quacks like a duck, then it is a duck" (this is a reference to the duck test that is attributed to James Whitcomb Riley). The term may have been coined by Alex Martelli in a 2000 message to the comp .lang. python newsgroup (see Python). Specialized type systems Many type systems have been created that are specialized for use in certain environments with certain types of data, or for out-of-band static program analysis. Frequently, these are based on ideas from formal type theory and are only available as part of prototype research systems. Dependent types Dependent types are based on the idea of using scalars or values to more precisely describe the type of some other value. For example, matrix(3,3) might be the type of a 33 matrix. We can then define typing rules such as the following rule for matrix multiplication: where k, m, n are arbitrary positive integer values. A variant of ML called Dependent ML has been created based on this type system, but because type checking for conventional dependent types is undecidable, not all programs using them can be type-checked without some kind of limits. Dependent ML limits the sort of equality it can decide to Presburger arithmetic. Other languages such as Epigram make the value of all expressions in the language decidable so that type checking can be decidable. It is also possible to make the language Turing-complete at the price of undecidable type checking, as in Cayenne. Linear types Linear types, based on the theory of linear logic, and closely related to uniqueness types, are types assigned to values having the property that they have one and only one reference to them at all times. These are valuable for describing large immutable values such as files, strings, and so on, because any operation that simultaneously destroys a linear object and creates a similar object (such as 'str = str + "a"') can be optimized "under the hood" into an in-place mutation. Normally this is not possible, as such mutations could cause side effects on parts of the program holding other references to the object, violating referential transparency. Linear types They used in the prototype operating system Singularity for interprocess communication, statically ensuring that processes cannot share objects in shared memory in order to prevent race conditions. The Clean language (a Haskell-like language) uses this type system in order to gain a lot of speed while remaining safe. Intersection types Intersection types are types describing values that belong to both of two other given types with overlapping value sets. For example, in most implementations of C the signed char has range -128 to 127 and the unsigned char has range 0 to 255, so the intersection type of these two types would have range 0
to 127. Such an intersection type could be safely passed into functions expecting either signed or unsigned chars, because it is compatible with both types. Intersection Intersection types are useful for describing overloaded function types: For example, if "int int" is the type of functions taking an integer argument and returning an integer, and "float float" is the type of functions taking a float argument and returning a float, then the intersection of these two types can be used to describe functions that do one or the other, based on what type of input they are given. Such a function could be passed into another function expecting an "int int" function safely; it simply would not use the "float float" functionality. Union types Union types are types describing values that belong to either of two types. For example, in C, the signed char has range -128 to 127, and the unsigned char has range 0 to 255, so the union of these two types would have range -128 to 255. Any function handling this union type would have to deal with integers in this complete range. More generally, the only valid operations on a union type are operations that are valid onboth types being unioned. C's "union" concept is similar to union types, but is not typesafe, as it permits operations that are valid on eithertype, rather than both. Union types Union types are important in program analysis, where they are used to represent symbolic values whose exact nature (e.g., value or type) is not known. In a subclassing hierarchy, the union of a type and an ancestor type (such as its parent) is the ancestor type. The union of sibling types is a subtype of their common ancestor (that is, all operations permitted on their common ancestor are permitted on the union type, but they may also have other valid operations in common). Existential types Existential types are frequently used in connection with record types to represent modules and abstract data types, due to their ability to separate implementation from interface. For example, the type "T = X { a: X; f: (X int); }" describes a module interface that has a data member of type X and a function that takes a parameter of the same type X and returns an integer. This could be implemented in different ways; for example: Existential types intT = { a: int; f: (int int); } int); }
floatT = { a: float; f: (float
These types are both subtypes of the more general existential type T and correspond to concrete implementation types, so any value of one of these types is a value of type T. Given a value "t" of type "T", we know that "t.f(t.a)" is well-typed, regardless of what the abstract type X is. This gives flexibility for choosing types suited to a particular implementation while clients that use only values of the interface type the existential type are isolated from these choices.
Explicit or implicit declaration and inference Many static type systems, such as those of C and Java, require type declarations: The programmer must explicitly associate each variable with a particular type. Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an expression3.14 might imply a type of floating-point, while [1, 2, 3] might imply a list of integers typically an array. Explicit or implicit declaration and inference Type inference is in general possible, if it is decidable in the type theory in question. Moreover, even if inference is undecidable in general for a given type theory, inference is often possible for a large subset of real-world programs. Haskell's type system, a version of Hindley-Milner, is a restriction of System F to so-called rank-1 polymorphic types, in which type inference is decidable. Most Haskell compilers allow arbitrary-rank polymorphism as an extension, but this makes type inference undecidable. (Type checking is decidable, however, and rank-1 programs still have type inference; higher rank polymorphic programs are rejected unless given explicit type annotations.) Types of types A type of types is a kind. Kinds appear explicitly in typeful programming, such as a type constructor in the Haskell language. Types fall into several broad categories:
1. Primitive types the simplest kind of type; e.g., integer and floating-point number Boolean Integral types types of whole numbers; e.g., integers and natural numbers Floating point types types of numbers in floating-point representation
Types fall into several broad categories:
2. Reference types 3. Option types Nullable types
4. Composite types types composed of basic types; e.g., arrays or records.
5. Abstract data types 6. Algebraic types 7. Subtype
8. Derived type 9. Object types; e.g., type variable 10. Partial type Types fall into several broad categories:
11. Recursive type 12. Function types; e.g., binary functions 13. Universally quantified types, such as parameterized types 14. Existentially quantified types, such as modules 15. Refinement types types that identify subsets of other types 16. Dependent types types that depend on terms (values) 17. Ownership types types that describe or constrain the structure of object-oriented systems 18. Pre-defined types provided for convenience in real-world applications, such as date, time and money. Compatibility: equivalence and subtyping A type-checker for a statically typed language must verify that the type of any expression is consistent with the type expected by the context in which that expression appears. For instance, in an assignment statement of the form x := e, the inferred type of the expression e must be consistent with the declared or inferred type of the variable x. This notion of consistency, called compatibility, is specific to each programming language. Programming style Some programmers prefer statically typed languages; others prefer dynamically typed languages. Statically typed languages alert programmers to type errors during compilation, and they may perform better at runtime. Advocates of dynamically typed languages claim they better support rapid prototyping and that type errors are only a small subset of errors in a program. Likewise, there is often no need to manually declare all types in statically typed languages with type inference; thus, the need for the programmer to explicitly specify types of variables is automatically lowered for such languages; and some dynamic languages have run-time optimisers that can generate fast code approaching the speed of static language compilers, often by using partial type inference. Strongly-typed In computer science and computer programming, a type system is said to feature strong typing when it specifies one or more restrictions on how operations involving values of different data types can be intermixed. The opposite of strong typing is weak typing. Most generally, "strong typing" implies that the programming language places severe restrictions on the intermixing that is permitted to occur, preventing the compiling or running of source code which uses
data in what is considered to be an invalid way. For instance, an addition operation may not be used with an integer and string values; a procedure which operates upon linked lists may not be used upon numbers. However, the nature and strength of these restrictions is highly variable. Strongly-typed A strongly-typed programming language is one in which each type of data (such as integer, character, hexadecimal, packed decimal, and so forth) is predefined as part of the programming language and all constants or variables defined for a given program must be described with one of the data types. Certain operations may be allowable only with certain data types. The language compiler enforces the data typing and use compliance. An advantage of strong data typing is that it imposes a rigorous set of rules on a programmer and thus guarantees a certain consistency of results. Strongly-typed A disadvantage is that it prevents the programmer from inventing a data type not anticipated by the developers of the programming language and it limits how "creative" one can be in using a given data type. A programming language characteristic that provides strict adherence to the rules of typing. Data of one type (integer, string, etc.) cannot be passed to a variable expecting data of a different type. Contrast with weak typing. Structurally Compatible
A structural type system (or property-based type system) is a major class of type system, in which type compatibility and equivalence are determined by the type's structure, and not by other characteristics such as its name or place of declaration. Structural systems are used to determine if types are equivalent and whether a type is a subtype of another. It contrasts with nominative systems, where comparisons are based on explicit declarations or the names of the types, and duck typing, in which only the part of the structure accessed at runtime is checked for compatibility. Structurally Compatible In structural typing, an object or term is considered to be compatible with another type if for each feature within the second type, there must be a corresponding and identical feature in the first type. Some languages may differ on the details (such as whether the features must match in name). This definition is not symmetric, and includes subtype compatibility. Two such types are considered to be identical if each is compatible with the other. Structurally Compatible As an example, OCaml uses structural typing on methods for compatibility of object types. Go uses structural typing on methods to determine compatibility of a type with an interface. C++ template functions exhibit structural typing on type arguments. HaXe uses structural typing, although classes are not structurally subtyped.
Chapter IX Scope, Lifetime and Referencing Environments
1. 2. 3.
Topics Scope Lifetime Referencing Environments Scope the range of statements in which a variable is visible (can be referenced) scope rules are determined by the language Local variable Nonlocal variable Global variable static or dynamic scope? Static Scope Based on program text To connect a name reference to a variable, you (or the compiler) must find the declaration Search process: search declarations, first locally, then in increasingly larger enclosing scopes, until one is found for the given name Enclosing static scopes (to a specific scope) are called its static ancestors; the nearest static ancestor is called a static parent Static Scope Variables can be hidden from a unit by having a "closer" variable with the same name C++ and Ada allow access to these "hidden" variables Dynamic Scope Based on calling sequences of program units, not their textual layout (temporal versus spatial) References to variables are connected to declarations by searching back through the chain of subprogram calls that forced execution to this point Binding Scope spatial extent of variable Lifetime temporal extent of variable
Allocation nd available memory to bind Deallocation release used memory to be allocated elsewhere Lifetime time during execution between allocation & deallocation Referencing Environment The collection of all variables (or other names) that are visible in the statement. For statically scoped languages: all variables declared in local scope all visible variables in ancestor scope
Chapter X Data Types Topics 1. 2. 3. 4. 5. 6. Primitive Data Types Structured Data Types User-Defined Data Types Multi Dimension Arrays Records Pointers
Data Type It is a classification identifying one of the various types of data, such as floating-point, integer, or Boolean 1. States the possible values for that type and operations that can be done on that type, and the way the values of that type are stored Primitive Data Types
2.
Primitive data type is either of the following: Basic type is a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types. Built-in type is a data type for which the programming language provides built-in support. All basic data types are built-in Primitive Data Types
In most programming languages, all basic data types are built-in. In addition, many languages also provide a set of composite data types. Opinions vary as to whether a built-in type that is not basic should be considered "primitive Depending on the language and its implementation, primitive data types may or may not have a one-toone correspondence with objects in the computer's memory. However, one usually expects operations on basic primitive data types to be the fastest language constructs there are. Primitive Data Types Integer addition, for example, can be performed as a single machine instruction, and some processors offer specific instructions to process sequences of characters with a single instruction. In particular, the C standard mentions that "a 'plain' int object has the natural size suggested by the architecture of the execution environment". This means that int is likely to be 32 bits long on a 32-bit architecture. Basic primitive types are almost always value types. Specific primitive data types Integer numbers Booleans Floating-point numbers Fixed-point numbers Characters and strings Numeric data type ranges Integer numbers Most common primitive data types Hold a whole number, but no fraction May be signed (allowing negative values) or unsigned (nonnegative values only) Represented in computer by a string of bits with one of the bits representing the sign An integer data type can hold a whole number, but no fraction. Integers may be either signed (allowing negative values) or unsigned (nonnegative values only). Typical sizes of integers are: Literals for integers Literals for integers consist of a sequence of digits. Most programming languages disallow use of commas for digit grouping, although Fortran(77, 90, and above, fixed form source but not free form source) allows embedded spaces, and Perl, Ruby, and D allow embedded underscores. Negation is indicated by a minus sign ( ) before the value.
1. 2. 3. 4. 5. 6.
Examples of integer literals are: 42 10000 233000
Booleans A boolean type, typically denoted "bool" or "boolean", is typically a logical type that can be either "true" or "false". Although only one bit is necessary to accommodate the value set "true" and "false", programming languages typically implement boolean types as one or more bytes. 0 as false and 1 as true
Booleans Most languages (Java, Pascal and Ada, e.g.) implement booleans adhering to the concept of boolean as a distinct logical type. Languages, though, may implicitly convert booleans to numeric types at times to give extended semantics to booleans and boolean expressions or to achieve backwards compatibility with earlier versions of the language. In C++, e,g., boolean values may be implicitly converted to integers, according to the mapping false 0 and true 1 (for example, true + true would be a valid expression evaluating to 2). The boolean typebool in C++ is considered an integral type and is a cross between numeric type and a logical type. Floating-point numbers A floating-point number represents a limited-precision rational number that may have a fractional part. These numbers are stored internally in a format equivalent to scientific notation, typically in binary but sometimes in decimal. Because floating-point numbers have limited precision, only a subset of real or rational numbers are exactly representable; other numbers can be represented only approximately. Many languages have both a single precision (often called "float") and a double precision type. Floating-point numbers Literals for floating point numbers include a decimal point, and typically use "e" or "E" to denote scientific notation. Examples of floating-point literals are: 20.0005 99.9 5000.12 6.02e23
Some languages (e.g., Fortran, Python, D) also have a complex number type comprising two floating-point numbers: a real part and an imaginary part.
Represents a limited-precision rational number that may be fractional part Fixed-point numbers A fixed-point number represents a limited-precision rational number that may have a fractional part. These numbers are stored internally in a scaled-integer form, typically in binary but sometimes in decimal. Because fixed-point numbers have limited precision, only a subset of real or rational numbers are exactly representable; other numbers can be represented only approximately. Fixed-point numbers also tend to have a more limited range of values than floating point, and so the programmer must be careful to avoid overflow in intermediate calculations as well as the final results. Store rational numbers with a fixed number of decimal places Characters and strings A character type (typically called "char") may contain a single letter, digit, punctuation mark, symbol, formatting code, control code, or some other specialized code (e.g., a byte order mark). Some languages have two or more character types, for example a single-byte type for ASCII characters and a multi-byte type for Unicode characters. The term "character type" is normally used even for types whose values more precisely represent code units, for example a UTF-16 code unit as in Java and JavaScript. Characters and strings Characters may be combined into strings. The string data can include numbers and other numerical symbols but will be treated as text. In most languages, a string is equivalent to an array of characters or code units, but Java treats them as distinct types (java.lang.String and char[]). Other languages (such as Python, and many dialects of BASIC) have no separate character type; strings with a length of one are normally used to represent (single code unit) characters. Literals for characters and strings are usually surrounded by quotation marks: sometimes, single quotes (') are used for characters and double quotes (") are used for strings. Characters and strings Examples of character literals in C syntax are: 'A' '4' '$' '\t' (tab character)
Examples of string literals in C syntax are: "A"
"Hello World"
Numeric data type ranges Each numeric data type has its maximum and minimum value known as the range. Attempting to store a number outside the range may lead to compiler/runtime errors, or to incorrect calculations (due to truncation) depending on the language being used. The range of a variable is based on the number of bytes used to save the value, and an integer data type is usually able to store 2n values (where n is the number of bits that contribute to the value). For other data types (e.g. floating point values) the range is more complicated and will vary depending on the method used to store it. Numeric data type ranges There are also some types that do not use entire bytes, e.g. a boolean that requires a single bit, and represents a binary value (although in practice a byte is often used, with the remaining 7 bits being redundant). Some programming languages (such as Ada and Pascal) also allow the opposite direction, that is, the programmer defines the range and precision needed to solve a given problem and the compiler chooses the most appropriate integer or floating point type automatically. Structured Data Types
Mechanisms for creating complex data objects Array, file, record, set, and string types
Largely known as Record, Struct or Structure in most programming languages, it is the GeneXus object which allows defining complex data structures. An SDT represents data whose structure is made up of several elements like a Customer struct. The SDT makes it easy to transfer parameters (more specifically, they allow providing/using structured information when using web services), it simplifies XML automatic reading and writing and makes it possible to manage variable-length lists of elements. User-Defined Data Types
Holds data in a format you define. The Structure statement defines the format. Previous versions of Visual Basic support the user-defined type (UDT). The current version expands the UDT to a structure. A structure is a concatenation of one or more members of various data types. Visual Basic treats a structure as a single unit, although you can also access its members individually. Define and use a structure data type when you need to combine various data types into a single unit, or when none of the elementary data types serve your needs. The default value of a structure data type consists of the combination of the default values of each of its members. User-Defined Data Types
A user-defined type must always have input and output functions. These functions determine how the type appears in strings (for input by the user and output to the user) and how the type is organized in memory. The input function takes a null-terminated character string as its argument and returns the internal (in memory) representation of the type. The output function takes the internal representation of the type as argument and returns a null-terminated character string. If we want to do anything more with the type than merely store it, we must provide additional functions to implement whatever operations we'd like to have for the type. Declaration Format A structure declaration starts with the Structure Statement and ends with the End Structure statement. The Structure statement supplies the name of the structure, which is also the identifier of the data type the structure is defining. Other parts of the code can use this identifier to declare variables, parameters, and function return values to be of this structure's data type. The declarations between the Structure and End Structure statements define the members of the structure. Multi Dimension Arrays
In computer science, an array data structure or simply array is a data structure consisting of a collection of elements (values or variables), each identified by at least one index. An array is stored so that the position of each element can be computed from its index tuple by a mathematical formula. For example, an array of 10 integer variables, with indices 0 through 9, may be stored as 10 words at memory addresses 2000, 2004, 2008, 2036, so that the element with index i has the address 2000 + 4 i. Multi Dimension Arrays Arrays are analogous to the mathematical concepts of the vector, the matrix, and the tensor. Indeed, arrays with one or two indices are often called vectors or matrices, respectively. Arrays are often used to implement tables, especially lookup tables; the word table is sometimes used as a synonym of array. Arrays are among the oldest and most important data structures, and are used by almost every program and are used to implement many other data structures, such as lists and strings. Multi Dimension Arrays They effectively exploit the addressing logic of computers. In most modern computers and many external storage devices, the memory is a one-dimensional array of words, whose indices are their addresses. Processors, especially vector processors, are often optimized for array operations. Arrays are useful mostly because the element indices can be computed at run time. Among other things, this feature allows a single iterative statement to process arbitrarily many elements of an array. For that reason, the elements of an array data structure are required to have the same size and should use the same data representation. The set of valid index tuples and the addresses of the elements (and hence the element addressing formula) are usually, but not always, fixed while the array is in use. Multi Dimension Arrays
The term array is often used to mean array data type, a kind of data type provided by most high-level programming languages that consists of a collection of values or variables that can be selected by one or more indices computed at run-time. Array types are often implemented by array structures; however, in some languages they may be implemented by hash tables, linked lists, search trees, or other data structures. Multi Dimension Arrays The term is also used, especially in the description of algorithms, to mean associative array or "abstract array", a theoretical computer science model (an abstract data type or ADT) intended to capture the essential properties of arrays. One-dimensional arrays A one-dimensional array (or single dimension array) is a type of linear array. Accessing its elements involves a single subscript which can either represent a row or column index.
As an example consider the C declaration auto int new[10]; In the given example the array starts with auto storage class and is of integer type named new which can contain 10 elements in it i.e. 0-9. It is not necessary to declare the storage class as the compiler initializes auto storage class by default to every data type After that the data type is declared which is followed by the name i.e. new which can contain 10 entities. For a vector with linear addressing, the element with index i is located at the address B + c i, where B is a fixed base address and c a fixed constant, sometimes called the address increment or stride. Records In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members. For example, a date may be stored as a record containing a numeric year field, a month field represented as a string, and a numeric day-of-month field. Records As another example, a Personnel record might contain a name, a salary, and a rank. As yet another example, a Circle might contain a center and a radius. In this instance, the center itself might be represented as a Point record containing x and y coordinates. Records are distinguished from arrays by the fact that their number of fields is typically fixed, each field has a name, and that each field may have a different type. Records A record type is a data type that describes such values and variables. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier (name or label) by which it can be accessed. In type theory, product types (with no field names) are generally preferred due to their simplicity, but proper record types are studied in
languages such as System F-sub. Since type-theoretical records may contain first-class function-typed fields in addition to data, they can express many features of object-oriented programming. Records Records can exist in any storage medium, including main memory and mass storage devices such as magnetic tapes or hard disks. Records are a fundamental component of most data structures, especially linked data structures. Many computer files are organized as arrays of logical records, often grouped into larger physical records or blocks for efficiency. Pointers In computer science, a pointer is a programming language data type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address. For high-level programming languages, pointers effectively take the place of general purpose registers in low-level languages such as assembly language or machine code, but may be in available memory. A pointer references a location in memory, and obtaining the value at the location a pointer refers to is known as dereferencing the pointer. Pointers A pointer is a simple, more concrete implementation of the more abstract reference data type. Several languages support some type of pointer, although some have more restrictions on their use than others. As an analogy, a page number in a book could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number. Pointers Pointers are also used to hold the addresses of entry points for called subroutines in procedural programming and for run-time linking to dynamic link libraries (DLLs). In object-oriented programming, pointers to functions are used for binding methods, often using what are called virtual method tables. Pointers Pointers to data significantly improve performance for repetitive operations such as traversing strings, lookup tables, control tables and tree structures. In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point.
Chapter XI Program Syntax Topics 1. Language Lexemes and tokens Recognition Generation
Program Syntax Syntax 1. The form of its expressions, statements, and program units
Language 1. Set of string of characters from some alphabet
Syntax rules 1. Specify which strings of characters from the language s alphabet are in the language
Sentence 1. String of a language
Program Syntax Programming language syntax: how programs look, their form and structure 1. Syntax is defined using a kind of formal grammar
Programming language semantics: what programs do, their behavior and meaning In computer science, the syntax of a programming language is the set of rules that define the combinations of symbols that are considered to be correctly structured programs in that language. The syntax of a language defines its surface form. Text-based programming languages are based on sequences of characters, while visual programming languages are based on the spatial layout and connections between symbols (which may be textual or graphical). Program Syntax The syntax of textual programming languages is usually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for grammatical structure) to inductively specify syntactic categories (nonterminals) and terminal symbols. Syntactic categories are defined by rules called productions, which specify the values that belong to a particular syntactic category. Terminal symbols are the concrete characters or strings of characters (for example keywords such as define, if, let, or void) from which syntactically valid programs are constructed. Lexeme A lexeme belongs to a particular syntactic category, has a certain meaning (semantic value), and in inflecting languages, has a corresponding inflectional paradigm; that is, a lexeme in many languages will have many different forms. Lexemes are categorized into different tokens and processed by the lexical analyzer
Ex:
if (width < height)
{ cout << width << endl; } Lexeme Lexemes: if, (, width, <, height, ), {, cout, <<, width, <<, endl, ;, } Tokens: iftok, lpar, idtok, lt, idtok, rpar, lbrace, idtok, llt, idtok, llt, idtok, semi, rbrace Note that some tokens correspond to single lexemes (ex. iftok) whereas some correspond to many (ex. idtok) Tokens Tokens are the basic building blocks of a programming language: keywords, identifiers, numbers, punctuation The first compiler phase (scanning) splits up the character stream into tokens Free-format language: program is a sequence of tokens and position of tokens on page is unimportant Fixed-format language: indentation and/or position of tokens on page is significant (early Basic , Fortran , Haskell) Case-sensitive language: upper- and lowercase are distinct (C , C++ , Java ) Case-insensitive language: upper- and lowercase are identical (Ada , Fortran , Pascal ) Tokens are described by regular expressions Language Recognizer An algorithm or mechanism, R, will process any given string, S, of lexemes and correctly determine if S is within L or not Language Recognizers accept a Language Recognizers are Machines. The Machines take a string as input. The Machines will accept the input if when run, the Machine stops at an accept state. Otherwise the input is rejected. If a Machine M recognizes all strings in Language L, and accepts input provided by a given string S, M is said to accept S. Otherwise M is said to reject S. S is in L if and only if M accepts S. Generator In computer science, a generator is a special routine that can be used to control the iteration behaviour of a loop. A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less
memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator. Generators can be implemented in terms of more expressive control flow constructs, such as coroutines or first-class continuations. Generation Produces valid sentences of L Not as useful as recognition for compilation, since the valid se ntenc es could be arbitrary More useful in understanding language syntax, since it shows how the sentences are formed Recognizer only says if sentence is valid or not more of a trial and error technique
So recognizers are what compilers need, but generators are what programmers need to understand language
Chapter XII Language Generators Topics 1. 2. 3. Grammar Regular Grammars/Expressions BNF(Backus-Naur Form) grammars
Grammar A mechanism (or set of rules) by which a language is generated Grammars are formal descriptions of which strings over a given character set are in a particular language Language designers write grammar Language implementers use grammar to know what programs to accept Language users use grammar to know how to write legitimate programs Context-Free Grammars A context-free grammar consists of a number of productions. Each production has an abstract symbol called a nonterminal as its left-hand side, and a sequence of one or more nonterminal and terminal symbols as its right-hand side. For each grammar, the terminal symbols are drawn from a specified alphabet. Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given context-free grammar specifies a language, namely, the set of possible sequences of terminal symbols
that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side. The Lexical Grammar This grammar has as its terminal symbols the characters of the Unicode character set. It defines a set of productions, starting from the goal symbol Input, that describe how sequences of Unicode characters are translated into a sequence of input elements. These input elements, with white space and comments discarded, form the terminal symbols for the syntactic grammar for the Java programming language and are called tokens. These tokens are the identifiers, keywords, literals, separators, and operators of the Java programming language. The Syntactic Grammar This grammar has tokens defined by the lexical grammar as its terminal symbols. It defines a set of productions, starting from the goal symbol Compilation Unit, that describe how sequences of tokens can form syntactically correct programs. Grammar Notation Terminal symbols are shown in fixed width font in the productions of the lexical and syntactic grammars, and throughout this specification whenever the text is directly referring to such a terminal symbol. These are to appear in a program exactly as written. Nonterminal symbols are shown in italic type. The definition of a nonterminal is introduced by the name of the nonterminal being defined followed by a colon. One or more alternative right-hand sides for the nonterminal then follow on succeeding lines. Grammar Notation For example, the syntactic definition: IfThenStatement: if ( Expression ) Statement states that the nonterminal IfThenStatement represents the token if, followed by a left parenthesis token, followed by an Expression, followed by a right parenthesis token, followed by a Statement. Grammar Notation As another example, the syntactic definition:
ArgumentList: Argument ArgumentList , Argument
states that an ArgumentList may represent either a single Argument or an ArgumentList, followed by a comma, followed by an Argument. This definition ofArgumentList is recursive, that is to say, it is defined in terms of itself. The result is that an ArgumentList may contain any positive number of arguments. Such recursive definitions of nonterminals are common. Regular Grammar A regular grammar is a context free grammar where every production is of one of the two forms: X p aY Xpa
Expression An expression in a programming language is a combination of explicit values, constants, variables, operators, and functions that are interpreted according to the particular rules of precedence and of association for a particular programming language, which computes and then produces (returns, in a stateful environment) another value. This process, like for mathematical expressions, is called evaluation. The value can be of various types, such as numerical, string, and logical. Expression For example, 2+3 is an arithmetic and programming expression which evaluates to 5. A variable is an expression because it is a pointer to a value in memory, so y+6 is an expression. An example of a relational expression is 4==4 which evaluates to true. Backus-Naur Form (BNF) Grammar Backus-Naur Form (BNF) is a way of writing a grammar to define a language. A BNF grammar uses some symbols, specifically ::= , h and i. These are metasymbols. It is crucial that you realise that these are part of the metalanguage; they are not part of the object language.
Chapter XIII Semantics Topics 1. 2. 3. Static Semantics typing rules Dynamic Semantics execution rules Attributes
Semantics
Semantics (from Greek s mantik, neuter plural of s mantiks) is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata. The word "semantics" itself denotes a range of ideas, from the popular to the highly technical. It is often used in ordinary language to denote a problem of understanding that comes down to word selection or connotation. Semantics indicate the meaning of a program Two different kinds of semantics: Static Semantics 1. 2. 3. Almost an extension of program syntax Deals with structure more than meaning, but at a meta level Handles structural details that are difficult or impossible to handle with the parser
Static Semantics The static semantics, or type system, imposes context-sensitive restrictions on the formation of expressions. For example, plus(x; num[n]) is sensible exactly if x has type num in the surrounding context; in fact, this is the only relevant kind of contextual information for static semantics Distinguishes well-typed from ill-typed xpressions. Two different kinds of semantics: Dynamic Semantics (often just called semantics) 1. 2. 3. 4. What does the syntax mean? Ex: Control statements Ex: Parameter passing Programmer needs to know meaning of statements before he/she can use language effectively
Attributes In computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such. However, in actual usage, the term attribute can and is often treated as equivalent to a property depending on the technology being discussed. For clarity, attributes should more correctly be considered metadata. An attribute is frequently and generally a property of a property.
Expressions 1. 2. 3. Precedence and Associativity Operator Overloading Coercion and conversion 1. 4. 5. 6. Explicit and Implicit
Boolean expressions Short-Circuit Evaluation Assignment
Expressions An expression in a programming language is a combination of explicit values, constants, variables, operators, and functions that are interpreted according to the particular rules of precedence and of association for a particular programming language, which computes and then produces (returns, in a stateful environment) another value. This process, like for mathematical expressions, is called evaluation. The value can be of various types, such as numerical, string, and logical. Expressions For example, 2+3 is an arithmetic and programming expression which evaluates to 5. A variable is an expression because it is a pointer to a value in memory, so y+6 is an expression. An example of a relational expression is 4==4 which evaluates to true. Expressions are vital to programs Allow programmer to specify the calculations that computer is to perform It is important that programmer understand how a language evaluates expressions Precedence and Associativity The data type and the value of an expression depends on the data types of the operands and the order of evaluation of operators which is determined by the precedence and associativity of operators. Let us first consider the order of evaluation. When expressions contain more than one operator, the order in which the operators are evaluated depends on their precedence levels. A higher precedence operator is evaluated before a lower precedence operator. If the precedence levels of operators are the same, then the order of evaluation depends on their associativity (or, grouping). Operator Overloading In object oriented computer programming, operator overloading less commonly known as operator adhoc polymorphism is a specific case of polymorphism, where different operators have different implementations depending on their arguments. Operator overloading is generally defined by the language, the programmer, or both.
Operator overloading is claimed to be useful because it allows the developer to program using notation "closer to the target domain"[1] and allows user-defined types a similar level of syntactic support as types built into the language. It can easily be emulated using function calls; Operator Overloading for an example, consider the integers a, b, c: a+b*c
In a language that supports operator overloading, and assuming the '*' operator has higher precedence than '+', this is effectively a more concise way of writing: add (a, multiply (b,c))
Explicit conversion In this case the language allows little or no mixed expressions in the code To allow mixing of data types, the programmer must convert through an operation of function call Ex: Ada does not even allow mixing of floats and integers Good: Everything is clear no uncertainty or ambiguity Programmer can more easily verify correctness of programs Easier to avoid logic errors
Bad: Makes language very wordy Can be annoying, especially when the types are similar (ex. addition of integers and floats)
Explicit Parallelism In computer programming, explicit parallelism is the representation of concurrent computations by means of primitives in the form of special-purpose directives or function calls. Most parallel primitives are related to process synchronization, communication or task partitioning. As they seldom contribute to actually carry out the intended computation of the program, their computational cost is often considered as parallelization overhead. The advantage of explicit parallel programming is the absolute programmer control over the parallel execution. A skilled parallel programmer takes advantage of explicit parallelism to produce very efficient code. Implicit conversion coercion
In this case mixed expressions are allowed, and the language coerces types where needed to allow types to match Usually a language has some rules by which the coercions are performed Good: Less wordy makes programs shorter and sometimes easier to write
Bad: Programs are harder to verify for correctness It is not always clear which coercion is being done, especially when programmer-defined coercions are allowed Bad: Can lead to logic errors in programs Ex: In C++ expressions are always coerced if they can be Standard rules of promotion for predefined types can be easily remembered However, programmer can also define functions that will be used for coercion Constructors for classes and conversion functions are both implicitly called if necessary Now the rules are less clear and can lead to ambiguity and logic errors
Implicit Parallelism In computer science, implicit parallelism is a characteristic of a programming language that allows a compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by some of the language's constructs. A pure implicitly parallel language does not need special directives, operators or functions to enable parallel execution. If a particular problem involves performing the same operation on a group of numbers (such as taking the sine or logarithm of each in turn), a language that provides implicit parallelism might allow the programmer to write the instruction thus:
numbers = [0 1 2 3 4 5 6 7]; result = sin(numbers); Implicit Parallelism Advantages A programmer that writes implicitly parallel code does not need to worry about task division or process communication, focusing instead in the problem that his or her program is intended to solve. Implicit
parallelism generally facilitates the design of parallel programs and therefore results in a substantial improvement of programmer productivity. Implicit Parallelism Disadvantages Languages with implicit parallelism reduce the control that the programmer has over the parallel execution of the program, resulting sometimes in less-than-optimal parallel efficiency. The makers of the Oz programming language also note that their early experiments with implicit parallelism showed that implicit parallelism made debugging difficult and object models unnecessarily awkward. Boolean Expression In computer science, a Boolean expression is an expression in a programming language that produces a Boolean value when evaluated, i.e. one of true or false. A Boolean expression may be composed of a combination of the Boolean constants true or false, Boolean-typed variables, Boolean-valued operators, and Boolean-valued functions. Boolean operators Programmers will often use a pipe symbol (|) for OR, an ampersand (&) for AND, and a tilde (~) for NOT. In many programming languages, these symbols stand for bitwise operations. "||", "&&", and "!" are used for variants of these operations. Other symbols are a + for OR, a for AND, and an overscore for NOT. A Boolean expression is an expression that evaluates to a value of the Boolean Data Type: True or False. Examples The value for (5 > 3) is evaluated as true. (5>=3) and (3<=5) are equivalent Boolean expressions (both of which would be evaluated as true). Of course, most Boolean expressions will contain at least one variable (X > 3), and often more (X > Y).
Short Circuit Evaluation Relational operations AND and OR are evaluated from left to right. However, as soon as the value is known, evaluation of the expression stops and the value is returned. As a result, not all operands of the expression need to be evaluated. For operation AND, if the first operand is false, then the second operand is not evaluated. Likewise, for operation OR, if the first operand is true, the second operand is not evaluated. Short Circuit Evaluation Short circuit evaluation refers to the condition where an expression is not longer evaluated since further evaluation cannot change the value of the expression. For example, consider this code:
if ( (1 == 0) && foobar() ){ // do something} In this scenario, foobar() is never called. Why? It does not matter whether or not foobar() returns nonzero, since any expression "false AND x" always evaluates to false, no matter what the value of x. This can be a problem in cases where foobar() is expected to change some variables.
Short Circuit Evaluation Also consider another example: if ( (1 == 1) || foobar() ){ // do something}
In this scenario, foobar() is never called, since "1 == 1" is alway true, the other side of the OR expression needs not to be evaluated for the if body to be executed. You can think of "Short circuit evaluation" as both an optimization made by the compiler, and a documented side-effect. Assignment Statement In computer programming, an assignment statement sets or re-sets the value stored in the storage location(s) denoted by a variable name. In most imperative computer programming languages, assignment statements are one of the basic statements. Common notations for the assignment operator are = and :=. Assignment statements typically allow for the same variable name to contain different values at different times during program execution. Thus a language with assignments does not have referential transparency which requires a procedure to return the same results for a given set of inputs at any point in time. Single assignment In pure functional programming, destructive assignment is not allowed, because of side effects Any assignment that changes an existing value (e.g. x := x + 1) is disallowed in purely functional languages. In functional programming, assignment is discouraged in favor of single assignment, also called name binding or initialization. Single assignment differs from assignment as described in this article in that it can only be made once, usually when the variable is created; no subsequent reassignment is allowed. Once created by single assignment, named values are not variables but immutable objects. Value of an assignment In some programming languages, an assignment statement returns a value, while in others it does not. In most expression-oriented programming languages (for example, C), the assignment statement returns the assigned value, allowing such idioms as x = y = a, in which the assignment statement y = a returns the value of a, which is then assigned to x. In a statement such as while (f = read()) { }, the return value of a function is used to control a loop while assigning that same value to a variable.
In other programming languages, Scheme for example, the return value of an assignment is undefined and such idioms are invalid. Chained assignment A statement like w = x = y = z is called a chained assignment in which the value of z is assigned to multiple variables w, x, and y. Chained assignments are often used to initialize multiple variables, as in a=b=c=d=f=0
Not all programming languages support chained assignment. In some programming languages (C for example), chained assignments are supported because assignments return values.
Chapter XIV Control Statements Topics 1. 2. 1. 2. 3. 4. 5. 6. 7. 8. GO TO s where s is the statement label of an executable statement that appears in the same program unit as the unconditional GO TO statement. Selection Iteration
Sixteen Control Statements: Unconditional GO TO Computed GO TO Assigned GO TO Arithmetic IF Logical IF Block IF ELSE IF ELSE Unconditional GO TO Statement The form of an unconditional GO TO statement is:
Execution of an unconditional GO TO statement causes a transfer of control so that the statement identified by the statement label is executed next. Computed GO TO Statement The form of a computed GO TO statement is: GO TO (s [,s]...) i
where: i is an integer expression s is the statement label of an executable statement that appears in the same program unit as the computed GO TO statement. The same statement label may appear more than once in the same computed GO TO statement. Computed GO TO Statement Execution of a computed GO TO statement causes evaluation of the expression i. The evaluation of i is followed by a transfer of control so that the statement identified by the ith statement label in the list of statement labels is executed next, provided that 1 <= i <= n, where n is the number of statement labels in the list of statement labels. If i<1 or i>n, the execution sequence continues as though a CONTINUE statement were executed. Assigned GO TO Statement The form of an assigned GO TO statement is: GO TO i [ (s [,s]...)]
where: i is an integer variable name s is the statement label of an executable statement that appears in the same program unit as the assigned GO TO statement. The same statement label may appear more than once in the same assigned GO TO statement. Assigned GO TO Statement At the time of execution of an assigned GO TO statement, the variable i must be defined with the value of a statement label of an executable statement that appears in the same program unit. Note that the variable may be defined with a statement label value only by an ASSIGN statement in the same program unit as the assigned GO TO statement. The execution of the assigned GO TO statement causes a transfer of control so that the statement identified by that statement label is executed next. If the parenthesized list is present, the statement label assigned to i must be one of the statement labels in the list. Arithmetic IF Statement
The form of an arithmetic IF statement is: IF (e) s1, s2, s3
where: e is an integer, real, or double precision expression s1, s2, and s3 are each the statement label of an executable statement that appears in the same program unit as the arithmetic IF statement. The same statement label may appear more than once in the same arithmetic IF statement. Arithmetic IF Statement Execution of an arithmetic IF statement causes evaluation of the expression e followed by a transfer of control. The statement identified by s1, s2, or s3 is executed next as the value of e is less than zero, equal to zero, or greater than zero, respectively. Logical IF Statement The form of a logical IF statement is: IF (e) st
where: e is a logical expression st is any executable statement except a DO, block IF, ELSE IF, ELSE, END IF, END, or another logical IF statement Logical IF Statement Execution of a logical IF statement causes evaluation of the expression e. If the value of e is true, statement st is executed. If the value of e is false, statement st is not executed and the execution sequence continues as though a CONTINUE statement were executed. Note that the execution of a function reference in the expression e of a logical IF statement is permitted to affect entities in the statement st. Block IF Statement The block IF statement is used with the END IF statement and, optionally, the ELSE IF and ELSE statements to control the execution sequence. The form of a block IF statement is: IF (e) THEN
where e is a logical expression. Block IF Statement
IF-Level. The IF-level of a statement s is n1 - n2 where n1 is the number of block IF statements from the beginning of the program unit up to and including s, and n2 is the number of END IF statements in the program unit up to but not including s. The IF-level of every statement must be zero or positive. The IF-level of each block IF, ELSE IF, ELSE, and END IF statement must be positive. The IF-level of the END statement of each program unit must be zero. Block IF Statement IF-Block An IF-block consists of all of the executable statements that appear following the block IF statement up to, but not including, the next ELSE IF, ELSE, or END IF statement that has the same IF-level as the block IF statement. An IF-block may be empty. Block IF Statement
Execution of a Block IF Statement Execution of a block IF statement causes evaluation of the expression e. If the value of e is true, normal execution sequence continues with the first statement of the IF-block. If the value of e is true and the IFblock is empty, control is transferred to the next END IF statement that has the same IF-level as the block IF statement. If the value of e is false, control is transferred to the next ELSE IF, ELSE, or END IF statement that has the same IF-level as the block IF statement. Transfer of control into an IF-block from outside the IF-block is prohibited. If the execution of the last statement in the IF-block does not result in a transfer of control, control is transferred to the next END IF statement that has the same IF-level as the block IF statement that precedes the IF-block. ELSE IF Statement The form of an ELSE IF statement is: ELSE IF (e) THEN where e is a logical expression. ELSE IF Statement ELSE IF-Block. An ELSE IF-block consists of all of the executable statements that appear following the ELSE IF statement up to, but not including, the next ELSE IF, ELSE, or END IF statement that has the same IF-level as the ELSE IF statement. An ELSE IF-block may be empty.
ELSE IF Statement Execution of an ELSE IF Statement Execution of an ELSE IF statement causes evaluation of the expression e. If the value of e is true, normal execution sequence continues with the first statement of the ELSE IF-block. If the value of e is true and the ELSE IF-block is empty, control is transferred to the next END IF statement that has the same IF-level as the ELSE IF statement. If the value of e is false, control is transferred to the next ELSE IF, ELSE, or END IF statement that has the same IF-level as the ELSE IF statement. ELSE Statement The form of an ELSE statement is: ELSE
ELSE-Block An ELSE-block consists of all of the executable statements that appear following the ELSE statement up to, but not including, the next END IF statement that has the same IF-level as the ELSE statement. An ELSEblock may be empty. An END IF statement of the same IF-level as the ELSE statement must appear before the appearance of an ELSE IF or ELSE statement of the same IF-level. ELSE Statement Execution of an ELSE Statement Execution of an ELSE statement has no effect. Transfer of control into an ELSE-block from outside the ELSE-block is prohibited. The statement label, if any, of an ELSE statement must not be referenced by any statement. END IF Statement The form of an END IF statement is: END IF
Execution of an END IF statement has no effect. For each block IF statement there must be a corresponding END IF statement in the same program unit. A corresponding END IF statement is the next END IF statement that has the same IF-level as the block IF statement. DO Statement A DO statement is used to specify a loop, called a DO-loop. The form of a DO statement is:
DO s i = e1, e2 [,e3] where: s is the statement label of an executable statement. The statement identified by s, called the terminal statement of the DO-loop, must follow the DO statement in the sequence of statements within the same program unit as the DO statement. i is the name of an integer, real, or double precision variable, called the DO-variable e1, e2, and e3 are each an integer, real, or double precision expression DO Statement The terminal statement of a DO-loop must not be an unconditional GO TO, assigned GO TO, arithmetic IF, block IF, ELSE IF, ELSE, END IF, RETURN, STOP, END, or DO statement. If the terminal statement of a DOloop is a logical IF statement, it may contain any executable statement except a DO, block IF, ELSE IF, ELSE, END IF, END, or another logical IF statement. DO Statement Range of a DO-Loop The range of a DO-loop consists of all of the executable statements that appear following the DO statement that specifies the DO-loop, up to and including the terminal statement of the DO-loop. If a DO statement appears within the range of a DO-loop, the range of the DO-loop specified by that DO statement must be contained entirely within the range of the outer DO-loop. More than one DO-loop may have the same terminal statement. DO Statement Range of a DO-Loop If a DO statement appears within an IF-block, ELSE IF-block, or ELSE-block, the range of that DO-loop must be contained entirely within that IF-block, ELSE IF-block, or ELSE-block, respectively. If a block IF statement appears within the range of a DO-loop, the corresponding END IF statement must also appear within the range of that DO-loop. DO Statement Active and Inactive DO-Loops A DO-loop is either active or inactive. Initially inactive, a DO-loop becomes active only when its DO statement is executed. Once active, the DO-loop becomes inactive only when: its iteration count is tested and determined to be zero, a RETURN statement is executed within its range,
control is transferred to a statement that is in the same program unit and is outside the range of the DOloop, or any STOP statement in the executable program is executed, or execution is terminated for any other reason. DO Statement Active and Inactive DO-Loops Execution of a function reference or CALL statement that appears in the range of a DO-loop does not cause the DO-loop to become inactive, except when control is returned by means of an alternate return specifier in a CALL statement to a statement that is not in the range of the DO-loop. When a DO-loop becomes inactive, the DO-variable of the DO-loop retains its last defined value. DO Statement Executing a DO Statement The effect of executing a DO statement is to perform the following steps in sequence: The initial parameter m1, the terminal parameter m2, and the incrementation parameter m3 are established by evaluating e1, e2, and e3, respectively, including, if necessary, conversion to the type of the DO-variable according to the rules for arithmetic conversion (Table 4). If e3 does not appear, m3 has a value of one. m3 must not have a value of zero. The DO-variable becomes defined with the value of the initial parameter m1. DO Statement The iteration count is established and is the value of the expression
MAX( INT( (m2 - m1 + m3)/m3), 0) Note that the iteration count is zero whenever: m1 > m2 and m3 > 0, or m1 < m2 and m3 < 0. At the completion of execution of the DO statement, loop control processing begins. DO Statement Loop Control Processing Loop control processing determines if further execution of the range of the DO-loop is required. The iteration count is tested. If it is not zero, execution of the first statement in the range of the DO-loop begins. If the iteration count is zero, the DO-loop becomes inactive. If, as a result, all of the DO-loops sharing the terminal statement of this DO-loop are inactive, normal execution continues with execution of
the next executable statement following the terminal statement. However, if some of the DO-loops sharing the terminal statement are active, execution continues with incrementation processing. DO Statement Execution of the Range Statements in the range of a DO-loop are executed until the terminal statement is reached. The DOvariable of the DO-loop may neither be redefined nor become undefined during execution of the range of the DO-loop. DO Statement Terminal Statement Execution Execution of the terminal statement occurs as a result of the normal execution sequence or as a result of transfer of control, subject to the restrictions. Unless execution of the terminal statement results in a transfer of control, execution then continues with incrementation processing. DO Statement Incrementation Processing Incrementation processing has the effect of the following steps performed in sequence: The DO-variable, the iteration count, and the incrementation parameter of the active DO-loop whose DO statement was most recently executed, are selected for processing. The value of the DO-variable is incremented by the value of the incrementation parameter m3. The iteration count is decremented by one. Execution continues with loop control processing of the same DO-loop whose iteration count was decremented. CONTINUE Statement The form of a CONTINUE statement is: CONTINUE Execution of a CONTINUE statement has no effect. If the CONTINUE statement is the terminal statement of a DO-loop, the next statement executed depends on the result of the DO-loop incrementation processing . STOP Statement The form of a STOP statement is: STOP [n]
where n is a string of not more than five digits, or is a character constant. Execution of a STOP statement causes termination of execution of the executable program. At the time of termination, the digit string or character constant is accessible. PAUSE Statement The form of a PAUSE statement is: PAUSE [n]
where n is a string of not more than five digits, or is a character constant. Execution of a PAUSE statement causes a cessation of execution of the executable program. Execution must be resumable. At the time of cessation of execution, the digit string or character constant is accessible. Resumption of execution is not under control of the program. If execution is resumed, the execution sequence continues as though a CONTINUE statement were executed. END Statement The END statement indicates the end of the sequence of statements and comment lines of a program unit. If executed in a function or subroutine subprogram, it has the effect of a RETURN statement. If executed in a main program, it terminates the execution of the executable program. The form of an END statement is: END
An END statement is written only in columns 7 through 72 of an initial line. An END statement must not be continued. No other statement in a program unit may have an initial line that appears to be an END statement. The last line of every program unit must be an END statement. Programming Language Selection Why language selection ? There is no universally superior language Language selection is fitment of language strengths and weaknesses to a context Language selection often has long term implications including those of business capability, cost and technology lock-in It is therefore a technology + management decision
Dimensions of Selection Capability : What the languages can / cannot do Productivity : How efficiently can one write programs using the languge
Ramp Up : How easily can you get online Extraneous Factors Costs : What are the costs of using the language Questions to be answered Can the language deliver on expectations ? What is the cost of delivering on expectations How long does it take to write and debug code ? If I don't already have the skill sets what is the cost and time required to build them ? What is the support structure available from community and corporate groups ? What are the hardware and deployment costs ? Capability Style Object Orientation Function Orientation / Higher Order Functions
Typing Static Dynamic
Reflection Object Orientation Encapsulation / Information Hiding Inheritance Polymorphism Are all types objects ? Are all operations performed by sending messages to objects? Are all user defined types objects? Functional Programming Elements Higher Order Functions Code Blocks
Generators (potentially infinite data, lazy evaluation) List operations eg. map / reduce etc Closures Traditional : Haskell,Erlang Upcoming : Scala, Clojure, F# Iteration Iteration means the act of repeating a process usually with the aim of approaching a desired goal or target or result. Each repetition of the process is also called an "iteration," and the results of one iteration are used as the starting point for the next iteration. Iteration in computing is the repetition of a process within a computer program. It can be used both as a general term, synonymous with repetition, and to describe a specific form of repetition with a mutable state. Iteration When used in the first sense, recursion is an example of iteration, but typically using a recursive notation, which is typically not the case for iteration. However, when used in the second (more restricted) sense, iteration describes the style of programming used in imperative programming languages. This contrasts with recursion, which has a more declarative approach.
Chapter XV Subprograms Topics 1. 2. 3. 4. Procedures Functions Local variables Parameters
Procedural Programming Used as a synonym for imperative programming(specifying the steps the program must take to reach the desired state), but can also refer (as in this article) to a programming paradigm, derived from structured programming, based upon the concept of the procedure call. Procedures, also known as routines, subroutines, methods, or functions (not to be confused with mathematical functions, but similar to those used in functional programming) simply contain a series of computational steps to be carried out.
Any given procedure might be called at any point during a program's execution, including by other procedures or itself. Functional Programming In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state. Functional programming has its roots in lambda calculus, a formal system developed in the 1930s to investigate function definition, function application, and recursion. Many functional programming languages can be viewed as elaborations on the lambda calculus. Functional Programming Functional programming languages are a class of languages designed to reflect the way people think mathematically, rather than reflecting the underlying machine. Functional languages are based on the lambda calculus, a simple model of computation, and have a solid theoretical foundation that allows one to reason formally about the programs written in them. The most commonly used functional languages are Standard ML, Haskell, and pure Scheme (a dialect of LISP), which, although they differ in many ways, share most of the properties described here. Functional Programming In practice, the difference between a mathematical function and the notion of a "function" used in imperative programming is that imperative functions can have side effects, changing the value of program state. Because of this they lack referential transparency, i.e. the same language expression can result in different values at different times depending on the state of the executing program. Conversely, in functional code, the output value of a function depends only on the arguments that are input to the function, so calling a function f twice with the same value for an argument x will produce the same result f(x) both times. Eliminating side effects can make it much easier to understand and predict the behavior of a program, which is one of the key motivations for the development of functional programming. Local Variable In computer science, a local variable is a variable that is given local scope. Such a variable is accessible only from the function or block in which it is declared. In programming languages with only two levels of visibility, local variables are contrasted with global variables. On the other hand, many ALGOL-derived languages allow any number of levels of nested functions with private variables, functions, constants and types hidden within them. Local Variable In most languages, local variables are automatic variables stored on the call stack directly. This means that when a recursive function calls itself, local variables in each instance of the function are given separate memory address space. Hence variables of this scope can be declared, written to, and read, without any risk of side-effects to processes outside of the block in which they are declared. Local Variable
Programming languages that employ call by value semantics provide a called subroutine with its own local copy of the arguments passed to it. In most languages, these local parameters are treated the same as other local variables within the subroutine. In contrast, call by reference and call by name semantics allow the parameters to act as aliases of the values passed as arguments, allowing the subroutine to modify variables outside its own scope. Variables of local scope are used to avoid issues with side-effects that can occur with global variables. Parameter In computer programming, a parameter is a special kind of variable, used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are called arguments. An ordered list of parameters is usually included in the definition of a subroutine, so that, each time the subroutine is called, its arguments for that call can be assigned to the corresponding parameters. Parameter In the most common case, call-by-value, a parameter acts within the subroutine as a local (isolated) copy of the argument, but in other cases, e.g. call-by-reference, the argument supplied by the caller can be affected by actions within the called subroutine. The semantics for how parameters can be declared and how the arguments get passed to the parameters of subroutines are defined by the language, but the details of how this is represented in any particular computer system depend on the calling conventions of that system.
Chapter XVI Terms Terms Terms that are defined to aid your understanding on our SPL topics: Language generator is a device that can be used to generate the sentences of a language. Grammars are used to describe the syntax of programming languages. Metalanguage is a language used to describe another language. Arithmetic expressions consist of operators, operands, parentheses, and function calls. Operator precedence rules for expression evaluation define the order in which the operators of different precedence levels are evaluated. Terms Operator associativity rules for expression evaluation define the order which operators with the same precedence level are evaluated. Functional side effects occur when function changes either one of its parameters or a nonlocal variable.
Operator overloading is the multiple use of an operator. Narrowing conversion is one that converts an object to a type that cannot include all of the values of the original type. Widening conversion is one which an object is converted to a type that can include at least approximations to all of the values of the original type. Terms Mixed-mode expression is one that has operands of different types. Relational operator is an operator that compares the values of its two operands. Boolean expressions consist of Boolean variables, Boolean constants, relational expressions, and Boolean operators. Control structure is a control statement and the statements whose execution it controls. Block (code block) is a compound statement that can define a new scope. Subprogram definition is a description of the actions of the subprogram abstraction Terms Subprogram header is the first line of the definition, including the name, the kind of subprogram, and the formal parameters. Formal parameter is a dummy variable listed in the subprogram header and used in the subprogram. Actual parameter represents a value or address used in the subprogram call statement. Overloaded subprogram is one that has the same name as another subprogram in the same Stack-dynamic variables are bound to storages when execution reaches the code to which the declaration is attached.

Structure of Programming Languages

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Structure of Programming Languages

Enviado por

Direitos autorais:

Formatos disponíveis

Structure of Programming Languages I. COURSE OUTLINE Basic Concepts of Programming Languages A. B. C. II.

. Brief PL History Reasons for Studying Concepts of PL Programming Domains

Language Implementation Issues A. Implementation Method 1. 2. 3. Compilation Interpreting Hybrid

COURSE OUTLINE Programming Environments A. Programming Paradigms

Variables A. Variable Attributes

Name Address Type Value Lifetime and Scope

Type-checking A. B. Strong-Typing Structurally Compatible

Language Generators A. B. C. Grammar Regular Grammars/Expressions BNF (Backus-Naur Form) grammars

Semantics A. B. C. Static Semantics Dynamic Semantics Attributes

Boolean expressions Short-Circuit Evaluation Assignment

Control Statements A. B. Selection Iteration

Subprograms A. B. C. D. Procedures Functions Local variables Parameters

Chapter I Basic Concepts of Programming Languages

Reasons for Studying Concepts of PL

e.g.: c = c + 1; c++; ++c; c += 1; Readability

Non-Von Neumann architectures

Data-flow computers: support functional languages

Chapter III Language Implementation Issues Topics 1. 2. 3. Compilation Interpreting Hybrid

Feeds the tokens to syntax analyzer

Chapter IV Programming Environments Topics

Imperative paradigm First do this and Next do that

Language representatives Fortran, Algol, Pascal, Basic, C

Functional Paradigm Characteristics:

Discipline and idea Mathematics and the theory of functions

Chapter V Identifiers, Reserved Words and Keywords Topics

Cannot be used by programmer in any other way

Six attributes Name Address Value Type Lifetime Scope

Variables The scope of variables can be: 1. 2. global or; local

A global variable s scope includes all the statements in a program.

Same variable has different addresses at different points in time

Case sensitivity 1. Disadvantage: readability (names that look alike are

Disadvantage of dynamic binding Type-checking is limited and must be done at run-time

(e.g. Objetive-C, Python) Type Inference

Disadvantage: lack of flexibility (no recursion) Storage can no be shared

Advantage: allows recursion; conserves storage

Disadvantages: Subprograms cannot be history sensitive Inefficient references (indirect addressing)

Advantage: flexibility (generic code)

Disadvantages: Inefficient, because all attributes are dynamic

Loss of error detection

Chapter VIII Type-checking Topics 1. 2. Strong-Typing Structurally Compatible

floatT = { a: float; f: (float

Types fall into several broad categories:

2. Reference types 3. Option types Nullable types

4. Composite types types composed of basic types; e.g., arrays or records.

5. Abstract data types 6. Algebraic types 7. Subtype

Chapter IX Scope, Lifetime and Referencing Environments

Examples of integer literals are: 42 10000 233000

Examples of string literals in C syntax are: "A"

Language 1. Set of string of characters from some alphabet

Sentence 1. String of a language

if (width < height)

ArgumentList: Argument ArgumentList , Argument

Boolean expressions Short-Circuit Evaluation Assignment

Short Circuit Evaluation Also consider another example: if ( (1 == 1) || foobar() ){ // do something}

The form of an arithmetic IF statement is: IF (e) s1, s2, s3

where e is a logical expression. Block IF Statement

Typing Static Dynamic

Chapter XV Subprograms Topics 1. 2. 3. 4. Procedures Functions Local variables Parameters

Você também pode gostar