Você está na página 1de 154

What are Programming

Languages?
Abby S. Paculdo
Concept of Programming
Language
Increased capacity to express ideas. Awareness
of a wider of programming language features can reduce
such limitations in software development thought processes
by learning new language constructs.
Improved background for choosing
appropriate language. Continue to used the
language.
Increased ability to learn new language.
Computer programming is a young discipline , and design
methodologies, software development tools, and
programming languages are still in a state of continuous
evolution.
Better understanding of the significance of
implementation.
Increased ability to design new language. The
form of that interface is designed by the system developer
Programming Domain
Scientific Applications, a simple data structures
but require large numbers of floating-point arithmetic
computations, common data structures are array and
common control structure are counting loops.
Artificial Intelligence, broad area of computer
application characterized by the use of symbolic rather than
numeric computations.
System Programming , also known as System
Software.
The wrong question...
What can this language do?
Implicitly we are comparing the new
language with other languages.
The answer is very simple: all languages
can do exactly the same computations!
(Influence on Language Design)Computer
Architecture (Result of operation)
If they can do all the same
computations... There must be other
reasons for the existence of hundreds of
programming languages.
Definition
A program is a sequence of symbols that
specifies a computation.
A programming language is a set of rules that
specify which sequence of symbols constitute a
program, and what computation the program
describes.

Programs and languages can be defined as


purely formal mathematical objects.
People are interested in programs than in
mathematical objects because it is possible to
use the program the sequence of symbols- to
control the execution of the computer.
Von Neumann Machine
The stored-program computer.
The original so-called von Neumann
machine was designed in the late 1940s at
the Institute for Advanced Study in
Princeton.
Modern computers have enough in common
with the original that they are said to have a
von Neumann architecture.
The native language of a computer, its
machine language, is the notation to which
the computer responds directly (originally
referred to as code).
The von Neomann Computer
Architecture
Memory (stores both instructions
and data)

Result of Instruction and


operations Data

Arithmetic Control Input &


Logic Unit Unit Output
devices
Implementation Method
The two of the primary components
of a computer are its:
a) Internal memory is used to store
programs and data.
b) Processor is a collection of
circuits that provides a realization
of set of primitive operations or
machine instructions, such as
those for arithmetic and logic
operations
Hierarchical Layers of Soft/hard-ware

word processor, image processing, pattern classification, data analysis, database access, ...

operating systems, compile


and assemblers

08/22 & 24, Computer Abstraction,


Liu@NMT,2007 Technology , and Logic Design 9
Compilation
Programming Languages can be
implemented by any of three
general methods. At one extreme,
programs can be translated to
machine language, which can be
executed directly on the computer
this is called a Compiler
implementation.
This method has the advantage of
very fast program execution, once
the translation process is complete
Stages in translating a
program Lexical analysis (Scanner):
Breaking a program into
primitive components, called
tokens (identifiers, numbers,
keywords, ...) We will see that
regular grammars and finite
state automata are formal
Syntactic
models ofanalysis
this. (Parsing):
Creating a syntax tree of the
program. We will see that
context free grammars and
pushdown automata are formal
models of this.
Symbol table: Storing
information about declared
objects (identifiers, procedure
names, ...)
Semantic analysis:
Understanding the relationship
among the tokens in the
program.
Optimization: Rewriting the
syntax tree to create a more
efficient program.

Code generation: Converting


the parsed program into an
executable form.
Stages:
a. The languages that a compiler
translates called the Source Language.
b. The lexical analyzer gathers the
characters of the source code into
lexical units. The lexical units of the
program are identifiers, special words,
operators and punctuation symbols.
c. The syntax analyzer takes the lexical
units from the lexical analyzer and uses
them to construct hierarchical
structures called parse tree. It
represent the syntactic structure of the
program.
Stages: cont
d. The Intermediate Code Generator
produces a program in a different
language, at an intermediate level
between the source program and the final
output of the compiler, the machine
language program.
e. Optimization, which improves programs
by making them smaller or faster or both,
is often an optional part of compilation.
f. The Code Generator translates optimized
intermediate code version of the program
into an equivalent machine language
program.
Stages: cont
g. The Symbol Table serves as a
database for the compilation
process. The primary contents of
the symbol table are the type and
attribute information of each user-
defined name in the program.
Differentiate
the Compiler
And
Interpreter
A program that translates source into object. The
compiler derives its name from the way it works,
looking at the entire piece of source code and
collecting and reorganizing the instructions.
Thus, a compiler differs from an interpreter, which
analyzes and executes each line of source code in
succession, without looking at the entire program.
The advantage of interpreters is that they can execute a
program immediately.
Compilers require some time before an executable
program emerges. However, programs produced by
compilers runs much faster than the same programs
executed by an interpreter. Every high-level programming
language (except strictly interpretive languages) comes
with a compiler. In effect, the compiler is the language,
because it defines which instructions are acceptable.
Advantages & Disadvantages of
Compiled Program
Advantages:
1. Machine code is created. Therefore there is
no need for compiler to be present after
the program has been compiled.
2. No dependency on the compiler after the
program is compiled.
3. Compilers also optimise the program often
giving a better performance for the
program.
Disadvantages:
4. The code is highly machine dependent.
Advantages & Disadvantages of
Interpreted Language
Advantages:
1. Programs written in interpreted
languages are platform
independent as long as you have
the interpreter for the other
platform. e.g the program written
in perl can be executed on Linux,
solaris and Windows provided
that you have perl interpreter
installed on them. Due to this the
cross platform
Programming
Environment
A Programming Environment
Is the collection of tools and
used in the development of
software. This collection may
consist of only a file system, a text
editor, a linker, and a compiler. Or
it may include a large collection
integrated tools, each accessed
through a uniform user interface.
Evolution of the
Major
Programming
Language
Describing
Syntax and
Semantics
Programming Language
Implementation
The study of programming languages, like
the study of natural languages, can be
divided into examinations of Syntax and
Semantics
The Syntax is the form of its expressions,
statements, and program units.
Its Semantics is the meaning of those
expressions, statements, and program units
ex. If (<expr>) <statement> for C
programming
Undertanding of Framework & plug ins
Describing
Syntax
Languages, whether natural or artificial, are
sets of strings and characters from some
alphabet.
The String of a language are called Sentences
or Statements.
The lowest level syntactic units are called
Lexemes.
The description of lexemes can be given by a
lexical specification, which can be separate
from the syntactic description of the language.
The lexemes of a programming language
include its IDENTIFIERS, LITERALS,
OPERATORS, AND SPECIAL WORDS.
A Token of a language is a category of its
lexemes, or instances, such as sum and
total.
ex. Index = 2* count + 17;
The lexemes and tokens of this statement
are
Lexemestokens
Index identifier
= equal_sign
2 int_literal
* mult_op
Count identifier
+ plus_op
17 int_literal
; semicolon
Backus-Naur Form and Context
Free Grammars
A landmark paper describing ALGOL 58
was presented by John Backus, a
prominent member of the ACM-GAMM
group, at an international conference in
1959. This paper introduced a new
notation for specifying programming
language syntax. The revised method
of syntax description became known as
Backus-Naur Form or simply BNF.
BNF is a very natural notation for
describing syntax.
Fundamentals
A Metalanguage is a language that is used to
describe another language. BNF is a
metalanguage for programming languages.
The actual definition of <assign> may be given
by:
<assign> <var> = <expression>
The symbol on the left side of the arrow, which is
aptly called the left-hand side (LHS), the
abstraction being defined. The text to the right of
the arrow is the definition of the LHS. It is called
the right-hand side(RHS) and consist of some
mixture of tokens, lexemes, and references to
other abstractions. Altogether the definition is
called a RULE or PRODUCTION,
Fundamentals
This particular rules specifies that
the abstraction <assign> is
defined as an instance of the
abstraction <var>, followed by
the lexeme =, followed by an
instance of the abstraction
<expression>. One example
sentence whose syntactic
structure is described by the rule
is:
total = sub1 + sub2;
Fundamentals
Nonterminals symbols can have two or more distinct
definitions, representing two or more possible
syntactic forms in the language. Multiple
definitions can be written as a single rule, with the
different definitions separated by the symbol |,
meaning logical OR.
Pascal if statement can be described with rules:
<if stmt> if <logic_expr> then <stmt>
<if stmt> if <logic_expr> then <stmt>
else<stmt>
Or with the rule
<if stmt> if <logic_expr> then <stmt>
|if <logic_expr> then <stmt> else<stmt>
Describing Lists
Variable-length in mathematics are often
written using an ellipsis(); 1,2,.. Is an
example. BNF does not include the
ellipsis, so an alternative method is
required for describing lists of syntactic
elements in programming language. The
common alternative is recursion. A rule
is recursive if its LHS appears in its
RHS. The following rules illustration how
recursion is used to describe lists;
<ident_lists> identifier
| identifier, <ident_list>
Describing Lists
This defines <ident_lists> as either
a single token (identifier) or an
identifier followed by a comma
followed by another instance of
<ident_list>. Recursion is used to
describe lists in many of the
example grammar.
Grammar & Derivations
The sentences of the language are
generated through a sequence of
applications of the rules, beginning
with a special nonterminal of the
grammar called the Start Symbol. A
sentence generation is called a
Derivation. In a grammar for a
complete language, the start symbol
represents a complete program and is
usually named < program>. This
simple grammar shown:
Grammar & Derivations
A Grammar for small language
<program> begin <stmt_list>end
<stmt_list> <stmt>
| <stmt>; <stmt_list>
<stmt> <var> := <expression>
<var> A|B|C
<expression> <var> + <var>
| <var> - <var>
| <var>

Given: <program> = > begin <stmt_list> end


Output: begin A:= B+C; B := C end
Grammar & Derivations
The derivation, like all derivations, begins
with the start symbol, in this case
<program>. The symbol => is read
derives. Each successive string in the
sequence is derived from the previous
string by replacing one of the nonterminals
with one of that nonterminals definitions.
Each of the string in the derivation,
including <program>, is called a
sentential form. In this derivation, the
replaced nonterminal is always the
leftmost. Derivations that use this order of
replacement are called leftmost
derivation. The derivation continues until
Grammar & Derivations
The describes assignment statement whose
right side are arithmetic expressions with
multiplication and addition operators and
parenthesis. For example, the statement
A:= B* (A+C)
A grammar for Simple Assignment Statement:
<assign> <id> := <expr>
<id> A|B|C
<expr> <id> + <expr>
| <id> * <expr>
| ( <expr>)
| <id>
Given: <assign> => <id> := <expr>
Output: = > A:=B * (A+C)
Assignment : B:=A+B*(C*B)
Parse Tree
One of the most attractive features of
grammars is that they naturally
describe the hierarchical syntactic
structure of the sentences of the
language they define. The hierarchical
structures are called PARSE TREE.
Every internal node of a parse tree is
labeled with nonterminal symbol; every
leaf is labeled with a terminal symbol.
Every subtree of a parse tree describes
one instance of an abstraction in the
statement.
Ambiguity
A grammar that generates a sentence for
which there are two or more distinct parse
tree is said to be Ambiguous
An ambiguous Grammar for Simple Assignment
Statement
<assign> <id> := <expr>
<id> A|B|C
<expr> <expr> + <expr>
| <expr> * <expr>
| ( <expr>)
| <id>

Given: <assign> <id> := <expr>


outPut: A:= B+C * A
Ambiguity
The ambiguity occurs because the
grammar specifies slightly less syntactic
structure than does the grammar. Rather
than allowing the parse tree of an
expression to grow only on the right, this
grammar allows growth on the both the
left and right.
Syntactic ambiguity of language structures
is a problem often base the semantics of
those structures on their syntactic form. If
a language structure has more than one
parse tree, then the meaning of the
structure cannot be determined uniquely.
Unambiguous Grammar
It indicates the usual precedence order of
multiplication and addition operators. The
following derivation of the sentence
UnAmbigous, Parse Tree
Rules:
A := B + C * A
<assign> <id> := <expr>
<id> A | B | C
<expr> <expr> + <term>
<expr>-<expr>
| <term>
<term> <term> * <factor>
| <factor>
<factor> (<expr>)
| <id>
Grammar (simple derivation)

Derive the following rule, without


Parse Tree
<S> a<S> c<B> | <A> | b
<A> c<A> | c
<B> d | <A>

Output:
1. abcd
2. accc
Usual grammar for
expressions
E E + T | T
T T * P | P
P i | ( E )

Natural value of expression


is 26
Multiply 2 * 3 = 6
Multiply 4 * 5 = 20
Add 6 + 20 = 26
Syntax Graphs
A Graph is a collection of nodes, some of
which are connected by lines, called edges.
A Directed graph is one in which the edges
are directional; that is, they have arrowheads
on one end to indicate a direction.
The information in BNF and EBNF rules can be
represented in a directed graph. Such graph
is used are called syntax graphs, syntax
diagrams, or syntax charts.
A Separate graph is used for each syntactic
unit, in the same way a nonterminal symbol
in a grammar represents such a unit.
Syntax Graphs
Syntax graphs use different kinds of
nodes to represent the terminal
and nonterminal symbols of the
right side of grammar rules.
Rectangle nodes contain the
names of syntactic unit
(nonterminals). Circles or ellipses
contain terminal symbol.
<if_stmt> if <condition>then
<stmt>{<else_if>}
[else<stmt>] end if
<else_if> elseif<condition>then
Attribute Grammars
An attribute grammar is a device
used to describe more of the
structure of a programming
language then is possible with a
context-free grammar. An attribute
grammar is an extension to a
context-free grammar. The
extension allows certain language
rules to be described, such as type
compatibility
Chapter 4

Name, Binding,
Type Checking,
And Scopes
Data Structure and algorithm
Algorithm a process or set of rules
used for calculation or problem
solving
Data Structure a series of coded
instructions to control the operation
of a computer or other machine,
Name Forms
A name/identifier is a string of
characters used to identify some
entity in a program.
In some language, notably C, C++,
and Java, uppercase and
lowercase letters in names are
distinct; that is, names in these
languages are case sensitive
Java vs C++/C
Used for class names, method names, and variable names.
Every name is made from the following characters, starting
with a letter:
Letters: a-z, A-Z, and other alphabetic characters from
other languages, Digits: 0-9, Special: _ (underscore)
No names can be the same as a Java keyword (eg, import,
if, ...)
Packages, Class and interface names - Start with
uppercase - Class and interface names start with an
uppercase letter, and continue in lowercase. For multiple
words, use camelcase. Eg, Direction, LogicalLayout,
DebugGapSpacer.
Variable and method names Lowercase - Lowercase is
used for variable and method names. If a name has multiple
words, use camelcase.
Constants - All uppercase, use _ to separate words - The
names of constants (typically declared static final) should be
in all uppercase. For example, BorderLayout.NORTH.
Java vs C++/C
Name Forms
A keyword is a special word of a that is
special certain contexts.
Example: int apple;
int = 3;
A reserved word is a special word of
programming language that cannot be
used as a name. As a language design
choice, reserved words are better than
keywords because the ability to
redefined keywords can lead to
readability problems.
Java vs. C++
Name Forms
A program Variable is an abstraction of a computer
memory cell or collection of cells. A variable can be
characterized as a sextuple of attribute; ( name, address,
value, type, lifetime, scope).
a. Name variable names are most common names in
program.
b. Address address of a variable is the memory address
with which it is associated.
c. Type determines the range of values the variable can
have and the set of operations that are defined for
values of the type.
d. Value is the contents of the memory cell or cells
associated with the variable. It is convenient to think of
computer memory in terms of abstract cells, rather than
physical cells, the cells, or individually addressable
units, of most contemporary computer memories are
byte sized, with a byte usually being eight bits in
length.
Variables in Java
the basic unit of storage in a java program. A
variable is defined by combination of an
identifier, a type, and an optional initialize. It is a
named memory location that may be assigned a
value by the program.

Type identifier[=value][,identifier[=value]
.];

Type is one of Javas atomic types, or the name


of a class or interface
Identifier is the name of the variable.

Variables in C++
A symbol that represent a storage
location in the computers memory. The
information that is stored in that location
is called the Value of the variable. One
common way for a variable to obtain a
value is by an assignment. Syntax
Variable = expression;
First the expression is evaluated and then
the resulting value is assigned to the
variable. The equals sign = is the
assignment operator in C++.
The Concept of Binding
In a general sense, a binding is an
association, such as between an attribute
and an entity or between an operation and a
symbol. The time at which a binding takes
place is called binding time. Binding and
binding time are prominent concepts in the
semantics of programming language.
Binding can take place at:
a. Language design time
b. Language Implementation time
c. Compile time
d. Link time
e. Load time / or run time
The Concept of Binding
The asterisk symbol (*) is usually
bound to the multiplication
operation at language design time.
A data type , such as Integer is
bound to range of possible values
at language implementation time.
At compile time, a variable in a C
program is bound to a particular
data type. A call to library
subprogram is bound to the
subprogram code to link time.
The Concept of Binding
int count;
.
count = count + 5

assignment are as follows:


set of possible types for count: bound at language design
time.
Type of count: bound at compile time.
Set of possible values of count: bound at compiler design time.
Value of count: bound at execution time with this statement.
Set of possible meaning for the operator symbol: bound at
language definition time
Meaning of the operator symbol + in this statement: bound at
compile time.
Internal representation of the literal 5: bound at compiler
design time.
Binding of Attributes to
Variables
A binding is Static if it occurs before run
time and remains unchanged throughout
program execution. If it occurs during run
time or can change in the course of
program execution, it is called dynamic.
The physical binding of a variable to a
storage cell in a virtual memory
environment is complex, because the page
or segment of the address space in which
the cell resides may be moved in and out of
memory many during program execution.
Type Binding
a. Variable Declarations
An explicit declaration is a
statement in a program that lists
variable names and specifies that
they are a particular type.
An implicit declaration is a means
of associating variables with types
through default conventions
instead of declaration statement.
Type Binding
b.) Dynamic Type Binding
The type is not specified by a
declaration statement. Instead, the
variable is bound to a type when it
is assigned a value in an
assignment. When the assignment
statement is executed, the variable
being assigned is bound to the type
of the value, variable or expression
on the right side of the assignment.
Storage Binding and
Lifetime
Storage Binding:

Allocation=getting a memory cell from some pool of


available cells, in order to bind it to a variable.
Deallocation=putting a memory cell (unbound from a
variable) back into the pool.

The lifetime of a variable is the time during which it is


bound to a particular memory cell. Depending on their
lifetime, 4 categories of variables:
a. Static.
b. Stack-Dynamic.
c. Explicit Heap-Dynamic.
d. Implicit Heap-Dynamic
a. Static Variables

A static variable is bound to a memory cell


before execution begins and remains bound to
the same memory cell throughout execution.
Example: C and C++ static variables.

int myFunction() {
static int count =0;

count++;
return count;
}
Static Variables
Advantages:
efficiency: direct addressing, no run-time
overhead for allocation & deallocation.
history-sensitive : maintain values
between successive function calls.

Disadvantages:
lack of flexibility (no recursion).
storage cannot be shared among
variables.
b. Stack-Dynamic
Variables
Stack-dynamic=storage bindings are created
for variables when their declaration
statements are elaborated.
- A declaration is elaborated when the executable code
associated with it is executed (with some exceptions).
- Storage is allocated from the run-time stack.
- If scalar, all attributes except address are statically
bound:
Example: local variables in C subprograms
and Java methods.
int factorial(int n) {
int result =1;
for (int i =2; i n; i++)
result =i;
return result;
Stack-Dynamic Variables
Advantages:
Allows recursion;
Conserves storage.

Disadvantages:
Overhead of allocation and deallocation.
Subprograms cannot be history
sensitive.
Inefficient references (indirect
addressing).
c. Explicit Heap-Dynamic
Variables
Explicit heap-dynamic variables are allocated and
deallocated by explicit directives, specified by the
programmer, which take effect during execution:
Storage is allocated from the heap.
The actual variables are nameless.
Referenced only through pointers or references,
e.g. dynamic objects in C++ (via new and delete),
all objects in Java.

int *intNode;// create the pointer, stack-dynamic.



intNode =new int;// create the heap-dynamic
variable.

delete intNode;// deallocate the heap-dynamic
Explicit Heap-Dynamic
Variables
Advantages:
Enable the specification and
construction of dynamic structures
(linked lists & trees) that grow and
shrink during the execution.
Disadvantages:
Unreliable: difficult to use pointers &
references correctly.
Innefficient: heap managemenet is
costly and complicated.
d. Implicit Heap-Dynamic
Variables
Implicitheap-dynamic variables allocation and
deallocation caused by assignment statements:
All their attributes (e.g. type) are bound every time
they are assigned.
Examples: strings and arrays in Perl, variables in
JavaScript & PHP.
list = [2, 4.33, 6, 8];
list = 17.3;
Advantages: flexibility (generic code)
Disadvantages:
Inefficient, because all attributes are dynamic.
Loss of error detection by compiler.
Type Checking
Preliminarystep: generalize the concept of
operands and operators to include:
subprograms as operators, and parameters as operands;
assignments as operators, and LHS & RHS as operands.

Type checking is the activity of ensuring that the


operands of an operator are of compatible types.
A compatible type is one that is either legal for the
operator, or is allowed under language rules to be
implicitly converted to a legal type:
This automatic conversion , by compiler-generated code,
is called a coercion.
Type Checking
A type error results from the application of an
operator to an operand of an inappropriate type.
Static type checking: if all type bindings are
static, nearly all type checking can be done
statically (Ada, C/C++, Java).
Dynamic type checking: if type bindings are
dynamic, type checking must be dynamic
(Javascript, PHP).
Strong typing: a programming language is
strongly typed if type errors are always detected.
Done either at compile time or run time.
Advantages: allows the detection of the misuses of
variables that result in type errors.
Strong Typing: Language
Examples

C and C++ are not strongly typed:


parameter type checking can be avoided;
unions are not type checked.

Ada is nearly strongly typed:


only exception: the UNCHECKED_CONVERSIONgeneric function
extracts the value of a variable of one type and using it as if it
were of a different type.

Java and C# are strongly typed in the same sense as


Ada:
types can be explicitely cast may get type errors at run time.
Strong Typing & Type
Coersion
Coercion rules can weaken the strong typing
considerably i.e. loss in error detection
capability:
C++s strong typing less effective compared to
Adas.

Although Java has just half the assignment


coercions of C++:
its strong typing is more effective than that of C++.
its strong typing is still far less effective than that of
Ada
Variable Attributes: Scope
The scopeof a variable is the range of statements
over which it is visible
Variable v is visible in statement s if v can be referenced in
s.

The scope rules of a language determine how


occurrences of names are associated with variables:
static scoping.
dynamic scoping.

Two types of variables:


localvariables: declared inside the program unit/block.
nonlocalvariable: visible, but declared outside the program
unit.
Static Scope
Introducedin ALGOL 60 as a method of binding
names to nonlocal variables:
To connect a name reference to a variable, you (or the
compiler) must find the declaration.
Search process: search declarations, first locally, then in
increasingly larger enclosing scopes, until one is found for
the given name.

Two ways of creating nested static scopes:


Nested subprogram definitions (e.g., Ada, JavaScript, and
PHP).
Nested blocks.

Given a specific scope:


Enclosing static scopes are called its static ancestors;
The nearest static ancestor is called its static parent.
Static Scope Example
Static Scope
Variables can be hidden from a
unit by having a "closer" variable
with the same name.
C++ and Ada allow access to
these "hidden" variables
In Ada: unit.name
In C++: class_name::name
Static Scope: Blocks
Blocks a method of creating (nested) static scopes inside
program units (introduced in ALGOL 60)
Examples:
C-based languages:
while (...) {
int index;
...
}
- Ada:
declare Temp : Float;
begin
...
end
Static Scope: Evaluation
Static Scope: Evaluation
Static Scope: Evaluation
Suppose the specification is changed so that
D must now access some data in B.
Solutions:
Put D in B (but then C can no longer call it and D
cannot access A's variables).
Move the data from B that D needs to MAIN (but
then all procedures can access them).

Same problem for procedure access as for


data access.
Overall: static scoping often encourages
many globals.
Dynamic Scope
Static Scope: names are associated
to variables based on their textual
layout (spatial).
Dynamic Scope: names are
associated to variables based on
calling sequences of program units
(temporal).
References to variables are connected to
declarations by searching back through
the chain of subprogram calls that forced
execution to this point.
Dynamic Scope: Example
Dynamic Scope:
Evaluation
Advantages:
convenience: called subprogram is executed in the
context of the caller no need to pass variables in
the caller as parameters.
Disadvantages:
poor readability
virtually impossible for a human reader to
determine the meaning of references to nonlocal
variables.

less reliable programs than with static scoping.


exectution is slower than with static scoping.
Scope vs. Lifetime
Sometimes scope and lifetime appear to be related:
Ex: a local variable in a Java method without method
calls.
scope: from declaration to the end of the method
(spatial).
lifetime: begins when the method is entered, and
ends when the execution of the method terminates
(temporal).

Sometimes scope and lifetime are unrelated:


Ex: a static variable inside a C/C++ function:
scope: the scope of that function (statically bound)
lifetime: the entire execution of the program
(statically bound).

Ex: a local variable inside a C++ function containing


Referencing Environment
The referencing environment of a statement
is the collection of all names that are visible
in the statement
In a static-scoped language, it is the local variables plus
all of the visible variables in all of the enclosing scopes.
In a dynamic-scoped language, the referencing
environment is the local variables plus all visible
variables in all active subprograms:
A subprogram is active if its execution has begun but has not
yet terminated.

Variables in enclosing scopes/active subprograms can


be hidden by variables with same name.
Named Constants
A named constant is a variable that is bound to
a value only when it is bound to storage.
Advantages:
Readability and modifiability.
Used to parameterize programs.

The binding of values to named


constants can be either:
Static(constant-valued expressions):FORTRAN 95:.
C# const named constants.
Dynamic(expressions of any kind):Ada, C++, and Java.
C# readonly named constants.
Variable Initialization
Initialization=thebinding of a variable to
a value at the time it is bound to storage.
Static storage binding:
initialization occurs before run time
initial value must be a constant expression
(combination of constant literals and named
constants).

Dynamic storage binding:


initialization occurs at run time.
initial value can be an expression of any kind.
Chapter 5

DATA TYPES
Data Types
A data type defines a collection of data objects and
a set of predefined operations on those objects.
Primitive data types are those not defined in
terms of other data types:
Some primitive data types are merely reflections of the
hardware.
Others require only a little non-hardware support for their
implementation.

User defined types are created with flexible


structure defining operators (ALGOL 68).
Abstract data types separate the interface of a
type (visible) from the representation of that type
(hidden).
Primitive Data Types
Integersalmost always an exact reflection of
the hardware.
Javas signed integers: byte, short, int, long.
Floating
Pointmodel real numbers, but only
as approximations.
Support for two types: float and double.
Complextwo floats, the real and the
imaginary.
Supported in Fortran and Python.
Booleantwo elements, true and false.
Implemented as bits or bytes.
Characterstored as numeric codings.
ASCII 8-bit encoding, UNICODE 16-bit encoding.
Character String Types
Character Strings values are sequences of
characters.
Typical operations:
Assignment.
Comparison.
Concatenation.
Substring reference.
Pattern matching.

Design issues:
Is it a primitive type or just a special kind of array?
Should the length of strings be static or dynamic?
String In Programming
Language
C and C++:
Implemented as null terminated char arrays.
A library of functions in string.h that provide string
operations.
Many operations are inherently unsafe (ex: strcpy).
C++ string class from the standard library is safer.

Java (C# and Ruby):


Primitive via the String class (immutable).
Arrays via the StringBufferclass (mutable, w/
subscripting).
Fortran:
Primitive type.
String In Programming
Language
Python:
Primitive type that behaves like an array of
characters:
indexing, searching, replacement, character
membership.
Immutable

Pattern Matching:
built-in for Perl, JavaScript, Ruby, and PHP, using
regular expressions.
class libraries for C++. Java, Python, C#.
String Length
Static Length set when the string is created:
Java String, C++ STL, Ruby String, C# .NET.

Limited Dynamic Length length can vary


between 0 and a maximum set when the
string is defined:
C/C++ null terminated strings.

Dynamic
Length varying length with no
maximum:
JavaScript and Perl (overhead of dynamic
allocation/deallocation).

Ada supports all three types:


Character String
Implementation
Static length:
compile-time descriptor storing the length
and the address.

Limited dynamic length:


may need a run-time descriptor for length
(but not in C and C++).

Dynamic length:
need run-time descriptor;
allocation/de-allocation is the biggest
implementation problem.
Character String
Implementation
Array Types
An array is an aggregate of homogeneous
data elements in which an individual
element is identified by its position in the
aggregate, relative to the first element.
Indexing is a mapping from indices to
elements:
array_name[index_value_list] an element
Index range checking:
C, C++, Perl, and Fortran do not specify range
checking.
Java, ML, C# specify range checking.
In Ada, the default is to require range checking,
but it can be turned off.
Array Categories
Static:subscript ranges are statically bound and
storage allocation is static (before run-time)
Advantage: efficiency no dynamic
allocation/deallocation.
Example: arrays declared as static in C/C++
functions.

Fixed Stack-Dynamic: subscript ranges are


statically bound, but the allocation is done at
declaration time (at run-time)
Advantage: space efficiency stack space is reused.
Example: arrays declared in C/C++ functions without
the static modifier
Array Categories
Stack-Dynamic: subscript ranges are
dynamically bound and the storage
allocation is dynamic (at run-time):
Advantage: flexibility the size of an array need not
be known until the array is to be used.
Example: Ada arrays.
Get(List_Len);
declare
List: array(1. .List_len) ofInteger;
begin

end;
Array Categories
FixedHeap-Dynamic: similar to fixed stack-
dynamic i.e. subscript range and storage
binding are fixed after allocation:
Binding is done when requested by the program.
Storage is allocated from the heap.
Examples:

o C/C++ using malloc/free or new/delete.


o Fortran 95.
o In Java all arrays are fixed heap-dynamic.
o C#.
Array Categories
Heap-dynamic: binding of subscript ranges and
storage allocation is dynamic and can change any
number of times:
Advantage: flexibility, as arrays can grow or shrink during
program execution.
Examples:
C#:
ArrayList intList = new ArrayList();
intList.add(nextOne);
Java has a similar class, but no subscripting (use
methods get()/set() instead).
Perl, JavaScript, Python, Ruby
Array Initialization
Some languages allow initialization at the time of
storage allocation:
C, C++, Java, C# example:
int list [] = {4, 5, 7, 83}
Arrays of strings in C and C++

char *names [] = {Bob, Jake, Joe];


Java initialization of String objects:

String[] names = {Bob, Jake, Joe};


Ada initialization using arrow operator:

Bunch : array (1..5) ofInteger := (1 => 17, 3 =>


34, others=> 0)
Heterogeneous Arrays

A heterogeneous array is one in


which the elements need not be
of the same type.
Supported by:
Perl: any mixture of scalar types (numbers,
strings, and references).
JavaScript: dynamically typed language
any type.
Python and Ruby: references to objects of
any type
Rectangular and Jagged
Arrays
A rectangular array is a multi-dimensioned array in
which all of the rows have the same number of
elements and all columns have the same number of
elements:Fortran, Ada, and C# support rectangular
arrays.
myArray[3,7]
A jagged matrix has rows with varying number of
elements:
Possible when multi-dimensioned arrays actually appear
as arrays of arrays
C, C++, C# and Java support jagged arrays.
myArray[3][7]
Slice
A slice is some substructure of an
array:nothing more than a referencing
mechanism.
only useful in languages that have array
operations.

Fortran 95 (also Perl, Python, Ruby, restricted


in Ada):
Integer, Dimension (10) :: Vector
Integer, Dimension (3, 3) :: Mat
Integer, Dimension (3, 3) :: Cube
Vector (3:6)is a four element array
Slices
Implementation of Arrays
Access function maps subscript expressions
to the address of an element in the array.

Single-Dimensional Arrays:
implemented as a block of adjacent memory
cells.
access function for single-dimensioned arrays:

address(list[k])
=address(list[lower_bound]) +
((k lower_bound) *element_size)
Access Function for a Multi-
Dimensioned Array
Compile Time
Descriptors
Record types
A recordis a possibly heterogeneous
aggregate of data elements in which the
individual elements are identified by
names.
A record type in Ada:
type Emp_Rec_Type is record
First: String (1..20);
Mid: String (1..10);
Last: String (1..20);
Hourly_Rate: Float;
end record;
Emp_Rec: Emp_Rec_Type;
Record Type

C, C++, C#: supported with the structdata


type.In C++ structures are minor variations
on classes.
In C# structures are related to classes, but
also quite different.
In C++ and C# structures are also used for
encapsulation.

Python, Ruby: implemented as hashes.


Implementation of Record
Types
Records vs. Arrays
Arrays mostly used when:
collection of data values is homogenous.
values are process in the same way.
order is important.

Records are used when:


collection of data values is heterogeneous.
values are not precessed in the same way.
unordered.

Accessto array elements is much slower than


access to record fields:
array subscripts are dynamic.
record field names are static.
Pointer Types
A pointer type variable has a range of values
that consists of memory addresses and a
special value nil.
Provide the power of indirect addressing.
Provide a way to manage dynamic memory
a pointer can be used to access a location in the area where
storage is dynamically created i.e. the heap.
variables that are dynamically allocated on the heap are
heap-dynamic variables.

Pointer types are defined using a type operator:


C/C++: int ptr = new int;
Pointer Operations
Two fundamental operations:
assignment.
dereferencing.

Assignment is used to set a pointer variables value to


some useful address:
int ptr = &counter; // indirect addressing.
int ptr = new int; // heap-dynamic variable.

Dereferencing yields the value stored at the location


represented by the pointers valueC++ uses an explicit
operation via unary operator :

j = *ptr; // sets j to the value located at ptr


Pointer Dereferencing
Problems with Pointers
Dangling pointers:
A pointer points to a heap-dynamic variable that has been
deallocated.
Dangerous: the location may be assigned to other
variables.

Lost heap-dynamic variable:


An allocated heap-dynamic variable that is no longer
accessible to the user program (often called garbage or
memory leak):
Pointer p1is set to point to a newly created heap-dynamic
variable
Pointer p1is later set to point to another newly created
heap-dynamic variable, without deallocating the first one
Pointer in C/C++
Extremely flexible but must be used
with care:
Pointers can point at any variable regardless of
when or where it was allocated.
Used for dynamic storage management and
addressing.
Explicit dereferencing () and address-of (&)
operators.
Domain type need not be fixed:
void *can point to any type and can be type checked.
void *cannot be de-referenced.

Pointer arithmetic is possible.


Pointer Arithmetic in C/C+
+
float stuff[100];
float *p;
p = stuff;
*(p+5)is equivalent to stuff[5]and
p[5]
*(p+i)is equivalent to stuff[i]and
p[i]
Reference Types
C++ includes a special kind of pointer type called a
reference type that is used primarily for formal
parameters:
Advantages of both pass-by-reference and pass-by-value.
No arithmetic on references.

Javaextends C++s reference variables and allows


them to replace pointers entirely:
References are handles to objects, rather than being
addresses.

C# includes both the references of Java and the


pointers of C++.
Evaluation of Pointers &
References
Problems due to dangling pointers and memory leaks.
Heap management can be complex and costly.
Pointers are analogous to goto's:
gotos widen the range of statements that can be executed
next.
poitners widen the range of cells that can be accessed by a
variable.

Pointers or references are necessary for dynamic data


structures, so we can't design a language without
them:
pointers are essential for writing device drivers.
references in Java and C# provide some of the capabilities of
pointers, without the hazards.
Chapter 6

Control Flow
Structure
Control Flow
Control flow =the flow of control,
or execution sequence, in a
program.
Levels of control flow:
1. Within expressions.
2. Among program statements.
3. Among program units.
Expressions
Expressions are the fundamental
means of specifying computations in
a programming language:
1. Arithmeticexpressions.
2. Relationalexpressions.
3. Booleanexpressions.
The control flow in expression
evaluation is determined by:
1. The order of operator evaluation:
Associativity;
Precedence.
2. The order of operand evaluation
Arithmetic Expressions
Arithmetic evaluation was one of
the motivations for the
development of the first
programming languages.
Arithmetic expressions consist of:
operators;
unary, binary, ternary.
operands;
parentheses;
function calls;
Arithmetic Expression: Design
Issues
Operator precedence rules?
Operator associatively rules?
Operator overloading?
Order of operand evaluation?
Operand evaluation side effects?
Type mixing in expressions?
Operator Precedence
Rules
The operator precedence rules for
expression evaluation define the order
in which adjacent operators of
different precedence levels are
evaluated.
Typical precedence levels:
parentheses;
unary operators;
** (where supported by the language);
*, /
+,
Operator Associatively
Rules
The operator associatively rules for
expression evaluation define the order in
which adjacent operators with the same
precedence level are evaluated.
Typical associatively rules:
Left to right, except **, which is right to left.
Sometimes unary operators associate right
to left (e.g., FORTRAN).

Precedence and associatively rules can


be overridden with parentheses.
Operator Overloading
Operator overloading =the use of an
operator for more than one purpose.
Some are common (e.g., + for intand float).
Some are potential trouble (e.g., *,&in C and
C++):
Loss of readability.
Lossof compiler error detection:
omission of an operand should be a
detectable error
Can be avoided by introduction of new
symbols:e.g., Pascals div for integer division
Operands Evaluation &
Evaluation Order
1. Variables:
fetch the value from memory.
2. Constants:
sometimes a fetch from memory;
sometimes the constant is in the machine
language instruction.
3. Parenthesized expressions:
evaluate all operands and operators first.
4. Function calls:
potential for side effectsoperand evaluation
order is relevant.
Functional Side Effects
Functional side effects: when a function
changes a two-way parameter or a non-
local variable.
Problem with functional side effects:
When a function referenced in an expression
alters another operand of the expression:

a = 10;
/* assume that fun changes its parameter */
b = a + fun(a);
Functional Side Effects: Possible
Solutions
1. Write the language definition to disallow
functional side effects:
No two-way parameters in functions
No non-local references in functions
Advantage: it works!
Disadvantage: inflexibility of one-way
parameters and lack of non-local references
2. Write the language definition to demand
that operand evaluation order be fixed
Disadvantage: limits some compiler
optimizations
Java requires that operands appear to be
evaluated in left-to-right order
Referential Transparency
Referential Transparency: an expression can be
substituted with its value, without changing the
effects of the program.
Functional side effects violate referential transparency.

Advantages of referential transparency:


Program semantics is much easier to understand.

Programs written in functional programming


languages are referential transparent:
no variables functions cannot have state.
value of function depends only on its parameters and
global constants.
Type Conversions
A narrowing conversion is one that converts an
object to a type that cannot include all of the
values of the original type e.g., float to int.
A widening conversion is one in which an
object is converted to a type that can include
at least approximations to all of the values of
the original type e.g., intto float.
Implicit type conversions i.e. coercions.
Explicit type conversions i.e. casts in C/C+
+/Java:
C: (int)angle
Ada: Float (Sum)
Mixed-Mode Expressions
A mixed-mode expression is one that has
operators with operands of different types.
Type coercions are used in mixed-mode expressions to
convert all operands to the same type.

Disadvantage of coercions:
They decrease the type error detection ability of the
compiler.

Scenarios:
All numeric types are coerced in expressions, using
widening conversions (most languages).
In Ada, there are virtually no coercions in expressions.
Relational Expression
Relational Expressions
Use relational operators and operands of various types.
Evaluate to some Boolean representation.
Always lower precedence than the arithmetic
operators.
Operator symbols used vary somewhat among
languages (!=, /=, .NE., <>, #).

JavaScriptand PHP have two additional relational


operator, ===and !==:
Similar to their cousins, ==and !=, except that they do
not coerce their operands.
Ex: 7 == 7 vs. 7 === 7.
Boolean Expression
Boolean Expressions in
C/C++
C versions prior to C99 have no Boolean type:
use int type with 0 for false and nonzero for true.

Odd characteristic of C/C++ boolean expressions:


arithmetic expressions can be used for Boolean expressions.
a < b < c is a legal expression, but the result is not what you
might expect:
Left operator is evaluated, producing 0 or 1.
The evaluation result is then compared with the third operand.

Disadvantages:
loss in readability.
loss in type error detection
Simple Assignment
Statements
The general syntax:
<target_var> <assign_operator> <expression>
The assignment operator:
=FORTRAN, BASIC, the C-based languages
:=ALGOLs, Pascal, Ada

Operator sign = can be bad


when it is overloaded for the
relational operator for equality
(thats why the C-based languages
use ==as the relational operator)
Statement-Level
Control Structures
Structured Control Flow
A program is called structured if the flow of control is
evident from the syntactic/static structure of the program.
Structured programming allows the programmer to be
able to reason about the behaviour of a program by just
analyzing the program text:
Eliminates some of the complexity that arises when programs
become large.
Common patterns of control flow that are used over and over by the
programmers are integrated in special control statements in the
language:
Selection statements.
Iteration statements.
Statements that support repetition and conditional execution are
calledcontrol statementsorcontrol structures.
Selection Statements

A selection statement provides


the means of choosing between
two or more paths of execution.
Two general categories:
Two-way selectors (if-then-else)
Multiple-way selectors (switch or
case).
If-statement
Syntax:if(Expression)Statement

Semantics: The expression must be of typeboolean.


If it evaluates totrue, the given statement is executed, otherwise not.

Note that there is only one statement. To execute more than one
statement conditionally, ablock statementis to be used.

The Code Conventions by Sun (which belong to our coding standards)


require that a block statement is always to be used, independent from the
number of statements that are to be executed conditionally.

In addition, our coding standard expects following layout:


if(Expression){
Statement1
Statement2
...
}
If-else Statement
Syntax:if(Expression)Statement1elseStatement2
Semantics: The expression must be of typeboolean.
If it evaluates totrue,Statement1is executed, otherwiseStatement2.

Again, blocks are to be used to compound several statements
intoStatement1orStatement2.

Our coding standard requires following layout:


if(Expression){
Statement1a
Statement1b
...
}else{
Statement2a
Statement2b
...
}


If-else-Statement cont
if-elsestatements can be chained in following form:if(Expression1){
Statement1a
Statement1b
...
}elseif(Expression2){
Statement2a
Statement2b
...
}elseif(Expression3){
Statement3a
Statement3b
...
}else{
Statement4a
Statement4b
...
}


The execution evaluates all conditional expressions beginning fromExpression1until the
first expression is found that evaluates totrue. Then the corresponding statement
sequence is executed, or, if none of the expressions evaluated totrue, the statement
sequence of the finalelsepart.
Do-Statement
Syntax:doStatementwhile(Expression);
Semantics: The statement is executed first and then the expression is
evaluated which must be ofbooleantype.
As long the expression remainstrue, the statement is executed
repeatedly.

Note: Thewhilestatement evaluates the condition first and then, as


long it remainstrue, executes the statement repeatedly. In case of
thedostatement the condition is not evaluated before the first
iteration is performed.

Following layout conforms to our coding standard:do{


Statement1
Statement2
...
}while(Expression);
For Statement
Syntax:
for(InitStatement;Expression1;Expression2)Statement
InitStatement,Expression1, andExpression2are optional.
Semantics:
Theforloop comes close to followingwhileconstruct:{
InitStatement;
while(Expression1){
Statement
Expression2;
}
}

Layout according to our coding
standard:for(InitStatement;Expression1;Expression2){
Statement1
Statement2
...
}
While Statement
Syntax:while(Expression)Statement
Semantics: The expression must be of typeboolean.
If it evaluates tofalse, the given statement is skipped,
Otherwise it is executed and afterwards the expression is
evaluated again.
If it is stilltrue, the statement is executed again. This is
continued until the expression evaluates tofalse.

Following layout conforms to our coding
standard:while(Expression){
Statement1
Statement2
...
}
Switch-case Statement
Syntax:switch(Expression)Statement
Semantics: The expression is computed and must of integer type (byte,int, orlong)
or of typechar. If a matchingcaselabel is found withinStatement, the execution
jumps to that point. If not, the execution continues at thedefaultlabel, if
provided. Otherwise,Statementis skipped.
Syntax ofcaselabels:caseConstantExpression:
default:
The labels come in front of a statement and multiple labels are permitted.
Thebreakstatement allows to leave aswitchstatement. Syntax:break;

Layout according to our coding standard:select(Expression){
caseConstant1:
StatementSequence1a
break;

caseConstant2:
caseConstant3:
StatementSequence2
break;

default:
StatementSequence3
break;
Two-Way Selection
Statements
General form:
if control_expression then
clause
else
clause
Nested selectors: which if is paired with the
else?
if (sum == 0)
if (count == 0)
result = 0;
else
result = 1;
Nested Selector
Static semantics rule (C/C++/Java/C#):
else matches with the nearest if.

To force an alternative semantics, compound


statements may be used:
if (sum == 0) {
if (count == 0)
result = 0;
}
else result = 1;
Perlrequires that all then & else clauses to
be compound.
Nested Selector

Statement sequences as clauses: Ruby

if sum == 0 then
if count == 0 then
result = 0
else
result = 1
end
end

Você também pode gostar