Você está na página 1de 27

Computational linguistics with the Zen attitude

G erard Huet

Department of Sanskrit Studies


University of Hyderabad November 2013

Purpose of the course

Crash course on functional programming Introduction to the Objective Caml language Hands-on experience with the Zen library Using Zen for Lexicon representation Using Zen for Morpho-phonetics computation Using Zen for Segmentation and Parsing Understanding the Sanskrit Heritage Platform A glimpse of research in relational programming

Prerequisites

Familiarity with a Unix/Linux/OSX workstation Open-mindedness Eagerness No mathematics prerequisite No linguistics prerequisite No programming prerequisite Actually, forget your programming skills! Warning. Objective Caml is addictive.

Various notions of computation

Sequential computation Parallel computation Deterministic computation Non-determinism Distributed computation, networking Client-server, RPC/CGI, Web services

Various notions of (sequential) programming

Imperative programming Applicative programming Functional programming Modularity Object-oriented programming Logic programming Relational programming

Arithmetic computation

5+7 3*(5+7) 3*(5+7)+((5+7)/2) shopkeeper calculator inside-out (bottom-up) no sharing deterministic (unique value) left-to-right or right-to-left

Algebraic computation of expressions


0*(5678*56576777765) 0*x=x outside-in (top-down) pattern-matching left-to-right vs right-to-left order matters for eciency let x=5+7 in 3*x+x/2 sharing the computation of 5+7 x is a bound or local variable 3*x+x/2 x has no meaning outside its scope

Environment of variable declarations

value x=5+7; x is a free or global variable x has meaning in the new environment similar to anuvr . tti! 3*x+x/2;

Algorithms
let f x = 3*x+x/2 in f(5+7); let f = fun x -> 3*x+x/2 in f(5+7); f(5+7); value f = fun x -> 3*x+x/2; f(5+7); value f x = 3*x+x/2; value g y = 3*y+y/2; f(x); f x; g y; x f;

Functional calculus
It is very important to understand the notion of static scoping LISP got it wrong. Perlis law: Someones free variables are someone elses bound variables. The use of concrete strings for names of variables gives rise to possible hiding of global variables bindings by homophonic bound variables (hole in the scope - man kapluti). This can be avoided in the abstract .d .u calculus by the mechanism of de Bruijns indexes. These notions have been thoroughly investigated by logicians since 1940 (under the name of -calculus) and theoretical computer scientists since 1970. We now understand that -calculus is the basic algebra for functional computation. It is the basis of computability theory, the kernel of programming languages, but also of proof theory, and thus is the core of Informatics as the science of Constructive Mathematics. In its untyped version, it is Turing-complete. In its typed versions, it gives constructive alternatives to Set theory, called collectively Type theory, an active research area.

Algorithms vs functions
mathematical functions are input-output pairs algorithms tell how to compute output from input algorithms are constructive functions possibly many algorithms for a given function recipe vs ingredients for a dish maths says: you can get khir from cow and b ad ama but you need to milk the cow to get ks ra, . and to grind the b ad ama to get pis .t . a, then mixing, heating, lots of turning the ladle... Functional programming is designing algorithms Mathematics is useless for cooking as well as programming Programming is more precise and relevant than classical

maths

Basic data types


We saw that type int is predened, with arithmetic

functions +, *, /, etc.
type bool has two values, True and False type unit has a unique value, denoted () type float provides oating point values such as .5 and

functions +. etc
value pi = 4.0 *. atan 1.0; char has the 256 values of a byte such as a (8-bit ASCII) type string provides vectors of chars such as

"Hello world"
"Hello " ^ "world"; equality is dened polymorphically on all base data types NB. We shall use neither oats nor strings in Zen!

Inductive notions
Dening equations for the factorial function:

fact 0 = 1 fact (n+1) = (n+1) * (fact n)


We may now compute:

fact fact fact fact fact 1

5 = 5 * (fact 4) 4 = 4 * (fact 3) 3 = 3 * (fact 2) 2 = 2 * (fact 1) = 1 * (fact 0) = 1 fact 2 = 2 fact 3 = 6 ... fact 5 = 120

Recursive denitions
let rec fact n =

if n = 0 then 1 else n * fact(n-1) in fact 5;


This expression gives rise to the recursive denition:

value rec fact n = if n = 0 then 1 else n * fact(n-1);


Note that the scope of fact covers its body, because of rec Evaluation rule of if e then e1 else e2 Thus op if cant be dened as

fun x y z -> if x then y else z


value rec loop x = loop x; recursive algorithms may not terminate! but functional values are evaluated lazily...

Programming by top-down pattern matching


match bool_exp with [ True -> e1 | False -> e2 ] Now this generalizes the if expression, thus: value rec fact n = match n = 0 with

[ True -> 1 | False -> n * fact(n-1) ];


alternatively, pattern-matching on integers:

value rec fact n = match n with [ 0 -> 1 | n -> n * fact(n-1) ];


or still better, generalizing variables with patterns:

value rec fact = fun [ 0 -> 1 | n -> n * fact(n-1) ];


This gives a specially elegant style of recursive

programming over data structures, directly corresponding to the inductive denition of fact.

Pairing
If a and b are types, one can form the product type (a*b) of pairs of elements (x,y) with x:a and y:b. The two projections are named fst and snd.
(0,True); value fst (x,y) = x

and snd (x,y) = y;


value apply (f,x) = f x; value foo (x,((y,z),u)) = u (x z) (z (y,y u)); value bar (x,((y,z),u)) = u (x z) (z (y,y x)); Exercise. Give the interpretation of * as a propositional

connective
Problem. Ponder the paradoxical type of loop above.

The list datatype


In Ocaml, the list datatype is predened with two constructors, [] for the empty list, and inx :: for the list constructor. The list separator is ;, appending is inx @. Thus: value l1 = [ 2; 7; 32; 1; 0 ]; value l2 = [ 79 :: l1 ]; lists are polymorphic: value l3 = [ "University"; "of"; "Hyderabad" ]; but must be homogeneous: [ True; 0; "foo" ]; Pattern-matching on lists (with _ as catch-all): value rec length = fun [ [] -> 0 | [ _ :: l ] -> 1+length l ]; length l1; An example of functional argument: value rec map f = fun [ [] -> [] | [ x :: l ] -> [ f x :: map f l ] ]; map (fun n -> 2*n+1) l2;

An example of mutual recursion: insertion sort


value rec insert elt lst = match lst with [ [] -> [ elt ] | [ head :: tail ] -> if elt <= head then [ elt :: lst ] else [ head :: insert elt tail ] ] and sort lst = match lst with [ [] -> [] | [ head :: tail ] -> insert head (sort tail) ]; sort l1; sort l3;

Dened recursive types

enumerated types:

type sex = [ Male | Female | Kliba ];


variant types:

type number = [ Int of int | Float of float | Error ];


polymorphic types:

type option a = [ Some of a | None ]; Some 0; Some "Panini";

value optional f = fun [ None -> () | Some x -> f x ];

Binary trees

type bintree = [ Null | Bin of (bintree * bintree) ]; value rec height = fun [ Null -> 0 | Bin (left,right) -> max (height left) (height right) ];

Binary contexts
type binzip = [ Top | Left of (binzip * bintree) | Right of (bintree * binzip) ] ; (* [zip_up : binzip -> bintree -> bintree] *) value rec zip_up z bt = match z with [ Top -> bt | Left (up,br) -> zip_up up (Bin (bt,br)) | Right (bl,up) -> zip_up up (Bin (bl,bt)) ] ;

More data structures

records :

type id_record = { name: string; age: int; sex: sex }; value namo = { name="Modi"; age=63; sex=Male }; namo.name;
extensible records, giving some object-oriented features variant elds, reference types (eects, imperative

programming)
vectors, arrays, streams non-local control structure (exception, raise, try ... with ...) threads

Fundamental properties of ML languages


Memory allocation/deallocation is implicit/invisible, due to

the garbage collector. No explicit pointer manipulation, no aliasing headaches.


Polymorphic type-checking is insured by the type-checker

incorporated in the compiler, that generates for you the most general polymorphic type, relieving the programmer of the burden of giving explicit type specications, while insuring at static time integrity of data-structures (no nil pointer, no illegal operation, no broken pipe, no core dump, no indexing out of bounds).
The top-level interpreter allows easy prototyping and

debugging.
Dont you wish C had all these features ? Other modern programming languages have similar

advantages, such as Haskell

Modules and functors

module specications module declarations parametric modules (functors) separate compilation

Syntax facilities
It is possible to describe formal grammars in Ocaml, and compute along the corresponding abstract syntax, in the manner of yacc. This facility leads to type-safe macro-processing and meta-programming. It may be applied as an Ocaml preprocessor (camlp4), leading to syntactic variations of the language. Beware. Zen uses such a syntax variant. This is triggered by the execution of a prelude to the Ocaml parser, obtained at invocation time by loading an init le in the directory from which ocaml is invoked. See ZEN/ML/.ocamlinit. The online reference manual is at http://caml.inria.fr/. You will have to understand a few minor syntactic discrepancies: let vs value true vs True hd :: tl vs [hd :: tl] pat1 | pat2 vs [pat1 | pat2]

I/O, system calls and libraries

I/O on channels follows Unix (Posix) full library Unix of system calls C linking easy with automatic marshalling of values compact architecture independent fast loading sharing

preserving persistent data structures


optimizing compiler with C-like performance of native code byte-code version allowing top-level loop and debugging many libraries and a rich user community Ocaml is denitely the programming language of choice for

discriminating hackers

Homework for Friday

install ocaml from http://caml.inria.fr or from Madam install ZEN from http://yquem.inria.fr/~huet/ZEN/ or

from Madam
read UoH1.pdf from http://yquem.inria.fr/~huet/UoH/

or from Madam
cwd ZEN/ML ocaml at the interactive loop, type in the examples from the

course slides
read rst 5 pages of ZEN/DOC/zen.pdf

Você também pode gostar