Você está na página 1de 18

C

! Haskell

or Yet Another Interfa ing Tool

Manuel M. T. Chakravarty
S hool of Computer S ien e and Engineering
University of New South Wales, Sydney
hak se.unsw.edu.au
www. se.unsw.edu.au/~ hak/

This paper dis usses a new method for typed fun tional languages to a ess libraries written in a lower-level language. More spe ifi ally, it introdu es an interfa ing tool that eases Haskell a ess to C
libraries. The tool obtains information about the C data type denitions
and fun tion signatures by analysing the C header les of the library.
It uses this information to ompute the missing details in the template
of a Haskell module that implements a Haskell binding to the C library.
Hooks embedded in the binding le signal where, whi h, and how C obje ts are a essed from Haskell. The Haskell ode in the binding le determines Haskell type signatures and marshaling details. The approa h is
lightweight and does not require an extra interfa e des ription language.
Abstra t

1 Introdu tion

The omplexity of modern software environments frequently requires interoperability between software omponents that are oded in dierent programming
languages. Thus, the la k of a ess to libraries and omponents implemented
in another language severely restri ts the s ope of a programming language.
It is, hen e, not surprising that a number of methods have been proposed for
integrating foreign ode into fun tional programs, and in parti ular, for using
imperative ode with Haskell [13approa hes range from simple inline- alls of
C routines [12 to sophisti ated COM support [6,5.
Interfa e generators in these previous approa hes relied on an annotated
interfa e in one of the two languages or used a dedi ated language for the interfa e spe i ation. Another interesting, yet unexplored approa h is the ombined
use of interfa e spe i ations in both languages for expressing omplementary
information. Interestingly, this is espe ially attra tive for the most frequent appli ation of a foreign language interfa e (FLI): a ess to existing libraries implemented in the language C. Interfa es to operating system servi es, as well
as to system-level and higher-level libraries are usually available from C, whi h,
due to its simpli ity, makes adapting the interfa e to another language relatively
? This work was ondu ted, in large part, while working at the Institute of Information

S ien es and Ele troni s, University of Tsukuba, Japan.

easy. The method presented here uses existing C header les in onjun tion with
additional high-level interfa e informationexpressed in Haskell plus a small set
of binding hooks to generate a Haskell binding to a C library.
Experien e with the interfa ing tool !hs shows that the outlined approa h
is pra ti ally feasible. By reading verbatim headers of a C library, !hs has
exa tly the same information as the C ompiler when translating C ode using
the same library. Therefore, it has su ient knowledge to all the library's C
fun tions and to read from or write to omponents of data stru tures; it an
even a ess C stru tures while doing all of the address arithmeti in Haskell.
In summary, the originalfeatures of the presented approa h are the following:
 Tool support based on the simultaneous and omplementary use of an interfa e spe i ation in the foreign and the host language.
 !hs was the rst tool using pristine C header les for Haskell a ess to C
libraries.
 Supports onsisten y he king of the Haskell interfa e spe i ation against
existing headers of the C library.
 Simple binding hooks and lightweight tool supportno omplex interfa e
language is required.
The tool !hs is implemented and available for publi use.
The remainder of the paper is stru tured as follows. Se tion 2 introdu es the
on epts underlying the present approa h and dis usses related work. Se tion 3
details the binding hooks supported by !hs. Se tion 4 des ribes the marshaling
support provided by the library C2HS. Se tion 5 outlines the implementation and
urrent experien e with !hs. Finally, Se tion 6 on ludes.
1

2 Con epts and Related Work

approa hes to language interoperability like IDL/Corba [10 dene


interfa es in a spe ial-purpose interfa e language and treat the interfa ed languages as peers. In ontrast, the present work is optimised for an asymmetri
situation where libraries implemented in a lower-level language ( alled the foreign language) are a essed from a typed, high-level fun tional language ( alled
the host language). In this asymmetri setting, the interfa e information provided by the foreign language library is generally insu ient for determining a
orresponding host language interfa e. Assertions and invariants that are only
informally spe ied in the foreign interfa e will have to be formalised for the
host interfa ea task that learly requires user intervention. A se ond aspe t of
the asymmetry of this situation is that the foreign library in luding its interfa e
spe i ation usually exist before the host language interfa e is implemented; and
furthermore, the foreign interfa e is usuallydeveloped further independent of and
on urrent to the development of the host interfa e. This situation alls for an

Symmetri

1
2

Apart from an early version of GreenCard that was never released.


!hs web page: http://www. se.unsw.edu.au/~ hak/haskell/ 2hs/

approa h where (a) the existing foreign interfa e is reused as far as possible and
(b) the onsisten y between the two interfa es is he ked automati ally.
We a hieve this by employing a tool that uses a foreign and a host language
interfa e in on ertan approa h that, to my knowledge, was not tried before.
Let us all it the dual language approa h (as opposed to the use of one spe ialpurpose interfa e language or a restri tion to the host language).
In omparison to symmetri approa hes, a dual language approa h trades
generality for simpli ity. In parti ular, a symmetri approa h requires an extra language to des ribe interfa es (su h as OMG IDL). From su h a generi
interfa e, a tool generates a host language interfa e, whi h then often requires
another layer of ode in the host language to provide a onvenient interfa e for
the library user. In ontrast, in a dual language approa h where the host interfa e is dire tly expressed in the host language, there is more freedom for the
interfa e designer to dire tly produ e an interfa e suitable for the library user.
It is interesting to see how a dual language approa h ts into the taxonomy given in [6, Se tion 2. This paper makes an argument for adopting a third
language, namely IDL, on the grounds that neither a spe i ation that is ex lusively in the host language (Haskell) nor one that is ex lusively in the foreign language (C) is su ient to determine the omplete interfa e. It negle ts,
however, the possibility of using a host language spe i ation in on ert with
a foreign language spe i ationwhi h is parti ularly appealing if the foreign
language spe i ation does already exist and is maintained by the author of the
library, whi h is usually the ase when interfa ing C libraries.
2.1 A Dual Language Tool for C and Haskell

In the following,we shall on entrate on the spe i ase of using C librariesfrom


Haskell with the tool !hs. This fo us is justied as by its very nature, language
interoperability has to handle a signi ant amount of language-spe i te hni al
detail, whi h makes a language-independent presentation tedious. In addition,
C is urrently the most popular low-level language; hen e, most interesting libraries have a C interfa e. Despite our fo us on Haskell, the dis ussed approa h
is appropriate for any typed fun tional language with basi FFI support.
!hs generates Haskell ode that makes use of the foreign fun tion interfa e (FFI) [15,5 urrently provided by the Glasgow Haskell Compiler (GHC)
and soon to be supported by Hugs. The FFI provides the basi fun tionality
of alling C from Haskell, and vi e versa, as well as the ability to pass primitive
data types, su h as integers, hara ters, oats, and raw addresses. Building on
these fa ilities, !hs automates the re urring tasks of dening FFI signatures of
C fun tions and marshaling user-dened data types between the two languages.
In other words, the full FLI onsists of two layers: (a) basi runtime support for
simple inter-language alls by the Haskell system's FFI and (b) tool support for
the more omplex aspe ts of data marshaling and representation of user-dened
data stru tures by !hs. The ar hite ture of the latter is displayed in Figure 1.
3

In my opinion, GHC's FFI is a good andidate for a standard Haskell FFI.

lib.

lib.a

Lib.a
gh

lib.h

pp

lib.i

C2HS.hs
2hs

Lib. hs

Lib.hs

newtype Window = Window Addr


({#enum GtkWindowType as WindowType {unders oreToCase}#}
windowNew
:: WindowType -> IO Window
windowNew wt =
liftM Window $ {# all gtk_window_new#} ( FromEnum wt)
Figure1.

!hs tool ar hite ture

The C library sour e (les lib.h and lib. ) in the grey box usually exists before
the Haskell binding is implemented and will in most ases be on urrently and
independently developed further. The header, lib.h, of the C library ontains
all C-level interfa e information. It is omplemented by a !hs binding module,
Lib. hs, whi h ontains the Haskell-side interfa e and marshaling instru tions;
the binding module spe ies how the two interfa es inter-operate. The latter is
a hieved by virtue of binding hooks, whi h are expressionsen losed in {#-#} pairs,
and marshaling instru tions, whi h are denoted in plain Haskell. The gure ontains a fragment of binding ode in luding binding hooks that referen e obje ts
dened in the C header le, in this ase GtkWindowType and gtk_window_new.In
the binding module, all but the binding hooks is plain Haskell that either denes
the Haskell interfa e (as for example, the type signature of windowNew) or details
marshaling pro edures (as for example, FromEnum). The latter mostly onsist
of the use of marshaling routines that are predened in the library C2HS.hs.
The interfa e generator, denoted by 2hs, reads the binding module together
with the C header le, whi h it rst pipes through the C pre-pro essor pp.
By exploiting the ross referen es from binding hooks to C obje ts and the
orresponding denitions in the C header le, the interfa e generator repla es
all binding hooks by plain Haskell ode that exer ises the FFI of the underlying
Haskell system. In the gure, the resulting Haskell module is Lib.hs.; it makes
use of the marshaling library C2HS.hs, whi h omes with !hs.

Overall, we expe t the following fun tionality from a tool like !hs:
 Conversion of C enumeration types into Haskell
 Conversion of basi C types into orresponding Haskell types
 Generation of all required FFI de larations from C fun tion prototypes
 Dire t a ess to members of C stru tures from Haskell
 Library support for parameter and result marshaling
 Consisten y he k between the C and the Haskell interfa e
In ontrast, we do not expe t the following two features:
1. Generation of Haskell fun tion signatures from C prototypes
2. Marshaling of ompound C stru tures to Haskell values
On rst sight, it may seem surprising that these two features are not in luded,
but a loser look reveals that they are of se ondary importan e. Although the
rst feature seems very onvenient for a ouple of examples, we generally annot
derive a Haskell signature from a C prototype (a C int may be an Int or Bool
in Haskell). The se ond feature is more generally useful; however, often we do
not really want to marshal entire C stru tures to Haskell, but merely maintain
a pointer to the C stru ture in Haskell. The evaluation of the usefulness of the
se ond feature is an interesting andidate for future work.
In summary, the use of pristine C header les for dening the low-leveldetails
of data representations simplies interfa e spe i ation signi antlyno new
interfa e language needs to be learned and the C header les are always up to
date with the latest version of the C ode, whi h allows the tool to he k the
onsisten y of the C header and Haskell interfa e. This, together with marshaling
spe ied in plain Haskell and utilising a Haskell marshaling library, keeps the
tool simple and exible. The ost is a restri tion of the foreign language to C.
2.2 Related Work

The absolute minimum requirement for an FLI is the ability to all out to C
and pass arguments of primitive type (su h as, integers and hara ters). If the
interfa e is more sophisti ated, it allows all ba ks from C to Haskell and ontains some additional inter-language support from the storage manager [8. As
already stated, throughout this paper, we all su h basi support a foreign fun tion interfa e (FFI) it does not onstitute a full language binding, but allows
merely for basi inter-language fun tion alls. The rst proposal for this kind of
fun tionality in the Haskell ompiler GHC was all [12, whi h made use of
GHC's ability to ompile Haskell to C. Re ently, all was superseded by a new
FFI [15, Se tion 3 that ts better into the base language and is more powerful as
it allows all ba ks and fun tions to be exported and imported dynami ally [5.
Green Card [9 is a tool that implements a full FLI on top of the basi FFI
of some Haskell systems (GHC, Hugs, and NHC). Its input is a spe i ation of
Haskell signatures for foreign pro edures augmented with de larations spe ifying various low-level details (data layout, storage allo ation poli ies, et .) that

annot be inferred from the signature. Its main disadvantage is the on eptual
omplexity of its interfa e spe i ation, whi h arises from the need to invent a
language that spe ies all the low-level information that is not naturally present
in a Haskell signature. As a result, part of the information that, when interfa ing to C, is already ontained in the C header le has to be re- oded in Green
Card's own language and, of ourse, it has to be updated for new versions of
the a essed C library. The goal behind Green Card and !hs is the same; the
essential dieren e in method is that !hs reads the C header le to obtain
the C-side interfa e and that it uses plain Haskell to express marshaling, instead
of spe ialised Data Interfa e S hemes. Interestingly, the initial version of Green
Card analysed C header les instead of using its own language (mu h like the
tool SWIG dis ussed below), but this approa h was later abandoned.
H/Dire t [6 is a ompilerfor Interfa e Denition Languages (IDLs) that generates Haskell bindings from IDL interfa es. H/Dire t an be used to generate a
Haskell binding for a COM omponent or for a onventional C library. In addition, H/Dire t supports the implementation of COM omponents in Haskell [5.
The spe ial appeal of this symmetri approa h is the use of a standardised interfa e language and the ability to mix Haskell ode with ode written in any other
language that has COM IDL supportdue to the proprietary nature of COM,
the latter is urrently restri ted to the Windows platform; this restri tion ould
be lifted by extending H/Dire t to over the open CORBA standard. Together
with the generality, H/Dire t also inherits the disadvantages of a symmetri
approa h: The programmer has to re- ode and regularly update information already ontained in existing C header les of a library; and furthermore, there is
the additional overhead of learning a dedi ated interfa e language.
Two methods were suggested for alleviating these disadvantages: rst, automati generation of an IDL spe i ation from C headers, and se ond, dire t
pro essing of existing C headers by H/Dire t. In both ases, the programmer
has to manually supply additional information. In the rst ase, the programmer post-pro esses the generated IDL spe i ation, and in the se ond ase, the
programmer supplies an additional le that ontains annotations to the plain C
de larations. The main on eptual dieren e between these methods and !hs
is that H/Dire t generates a xed Haskell interfa e from the input, whereas
!hs allows the programmer to determine the Haskell interfa e. For simple
interfa es, the xed output may be su ient, but for more ompli ated interfa es (like GTK+ [7,3, see the marshaling in Se tion 4.1), H/Dire t's approa h
requires another layer of Haskell ode to provide a suitable interfa e for the user.
A se ond dieren e is that H/Dire t marshals entire C stru tures to Haskell,
whereas !hs allows a ess to individual elds of C stru tures without marshaling the whole stru ture. Again, for simple stru tures like time des riptors or
geometri gures it is usually more onvenient to marshal them in their entirety,
but in the ase of ompli ated stru tures like widget representations individual
a ess to stru ture members is preferable (see Se tion 3.5).
Finally, SWIG [1 should be mentionedalthough, there is no Haskell support at the time of writing. SWIG generates a binding to C or C++ ode from an

annotated C header le (or C++ lass denition). SWIG works well for untyped
s ripting languages, su h as T l, Python, Perl, and S heme, or C-like languages,
su h as Java, but the problem with typed fun tional languages is that the information in the C header le is usually not su ient for determining the interfa e
on the fun tional-language side. As a result, additional information has to be
in luded into the C header le, whi h leads to maintenan e overhead when new
versions of an interfa ed C library appear. This is in ontrast to the use of pristine C header les omplemented by a separate high-level interfa e spe i ation
as favoured in !hs.
The above dis ussion largely entres around the various FLIs availablefor the
Glasgow Haskell Compiler. This is not to say that other Haskell implementations
do not have good FLI support, but GHC seems to have enjoyed the largestvariety
of FLIs. More importantly, there does not seem to be any approa h to FLI design
in any of the other systems that was not also tried in the ontext of GHC. The
same holds for other fun tional languages (whose in lusion was prevented by
spa e onsiderations).
3 Interfa e Spe i ation

As dis ussed in Se tion 2.1, binding hooks establish the link between obje ts
dened in the C header and denitions in the Haskell binding module (i.e.,
the . hs le). The tool !hs repla es these binding hooks by interfa ing ode
omputed from the C de larations. Binding hooks are en losed between {# and
#} and start with a keyword determining the type of hook, of whi h there are
the following six,
 ontext: Spe ies binding information used throughout the module
 type: Computes the Haskell representation of a C type
 enum: Maps a C enumeration to a orresponding Haskell data type denition
 all: Calls out to a C fun tion
 get and set: Allows to read and write a omponents of C stru tures
A ontext hook, if present, has to be the rst hook o urring in a module.
The following subse tions dis uss the stru ture and usage of binding hooks;
Appendix A ontains a formal denition of their grammar.
3.1 Context hooks

A ontext hook may dene the name of a dynami library that has to be loaded
before any of the external C fun tions may be invoked and it may dene a prex
to use on all C identiers. For example,
{# ontext lib="gtk" prefix="gtk"#}

states that the dynami library alled gtk (e.g., libgtk.so on ELF based Linux
systems) should be loaded and that the prex gtk may be safely omitted from

C identiers. All identiers in the GTK+ library start with gtk or Gtkin a
kind of poor man's attempt at module name spa es in C. The above prex
de laration allows us to refer to these identiers while omitting the prex; so,
we an write WindowType instead GtkWindowType. Mat hing the prex is ase
insensitive and any unders ore hara ters between the prex and the stem of
the identier are also removed, so that we an also use new_window instead of
gtk_new_window. Where this leads to ambiguity, the full C name an still be
used and has priority. To simplify the presentation, the following examples do
not make use of the prex feature.
3.2 Type hooks

! 's marshaling library denes Haskell ounterparts for the various primitive
C types. A type hook, given a C type name, is expanded by !hs into the
appropriate Haskell type. For example, in
hs

type GInt = {#type gint#}


type GFloat = {#type gfloat#}

the type gint may be dened to represent int or long int in the C header
le; the hook {#type gint#} is then repla ed by CInt or CLInt, respe tively,
whi h will have the same representation as a value expe ted or returned by a C
fun tion using the orresponding C type.
3.3 Enumeration hooks

An enumeration hook onverts a C enum de laration into a Haskell data type. It


ontains the name of the de laration in C and, optionally, a dierent name for
the Haskell de laration. Furthermore, a translation table for mapping the names
of onstru tors may be dened. For example, given the C de laration
typedef enum
{
GTK_WINDOW_TOPLEVEL,
GTK_WINDOW_DIALOG,
GTK_WINDOW_POPUP
} GtkWindowType;

and the hook

{#enum GtkWindowType as WindowType {unders oreToCase}#}

! generates

hs

data WindowType = GtkWindowToplevel


| GtkWindowDialog
| GtkWindowPopup
deriving (Enum)

The C identier mentioned in the hook an be a type name referring to the


enumeration, as in the example, or it an be the tag of the enum de laration itself. Optionally, it is possible to give the Haskell denition a dierent
name than the C typein the example, WindowType. The last argument of
the hook, en losed in bra es, is alled a translation table. When it ontains the
item unders oreToCase, it spe ies that C's ommon this_is_an_identifier
(or A_MACRO) notation is to be repla ed in Haskell by ThisIsAnIdentifier (or
AMa ro). Whether or not unders oreToCase is used, expli it translations from C
into Haskell names an be spe ied in the form name as HSName and always
take priority over the unders oreToCase translation rule.
In the above example, the values assigned by C and Haskell to the orresponding enumerators are the same. As C allows us to expli itly dene the
values of an enumerator, whenever any of the values is expli itly given, !hs
generates a ustomised Enum instan e for the new data type, instead of using
a derived instan e. This guarantees that, whenever the C library is updated,
re-generating the binding with !hs will pi k up all hanges in enumerations.
C libraries o asionally dene enumerations by a set of ma ro pre-pro essor
#define statements, instead of using an enum de laration. !hs also provides
support for su h enumerations. For example, given
4

#define GL_CLAMP
0x2900
#define GL_REPEAT
0x2901
#define GL_CLAMP_TO_EDGE 0x812F

from the OpenGL header, we an use a hook like


{#enum define Wrapping {GL_CLAMP
as Clamp,
GL_CLAMP_TO_EDGE as ClampToEdge,
GL_REPEAT
as Repeat}#}

to generate a orresponding Haskell data type denition in luding an expli it


Enum lass instan e, whi h asso iates the spe ied enumeration values. !hs
implements this variant of the enumeration hooks by generating a C enum de laration of the form
enum Wrapping
Clamp
ClampToEdge
Repeat

{
= GL_CLAMP,
= GL_CLAMP_TO_EDGE,
= GL_REPEAT};

and then pro essing it as any other enumeration hookin luding pre-pro essing
the denition with the C pre-pro essor to resolve the ma ro denitions.
3.4 Call hooks

Call hooks spe ify alls to C fun tions. For example,


4

We understand the value of an enumerator in Haskell to be the integer value asso iated with it by virtue of the Enum lass's fromEnum method.

{# all gtk_radio_button_new#}

alls the fun tion gtk_radio_button_new. GHC's FFI requires that ea h external fun tion has a foreign de laration. This de laration is automati ally added
to the module by !hs. If the fun tion is dened as
GtkWidget* gtk_radio_button_new (GSList *group);

in the C header le, !hs produ es the following de laration:


foreign import all "libgtk.so" "gtk_radio_button_new"
gtk_radio_button_new :: Addr -> IO Addr

We assume here that the all is in the same module as our previous example
of a ontext hook; therefore, the dynami library libgtk.so (assuming that
we are ompiling for an ELF-based Linux system or Solaris) is added to the
de laration. In this de laration, the identier ex losed in quotes spe ies the
name of the C fun tion and the following identier is the bound Haskell name.
By default they are equal, but optionally, an alternative name an be given for
the Haskell obje t (this is ne essary when, e.g., the C name would not onstitute
a legal fun tion identier in Haskell). Moreover, !hs infers the type used in
the foreign de laration from the fun tion prototype in the C header le. As the
argument and result of gtk_radio_button_new are pointers, Addr is used on
the Haskell side. So, there is learly some more marshaling required; we shall
ome ba k to this point in Subse tion 4.
By default, the result is returned in the IO monad, as the C fun tion may
have side ee ts. If this is not the ase, the attribute fun an be given in the all
hook. For example, using {# all fun sin#}, we get the following de laration:
foreign import all sin :: Float -> Float

Furthermore, the attribute unsafe an be added to C routines that annot reenter the Haskell runtime system via a all ba k; this orresponds to the same
ag in GHC's FFI.
5

3.5 Get and set hooks

Get and set hooks are related and have, apart from the dierent keyword, the
same syntax. They allow reading from and writing to the members of C stru tures. The a essed member is sele ted by the same a ess path expression that
would be used in C: It onsists of the name of a stru ture followed by a series
of stru ture referen e (s.m or s ->m ) and indire tion (*e ) operations. Given a
C header ontaining
5

The attribute is alled unsafe, as the all


s heme is taken from GHC's FFI.

does not have to play safe.

This naming

typedef stru t _GtkAdjustment GtkAdjustment;


stru t _GtkAdjustment {
GtkData data;
gfloat lower;
... /* rest omitted */
};

the binding le might, for example, ontain a hook


{#get GtkAdjustment.lower#}

By reading the C header, !hs has omplete information about the layout
of stru ture elds, and thus, there is no need to make a foreign fun tion all
to a ess stru ture eldsthe ne essary address arithmeti an be performed
in Haskell. This an signi antly speed up a ess ompared to FLIs where the
information from the C header le is not available. As with enumeration hooks,
it is su ient to re-run !hs to adjust the address arithmeti of get and set
hooks when the C library is updated.
Get and set hooks expand to fun tions of type Addr -> IO res and Addr
-> res -> IO (), respe tively, where res is the primitive type omputed by
!hs for the a essed stru ture member from its denition in the C header le.
Marshaling between this primitive type and other Haskell types is dis ussed in
the next subse tion.
!hs allows for some exibility in the way a hook may refer to a stru ture
denition. Above, we used a type name asso iated with the stru ture via a C
typedef de laration; but we ould also have used the name of the tag of the
stru ture de laration, as in {#get _GtkAdjustment.lower#}. Finally, if there
had been a type name for a pointer to the _GtkAdjustment stru ture, we ould
also have used that. This exibility is important, as C libraries adopt varying
onventions as to how they dene stru tures and we want to avoid editing the
C header to in lude a denition that an be used by !hs.
As alreadymentionedin Se tion 2.2, for omplexdata stru tures (likeGTK+'s
widgets), it is often preferable to a ess individual stru ture members instead
of marshaling omplete stru tures. For example, in the ase of GtkAdjustment,
only a ouple of s alar members (su h as lower) are of interest to the appli ation
program, but the stru ture itself is rather large and part of an even larger linked
widget tree.
4 The Marshaling Library

When using all, set, and get hooks, the argument and result types are those
primitive types that are dire tly supported by the FFIe.g., in the example
where we alled gtk_radio_button_new, the result of the all was Addr; although, a ording to the C header (repeated here),
GtkWidget* gtk_radio_button_new (GSList *group);

the fun tion returns a GtkWidget*. There is obviously a gap that has to be lled.
It is the task of the library C2HS, whi h ontains routines that handle storage
allo ation, onvert Haskell lists into C arrays, handle in-out parameters, and
so on. However, the library provides only the basi blo ks, whi h have to be
omposed by the programmer to mat h both the requirements spe ied in the C
API and the ne essities of the Haskell interfa e. The library overs the standard
ases; when marshaling gets more omplex, the programmer may have to dene
some additional routines. This is not unlike the pre-dened and the user-dened
data interfa e s hemes of Green Card [9, but entirely oded in plain Haskell.
4.1 Library-spe i Marshaling

In the ase of the GTK+ library, a radio button has the C type GtkRadioButton,
whi h in GTK+'s widget hierar hy is an (indire t) instan e of GtkWidget. Nevertheless, the C header le says that gtk_radio_button_new returns a pointer
to GtkWidget, not GtkRadioButton. This is perfe tly ok, as GTK+ implements
widget instan es by means of C stru tures where the initial elds are identi al
(i.e., the C pointer is the same; it is only a matter of how many elds are a essible). There is, however, no way to represent this in Haskell. Therefore, the
Haskell interfa e denes
newtype RadioButton = RadioButton Addr

and uses type lasses to represent the widget hierar hy. As a result, the marshaling of the result of gtk_radio_button_new has to be expli itly spe ied.
Moreover (for reasons rooted in GTK+'s spe i ation), the argument of type
GSList *group is of no use in the Haskell interfa e. Overall, we use the following denition in the binding le:
radioButtonNew :: IO RadioButton
radioButtonNew =
liftM RadioButton $ {# all gtk_radio_button_new#} nullAddr

The fun tion liftM is part of Haskell's standard library; it applies a fun tion to
the result in a monad. The onstant nullAddr is part of the FFI library Addr
and ontains a null pointer.
The important point to noti e here is that omplex libraries are built around
onventions that usually are only informally spe ied in the API do umentation
and that are not at all ree ted in the formal C interfa e spe i ation. No tool
an free the interfa e programmer from the burden of designing appropriate
marshaling routines in these ases; moreover, an elegant mapping of these API
onstraints into the ompletely dierent type system of Haskell an be the most
hallenging part of the whole implementation of a Haskell binding. The design
de ision made for !hs at this point is to denote all marshaling in Haskell, so
that the programmer has the full expressive power and abstra tion fa ilities of
Haskell at hand to solve this task.

4.2 Standard Marshaling

The library C2HS, whi h omes with !hs, provides a range of routines, whi h
over the ommon marshaling requirements, su h as bit-size adjustment of primitive types, marshaling of lists, handling of in/out parameters, and ommon ex eption handling. Unfortunately, a omplete dis ussion of the library is out of
the s ope of this paper; thus, we will only have a look at two typi al examples.
Conversion of primitive types. For ea h primitive Haskell type (like Int,
Bool, et .), C2HS provides a onversion lass (IntConv, BoolConv, et .), whi h
maps the Haskell representation to one of possibly many C representations and
vi e versa.
For example, in the ase of the get hook applied to the stru t GtkAdjustment
in Subse tion 3.5, we have to provide a pointer to a GtkAdjustment widget
stru ture as an argument to the get hook and marshal the resulting value of
C type gfloat to Haskell. We implement the latter using the member fun tion
ToFloat from the lass FloatConv.
newtype Adjustment = Adjustment Addr
adjustmentGetLower :: Adjustment -> IO Float
adjustmentGetLower (Adjustment adj) =
liftM ToFloat $ {#get GtkAdjustment.lower#} adj

The intera tion between the interfa e generator and Haskell's overloading me hanism is ru ial here. As explained in Subse tion 3.5, the get hook will expand to
a fun tion of type Addr -> IO res , where res is the Haskell type orresponding
to the on rete type of the C typedef gfloatas omputed by the interfa e
generator from the C header. For the overloaded fun tion ToFloat, the Haskell
ompiler will sele t the instan e mat hing res -> Float. In other words, every
instan e of FloatConv, for a type , provides marshaling routines between and
Float. This allows us to write generi marshaling ode without exa t knowledge
of the types inferred by !hs from the C header les. This is of parti ular
importan e for integer types, whi h ome in avours of varying bit size.
Compound stru tures. GtkEntry is again a widget (a one line text eld that
an be edited), and the routine
t

void gtk_entry_set_text (
GtkEntry *entry,
onst g har
*text);

requires in its se ond argument marshaling of a String from Haskell to C (there


is no dire t support for passing lists in GHC's FFI). C2HS helps here by providing
support for storage allo ation and representation onversion for passing lists
of values between Haskell and C. The lasses ToAddr and FromAddr ontain
methods to onvert Haskellstru tures to addresses referen inga C representation
of the given stru ture. In parti ular, stdAddr onverts ea h type for whi h there
is an instan e of ToAddr into the type's C representation.

newtype Entry = Entry Addr


entrySetText :: Entry -> String -> IO ()
entrySetText (Entry ety) text =
{# all gtk_entry_set_text unsafe#} ety `marsh1_`
(stdAddr text :> free)

Ea h member of the family of fun tions marshn marshals n arguments from


Haskell to C and ba k. The onversion to C is spe ied to the left of :> and the
reverse dire tion to its right. The routine free simply deallo ates the memory
area used for marshaling. The marshn _ variants of these fun tions dis ard the
values returned by the C routines. In addition to marshaling strings to and from
C, these routines an generally be used to handle in/out arguments.

5 Implementation and Appli ation of hs

The interfa e generator !hs is already implemented and publi ly available


(the link was given in Se tion 1). The following provides a rough overview over
the urrent implementation and reports on rst experien es with the approa h
to interfa ing des ribed in this paper.
5.1 Implementation

The interfa e generator is entirely implemented in Haskell and based on the


Compiler Toolkit [2. It makes heavy use of the toolkit's self-optimising parser
and lexer libraries [14,4; in parti ular, a full lexer and parser for C header
les is in luded. The Haskell binding modules are, however, not fully analysed.
The lexer makes use of the lexer library's meta a tions to distinguish whether it
reads hara ters belonging to standard Haskell ode or to a binding hook. Haskell
ode is simply olle ted for subsequent opying into the generated plain Haskell
module, whereas binding hooks are fully de omposed and parsed a ording to
the rules given in Appendix A.
After the header and the binding module have been read, !hs onverts all
binding hooks into plain Haskell ode, and nally, outputs the resulting Haskell
module. During expansion of the hooks, the denitions in the C header le
referen ed by binding hooks are analysed as far as this is required to produ e
binding odehowever, in general, the tool does not re ognise all errors in C
denitions and does not analyse denitions that are not dire tly referred to in
some binding hooks; thus, the header le should already have been he ked for
errors by ompiling the C library with a standard C ompiler (if, however, errors
are dete ted by the binding tool, they are properly reported). This lazy strategy
of analysing the C denitions makes a lot of sense when onsidering that a prepro essed C header le in ludes the denitions of all headers that it dire tly or
indire tly in ludesin the ase of the main header gtk.h of the GTK+ library,
the C pre-pro essor generates a le of over 230kB (this, however, ontains a
signi ant amount of white spa e).

The analysis performed on C de larations is standard in the sense that it


is a subset of the semanti analysis performed in a normal C ompiler. Hen e,
a detailed dis ussion would not lead to any new insights. Details of how this
information is used to expand the various forms of binding hooks, while interesting, would ex eed the spa e limitations pla ed on this paper. However, !hs's
sour e ode and do umentation is freely available and onstitutes the ultimate
referen e for all questions about the implementation.
5.2 Appli ation

The idea for !hs arose in the ontext of the implementation of a Haskell
binding [3 for the GTK+ graphi al user interfa e toolkit [7,11. Naturally, the
GUI toolkit is an important appli ation of the binding generator. The Haskell
binding of GTK+ was originally oded dire tly on top of GHC's new FFI and is
urrently rewritten to use !hs. The resulting ode is more ompa t and ross
he king onsisten y with the C headers is signi antly improved by !hs.
The libraries of the Gnome [11 desktop proje t in lude a C library implementing the HTTP 1.1 proto ol, alled ghttp. A Haskell binding for ghttp was
implemented as a rst appli ation of !hs to a library, whi h is stru tured
dierently than GTK+. The library is relatively small with four enumeration
types, one stru ture de laration, and 24 fun tions that have to be interfa ed.
The Haskell binding module Ghttp is 153 lines (this ex ludes empty lines and
lines ontaining only omments) and is expanded by !hs to a 276 line plain
Haskell module. The latter is almost exa tly the ode that would have been written manuallyby a programmerusing GHC's FFI. Thus, the use of !hs redu ed
the oding eort, in terms of lines of ode, by 45% (assuming that even when
the binding had been oded manually, the marshaling library C2HS would have
been available). Judging from the experien e with GTK+, the amount of saved
work is, however, smaller when the library and its interfa e is more omplex,
be ause there is more library-spe i marshaling required.
6 Con lusions

In many respe ts, !hs builds on the lessons learned from Green Card. It avoids
the omplexity of a new interfa e spe i ation language by re-using existing C
interfa e spe i ations and by repla ing data interfa e s hemata with marshaling oded in plain Haskell. The latter is simplied by providing a omprehensive
marshaling library that overs ommon marshaling situations. Green Card pioneered many of the basi on epts of C-with-Haskell interfa ing and !hs
denitely proted from this.
!hs demonstrates the viability of dual language tools, i.e., it demonstrates
that interfa e spe i ations in the two languages on erned an be jointly used
to bridge the gap between languages as dierent as C and Haskell. The advantages of this approa h are that the binding hooks ne essary to ross-referen e
omplementary denitions in the two interfa es are signi antly simpler than

dedi ated interfa e languages and existing library interfa es an be reused in


their pristine form. The latter saves work and allows onsisten y he ks between
the two interfa esthis is parti ularly important when the interfa ed library already exists and is independently developed further. H/Dire t's re ent support
for C headers is another indi ation for the attra tiveness of this approa h.
!hs has so far proved valuable in developing a Haskell binding to the
GTK+/Gnome libraries [11,3. More experien e is, however, required for a thorough evaluation.
In my experien e, GHC's new FFI provides a very ni e basi interfa e to
foreign fun tions in Haskell. Thus, I would highly re ommend its in lusion into
the next Haskellstandard. After all, Haskell'svalue as a generalpurpose language
is severely limited without good foreign language supportsu h an important
aspe t of the language should denitely be standardised!
Future Work. The fun tionality of !hs was largely motivated by the requirements of GTK+. As the latter is a large and omplex system, it is to be expe ted
that most of the interesting problems in a binding are en ountered throughout
the implementation of a Haskell binding for GTK+. However, the onventions
used in dierent C libraries an vary signi antly, so further extensions may
be ome attra tive with added experien e; furthermore, !hs allows the programmer dire t a ess to the FFI of the Haskell system, where this seems more
appropriate or where additional fun tionality is required. In fa t, there are already a ouple of areas in whi h extensions seem desirable: (1) support for a essing global C variables is needed; (2) the tool should help generating the signatures for all ba k routines; (3) sometimes the marshaling ode for fun tions
might be generated automati ally; (4) better type safety for address arguments
and results; and (5) marshaling of omplete stru tures, as in H/Dire t, is sometimes onvenient and urrently has to be done in a mixture of set/get hooks and
dedi ated Haskell ode.
Regarding Point (3), for fun tions with simple signatures, the marshaling
ode is often obvious and ould be generated automati ally. This would make
the ode a bit more on ise and easier to maintain. Regarding Point (4), all
pointer arguments of C fun tions are mapped to type Addr in Haskell, whi h
makes it impossible for the Haskell ompiler to re ognise errors, su h as, ex hanged arguments. It would be interesting to use a variant of Addr that gets
an additional type argument, namely, the name of the type referred to by the
address. Even for abstra t types, a type tag an be generated using a Haskell
newtype de laration. This would allow !hs to generate dierent instan es of
the parametrised Addr type for dierent C types, whi h would probably signi antly enhan e the onsisten y he ks between the C and the Haskell interfa e.
A knowledgements. I am grateful to Mi hael Hobbs for our dis ussions about
the GTK+ binding; they were part of the motivation for starting to think about
!hs. Furthermore, I like to thank Gabriele Keller, Sven Panne, Alastair Reid,
Mi hael Weber, and the anonymous referees for their helpful omments and
suggestions.

Referen es
1. David M. Beazley. SWIG and automated C/C++ s ripting. Dr. Dobb's Journal,
February 1998.
2. Manuel M. T. Chakravarty. A ompiler toolkit in Haskell. http://www. se.
unsw.edu.au/~ hak/haskell/ tk/, 1999.
3. Manuel M. T. Chakravarty. A GTK+ binding for Haskell. http://www. se.unsw.
.edu.au/~ hak/haskell/gtk/, 1999.
4. Manuel M. T. Chakravarty. Lazy lexing is fast. In Aart Middeldorp and Taisuke
Sato, editors, Pro eedings of the 4th Fuji International Symposium on Fun tional
and Logi Programming, Le ture Notes in Computer S ien e. Springer-Verlag,
1999.
5. Sigbjorn Finne, Daan Leijen, Erik Meijer, and Simon Peyton Jones. Calling hell
from heaven and heaven from hell. In Pro eedings of the ACM SIGPLAN International Conferen e on Fun tional Programming. ACM Press, 1999.
6. Sigbjorn Finne, Daan Leijen, Erik Meijer, and Simon L. Peyton Jones. H/Dire t: A
binary foreign language interfa e for Haskell. In Pro eedings of the ACM SIGPLAN
International Conferen e on Fun tional Programming (ICFP'98), pages 153162.
ACM Press, 1998.
7. Eri Harlow. Developing Linux Appli ations with GTK+ and GDK. New Riders
Publishing, 1999.
8. Simon Peyton Jones, Simon Marlow, and Conal Elliott. Stret hing the storage
manager: Weak pointers and stables names in Haskell. In Pro eedings of the International Conferen e on Fun tional Programming, 1999.
9. T. Nordin, Simon L. Peyton Jones, and Alastair Reid. Green Card: a foreignlanguage interfa e for Haskell. In Pro eedings of the Haskell Workshop, 1997.
10. The ommon obje t request broker: Ar hi te ture and spe i ation, rev. 2.2. Te hni al report, Obje t Management Group, Framingham, MA, 1998.
11. Havo Pennington. GTK+/Gnome Appli ation Development. New Riders Publishing, 1999.
12. Simon L. Peyton Jones and Philip Wadler. Imperative fun tional programming.
In ACM Symposium on Prin iples of Programming Languages, pages 7184. ACM
Press, 1993.
13. Haskell 98: A non-stri t, purely fun tional language. http://haskell.org/definition/, February 1999.
14. S. D. Swierstra and L. Dupon heel. Deterministi , error- orre ting ombinator
parsers. In John Laun hbury, Erik Meijer, and Tim Sheard, editors, Advan ed
Fun tional Programming, volume 1129 of Le ture Notes in Computer S ien e, pages
184207. Springer-Verlag, 1996.
15. The Haskell FFI Team. A primitive foreign fun tion interfa e. http://www.d s.
gla.a .uk/fp/software/hdire t/ffi.html, 1998.

A The Grammar of Binding Hooks

The grammar of binding hooks appearing in Haskell binding modules is formally


dened in Figure 2. Here string denotes a string literal and ident a Haskellstyle variable or onstru tor identier (this lexi ally in ludes C identiers). If
unders oreToCase o urs in a translation table, it must be the rst entry.

hook
inner

! {#
#}
! ontext
j type
j enum
j all [fun [unsafe
j get
j set
! [lib =
[prefix =
!
[as

!
j *
j
.
j
->
!{
,
1 ,
n }
! unders oreToCase
j
as
inner

txopts

ident

idalias trans

idalias

apath

apath

txopts
idalias
apath

string

string

ident

ident

ident

apath

apath

apath

trans
alias

alias

ident

ident

ident
:::

ident

Figure2.

alias

(binding hook)
(set ontext)
(type name)
(map enumeration)
( all a C fun tion)
(read a stru ture member)
(write to a stru ture member)
( ontext options)
(possibly renamed identier)
(a ess path identier)
(dereferen ing)
(member sele tion)
(indire t member sele tion)
(translation table, n  0)
(standard mapping)
(asso iate two identiers)

Grammar of binding hooks.

Generally, it should be noted that in the ase of an enumeration hook, the


referen ed C obje t may either be an enum tag or a type name asso iated with
an enumeration type using a typedef de laration. Similarly, in the ase of a
set/get hook, the name of the C obje t that is rst in the a ess path may be a
stru t or union tag or a type name asso iated with a stru ture type via a typedef
de laration; a pointer to a stru ture type is also admitted. All other identiers
in an a ess path need to be a member of the stru ture a essed at that level.
A type hook always referen es a C type name.

Você também pode gostar