Notes on the Foreign Function Interface (ffi) - 22 June 2002 N.B. The Hugs FFI implementation has changed significantly since the December 2001 release. Known limitations: o Only the ccall calling convention is supported. All others are flagged as errors. o foreign export is not implemented o foreign import wrappers are only implemented for the x86, PowerPC and Sparc architectures and has been most thoroughly tested on Windows, Linux and using gcc. It should be easy to port by any experienced assembly language programmer, especially if they first look at rts/Adjustor.c in the GHC source tree. The following information is intended for those brave souls who try to port the implementation to other architectures and can be safely ignored by everyone else. To make foreign import wrappers work for other architectures, you have to modify the function mkThunk in hugs98/src/builtin.c to generate a short sequence of machine code (and then send your fix to hugs-bugs@haskell.org for inclusion in the next release). The goal of the code is (more or less) to implement this C function rty f(ty1 a1, ... tym am) { return (*app)(s,a1, ... am); } where rty, ty1, ... tym are C types, app is a "apply" function generated by running "ffihugs +G" and "s" is a "stable pointer" to the Haskell being wrapped. The reason the function is written in machine code is: o For foreign import wrappers the function has to be generated dynamically and neither ANSI C nor any extensions we know of let you generate C functions at runtime. The alternative of invoking the C compiler and loader at runtime is not attractive. o The code has to be placed next to a data structure in memory. The data structure has this type: struct thunk_data { struct thunk_data* next; struct thunk_data* prev; HugsStablePtr stable; char code[16]; }; The next and prev pointers are used to implement a doubly-linked list used by the garbage collector to keep track of all wrapped functions. The stable pointer stores a stable pointer to the Haskell function being wrapped. This is used by the garbage collector. The code field stores the machine code. It is expected that the size will have to be changed for other architectures. o By writing in assembly/machine code, it is possible to use the same code sequence no matter what the function type is. This works because the C calling convention on most machines has the stack looking something like this (the stack grows downwards in this picture) | ... | +--------+ | argm | +--------+ ... +--------+ | arg2 | +--------+ | arg1 | +--------+ |ret_addr| +--------+ This calling convention is more or less imposed by the need to support vararg functions in C. To implement the above function, all we need to do is adjust the stack to look like this: | ... | +--------+ | argm | +--------+ ... +--------+ | arg2 | +--------+ | arg1 | +--------+ | s | +--------+ |ret_addr| +--------+ and jump to (tailcall) the start of app. On the x86, you can do this with the following code sequence: pushl (%esp) ; move the return address "up" movl s,4(%esp) ; stick the stable pointer "under" it jmp app ; tail call app On the Sparc, alignment restrictions require that we add a doubleword. On architectures with very different architectures, you can (hopefully) get things working by passing the stable pointer in a global variable or, perhaps, a callee-saves register and tweaking the "app" function (which is generated by implementForeignImportWrapper in ffi.c) to expect "s" in that variable instead of on the stack. o It is machine code instead of assembly code because we don't want to invoke an assembler and linker/loader at runtime. Having determined which assembly code sequence to use, use "as -a" (or equivalent) to view the corresponding machine code and then write C code which will insert that code into the code field of a thunk. For the x86, the code looks like this. #if defined(__i386__) /* 3 bytes: pushl (%esp) */ *pc++ = 0xff; *pc++ = 0x34; *pc++ = 0x24; /* 8 bytes: movl s,4(%esp) */ *pc++ = 0xc7; *pc++ = 0x44; *pc++ = 0x24; *pc++ = 0x04; *((HugsStablePtr*)pc)++ = s; /* 5 bytes: jmp app */ *pc++ = 0xe9; *((int*)pc)++ = (char*)app - ((char*)&(thunk->code[16])); #else ... #endif This code contains a copy of the stable pointer because it is convenient to do this on the x86. On architectures such as the Sparc where 32-bit immediate loads are more painful, it may be easier to load the copy of the stable pointer stored in the thunk - this is stored at a fixed offset from the code. Likewise, it may be convenient to add a copy of "app" to the thunk struct.