[erlang-questions] Erlang FFI: 1st discussion summary

Thu Sep 13 16:46:48 CEST 2007

Hello,

in order to make it easier to follow the Erlang FFI (Foreign
Function Interface) proposal, here's a summary of the discussion so
far, and a set of possible solutions for the issues being raised.  I
could send other summaries like this, if/when needed.

The initial Erlang FFI proposal is still available in [0].  It was
accepted as EEP-0007 [1], but it has been frozen until the
discussion on this mailing list settles.  A link to the first thread
of the discussion itself is in [2].

Here's an index of the issues and suggestions that have appeared so
far (I'm reordering them for ease of answer):

    1. Raimo Niskanen proposed a type-tagged interface for the FFI
       calls: call arguments and return values could be tuples in
       the form {type_tag(), Value} [3];

    2. Claes Wikström and Vlad Dumitrescu wondered how C pointers
       (to buffers or strings) returned by FFI calls could be turned
       into Erlang binaries;

    3. when C buffers are turned into Erlang binaries, binary
       matching could be used e.g. to extract C struct fields.  But
       this could require knowing the sizes of the C types on the
       system in use.

Here are the proposals for addressing these issues (based on the
ideas gathered so far).

========================
1. Type-tagged FFI calls
========================

Type tags are extremely useful and can increase call safety, but
they can also kill FFI performance (see the final note in [4]).  A
good compromise would be to leave the lower-level FFI BIFs without
tags, and implement type tags handling in Erlang (i.e. in a ffi.erl
module).  Raimo agreed with this idea.

The old untagged ffi:call/3 and ffi:call/2 BIFs could be kept with a
different name (proposal: ffi:raw_call/3 and ffi:raw_call/2).  The
higher-level, type-tagged interface for FFI calls could be:

    ffi:call(Port, {ReturnType, Function}, [TaggedVal]) ->
            {ReturnType,term()}
        ReturnType = type_tag()
        Function = string() | atom()
        TaggedVal  = {type_tag(), Val}
        Val        = term()
        type_tag() = uchar|schar|...|pointer|size_t|ssize_t

It checks whether the required C function was preloaded with
erl_ddll:load_library/3.  Then, two alternatives arise:

    a. if the C function was preloaded, its signature is compared
       with the type tags.  If they match, a raw FFI call is
       performed (with ffi:raw_call/2); otherwise, a badarg
       exception is raised;

    b. if the C function was *not* preloaded, the type tags will be
       ignored and a raw FFI call will be performed (with
       ffi:raw_call/3).

In both cases, the raw FFI call return value will be returned as a
{ReturnType, RawReturnValue} tuple.

    --------------------------------------------------
    1.1. Getting information about preloaded functions
    --------------------------------------------------

    The proposed high-level ffi:call/3 would need information about
    functions and FFI signatures preloaded with
    erl_ddll:load_library/3.  This information could be useful for
    developers, too (e.g. for debugging pourposes).  For these
    reasons, the erl_ddll:info/2 BIF could be extended with a
    'preloads' argument, that would return a list of preloaded
    functions, signatures etc.  This information could be obtained
    via erl_ddll:info/1 and erl_ddll:info/0 as well.

======================================================
2. Creating Erlang binaries from C strings and buffers
======================================================

The first proposal on this issue [5] can be revised considering type
tagging.  A new 'cstring' type atom/tag can be introduced, in order
to distinguish NULL-terminated C strings from generic 'pointer's to
byte buffers.  Two functions could be used for turning them into
Erlang binaries:

    ffi:cstring_to_binary(TaggedCString) -> binary()
        TaggedCString = {cstring, CStringPtr}
        CStringPtr = integer()

        Return a new binary with a copy of the given NULL-terminated
        C string (including the trailing \0);

    ffi:buffer_to_binary(TaggedPointer, Size) -> binary()
        TaggedPointer = {pointer, Ptr}
        Ptr = integer()

        Return a new binary filled with a copy of Size bytes read
        from the given C pointer.

These two functions would have, as seen in the previous section,
their type-untagged equivalents: ffi:raw_cstring_to_binary/1 and
ffi:raw_buffer_to_binary/2.

==================================
3. Determining the size of C types
==================================

The sizes of C types could be determined in run-time with a new
ffi:sizeof/1 BIF (initially proposed in [6]):

      * ffi:sizeof(CType) -> integer()
            CType = type_tag()

            Return the number of bytes used by CType on the current
            platform.

Type size information should, in general, *not* be hardcoded,
because it may change when running the same BEAM files on different
architectures.  The BIF above is the recommended way for getting
type sizes when writing portable code.

However, when the FFI-based code is *not* expected to be portable
without recompilation, the size of C types remains constant and
could be determined when the Erlang/OTP sources are compiled.  Thus,
this information could be stored in a .hrl file.  Developers could
-include_lib("kernel/include/ffi_hardcodes.hrl") [7] and obtain a
set of faster and easier-to-use macros, for each supported FFI type:

    FFI_HARDCODED_SIZEOF_<TYPE>
        The type size in bytes

    FFI_HARDCODED_<TYPE>_BITS
        The type size in bits

The size in bits is precomputed in order to simplify binary
matching, since expressions like (?FFI_HARDCODED_SIZEOF_LONG * 8)
are not allowed in patterns.

===========================
4. Other minor enhancements
===========================

The following enhancements have never been discussed so far, but
they are very small and extremely trivial to implement:

    * a new erl_ddll:load_library/2 function could be added, that
      can be used instead of calling erl_ddll:load_library/3 with an
      empty list of options (i.e. when no preloads are requested);

    * when used with a library instead of a linked-in driver,
      erlang:open_port/2 calls are quite noisy (the 'spawn' and the
      list of options are redundant).  A new erlang:open_port/1
      function could be added:

          erlang:open_port(Library) -> port()
              Library = string()

      Under the hoods, it could just call something like the
      existing erlang:open_port({spawn, Library}, [binary]).

=============
5. That's all
=============

Please tell your opinion about the proposals above, and complain if
something is missing.  Everything will be implemented depending on
your feedback.  Thanks!

=====
Notes
=====

[0] Home page of the FFI for Erlang/OTP
http://muvara.org/crs4/erlang/ffi

[1] EEP-0007: Foreign Function Interface (FFI)
http://www.erlang.org/eeps/eep-0007.html

[2] Thread about the Erlang FFI on the erlang-questions mailing list
http://erlang.org/pipermail/erlang-questions/2007-September/029121.html

[3] Proposal for type-tagged FFI calls
http://erlang.org/pipermail/erlang-questions/2007-September/029174.html

[4] FFI tagging vs. call performance
http://erlang.org/pipermail/erlang-questions/2007-September/029179.html

[5] First proposal for turning C strings/buffers into binaries
http://erlang.org/pipermail/erlang-questions/2007-September/029141.html

[6] First proposal about a ffi:sizeof/1 BIF:
http://erlang.org/pipermail/erlang-questions/2007-September/029146.html

[7] From the implementation point of view, the ffi_hardcodes.hrl
    header file could be autogenerated by GNU Autoconf from
    ffi_hardcodes.hrl.in.

Regards,

alceste
-- 
Alceste Scalas <alceste@REDACTED>
CRS4 - http://www.crs4.it/