[erlang-questions] binary_to_term and leaking atoms

Mon Jan 4 11:23:38 CET 2010

This was exactly what I was thinking.  As for naming the option, perhaps you should use "only_existing_atoms" just to make it look similar to list_to_existing_atom/1.  Also, their docs should probably refer to each other, just for context.  I would also want options to prohibit_funs and prohibit_pids.  Both of these would be necessary to safely use binary_to_term with unsafe binaries.  Anything else anyone can think of?

As an alternative to adding options, you could also just have safe_binary_to_term/1 which prohibited funs, pids, and new atoms.  Unless there are other compelling options, an option list might be overkill.  I'd imagine that this will get used in a tight loop, so handling the lists for each call might also be a bit of an unnecessary performance hit (although, I find this unlikely).

So, specifically, either:

%% @spec safe_binary_to_term(binary()) -> term()
%% @doc Limited form of binary_to_term which won't create funs, pids, or new atoms.
%%   To be used to limit danger of decoding untrusted external binaries.

Or:

%% @spec binary_to_term(binary(), [ only_existing_atoms | prohibit_pid | prohibit_fun | safe ]) -> term()
%% @doc Same as binary_to_term/1, but with special decoding options.
%%   only_existing_atoms: prohibit creation of new atoms
%%   prohibit_pid: prohibits creation of pids
%%   prohibit_fun: prohibits creation of funs
%%   safe: same as above three options, useful when decoding binaries from untrusted sources

Take your pick, but either one would make my world much easier.

On Jan 4, 2010, at 1:54 AM, Kenneth Lundin wrote:

> I think we will implement atom GC one day. We have discussed this
> several times and there
> are solutions with only small performance decrease (10% or maybe less).
> It is a major thing to implement and we cannot prioritize this for now
> and unknown when we can.
> 
> Extended functionality in binary_to_term can be a good compromise in
> the mean time.
> The question is how it should work.
> 
> Assume we introduce:
> 
> binary_to_term(Bin,Options)
> 
> What options should we have:
> 
> 'no_new_atoms' could make the function crash with reason badarg if the
> binary contains
> encoded "new" non existing atoms
> 
> or
> 
> 'new_atoms_as_binaries' could translate "new" existing atoms to binaries
> 
> etc.
> 
> I think the first step is to define how the new function should work,
> what suggestions do you have here?
> 
> /Kenneth Erlang/OTP Ericsson
> 
> On Mon, Jan 4, 2010 at 9:28 AM, Jayson Vantuyl <kagato@REDACTED> wrote:
>> On the up side, short atoms sound like a fantastic idea.  On the down side, atom GC sounds like it could easily kill performance.  I'd imagine that everything would have to stop when atoms were GC'd.  Even if not, I'd also imagine it would be a large and invasive change.
>> 
>> I don't suppose I could get a safe binary_to_term in the meantime?  What if I submitted a patch?
>> 
>> On Jan 4, 2010, at 12:17 AM, Joe Armstrong wrote:
>> 
>>> The real problem is that we don't have garbage collection of atoms.
>>> lists_to_exiting_atom is a hack to try and
>>> get around this.
>>> 
>>> Erlang was designed for an environment of trusted nodes so we never
>>> worried about atom garbage collection.
>>> The case for adding atom GC has never been compelling enough to do it.
>>> 
>>> On a 64 bit machine the case for atoms seems weak - you could make a
>>> new data type "short atoms" and
>>> store them in 64 bits, long atoms could be on the local process  heap
>>> and not in the atom table, You could use some
>>> smart pointer scheme to avoid unnecessary string comparisons when
>>> comparing atoms ...
>>> 
>>> /Joe
>>> 
>>> 
>>> 
>>> On Mon, Jan 4, 2010 at 3:58 AM, Jayson Vantuyl <kagato@REDACTED> wrote:
>>>> I've been writing a lot of Erlang lately, and I feel like I'm missing something.
>>>> 
>>>> Specifically, list_to_existing_atom is awesome for preventing atom leak; binary_to_term is great for easily building flexible network protocols; and {packet,N} makes framing the protocol a breeze.
>>>> 
>>>> That said, I can't get the safety of list_to_existing_atom with binary_to_term.  binary_to_term will automatically create any atoms (as well as funs) that a remote sender wants.  This is has necessitated writing custom protocol encoders / decoders, and makes Erlang's external binary term format incredibly useless.  It would be very nice to add a version of binary_to_term that has an extra argument which contains options.  This would generally useful to allow prohibiting creation of new atoms, prohibiting creation of funs / pids, and maybe even to specify backwards-compatible binary formats (making it easier to interoperate with older versions of Erlang).
>>>> 
>>>> --
>>>> Jayson Vantuyl
>>>> kagato@REDACTED
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ________________________________________________________________
>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>>> erlang-questions (at) erlang.org
>>>> 
>>>> 
>>> 
>>> ________________________________________________________________
>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>> erlang-questions (at) erlang.org
>>> 
>> 
>> 
>> 
>> --
>> Jayson Vantuyl
>> kagato@REDACTED
>> 
>> 
>> 
>> 
>> 
>> 
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>> 
>>