[erlang-questions] binary_to_term and leaking atoms

Mon Jan 4 11:35:40 CET 2010

Just to see this clearly:
How can a pid or fun be dangerous? If you don't trust them, don't call
them or send messages to them.

> This was exactly what I was thinking.  As for naming the option, perhaps
> you should use "only_existing_atoms" just to make it look similar to
> list_to_existing_atom/1.  Also, their docs should probably refer to each
> other, just for context.  I would also want options to prohibit_funs and
> prohibit_pids.  Both of these would be necessary to safely use
> binary_to_term with unsafe binaries.  Anything else anyone can think of?
>
> As an alternative to adding options, you could also just have
> safe_binary_to_term/1 which prohibited funs, pids, and new atoms.  Unless
> there are other compelling options, an option list might be overkill.  I'd
> imagine that this will get used in a tight loop, so handling the lists for
> each call might also be a bit of an unnecessary performance hit (although,
> I find this unlikely).
>
> So, specifically, either:
>
> %% @spec safe_binary_to_term(binary()) -> term()
> %% @doc Limited form of binary_to_term which won't create funs, pids, or
> new atoms.
> %%   To be used to limit danger of decoding untrusted external binaries.
>
> Or:
>
> %% @spec binary_to_term(binary(), [ only_existing_atoms | prohibit_pid |
> prohibit_fun | safe ]) -> term()
> %% @doc Same as binary_to_term/1, but with special decoding options.
> %%   only_existing_atoms: prohibit creation of new atoms
> %%   prohibit_pid: prohibits creation of pids
> %%   prohibit_fun: prohibits creation of funs
> %%   safe: same as above three options, useful when decoding binaries from
> untrusted sources
>
> Take your pick, but either one would make my world much easier.
>
> On Jan 4, 2010, at 1:54 AM, Kenneth Lundin wrote:
>
>> I think we will implement atom GC one day. We have discussed this
>> several times and there
>> are solutions with only small performance decrease (10% or maybe less).
>> It is a major thing to implement and we cannot prioritize this for now
>> and unknown when we can.
>>
>> Extended functionality in binary_to_term can be a good compromise in
>> the mean time.
>> The question is how it should work.
>>
>> Assume we introduce:
>>
>> binary_to_term(Bin,Options)
>>
>> What options should we have:
>>
>> 'no_new_atoms' could make the function crash with reason badarg if the
>> binary contains
>> encoded "new" non existing atoms
>>
>> or
>>
>> 'new_atoms_as_binaries' could translate "new" existing atoms to binaries
>>
>> etc.
>>
>> I think the first step is to define how the new function should work,
>> what suggestions do you have here?
>>
>> /Kenneth Erlang/OTP Ericsson
>>
>> On Mon, Jan 4, 2010 at 9:28 AM, Jayson Vantuyl <kagato@REDACTED> wrote:
>>> On the up side, short atoms sound like a fantastic idea.  On the down
>>> side, atom GC sounds like it could easily kill performance.  I'd
>>> imagine that everything would have to stop when atoms were GC'd.  Even
>>> if not, I'd also imagine it would be a large and invasive change.
>>>
>>> I don't suppose I could get a safe binary_to_term in the meantime?
>>> What if I submitted a patch?
>>>
>>> On Jan 4, 2010, at 12:17 AM, Joe Armstrong wrote:
>>>
>>>> The real problem is that we don't have garbage collection of atoms.
>>>> lists_to_exiting_atom is a hack to try and
>>>> get around this.
>>>>
>>>> Erlang was designed for an environment of trusted nodes so we never
>>>> worried about atom garbage collection.
>>>> The case for adding atom GC has never been compelling enough to do it.
>>>>
>>>> On a 64 bit machine the case for atoms seems weak - you could make a
>>>> new data type "short atoms" and
>>>> store them in 64 bits, long atoms could be on the local process  heap
>>>> and not in the atom table, You could use some
>>>> smart pointer scheme to avoid unnecessary string comparisons when
>>>> comparing atoms ...
>>>>
>>>> /Joe
>>>>
>>>>
>>>>
>>>> On Mon, Jan 4, 2010 at 3:58 AM, Jayson Vantuyl <kagato@REDACTED>
>>>> wrote:
>>>>> I've been writing a lot of Erlang lately, and I feel like I'm missing
>>>>> something.
>>>>>
>>>>> Specifically, list_to_existing_atom is awesome for preventing atom
>>>>> leak; binary_to_term is great for easily building flexible network
>>>>> protocols; and {packet,N} makes framing the protocol a breeze.
>>>>>
>>>>> That said, I can't get the safety of list_to_existing_atom with
>>>>> binary_to_term.  binary_to_term will automatically create any atoms
>>>>> (as well as funs) that a remote sender wants.  This is has
>>>>> necessitated writing custom protocol encoders / decoders, and makes
>>>>> Erlang's external binary term format incredibly useless.  It would be
>>>>> very nice to add a version of binary_to_term that has an extra
>>>>> argument which contains options.  This would generally useful to
>>>>> allow prohibiting creation of new atoms, prohibiting creation of funs
>>>>> / pids, and maybe even to specify backwards-compatible binary formats
>>>>> (making it easier to interoperate with older versions of Erlang).
>>>>>
>>>>> --
>>>>> Jayson Vantuyl
>>>>> kagato@REDACTED
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ________________________________________________________________
>>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>>>> erlang-questions (at) erlang.org
>>>>>
>>>>>
>>>>
>>>> ________________________________________________________________
>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>>> erlang-questions (at) erlang.org
>>>>
>>>
>>>
>>>
>>> --
>>> Jayson Vantuyl
>>> kagato@REDACTED
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________________________________
>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>> erlang-questions (at) erlang.org
>>>
>>>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>
>