[erlang-questions] binary_to_term and leaking atoms

Jayson Vantuyl kagato@REDACTED
Mon Jan 4 12:01:57 CET 2010


Yeah, I thought of that after sending it (and followed up cryptically).

When data gets passed around, I generally don't expect it to be properly validated, so I was being a bit paranoid.

On Jan 4, 2010, at 2:35 AM, Zoltan Lajos Kis wrote:

> Just to see this clearly:
> How can a pid or fun be dangerous? If you don't trust them, don't call
> them or send messages to them.
> 
>> This was exactly what I was thinking.  As for naming the option, perhaps
>> you should use "only_existing_atoms" just to make it look similar to
>> list_to_existing_atom/1.  Also, their docs should probably refer to each
>> other, just for context.  I would also want options to prohibit_funs and
>> prohibit_pids.  Both of these would be necessary to safely use
>> binary_to_term with unsafe binaries.  Anything else anyone can think of?
>> 
>> As an alternative to adding options, you could also just have
>> safe_binary_to_term/1 which prohibited funs, pids, and new atoms.  Unless
>> there are other compelling options, an option list might be overkill.  I'd
>> imagine that this will get used in a tight loop, so handling the lists for
>> each call might also be a bit of an unnecessary performance hit (although,
>> I find this unlikely).
>> 
>> So, specifically, either:
>> 
>> %% @spec safe_binary_to_term(binary()) -> term()
>> %% @doc Limited form of binary_to_term which won't create funs, pids, or
>> new atoms.
>> %%   To be used to limit danger of decoding untrusted external binaries.
>> 
>> Or:
>> 
>> %% @spec binary_to_term(binary(), [ only_existing_atoms | prohibit_pid |
>> prohibit_fun | safe ]) -> term()
>> %% @doc Same as binary_to_term/1, but with special decoding options.
>> %%   only_existing_atoms: prohibit creation of new atoms
>> %%   prohibit_pid: prohibits creation of pids
>> %%   prohibit_fun: prohibits creation of funs
>> %%   safe: same as above three options, useful when decoding binaries from
>> untrusted sources
>> 
>> Take your pick, but either one would make my world much easier.
>> 
>> On Jan 4, 2010, at 1:54 AM, Kenneth Lundin wrote:
>> 
>>> I think we will implement atom GC one day. We have discussed this
>>> several times and there
>>> are solutions with only small performance decrease (10% or maybe less).
>>> It is a major thing to implement and we cannot prioritize this for now
>>> and unknown when we can.
>>> 
>>> Extended functionality in binary_to_term can be a good compromise in
>>> the mean time.
>>> The question is how it should work.
>>> 
>>> Assume we introduce:
>>> 
>>> binary_to_term(Bin,Options)
>>> 
>>> What options should we have:
>>> 
>>> 'no_new_atoms' could make the function crash with reason badarg if the
>>> binary contains
>>> encoded "new" non existing atoms
>>> 
>>> or
>>> 
>>> 'new_atoms_as_binaries' could translate "new" existing atoms to binaries
>>> 
>>> etc.
>>> 
>>> I think the first step is to define how the new function should work,
>>> what suggestions do you have here?
>>> 
>>> /Kenneth Erlang/OTP Ericsson
>>> 
>>> On Mon, Jan 4, 2010 at 9:28 AM, Jayson Vantuyl <kagato@REDACTED> wrote:
>>>> On the up side, short atoms sound like a fantastic idea.  On the down
>>>> side, atom GC sounds like it could easily kill performance.  I'd
>>>> imagine that everything would have to stop when atoms were GC'd.  Even
>>>> if not, I'd also imagine it would be a large and invasive change.
>>>> 
>>>> I don't suppose I could get a safe binary_to_term in the meantime?
>>>> What if I submitted a patch?
>>>> 
>>>> On Jan 4, 2010, at 12:17 AM, Joe Armstrong wrote:
>>>> 
>>>>> The real problem is that we don't have garbage collection of atoms.
>>>>> lists_to_exiting_atom is a hack to try and
>>>>> get around this.
>>>>> 
>>>>> Erlang was designed for an environment of trusted nodes so we never
>>>>> worried about atom garbage collection.
>>>>> The case for adding atom GC has never been compelling enough to do it.
>>>>> 
>>>>> On a 64 bit machine the case for atoms seems weak - you could make a
>>>>> new data type "short atoms" and
>>>>> store them in 64 bits, long atoms could be on the local process  heap
>>>>> and not in the atom table, You could use some
>>>>> smart pointer scheme to avoid unnecessary string comparisons when
>>>>> comparing atoms ...
>>>>> 
>>>>> /Joe
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Jan 4, 2010 at 3:58 AM, Jayson Vantuyl <kagato@REDACTED>
>>>>> wrote:
>>>>>> I've been writing a lot of Erlang lately, and I feel like I'm missing
>>>>>> something.
>>>>>> 
>>>>>> Specifically, list_to_existing_atom is awesome for preventing atom
>>>>>> leak; binary_to_term is great for easily building flexible network
>>>>>> protocols; and {packet,N} makes framing the protocol a breeze.
>>>>>> 
>>>>>> That said, I can't get the safety of list_to_existing_atom with
>>>>>> binary_to_term.  binary_to_term will automatically create any atoms
>>>>>> (as well as funs) that a remote sender wants.  This is has
>>>>>> necessitated writing custom protocol encoders / decoders, and makes
>>>>>> Erlang's external binary term format incredibly useless.  It would be
>>>>>> very nice to add a version of binary_to_term that has an extra
>>>>>> argument which contains options.  This would generally useful to
>>>>>> allow prohibiting creation of new atoms, prohibiting creation of funs
>>>>>> / pids, and maybe even to specify backwards-compatible binary formats
>>>>>> (making it easier to interoperate with older versions of Erlang).
>>>>>> 
>>>>>> --
>>>>>> Jayson Vantuyl
>>>>>> kagato@REDACTED
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________________________________________
>>>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>>>>> erlang-questions (at) erlang.org
>>>>>> 
>>>>>> 
>>>>> 
>>>>> ________________________________________________________________
>>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>>>> erlang-questions (at) erlang.org
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jayson Vantuyl
>>>> kagato@REDACTED
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ________________________________________________________________
>>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>>> erlang-questions (at) erlang.org
>>>> 
>>>> 
>> 
>> 
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>> 
>> 
>> 
> 
> 



More information about the erlang-questions mailing list