[erlang-questions] binary_to_term and leaking atoms

Mon Jan 4 10:54:37 CET 2010

I think we will implement atom GC one day. We have discussed this
several times and there
are solutions with only small performance decrease (10% or maybe less).
It is a major thing to implement and we cannot prioritize this for now
and unknown when we can.

Extended functionality in binary_to_term can be a good compromise in
the mean time.
The question is how it should work.

Assume we introduce:

binary_to_term(Bin,Options)

What options should we have:

'no_new_atoms' could make the function crash with reason badarg if the
binary contains
encoded "new" non existing atoms

or

'new_atoms_as_binaries' could translate "new" existing atoms to binaries

etc.

I think the first step is to define how the new function should work,
what suggestions do you have here?

/Kenneth Erlang/OTP Ericsson

On Mon, Jan 4, 2010 at 9:28 AM, Jayson Vantuyl <kagato@REDACTED> wrote:
> On the up side, short atoms sound like a fantastic idea.  On the down side, atom GC sounds like it could easily kill performance.  I'd imagine that everything would have to stop when atoms were GC'd.  Even if not, I'd also imagine it would be a large and invasive change.
>
> I don't suppose I could get a safe binary_to_term in the meantime?  What if I submitted a patch?
>
> On Jan 4, 2010, at 12:17 AM, Joe Armstrong wrote:
>
>> The real problem is that we don't have garbage collection of atoms.
>> lists_to_exiting_atom is a hack to try and
>> get around this.
>>
>> Erlang was designed for an environment of trusted nodes so we never
>> worried about atom garbage collection.
>> The case for adding atom GC has never been compelling enough to do it.
>>
>> On a 64 bit machine the case for atoms seems weak - you could make a
>> new data type "short atoms" and
>> store them in 64 bits, long atoms could be on the local process  heap
>> and not in the atom table, You could use some
>> smart pointer scheme to avoid unnecessary string comparisons when
>> comparing atoms ...
>>
>> /Joe
>>
>>
>>
>> On Mon, Jan 4, 2010 at 3:58 AM, Jayson Vantuyl <kagato@REDACTED> wrote:
>>> I've been writing a lot of Erlang lately, and I feel like I'm missing something.
>>>
>>> Specifically, list_to_existing_atom is awesome for preventing atom leak; binary_to_term is great for easily building flexible network protocols; and {packet,N} makes framing the protocol a breeze.
>>>
>>> That said, I can't get the safety of list_to_existing_atom with binary_to_term.  binary_to_term will automatically create any atoms (as well as funs) that a remote sender wants.  This is has necessitated writing custom protocol encoders / decoders, and makes Erlang's external binary term format incredibly useless.  It would be very nice to add a version of binary_to_term that has an extra argument which contains options.  This would generally useful to allow prohibiting creation of new atoms, prohibiting creation of funs / pids, and maybe even to specify backwards-compatible binary formats (making it easier to interoperate with older versions of Erlang).
>>>
>>> --
>>> Jayson Vantuyl
>>> kagato@REDACTED
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________________________________________
>>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>>> erlang-questions (at) erlang.org
>>>
>>>
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
>
>
>
> --
> Jayson Vantuyl
> kagato@REDACTED
>
>
>
>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>