[erlang-questions] Hipe and Binary - Bitstring

Oliver Bollmann oliver.bollmann@REDACTED
Thu Mar 28 17:55:23 CET 2019


Hi,

> I'm glad it worked out!

> However, you're still going to copy those ~2GB of live data when a full
> GC finally happens, and I think you should consider reducing that
> figure. Do you really need all that data in one process?


The problem i solved with this process is to resolve nested groups, using diagraph:in_neighbours.

I have 1M groups with each have at least 100,000 members. The nested level is at least 100. Loops are allowed!

Question: Which group have which members =/= group.

I started with ets, ets copy at each access the value, i got a lot of memory peaks, not good.
I tried lists,maps and so on.
I finished at the process directory, perfect, if the process died the memory is gone and using binaries only no copy of data on get.

Now i use a bitmap a 1Mx1M grid which each bit is a nested group, using union,intersection to resolve the nested groups.

The process runs now about 10mins and save the result in mnesia(about 5GB) and die.

BTW, the persistent_term looks good, cause the grid is a onetime grid, to split in more than one process, but what i need 10M terms with about 1TB binaries, for the next step :-)


Oliver






On 28.03.19 15:05, John Högberg wrote:
> On Wed, 2019-03-27 at 21:30 +0100, Kostis Sagonas wrote:
>> On the other hand, I would not call the performance difference
>> between
>> BEAM and HiPE that you observed "modest".  Four times faster
>> execution
>> is IMO something that deserves a better adjective.
>>
>> Kostis
> Yes, it's a very impressive improvement. "Modest" was in relation to
> that 400x number and I should've been clearer about that, "reasonable
> difference" would have been better wording.
>
> On Thu, 2019-03-28 at 08:34 +0100, Oliver Bollmann wrote:
>> Hi John,
>> problem solved!
>> The secret is:
>> process_flag(min_heap_size,1024*1024*10),process_flag(min_bin_vheap_s
>> ize,1024*1024*10*10),
>> with this i get without native 1,000,000 steps:
>> #{gc_major_end => 8,gc_major_start => 8,gc_max_heap_size =>
>> 0,gc_minor_end => 85,gc_minor_start => 85}
>>
>> Performance is 100 time faster, the missing factor 4 comes from hipe
>> itself!
>>
>> Very nice!
>>
>> Oliver
> I'm glad it worked out!
>
> However, you're still going to copy those ~2GB of live data when a full
> GC finally happens, and I think you should consider reducing that
> figure. Do you really need all that data in one process?
>
> On Thu, 2019-03-28 at 08:55 +0100, Frank Muller wrote:
>> Can someone shed some light on the difference between min_heap_size
>> & min_bin_vheap_size
>>
>> on how to tweak them per process to tune VM’s perfs?
>>
>>
>> Thanks
> On the process heap, off-heap binaries are essentially just a small
> chunk with a pointer and size, so if we decided to GC based on the
> process heap alone we would keep an unreachable 1GB binary alive for
> just as long as a 1KB one (all else equal), which is a bit suboptimal.
>
> We therefore track the combined size of all our off-heap data and GC
> when they exceed the "virtual binary heap size," even if the process
> heap nowhere near full. This "virtual binary heap" grows and shrinks
> much like the ordinary process heap, and the min_bin_vheap_size option
> is analogous to min_heap_size.
>
> In general you shouldn't need to play around with these settings, but
> if you have a process that you know will grow really fast then there
> may be something to gain by bumping its minimum heap size. I don't
> recommend doing this without careful consideration though.
>
> http://erlang.org/doc/efficiency_guide/processes.html#initial-heap-size
>
> /John
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-- 
Grüße
Oliver Bollmann

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190328/3824439c/attachment.htm>


More information about the erlang-questions mailing list