[erlang-questions] Problem with pattern matching in large binaries

Sverker Eriksson sverker@REDACTED
Tue May 6 11:09:28 CEST 2008


Edwin Fine wrote:
> I have a text file that is 1,037,563,663 bytes in length. In the
>  shell, I read it all into memory as follows:
>
>  >{ok,B} = file:read_file("/tmp/data").
[...]

> > <<_Skip:Offset/binary,Last100/binary>> = B.
> > byte_size(Last100).
> 100
> > byte_size(_Skip).
> 500692651
>
> WTF??? Checking Last100 showed that it was indeed the data from offset
> 500693651, not the last 100 bytes. Where did the other 500MB-odd go?
>   
[...]

Per Gustafsson wrote:
> I took a look at this and it seems that in the BYTE_OFFSET macro on line 
>   153 in erl_bits.h there is a cast to unsigned that really should be to 
> Uint that probably causes this problem, but I do not have a good machine 
> for testing this change. (Our amd64 does not have enough memory to build 
> a 1 Gb binary)
>
> Per
>   

I will take a look at this with my Intel Quad, 64bit, 4 Gb.

If Per is right, a fix will be released to R12B-3.

/Sverker, Erlang/OTP, Ericsson




More information about the erlang-questions mailing list