Patch for strstr_binary

Tue Dec 13 12:12:27 CET 2005

(**) on Mikael's (*): Careful now. Yes, it true, ptr1 - ptr2 is a signed
int, but it might not have the value you'd expect. There's a pretty good
reason the pointers must be to the same type - that's because value of ptr1
- ptr2 is supposed to be the offset (in memory) between the two objects
being pointed at. Unless theyre byte pointers (char* etc), the value of ptr1
- ptr2 is not returned in bytes, but in "element size units". Subtracting
two pointers to two adjacent elements in an array of 32-byte structures
would result in a value of 1, not 32! The same goes for place = ptr1 + diff
: if ptr1 points to a 8 byte double and diff is 1, then place should be ptr1
+ 8 bytes.

This, though correct behaviour for strongly typed pointer arithmetic, has
lead to numerous confused programmers chasing all sorts of bugs. Hence,
seeking certainty, some programmers force all pointer arithmetic into char*
arithmetic and manually address the element size issues where they arise. 

C is my "mother tongue" but I've been living in a completely different world
for the last 10 years, so who knows how modern compilers and standards
handle this. 

But wait, this is not a C mailing list is it? 

/Marthin

> -----Original Message-----
> From: owner-erlang-questions@REDACTED [mailto:owner-erlang-
> questions@REDACTED] On Behalf Of Mikael Pettersson
> Sent: 13 December 2005 04:54
> To: dizzyd@REDACTED; erlang-questions@REDACTED
> Subject: Re: Patch for strstr_binary
> 
> > OSX and smoke tested on Linux (32-bit). I doubt it will work on
> > win32, as it uses a memcmp() function...but it should be easy to
> 
> memcmp() is a standard ANSI/ISO-C feature. I would be very
> surprised if Win32 didn't have it.
> 
> > +    if (is_binary(big) && is_binary(little)) {
> > +
> > +        char* big_ptr;
> > +        int big_len = binary_size(big);
> > +        GET_BINARY_BYTES(big, big_ptr);
> > +
> > +        char* little_ptr;
> > +        int little_len = binary_size(little);
> > +        GET_BINARY_BYTES(little, little_ptr);
> > +
> > +        char* cur;
> > +        char* max_big = (big_ptr + big_len) - little_len ;
> > +        char  first_little = *little_ptr;
> > +        int   index = -1;
> 
> This isn't ANSI-C and won't compile with e.g. gcc-2.95.3.
> You should make a single block of the variable declarations
> at the beginning of this scope, and do the initialisations
> as assignments.
> 
> > +        for (cur = big_ptr; cur <= max_big; cur++) {
> > +            if (*cur == first_little && memcmp(cur, little_ptr,
> little_len) == 0) {
> > +                index = (int)cur - (int)big_ptr;
> 
> These (int) casts are broken on 64-bit machines. Replace them
> with (long) casts. Better yet, "index = cur - big_ptr;"
> should have the same effect and be 100% kosher.
> 
> > +bif erlang:strstr_binary/2
> > +
> > +
> > +
> 
> No stray empty lines please.
> 
> /Mikael