Binary matching

Per Bergqvist per@REDACTED
Mon Sep 30 10:01:25 CEST 2002


Hi Jay,                                                               
                                                                      
you should specify the size of the binary in units not in bits.       
For a binary the unit size is 8 bits.                                 
I.e. use:                                                             
<<W1:4/binary, Rest/binary>> = Bin.                                   
                                                                      
/Per                                                                  
                                                                      
> I am trying to use port communications and do as much               
> of the parsing / matching as possible on the binary data            
> to keep the speed up.  The problem is that I either don't           
> understand it, have some missing syntax, or it just doesn't         
> work as documented.                                                 
>                                                                     
> My initial testing was to use a web browser to contact the          
> active port, receive the GET request and split it into separate     
> lines (breaking on cr / lf) and printing it out to see the result.  
> This would be an easy way to see how to receive and parse           
> binary data.                                                        
>                                                                     
> Of course, the first thing I noticed was that I can only go         
> from a binary to a list of ints or a list of ints to a binary.      
> There is no documented way to print a binary as string              
> which makes things a bit difficult to review.                       
>                                                                     
> Ignoring that, I forged ahead with a little practice matching       
> in the interpreter:                                                 
>                                                                     
> Eshell V5.1.2  (abort with ^G)                                      
> 1> Bin = <<"just a test">>.                                         
> <<106,117,115,116,32,97,32,116,101,115,116>>                        
> 2> <<W1:32/binary, Rest/binary>> = Bin.                             
> ** exited: {{badmatch,<<106,117,115,116,32,97,32,116,101,115,116>>},
>              [{erl_eval,expr,3}]} **                                
> 3> <<W1:32, Rest/binary>> = Bin.                                    
> <<106,117,115,116,32,97,32,116,101,115,116>>                        
> 4> b().                                                             
> Bin = <<106,117,115,116,32,97,32,116,101,115,116>>                  
> Rest = <<32,97,32,116,101,115,116>>                                 
> W1 = 1786082164                                                     
> ok                                                                  
>                                                                     
> OK, I can stick a string in a binary but I can't split it in two    
> by specifying a binary length and then getting the rest, but        
> I can pull of a 32-bit int and get the rest as a binary.            
>                                                                     
> 5> <<Beginning/binary, 32, End/binary>> = Bin.                      
> ** exited: {{badmatch,<<106,117,115,116,32,97,32,116,101,115,116>>},
>              [{erl_eval,expr,3}]} **                                
> 6> <<106, 117,115, 116, 32, End/binary>> = Bin.                     
> <<106,117,115,116,32,97,32,116,101,115,116>>                        
> 7> b().                                                             
> Bin = <<106,117,115,116,32,97,32,116,101,115,116>>                  
> End = <<97,32,116,101,115,116>>                                     
> Rest = <<32,97,32,116,101,115,116>>                                 
> W1 = 1786082164                                                     
> 10> os:version().                                                   
> {4,10,67766222}                                                     
> 11> os:type().                                                      
> {win32,windows}                                                     
>                                                                     
> and so on...   Various tests basically conclude that I can only     
> match with the last term as a binary, and that the beginning must   
> match exactly byte for byte or else I get badmatch.  I cannot even  
> specify an exact binary length for the leading segments, although   
> I can specify a length and strip it off as a big integer            
> (e.g., <<Bignum:80, Rest/binary>> = Data).                          
>                                                                     
> Do I have the wrong version or is this the intended functionality?  
> (R8B2 on both Windows 98 and Red Hat 7.3 compiled from scratch      
> both running Erlang 5.1.2).  I read the documentation a little more 
> closely as I was writing this and it didn't say that the sizes had  
to                                                                    
> be bound, but if they are specified they must be bound.  It implied 
> that <<B1:32/binary, B2:32/binary>> = EightBytes was allowed.       
>                                                                     
> I want to write:                                                    
>                                                                     
> breakLines(Binary) -> lines(Binary, []).                            
>                                                                     
> lines(<<>>, Acc) -> lists:reverse(Acc).                             
> lines(<<Line/binary, 13, 10, Rest/binary>>, Acc) ->                 
> 	lines(Rest, [Line | Acc]).                                         
>                                                                     
> Seems straightforward and not too difficult for the pattern-matcher,
> but then I'm no compiler writer.                                    
>                                                                     
> Instead I am left with converting to a list of ints and writing a   
few                                                                   
> helper functions to loop over the list pulling out chars using two  
> accumulators (one for the current line and one for the list of      
lines).                                                               
>                                                                     
> Am I too worried about efficiency?  Should I forget about binaries? 
> Am I doing something wrong?                                         
>                                                                     
> Is it more efficient to make a pass                                 
> across the binary looking for the location of all <<13, 10>> pairs, 
returning                                                             
> a list of the number of bytes between them and then doing binary    
> matching now that I know how many bytes to specify on the initial   
> patterns?                                                           
>                                                                     
> <little coding break for a couple hours>                            
>                                                                     
> I tried this and ended up with the same problem:                    
> -------------------------- cut here --------------------------      
>                                                                     
> -module(bin_utils).                                                 
> -export([breakLines/1,extract/3,scan/6]).                           
>                                                                     
> breakLines(Binary) ->                                               
>      StartStop = scan(Binary, <<13,10>>, 16, 0, 0, []),             
>      extract(Binary, StartStop, []).                                
>                                                                     
> extract(<<>>, _Locs, Acc) -> lists:reverse(Acc);                    
> extract(Data, [Start, Stop | Rest], Acc) ->                         
>      Len = Stop - Start,                                            
>      %%%%%%%%%  Here is the problem %%%%%%%%%%%%                    
>      %% It does little good to get Line as an int %%                
>      <<Front:Start, Line:Len/binary, Back/binary>> = Data,          
>      %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%                    
>      extract(Back, Rest, [Line | Acc]).                             
>                                                                     
> scan(<<>>, _Pattern, _Len, 0, 0, Acc) -> lists:reverse(Acc);        
> scan(<<>>, _Pattern, _Len, Start, End, Acc) -> lists:reverse([End | 
[Start                                                                
> | Acc]]);                                                           
> scan(<<Pattern/binary, Rest/binary>>, Pattern, Len, Start, End, Acc)
->                                                                    
>      scan(Rest, Pattern, Len, End + Len, End + Len, [End | [Start | 
Acc]]);                                                               
> scan(Data, Pattern, Len, Start, End, Acc) ->                        
>      <<Nomatch, Rest/binary>> = Data,                               
>      scan(Rest, Pattern, Len, Start, End + 8, Acc).                 
>                                                                     
> ---------------------- end cut ------------------------------       
>                                                                     
>                                                                     
> Does this avoid copying the binary (except for inside the function  
> extract)?  Is looping over the binary more efficient than looping   
over                                                                  
> a list of integers?                                                 
>                                                                     
> NOTE: There is a bug in my scan function because the following      
> doesn't work:                                                       
>                                                                     
> 136> CRLF = <<13, 10>>.                                             
> <<13, 10>>                                                          
>                                                                     
> 137> Test = <<67, 13, 10>>.                                         
> <<67, 13, 10>>                                                      
>                                                                     
> %%%%%%%%%%% This looks good!                                        
> 138> bin_utils:scan(Test, CRLF, 16, 0, 0, []).                      
> [0, 8, 24, 24]                                                      
>                                                                     
> 139> L1 = <<"how are you">>.                                        
> <<104,111,119,32,97,114,101,32,121,111,117>>                        
>                                                                     
> 140> L2 = <<"just fine.">>.                                         
> <<106,117,115,116,32,102,105,110,101,46>>                           
>                                                                     
> 141> L3 = <<"how about you?">>.                                     
> <<104,111,119,32,97,98,111,117,116,32,121,111,117,63>>              
>                                                                     
> 142> L4 = <<L1/binary, CRLF/binary, L2/binary, CRLF/binary,         
L3/binary>>.                                                          
>                                                                     
<<104,111,119,32,97,114,101,32,121,111,117,13,10,106,117,115,116,32,10
2,105,                                                                
> 110,101,46,13,10,104,111,119,32,...>>                               
>                                                                     
> %%%%%%%%%%  Ooops.                                                  
> 143> bin_utils:scan(L4, CRLF, 16, 0, 0, []).                        
> [0,312]                                                             
>                                                                     
>                                                                     
>                                                                     
>                                                                     
> I can't even figure out how to determine the length of              
> the binary without converting it to a list.                         
>                                                                     
> jay                                                                 
>                                                                     
>                                                                     
> ---------------------------------------------------                 
> DuoMark International, Inc.                                         
> 6523 Colgate Avenue, Suite 325                                      
> Los Angeles, CA  90048-4410 / USA                                   
> Voice: +1 323 381-0001                                              
> FAX: +1 323 549 0172                                                
> Email: jay@REDACTED                                              
> WWW: http://www.duomark.com/                                        
>                                                                     
=========================================================             
Per Bergqvist                                                         
Synapse Systems AB                                                    
Phone: +46 709 686 685                                                
Email: per@REDACTED                                                   



More information about the erlang-questions mailing list