<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 2011-11-20 07:53, Bob Gustafson wrote:
<blockquote cite="mid:1321772035.9052.223.camel@hoho6.chidig.com"
type="cite">
<pre wrap="">$ cat b_read.rb
#!/usr/env ruby
fin = File.open("buffer.out","r")
count = {}
fin.each_line do <code class="moz-txt-verticalline"><span class="moz-txt-tag">|</span>line<span class="moz-txt-tag">|</span></code>
line.each_char do <code class="moz-txt-verticalline"><span class="moz-txt-tag">|</span>c<span class="moz-txt-tag">|</span></code>
if count[c] == nil
count[c] = 1
else
count[c] += 1
end
end
end
</pre>
</blockquote>
<br>
Not tested, but here is a general idea. Define a module and export
the relevant function.<br>
<br>
-module(foo).<br>
<br>
-export([go/1]).<br>
<br>
Now, go/1 will open the file in buffered mode so we get a little bit
of read_ahead speedup. Opening the file as a binary is even faster
but for now, this will definitely do.<br>
<br>
go(FN) -><br>
%% fin = File.open("buffer.out","r")<br>
{ok, Fd} = <a class="moz-txt-link-freetext" href="file:open(FN">file:open(FN</a>, [read, read_ahead, raw]),<br>
R = frequency_count(Fd),<br>
<a class="moz-txt-link-freetext" href="file:close(Fd)">file:close(Fd)</a>,<br>
{ok, R}.<br>
<br>
The way to count frequencies is to read in the first line, and keep
a dictionary with us to update. We initialize the dictionary as the
empty one.<br>
<br>
frequency_count(IODev) -><br>
%% count = {}<br>
frequency_count(IODev, <a class="moz-txt-link-freetext" href="file:read_line(IODev)">file:read_line(IODev)</a>, dict:new()).<br>
<br>
Two patterns. Either there is a line, in which case we process it,
or there are no more lines, in which case we return the dictionary.<br>
<br>
%% fin.each_line do |line|<br>
frequency_count(IODev, {ok, Line}, Dict) -><br>
frequency_count(IODev, <a class="moz-txt-link-freetext" href="file:read_line(IODev)">file:read_line(IODev)</a>,<br>
update_line(Line, Dict));<br>
frequency_count(_IODev, eof, Dict) -> Dict.<br>
<br>
Same game, either there is another character or there isn't. When we
are done with the line, we return the dict. When there is a line,
the scheme in the Ruby code can be handled by the function
dict:update_counter/3, and by noticing we can process each character
one at a time.<br>
<br>
%% line.each_char do |c|<br>
update_line([Char | Rest], Dict) -><br>
%% if count[c] == nil<br>
%% count[c] = 1<br>
%% else<br>
%% count[c] += 1<br>
%% end<br>
NewDict = dict:update_counter(Char, 1, Dict),<br>
update_line(Rest, NewDict);<br>
update_line([], Dict) -> Dict.<br>
<br>
<pre class="moz-signature" cols="72">--
Jesper Louis Andersen
Erlang Solutions, Copenhagen, DK</pre>
</body>
</html>