<div dir="ltr">I made a simple spamfilter in Erlang. It takes 2 files with previous spam and good emails and then counts how many times the most frequent words from the spammy emails and the good emails occurs and then calculates the quote spam/(spam+good) in the file you want to test and returns a number between 0 and 1.<br>
It could easily be improved in numerous ways but the main point for me was to learn Erlang. This isn't exactly what Erlang is for but it s way to get started.<br>I'd be happy to receive comments on the Erlang-ness of the code and improvements.<br>
File I/O seems slow, is there a better way? In Haskell it is fairly instant.<br><br><br>-module(antispam).<br>-export([take/2,count/2,count_all/1,most_common/2,count_set_in_list/2,<br>classify/0,readfile/1]).<br><br>take(N,List) -> i_take(N,List,0,[]).<br>
i_take(N,List,Count,Acc) -><br> if Count < N andalso List /= [] -><br> i_take(N,tl(List),Count+1,Acc++[hd(List)]);<br> Count == N -><br> Acc;<br> true -><br> []<br>
end.<br><br>count(Tok,List) -> i_count(Tok,List,0). <br> i_count(Tok,List,Acc) -><br> if Tok == hd(List) andalso List /= [] -><br> i_count(Tok,tl(List),Acc+1);<br> Tok /= hd(List) andalso List /= [] -> <br>
i_count(Tok,tl(List),Acc);<br> true -><br> Acc<br> end.<br><br>count_all(List) -><br> Unique = lists:usort(List),<br> [{U, count(U, List)} || U <- Unique]. <br><br>count_set_in_list(Set,List) -><br>
S = [{S, count(S, List)} || S <- Set],<br> lists:sum(lists:map(fun({H,T}) -> T end, S)).<br><br>most_common(Stringlist,Xmost) -><br> No_preps = lists:filter(fun(X) -> length(X) > 4 end, Stringlist),<br>
Sorted_by_count = lists:keysort(2, count_all(No_preps)),<br> TakeX = take(Xmost, lists:reverse(Sorted_by_count)),<br> lists:map(fun({H,T}) -> H end, TakeX).<br><br>readfile(FileName) -><br> {ok, Binary} = file:read_file(FileName),<br>
string:tokens(binary_to_list(Binary), " ").<br><br>classify() -><br> GoodWords = most_common(readfile("C:/Users/saftarn/Desktop/emails/okemails.txt"), 20),<br> BadWords = most_common(readfile("C:/Users/saftarn/Desktop/emails/spam.txt"), 20),<br>
GoodCount = count_set_in_list(GoodWords, readfile("C:/Users/saftarn/Desktop/emails/test.txt")),<br> BadCount = count_set_in_list(BadWords, readfile("C:/Users/saftarn/Desktop/emails/test.txt")),<br>
T = BadCount + GoodCount,<br> if T /= 0 -><br> BadCount / T;<br> true -><br> 0.5<br> end.<br><br></div>