[erlang-questions] Erlang Exercise 19-3
Wed Oct 31 04:24:23 CET 2018
I'm a little confused with an Exercise, and I cannot understand the
Here is the question.
Write a program to detect plagiarisms in text. To do this, use a two-pass
algorithm. In pass 1, break the text into 40-character blocks and compute a
checksum for each 40-character block. Store the checksum and filename in an
ETS table. In pass 2, compute the checksum of each 40-character block in
the data and compare with the checksums in the ETS table.
Hint: You will need to compute a "rolling checksum" to do this. For
example, if C1 = B1 + B2 + ... + B40 and C2 = B2 + B3+ ... + B41, then C2
can be quickly computed by obversing that C2 = C1 + B41 - B1.
What I've done is finding two files, say file1.txt and file2.txt. I want to
check whether file2 plagiarizes file1. So I get first 40-characters of
file1 and calculate the checksum of it. And then I move the block one
character to the right and calculate the checksum. At last, I will get a
list of checksum of file1, and I store them into an ETS table.
On pass two, I do the same thing with file2 and get a list of checksum [C1,
C2, ..., Cn]. For each element, I check whether it exists in ETS table or
not. If exists, it means that there is a block with the same content in
file 1. So it is plagiarism. Finally, I count the number of these blocks
and output it.
However, it seems that I haven't use the filename. So I am wondering that
there is something wrong with my opinion.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions