[erlang-questions] Erlang Exercise 19-3

カカキ heturing@REDACTED
Wed Oct 31 04:24:23 CET 2018


Hello, everyone.

I'm a little confused with an Exercise, and I cannot understand the
question.

Here is the question.

Write a program to detect plagiarisms in text. To do this, use a two-pass
algorithm. In pass 1, break the text into 40-character blocks and compute a
checksum for each 40-character block. Store the checksum and filename in an
ETS table. In pass 2, compute the checksum of each 40-character block in
the data and compare with the checksums in the ETS table.

Hint: You will need to compute a  "rolling checksum" to do this. For
example, if C1 = B1 + B2 + ... + B40 and C2 = B2 + B3+ ... + B41, then C2
can be quickly computed by obversing that C2 = C1 + B41 - B1.

What I've done is finding two files, say file1.txt and file2.txt. I want to
check whether file2 plagiarizes file1. So I get first 40-characters of
file1 and calculate the checksum of it. And then I move the block one
character to the right and calculate the checksum. At last, I will get a
list of checksum of file1, and I store them into an ETS table.

On pass two, I do the same thing with file2 and get a list of checksum [C1,
C2, ..., Cn]. For each element, I check whether it exists in ETS table or
not. If exists, it means that there is a block with the same content in
file 1. So it is plagiarism. Finally, I count the number of these blocks
and output it.

However, it seems that I haven't use the filename. So I am wondering that
there is something wrong with my opinion.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20181031/0de81643/attachment.htm>


More information about the erlang-questions mailing list