Erlang binary longest common substring

Mon Mar 30 15:08:05 CEST 2020

Hi,

I think the implementation on rosetta code is slightly broken from a
performance point of view. Usually, if you are memoising in this
problem you would use the offsets for where you are in the string. So
if you were writing in an imperative language you would just have a N
x M array. Obviously, you can't do this in Erlang so they are storing
stuff in a dictionary. However, for the keys it looks like they are
using the string/list for the key instead of the offset. This means
the dict code will have to iterate over the remaining bit of the
string for every lookup. So instead of being O(NxM) you end up with an
algorithm that is O(NxMx(N + M)). So instead of being quadratic it
gives you cubic performance. This might explain why it is so slow.
Note: I haven't included the log() term that is introduced by the
dictionary lookup that can't be avoided in Erlang.

On Mon, Mar 30, 2020 at 11:18 AM I Gusti Ngurah Oka Prinarjaya
<okaprinarjaya@REDACTED> wrote:
>
> Hi,
>
> I need longest common substring erlang implementation. Please help me
>
> I've tried https://rosettacode.org/wiki/Longest_common_subsequence#Erlang but very very slow T_T
>
> and i've tried binary:longest_common_prefix([<<"i love you 2020-03-28">>,<<"i love you 2020-03-01">>]) not effective / not valid
>
> Thank you
>