[erlang-questions] Packets deduplication

Felix Gallo felixgallo@REDACTED
Thu Feb 18 18:09:06 CET 2016


Alexander -- what's the application supposed to do if it receives some
packets in a non-monotonic order?  If the answer is "ignore", then you
could just check that the packet ID is greater than the last packet ID and
everything else just falls out naturally.


If you still need to process out-of-order packets, then consider each
packet's "has been seen" value as a single bit.  There is some packet count
window C beyond which duplicates are not possible.  Maintain a queue of
bytes totalling at least C bits, as well as the id number at which the byte
queue starts.  When a new packet comes in, if it would require using one or
more new bytes, push those to the front and drop the equivalent number off
the back, and update the low byte queue id.  Create a 'test byte' which has
only the new bit set in the appropriate position for that byte.  use
binary-and ("band") to check to see if that bit has been seen in the
appropriate byte in the byte queue; if the resulting byte is greater than
zero, the bit has already been seen; do whatever processing is necessary.
Otherwise, the bit has not been seen; set the bit in the byte queue,
process, and continue.

This is probably fastest in C/C++ but Erlang should be fine as long as the
sparsity factor between packet ids is not generally gigantic (i.e. it's
usually 1, rarely 100).  If it's gigantic, then I'd probably use a queue of
small pooled bloom filters in the same way so that I wouldn't have to do as
much allocation.  As always, measure and compare.

F.

On Thu, Feb 18, 2016 at 6:58 AM, Jesper Louis Andersen <
jesper.louis.andersen@REDACTED> wrote:

> Keep a map, M :: #{ integer() => bool() } and ask if there is_key/2 on the
> ID's. Once in a while, you GC the map through maps:with(lists:seq(ID1,
> ID2), M)  and make it small again.
>
> You would have to measure if this is actually faster/slower and by how
> much it is a slower solution. It isn't a priori given it would be either.
> jlouis/eministat will help determine if it is the case.
>
>
> On Thu, Feb 18, 2016 at 2:43 PM, Alexander Petrovsky <askjuise@REDACTED>
> wrote:
>
>> Yep, right now I try to do something similar with bitmap and sliding
>> window, that will shift. Could you explain about maps?
>>
>> Thanks!
>>
>> 2016-02-18 16:37 GMT+03:00 Jesper Louis Andersen <
>> jesper.louis.andersen@REDACTED>:
>>
>>> A simple solution that may work in the end is to keep track of a window
>>> of bits:
>>>
>>> * shift the window as the ID increases
>>> * check older packets against the window
>>>
>>> This gives you "SACK" support, is fast to check against and fast to
>>> expire old data due to the shifting.
>>>
>>> Simply using an ordered list or a map could also work if you clean it up
>>> eventually. It depends on how fast your CPU core is.
>>>
>>>
>>> On Thu, Feb 18, 2016 at 2:24 PM, Alexander Petrovsky <askjuise@REDACTED
>>> > wrote:
>>>
>>>> Ouch, I forgot to say, the IDs can be sparse, so they not really
>>>> monotonically, they just grow, and can be reordered.
>>>>
>>>> 2016-02-18 16:20 GMT+03:00 Danil Zagoskin <z@REDACTED>:
>>>>
>>>>> Hi!
>>>>>
>>>>> If ID grows monotinically and you have no plans of recovering after
>>>>> packet reordering, then you can just keep the previous ID.
>>>>>   - If CurID > PrevID, CurID is unique;
>>>>>   - If CurID == PrevID, it is not unique;
>>>>>   - If CurID < PrevID, it is a bug or reordering, let it crash.
>>>>>
>>>>>
>>>>> On Thu, Feb 18, 2016 at 3:01 PM, Alexander Petrovsky <
>>>>> askjuise@REDACTED> wrote:
>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I have the stream of packets with ID (int), and I need to check is
>>>>>> the packet is uniq (by ID) or not?
>>>>>>
>>>>>> Incoming rate is about 20k pps and ID is monotonically grows. What's
>>>>>> the best way and data structure fit for this problem?
>>>>>>
>>>>>> --
>>>>>> Петровский Александр / Alexander Petrovsky,
>>>>>>
>>>>>> Skype: askjuise
>>>>>> Phone: +7 914 8 820 815
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> erlang-questions mailing list
>>>>>> erlang-questions@REDACTED
>>>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Danil Zagoskin | z@REDACTED
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Петровский Александр / Alexander Petrovsky,
>>>>
>>>> Skype: askjuise
>>>> Phone: +7 914 8 820 815
>>>>
>>>>
>>>> _______________________________________________
>>>> erlang-questions mailing list
>>>> erlang-questions@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>
>>>>
>>>
>>>
>>> --
>>> J.
>>>
>>
>>
>>
>> --
>> Петровский Александр / Alexander Petrovsky,
>>
>> Skype: askjuise
>> Phone: +7 914 8 820 815
>>
>>
>
>
> --
> J.
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160218/177eda78/attachment.htm>


More information about the erlang-questions mailing list