<div dir="ltr">Hi,<div><br></div><div>What the form of your bulk data? Is it a CSV that contain millions lines (rows) ? <br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Pada tanggal Rab, 4 Des 2019 pukul 02.59 Roberto Ostinelli <<a href="mailto:ostinelli@gmail.com">ostinelli@gmail.com</a>> menulis:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks for the tips, Max and Jesper.<br><div>In those solutions though how do you guarantee the order of the call? My main issue is to avoid that the slow process does not override more recent but faster data chunks. Do you pile them up in a queue in the received order and treat them after that?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Dec 2, 2019 at 3:57 PM Jesper Louis Andersen <<a href="mailto:jesper.louis.andersen@gmail.com" target="_blank">jesper.louis.andersen@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Another path is to cooperate the bulk write in the process. Write in small chunks and go back into the gen_server loop in between those chunks being written. You now have progress, but no separate process.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Another useful variant is to have two processes, but having the split skewed. You prepare iodata() in the main process, and then send that to the other process as a message. This message will be fairly small since large binaries will be transferred by reference. The queue in the other process acts as a linearizing write buffer so ordering doesn't get messed up. You have now moved the bulk write call out of the main process, so it is free to do other processing in between. You might even want a protocol between the two processes to exert some kind of flow control on the system. However, you don't have an even balance between the processes. One is the intelligent orchestrator. The other is the worker, taking the block on the bulk operation.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Another thing is to improve the observability of the system. Start doing measurements on the lag time of the gen_server and plot this in a histogram. Measure the amount of data written in the bulk message. This gives you some real data to work with. The thing is: if you experience blocking in some part of your system, it is likely there is some kind of traffic/request pattern which triggers it. Understand that pattern. It is often covering for some important behavior among users you didn't think about. Anticipation of future uses of the system allows you to be proactive about latency problems.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">It is some times better to gate the problem by limiting what a user/caller/request is allowed to do. As an example, you can reject large requests to the system and demand they happen cooperatively between a client and a server. This slows down the client because they have to wait for a server response until they can issue the next request. If the internet is in between, you just injected an artificial RTT + server processing in between calls, implicitly slowing the client down.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 29, 2019 at 11:47 PM Roberto Ostinelli <<a href="mailto:ostinelli@gmail.com" target="_blank">ostinelli@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="padding:20px 0px 0px;font-size:0.875rem;font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif"><span style="font-family:Arial,Helvetica,sans-serif;font-size:small">All,</span><br></div><div style="font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:medium"><div id="gmail-m_7104257868664812288gmail-m_-7896808937731452598gmail-m_-3044689990346731253gmail-:1cr" style="font-size:0.875rem;direction:ltr;margin:8px 0px 0px;padding:0px"><div id="gmail-m_7104257868664812288gmail-m_-7896808937731452598gmail-m_-3044689990346731253gmail-:1cs" style="overflow:hidden;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:small;line-height:1.5;font-family:Arial,Helvetica,sans-serif"><div dir="ltr"><div>I have a gen_server that in periodic intervals becomes busy, eventually over 10 seconds, while writing bulk incoming data. This gen_server also receives smaller individual data updates.</div><div><br></div><div>I could offload the bulk writing routine to separate processes but the smaller individual data updates would then be processed before the bulk processing is over, hence generating an incorrect scenario where smaller more recent data gets overwritten by the bulk processing.</div><div><br></div><div>I'm trying to see how to solve the fact that all the gen_server calls during the bulk update would timeout.</div><div><br></div><div>Any ideas of best practices?</div><div><br></div><div>Thank you,</div><div>r.</div></div></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr">J.</div>
</blockquote></div>
</blockquote></div>