[erlang-questions] code_change in the large

Mon Jun 20 09:46:42 CEST 2011

When you have tens of thousand processes doing time-critical work, you tend to have to think hard about how to manage code change. There are some alternatives, and your balance between speed and complexity will determine your mileage.

1. Simply stop new jobs from being spawned before upgrade, wait until all running jobs are finished, upgrade, then allow new jobs

2. Bootstrap state changes on-demand, e.g. by checking the data version at some strategic point, and calling conversion routines when needed. This obviously requires some foresight.

3. A particularly dirty trick, if the upgrade involves new message formats, is to use hibernation; when waking from hibernation, a version tag could be checked and the appropriate code change functions called. This has the advantage that messages can be converted before they are "received" by the normal application code. This type of code change is implemented in plain_fsm.

4. Redundancy upgrade. This is tricky to do if you're using mnesia, but there are some ways around that too. Redundancy upgrade works best if you have version control in the communication between nodes. An old library of mine, http://svn.ulf.wiger.net/vcc, has a plugin system for message transform functions, chaining them together to form a transform path between two versions.

A particular problem with soft upgrade in this regard is that you tend to be forced to take shortcuts, relaxing the tests ensuring that the upgrade in fact works. This is one reason why I tend to favour redundancy upgrade, at least for very complex systems. It gives you time to carry out thorough consistency tests.

BR,
Ulf W

On 20 Jun 2011, at 00:21, Mojito Sorbet wrote:

> As I understand how code_change works, all processes using a module must
> be suspended before the switch can happen, the code is purged and
> reloaded, then they are all resumed.  The application upgrade support
> takes care of the mechanics for this, but I wonder about the timing
> implications when I have possibly tens of thousands of processes using
> the same module.  Are there any numbers on this?
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com