[erlang-patches] Win64 memory corruption fix
Tue Feb 12 20:32:17 CET 2013
When the Win64 emulator was released we switched to it as a solution to our memory problem. We were bumping into the 3 GB limit and our solution was to run multiple VMs on the same machine. However once we started using the Win64 we noticed it was stable as long as memory usage was low, but if it started approaching or passing the 4 GB line odd problems started showing up, which turned out to be memory corruption. We were getting a number of function_clause and bad_match exceptions and when we reviewed the code and the data we found cases that should be impossible.
Code wise, I looked at all uses of the type long regardless of the code section. I agree that the majority of the patch is just formatting code with bad casts which should only result in incorrect memory reads and not writes (however I believe the issue could result in erl_misc_utils.c a write)
After the changes, the memory corruption issues went away. Unfortunately I don't really know which specific casts caused memory corruption.
As mentioned in my original email, applying the top down memory allocation registry setting can help trace down any of these potential problems as the erlang VM will GPF. With the memory allocation in place, the Win64 Erlang will not fully build. Erl.exe will be built, but the first attempt to compile a .erl file will result in a GPF where the stack trace will show the source as one of the printf functions. With the patch applied, the Erlang VM runs fine with the top down memory allocation applied.
From: Sverker Eriksson [mailto:]
Sent: Tuesday, February 12, 2013 10:27 AM
To: Blaine Whittle
Subject: Re: [erlang-patches] Win64 memory corruption fix
Does this patch solve a real issue that you have seen, or is it more of a potential problem stemming from code analysis?
We really appreciate patches like this, but we need to prioritize it for inclusion in R16B or not. And it seems to me the patch mostly deals with type casts for diagnostic printing of internal states, where a wrong type cast just mean we get a wrong value printed.
Blaine Whittle wrote:
> git fetch git://github.com/bwhittle/otp.git win-64-pointer-fix
> This patch should fix a number of memory corruption issues and / or crashes on Win64 that can potential occur when the Erlang VM exceeds 4 GB of ram. The problem stems from casting pointers to unsigned long and assuming long is type that is always large enough to hold a pointer. This assumption holds up for all platforms except windows.
> Nix 32 (unsigned long) -> 32 bit (pointer size = unsigned long)
> Nix 64 (unsigned long) -> 64 bit (pointer size = unsigned long)
> Win 32 (unsigned long) -> 32 bit (pointer size = unsigned long)
> Win 64 (unsigned long) -> 32 bit (pointer size != unsigned long)
> To compound the problem these casts can appear to be fine on Win64 as only those pointers that reference memory above the 32 bit address space will lead to issues. Which means you need Erlang to allocate ~ 4 GB of memory before you even have a chance or running into problems. The issue is if you have a pointer that reference memory above the 32 bit address space on Win64 and then type cast it to a long (i.e. 32 bits) and then turn around and use that type cast value as a pointer then you'll be referencing a different memory location. Most of the time the incorrect pointer will still reference a valid location as memory is allocated bottom up which can lead to memory corruption.
> This patch has been tested heavily and has been used on production
> systems. I made the changes a year ago when the Win64 Erlang VM was
> released (just didn't mean to wait so long to submit it.)
> The patch submission page recommends that I create new test cases which I have not done. However I have a small registry change that should be applied on any systems that execute Erlang Win64 smoke and / or unit tests.
> The registry change instructs Windows to allocate memory from top down, meaning that any valid memory pointers will require a 64 bit value and any attempt to cast them to a 32 bit value followed by a dereference will produce an access violation.
> To apply the registry setting just copy and paste the following section and place it into a <some name>.reg file and import it on each test machines followed by a reboot.
> Windows Registry Editor Version 5.00
> Manager\Memory Management] "AllocationPreference"=dword:00100000.
> With this registry change Erlang's existing unit tests should be able to catch any incorrect pointer casts by causing the VM to crash. Every pointer will reference memory above the 32 value so type casting it to a 32 bit value and then dereferencing causes an access violation.
> One potential issue with using this registry setting is that if your
> test machines rely on 3rd party Win64 apps it's possible they may
> crash on startup (that is if they contain similar type casting bugs.)
> erlang-patches mailing list
More information about the erlang-patches