Improve pad character handling in base64 MIME decoding functions

Tue Dec 7 03:40:33 CET 2010

Hi all,

This is my first patch to OTP. Hopefully, I've ticked all the boxes.

Here is the git fetch request:
  git fetch git://github.com/tpodowd/otp.git base64_mime_decoding

This patch fixes some of the problems we've run into when decoding
base64 encoded text generated by various email clients. Most of the
issues are related to to many padding characters at the end of the text.
A typical example is "abcd====" where the trailing padding is all
extraneous. 

After reading RFC4648, I decided that we should implement the MAY
clauses for MIME base64 decoding.

This patch breaks one existing unit test.

- %% One pad, followed by ignored text
- <<"Hello World">> = base64:mime_decode(<<"SGVsb)(G8gV29ybGQ=apa">>),
+ %% One pad to ignore, followed by more text
+ <<"Hello World!!">> = base64:mime_decode(<<"SGVsb)(G8gV29ybGQ=h IQ= =">>),

In the old test the trailing "apa" text is ignored as it follows a pad
character. RFC4648 states that we MAY ignore pad characters that are followed
by further text by treating them the same as illegal characters. As such, I
have left the pad as-is and followed it up with the same number of valid 
base64 characters but which decode to something a bit more friendlier and 
added the proper "==" padding to correctly terminate the encoded stream.

Given the new test, the old implementation would still return <<"Hello World">>
without the trailing "!!".

I took the liberty of adding more tests and also added tests for the two
different implementations for mime_decode/1 and mime_decode_to_string/1. All
base64 tests pass currently.

Best regards,

Tom.