[erlang-questions] Question/Alternative on Frames Proposal [Warning: Long]

Fri May 18 02:21:45 CEST 2012

Richard,

Impressive proposal on Frames, in the explained issues, benefits, and especially the background for the choices.  I've read through it and I'm in agreement that Erlang should plug the issue with updates to records and being able to pass formatted name-keyed information between systems.  Having said that... and I'm certainly not trying to paint any bike sheds here (nor have I read pre-2012 conversations)... but it leaves me wondering one thing... but first some background on the question/alternative.

(Bear with me for a moment, while I diverge into some Java analogies).  The Java folks have been very careful recently to distinguish two things: The Java language and the Java platform.  We have a similar situation here, and if you consider the platform composed of the OTP and Virtual Machine (BEAM for example) as two separate entities (like Java has the JDK and VM), we should think about the impact to all 3 of those items in a proposal.  Another thought: Java made huge distortions around injecting types onto lists (using generics) and having to make the compromise that those types never make it into the class files.  This was done - perhaps among other reasons - to maintain compatibility from old code to the large quantity of class libraries available for Java.  I'm not saying we should compromise on capabilities - especially reliability (not sure that is necessary), but we should consider the impact to the libraries (and cost to rewrite) on any
 changes.

Where I think we would vehemently agree: I expect Erlang to be robust.  That's why I'm even here.  An issue with records where you can compile two files (using the same .hrl) and end up with a result that (a) compiles, (b) doesn't produce an error and (c) produces the wrong answer... is a serious issue.  It distinctly shows the shortcomings of records.   

However, in the 3 pillars (Language, Libraries, VM), while all need enhancement, this is primarily a limitation *of BEAM format and the virtual machine* (because this problem would occur EVEN IF we compiled another language to BEAM and used tuples).  As you've noted, the VM needs to comprehend the mapping from keys (Atoms) to values.  I believe you have defined the VM changes in a way that is robust and requires little further discussion.  That leaves how to communicate that information to the VM.  You've defined some BEAM file changes (new commands, and if I'm reading correctly, using the Literal Table to store frames), which doesn't have a major impact on BEAM file structure (but mainly relies on frame_to_binary, etc.).  I believe that works as defined as well.

So the next question then becomes what of the Erlang language.  Your proposal covers a new syntax that uses <{ ... }> to identify a frame.  You end up using ~ since that is a character unused in other parts of Erlang, so that you have a new operator to avoid ambiguity.

But here is where I get stuck.  

Why not use the record format?  ... (Bear with me, I believe the issues in that reuse can be addressed.)

Issue #1 with the ~ syntax) Today's Erlang developer, if using records, has already provided a form of that information.  It already has a key->value mapping.  One could easily assert the preprocessor was forced to destroy information for 
lack of an ability for the compiler and BEAM format to handle the 
information.  BUT, we know that the record format is NOT ambiguous in ability to parse.  (It is a bit verbose, however).  We cannot force records to be formal as a native format (since backwards compatibility forces tuple comparisons to pass), but that doesn't mean we can't leverage the syntax...

Issue #2 with the ~ syntax) The format proposed for extraction from a frame, with a tilde, is ~name(Frame).  (okay, a little bike shed paint :) ) This reverses the 
standard ordering of items from Major to minor.  For functions, we have 
module:function/arity, and module is more important than function.  Same
 goes for the arguments (less important than the function name).  The ~field fun with a target of the frame reverses that relationship, ~minor(Major), which may be uncomfortable, and there 
is an alternative (use the existing record format).  

We are all already familiar with the record format being a key:value mapping, and that's what a frame would look like:
P = #point{x=4,y=5}.
or as part of the frame proposal, if you prefer the anonymous definition vs. a declared format, we can use (from erlson):
P = #{x = 4, y = 5}.

The big question:
Q: How do I know this #point is a frame and not a record? 
A: Well, we have 2 ways of knowing:
(1) A language requirement would be imposed that #{ is always an "anonymous frame" since it is not a valid record.
(2) The namespace defined for -record(ATOM, {...}) and 
-frame(ATOM,{...}) would be shared (compiler - well, technically the 
preprocessor - would halt with an error if a record and a frame with the
 same ATOM identifier were included in the module to be compiled.)  Since the preprocessor destroys records, the only #atom{ the compiler would ever see would be frames - it is never ambiguous to the compiler.

Comments on your questions from section 4 of your document in comparing solutions:
Q1: Frames answer changes to Yes, this actually forces an increase in intelligence of the preprocessor.
Q4: This would alleviate the impact on Dialyzer as it already understands record formats (should make it a less challenging change)
Q9: If the frame is a declared frame, then the frames answer is enhanced
 to be able to catch this issue at compile time, otherwise as before
Q10: If the frame is a declared frame, then the frames answer is enhanced
 to be able to catch this issue at compile time, otherwise as before
Q11. Frames now take on the unnecessary recompile situation of records (if the record is a declared frame)
Q20: I believe this would change to make detection easier, but am not totally sure.
I believe all of the other answers stay the same since this is not changing the BEAM or VM structure of your proposal.

Additional Q&A:
Q: So this forces declaration of a frame if I want an atom between # and {
A: That is my thought.  It would look like:
-frame(point, {x, y}).
If a frame is to be used throughout a file, the declaration helps to avoid capitalization and other typo errors in the name of the frame.  This DOES give us the "overhead" (and the benefits of forcing a recordkeeping and naming) of .hrl files IF we are using defined frames, but since the frame information is passed into the VM, we do not inherit the "compiled at different times so I used the preprocessor to trick myself" problems of records.  This also provides the language (the libraries, really) the ability to publish standard definitions of frames that could be reliably re-used.  Without that publication, errors would be found at runtime.  With publication, some can be found at compile time.  Consider:
get_area(#rectangle{x1 = 4, y1 = 4, x2 = 6, y2 = 8})
...could be caught at compile if there was (in an hrl)
-frame(rectangle, {x, y, w, h}).
The ~frame proposal does not have a declaration and would catch the error at run time.

Q: Can this use defaults?
A: Sure.  Unlike the <{}> syntax which has no ability to declare, this form allows (but does not require) a declaration.  This would mean providing defaults, which could get exponentially large in the number of required functions, would not be required.  The preprocessor could continue to handle that capability of inserting the defaults into the correct locations before the compiler has to take over.

Q: How does extraction change?
A: Extraction with the record-style format ... and is major to minor:
Frame#format.member

Q: What about the erlson ambiguity with "."?
A: This does not have that issue.  Since it is using a more verbose syntax, it is not "x.y" and "x.y".  The "x.y" atom is "x.y" and the "Y member of Frame X" would be: x#point.y

Q: Should this change the stance on extension vs. update?
A: It could, and perhaps only for defined frames.  Since the definition at the top of the file would exist it would decrease chance of typographical errors (but admittedly not eliminate them across files).  An alternative here that would require one additional byte of storage (or another type of frame perhaps) would be to have a "strict" definition on a frame.  So the definition might be:
-frame(point, {x, y}, [strict])
This would prohibit any frame constructed in that module from being extended (runtime check).  Not sure if that is valuable, but should be considered. 
Another possibility is to have a file-by-file setting:
-compile(strict_frames)     (The option to default to strict and have a allow_extend_frames is also possible)
My concern with losing the ability to extend has to do with extending the use of frames.  Consider a #rectangle{x, y, w, h}.  If it was extendable, we could add "color" and continue to hand the #rectangle frame into a "get_area" function.  Otherwise, we have to decompose either another frame or a tuple to extract the rectangle field in order to pass the rectangle into the get_area function.  Additionally, if we have a transform(x, y, #rectangle) function, an extendable frame would allow just 2 field updates in a constructed frame, while the non-extended version would require the result of the transform be repackaged into a new frame or tuple (increases code size & execution at the expense of the security of extension vs. update).  With declarations and compiler flags, it seems we can get the best of both worlds.  (Although you refer to some optimizations from not allowing updates that I don't think I quite followed)
NOTE: Extension might have a side effect of requiring additional virtual machine 
instructions.  Richard's proposal only allows the "create_frame" to define the tuple of allowable Atoms, while any extension would require a "merge atom list" function that could cause additional VM instructions?

Q: What happens if I have:
-frame(point, {x, y}) in one file and 
-frame(point, {x, y, z}) in another?
A: The first module will happily accept frames from the second file.  This is because Richard's definition that the pattern matching is loose and I agree with that logic.  The second may or may not fail depending on whether z is ever dereferenced.  If it is, usage would fail at runtime just as with the ~field(frame) syntax or pattern matches with ~.

Q: What happens if I have:
-frame(chromosome, {x, y}) in one file and
-frame(chromosome, {w, z}) in another file?
A: It depends on the answer to extension vs. update.  
If extension is prohibited, then either (a) Dialyzer should catch this for you or (b) Attempts to modify or pattern match the frame from the other module will fail at runtime just like it would in Richard's ~frame proposal.
If extension is allowed, then the extension in a given module is based on the definition IN THAT MODULE.  So things would happily go along unless you dereference a key that does not exist (which in this example is likely, but in an {a, b, c}, {b, c, d} scenario might be avoided.

Q: This breaks erlson
A: Yes.  I believe that is an acceptable side effect given the benefits.

This provides one more additional advantage:
The library, aka the rest of OTP

The tilde frame proposal basically causes a rewrite of a section of OTP  (I'm unsure of the total side effect size) to use frames.  Every function might need a change if it were to use frames instead of records/tuples.  In the case of using the record syntax, all that would be required to change is the -record changed to -frame (unless someone used the tuple form in a match or construction somewhere! - Dialyzer should probably be taught to catch that if it does not already)

Even better, a few compiler options could even help out:
1) -record_as_frame would have the system automatically assume -record should be interpreted a -frame.  Unsafe? perhaps, but it provides a quick way of testing for compatibility with frames vs. records in a non-destructive way.
2) -fatrecords.  Think of this as the same as the compatibility that Apple provided in their transitions between architectures.  This could automatically produce extra functions that would emulate BOTH frames and records.  For example:

-record(rectangle, {x, y, w, h}).
get_area(#rectangle{w = W, h = H}) ->
   W * H.

would pass through the preprocessor (when provided -fatrecords) and become:
-frame(rectangle, {x, y, w, h}).
% From the -record use, as collapsed by the pre-processor
get_area({rectangle, _, _, W, H}) ->
   W * H;
% from the -frame use
get_area(#rectangle{w = W, h = H}) ->
   W * H.

This provides - at the expense of a bulkier BEAM file and a bit more pattern matching at runtime - the ability to do "fatrecords" in a library (like OTP) and gradually move code that leverages those libraries from records to frames, not forcing a full all-at-once rewrite.  This provides a smooth transition for legacy code, while providing benefits of the security of frames to those that would like it carried through the standard (and other) libraries.

Thoughts?

TP.

--
Tom Parker
thpr@REDACTED
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120517/72c8a2d2/attachment.htm>