[erlang-questions] Type inference in Erlang [WAS: Re: The compiler "eats" structures which are not separated by commas]

Mon Apr 23 17:05:42 CEST 2012

On 23 Apr 2012, at 14:49, Kostis Sagonas wrote:

> Finally, to give more evidence for this claim Robert mentioned "attempts in the 90's by knowledgeable people". If I took this last statement literally, I would probably find it a bit insulting -- I don't because I am sure that Robert did not mean it this way.

So am I.

I'm sure he referred to people like Wadler, Marlow, Arts and Armstrong. I guess you could see this as the first wave of type analyzers for Erlang. Dialyzer was introduced in 2004, right? Sven-Olof Nyström talked about his soft-typing system for Erlang at the 2003 Erlang workshop...

At the start of the AXD 301 project, in 1996, we were promised that there would be a type checker with the first Erlang/OTP release - something derived from the Wadler/Marlow work, so we were doing our best to prepare, by inserting comments that at least approximated the syntax we'd seen examples of. When some type checker prototypes became ready for some serious testing (I recall trying to integrate them into our design flow), we had too much legacy code, and an established workflow that didn't play well with the type checkers. They were not very good at doing partial analysis, and going through our > 1 million lines of Erlang code and annotating everything was not an option. In other words, it was type *checking* - not inference.

In short, they were never useful to us. No blame should fall on them (or on the original designers of Erlang). We certainly could have approached our design differently, if our primary goal had been to stay compatible with a future static type checker, but as it was, we had different requirements, and adapted to an Erlang without type inference, and without static type checking.

When we first tested Dialyzer in 2004, it was very quickly well received, since it didn't clash with the established culture, even though we were then 8 years into our erlang-based product development. This was quite a feat. Given that a useful approach eventually emerged, surely the work that went before it was not a waste of time, and was therefore useful in a different context. I'm being deliberately hand-wavy on the connections, since I don't know to what extent the different projects informed each other, but negative results are also valuable results.

Obviously, there is a huge difference between "commercially useful" and "academically useful", and even between "practically useful" and "theoretically useful".

As to whether this means Robert was justified in saying that "type inferencing doesn't work", I will let him try to defend. :) 

But the *context*, once more, was that he commented on the statement "... which a language with type inference doesn't need", even more specifically, suggesting that certain parts of the Erlang syntax could be eliminated with the help of type inferencing. That Robert's claim *in this very narrow context* is plainly false is at least not obvious to me.

> Anyway, I will repeat myself: *there is no fundamental reason for type inference in Erlang not to work*; there are only engineering and still (erroneous) community perception reasons.
> 
>> Obviously dialyzer is (extremely) useful, but there is a reason why it's off-line and doesn't actually give compilation errors, but merely advice.
> 
> Well, no. This is also a wrong statement. The reason why dialyzer is off-line is mainly historical (it started as a separate tool) but also related to the fact that it is typically more effective when analyzing a complete application rather than a single module. But fundamentally there is actually no reason why dialyzer should remain an off-line tool, especially if the BEAM compiler gets extended to take a set of module rather than a single module as input.

Hmm, if we are nit-picking, did you just tell me that it was a wrong statement that "there is a reason why it's off-line…", and then proceed to give me the reason why Dialyzer is an off-line tool?  ;-)

Given that my reply was directed at the human equivalent of Dialyzer, I took some care to stay within the bounds of what I knew was correct (and as a consequence, didn't end up saying much). I merely observed that dialyzer is currently an off-line tool, and "merely" gives advice. I did *not* claim that there was a fundamental reason why it couldn't be more tightly integrated into the compilation flow. On the contrary, I have many times expressed the hope that we are slowly moving in that direction, including gradual changes of how the community views (and works with) its erlang code.

Actually, the current state of things seems to be that the compiler borrows ideas from dialyzer and implements them selectively. This doesn't seem like an ideal setup.

Since Vlad in a different thread mentioned that Dialyzer's was too slow to be incorporated into the compilation flow, I took some time to do some measurements to prove him wrong. Unfortunately, calling dialyzer_cl:run(…) with only one file (gproc.beam) given an up-to-date PLC took 3.5 seconds, of which most of the time seemed to be spent inside the actual analysis in dialyzer_analysis_callgraph. I gave up at that point, since my initial assumption was not so easily proven as I had hoped.

Since the entire compilation of gproc.erl takes ca 320 ms on my machine, even reducing the runtime of Dialyzer by 10x would mean a doubling of the compilation time. Granted, my investigation was shallow to say the least, and on this point, I sincerely hope I am wrong. Also, as long as it's an option, even 3.5 secs might actually be quite reasonable sometimes.

Bob Ippolito gave a talk at the London Erlang Factory last year, where he mentioned some practical issues they had had with Dialyzer (http://www.erlang-factory.com/conference/London2011/speakers/BobIppolito, 29:30 into the presentation - I recall that you were there). I know some of his complaints have been addressed to some extent (and so did Bob at the time), but you do often need to clear a few hurdles before you have dialyzer well adapted to your design flow (or the other way around).

These are engineering issues, I agree, but there is really a fairly large number of tools and techniques that are discarded as "not useful" on mostly fairly subjective grounds, because the cost-benefit analysis is not clear to enough developers.

Having said this, Dialyzer *is* one of the most popular tools in the Erlang community - in other words empirically "useful". ;-)

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com