[erlang-questions] Automated Stripping of otp libraries / modules
Matthias Lang
matthias@REDACTED
Thu Jun 23 16:49:07 CEST 2011
Hi,
Dale Harvey asked about how to figure out what modules are actually
needed in a system. I promised to try out some different approaches on
the control system for the hardware I work on.
Executive summary:
- Using my code:all_loaded() approach nets me 153 modules
- Xref looks like a dead end. Xref gives me 475 modules in a simple analysis
- Dialyzer might be able to do better. I don't know.
The rest of this post is long. Sorry. I should probably get a personal
blog, but then nobody would read it.
Hacky approach:
I previously described what we actually do: run our test suite and
then call code:all_loaded(). Simple, but only pulls in modules the
test suite touches. Seems unbeatable for my purposes.
Use Xref to find module dependencies: 475 modules
Xref is OTP's cross-referencing tool.
There's an analyzer in xref which reads .beam files and produces a
call graph. A call graph is just a (big) list of which function
calls which other function. So we could use that to see which modules
are actually needed in a system.
The other part of xref is a query language which lets you determine
things about the call graph.
Anyway, let's just dive in. The call
{ok, Modules} = xref:analyse(Xref, {module_call, [gth_mop]})
returns a list of modules used by gth_mop. gth_mop is the 'entry point'
for my system. So if I keep calling
{ok, More} = xref:analyse(Xref, {module_call, Modules})
until More == Modules, then I've got every module the system needs.
The list produced this way is huge, 475 modules, and includes a
bunch of things which are obviously _not_ needed, e.g. 'wx' on
a system which can't possibly run 'wx'. Not so good.
Use reltool (Håkan Mattsson's suggestion)
It looks like reltool just uses 'xref'. Instead of using
xref:analyse/2, it uses a query, xref:q(Pid, "UM"). I'm not sure
if it does that recursively or not, but I can't see how it can
solve this problem better than xref. But I'm no reltool expert.
Can Xref do a better job?
My first approach with xref is crap. Imagine this system with two modules:
system entry point is m:f
m:f calls n:f
n:f calls nothing
n:g calls o:f
o:f can't be reached from m:f, but xref's module_call analysis will
include 'o' in the results. So the module_call analysis is not
the right way to go for this problem. We need to use the call graph.
There's probably an xref query to do what I want, but thinking in
terms of xref's query language is beyond me*. So I just get xref to
give me the call graph edges, like this (E stands for edge):
> xref:q(Xref, "E").
{ok,[{{m,f,0},{n,f,0}}, {{n,g,0},{o,f,0}}]}
you can see from the call graph that o:f/0 isn't reachable from m:f/0
But this falls in a heap as soon as you use 'spawn' or M:F in
even slightly tricky ways, e.g.
go() ->
B = b,
spawn(fun() -> B:f() end).
call graph: [{{mml,go,0},{'$M_EXPR',f,0}}, {{mml,go,0},{erlang,spawn,1}}]}
'$M_EXPR' is xref-speak for "I don't know which module this is".
Can Dialyzer do this better?
Dialyzer is remarkably good at finding dead code, so I wonder if
it can produce a call graph better than xref does. But I've already
spent the better part of a day on xref so poking around dialyzer's
will have to wait.
Matt
* The xref manpage says that xref has a "simple query language". The
language has more than 20 predefined variables, a bunch of
operators including |, || and |||, regular expressions and a cast
syntax. I don't think that qualifies as "simple", unless your hobby
is designing query languages.
More information about the erlang-questions
mailing list