[erlang-questions] Best way to check if a given directory is empty (or not)?
Richard O'Keefe
ok@REDACTED
Tue Mar 13 22:25:30 CET 2012
On 13/03/2012, at 9:03 PM, Steve Strong wrote:
> Yeah, if there are other processes (erlang or os) then that would be a problem :) - obviously easy to recreate the directory if it did get deleted but you would have that time when it wasn't there.
Recently I was doing some multithread programming in an imperative language.
There must be a proper technical term, but the one I've been using is
'evanescent property'. An evanescent property is one that quickly becomes
out of date. Let's take an example.
while (!work_pool.is_empty()) {
/* A */
process(work_pool.remove_most_urgent_task());
}
terminate();
The code was originally written for a single-cpu system using co-operative
multithreading. So it could rely on nothing happening at point A, so that
when it came to call work_pool.remove_most_urgent_task() it was *certain*
that work_pool.is_empty() was still false.
If you have pre-emptive scheduling, some other threads might be scheduled
at point A. By the time you get to work_pool.remove_most_urgent_task()
the work pool might have been emptied and refilled any number of times.
That isn't _likely_, but it can happen.
If you have multiple cores, some other thread might be executing at the very
same time as this one, and again, by the time work_pool.remove_most_urgent_task()
is called, you have NO IDEA whether work_pool.is_empty() is still false or not.
This is the kind of thing that keeps on working when your environment is upgraded,
right up to the point where it breaks. And then it's a nightmare to reproduce.
You can use locking, of course.
while (work_pool.lock(), !work_pool.is_empty()) {
task = work_pool.remove_most_urgent_task();
work_pool.unlock();
process(task);
}
work_pool.unlock();
terminate();
Or better still, you can redesign:
while (work_pool.remove_most_urgent_task(&task)) {
process(task);
}
terminate();
although that one is tricky in languages that take a rigidly, blindly, and
stupidly doctrinaire approach to what the designers (mis)took to be
"object orientation" and ban multi-result methods. (I could name a language
like that, but I shan't. Guess, while drinking coffee and listening to a
gamelan...)
I hope the relevance is clear. A directory is just such a shared mutable
data structure, and if there are other processes using the file system you
have worse problems than just a potentially missing directory.
I don't know any way to lock a directory in UNIX (and I'm pretty confident
that I _would_ know; there could be a dozen ways to do it in Windows and I
wouldn't have a clue). One approach would be to create a '.lock' file in
the directory and lock _that_; you just have to make sure that other OS-
level processes also lock the lockfile around modifications to the directory.
Frankly, I have been worried about this ever since I first learned UNIX.
I've worried about "What happens if someone adds or removes a file while I
am scanning a directory?" The manual page
If a file is removed from or added to the directory after the most
recent call to opendir(3C) or rewinddir(3C), whether a subsequent
call to readdir() returns an entry for that file is unspecified.
is not reassuring. In fact I suspect it's worse than that: in systems with
hashed directories I have a nasty feeling that any change to a directory
might result in an open directory scan seeing a name a second time and
missing some names that are still there. Solaris has the notion of
"name-locking" a file system, preventing any changes to directories while
the lock is held, but that's a bit _too_ broad, and you have to own the
file system to do that.
So when scanning a directory, I religiously do NOTHING but read names and
store them, in the hope that getting it over quickly will reduce the odds
of craziness.
More information about the erlang-questions
mailing list