Architectural Suggestions for Job Queuing

Wed Apr 14 17:21:35 CEST 1999

Hi Joe,

Thanks for the response.  It looks like there does not exist a
gen_batch_queue or gen_software_license module which would immediately
solve my problem.  :)

Getting on to more specifications of the applications let me first
state the following.  I have no intention of selling the programs or
running them outside of the company where i work.  If they turn out to
be generically useful I'd gratefully make them available to the public
community.

I'm not sure how much work is involved in developing a robust (crash
resistant) and reasonably efficient application.  I am curious about
your remarks

>     You have to decide:
> 
>     1) Server in  Erlang + clients in Erlang  + Distributed Erlang for
>        communication, or,
> 
>     2) Server in Erlang +  clients  in Erlang + socket communication,
>        or,
> 
>     3) Server in Erlang +  clients in (C, C++, ...) + socket communication
>        + API for (C, C++, ..) client side applications
> 
> These are in order of complexity (simplest first).
> 
> 	1) would be OK for a quick prototype to get the protocols ok
> 	3) would be for a commercial product that you could sell
>   	2) is a half way house to implementing 3)

I figured that implementation #1 would minimize the amount of my work.
Why is option #2 "better" than #1?  I presume you are saying that #3
is good for a commercial product as the end-user could interface to it
using C and not Erlang.  When you mention option #2 are you suggesting
that there is some good reason for using socket communication in
preference to Distributed Erlang communication?

>     I  think the key  architectural/design  problem is one of deciding
> what you want to do  in the event  of failure. If the server  crashes,
> then all the clients are blocked. If there is a communication failure
> then you may loose licenses etc.

I agree.  This may sound selfish but I would like to implement a
robust distributed application because it sounds like fun and i have
never written one before.

>     Here you can virtually design whatever you want.

Darn, this is where i was hoping for a gen_software_license module :)

>     In a small (closed) system - then maybe there should be no server -
> the clients could alternatively take on the roles of client or server and 
> use a broadcast/lock strategy to negotiate licenses.

Interesting.

>     In a  commercial system all nodes might  not be  equal. One server
> (placed on a  reliable node) might service  hundreds of clients, etc. -
> you  still might want to  have some hot-standby/fail-over behaviour for
> the server,...
> 
>     This kind   of stuff soon  gets complicated  (but  that's OK - our
> telecomms stuff is like this :-)

Since this is not a commercial system i have a lot of flexibility
here.  I am not familiar with the trade-offs.  I presume that the more
robust the system the more work it will take to get it written.

>     Firstly I'd, like   to see a  simple "ball  park"  analysis of the
> problem, in terms of;
> 
> 	a) How many clients

We have around 100 workstations at our company.  This sounds like a
nice number.

> 	b) How many servers

Other than for reliability or performance i see no reason why the
answer could not be 1.  Furthermore, I see no reason why the server
could not run on all of the machines -- all 100 of them.

> 	c) Holding times (how long does a client use a license)

Rarely less than 10 seconds.  Normally a minute or two.  There may
also be interactive use where the license could be held for hours at a
time.  There may also be large batch jobs where the license is held
for some hours.

> 	d) Reliability requirements.
> 	   - Is it acceptable that clients block if the server crashes?
>          - Do you want hot standby for server crashes?

Yes and yes.  I'm not familiar with the coding and performance impact
of having a hot standby.  I have never implemented anything like this
before.

>       e) is this a LAN/WAN application?

LAN only.

> 	f) required response times for obtaining/freeing a license 

Less than 1 second.

> 	g) security levels (none, ... full) How much effort do you
> 	   want to put into making sure that you cannot forge a license
> 	   (you can have anything up to a full public/private key system)

None.  Environment is completely secure. (Ha!)

> 	h) Maintaince levels (do you want a remote management system)
>            If so what ...

I'm not sure what remote means but yes i do want the ability to
dynamically increase or decrease license counts, add new queues, shut
down queues, etc.  This is one of the reasons Erlang intrigued me as
it seems that you guys have familiarity with these kinds of
applications.

>     Once you have some idea of the answers to questions like these you
> can *begin* to think about an architecture.
>
>     It may be that you have a very specific set of answers - fine then
> we can talk architectures. Or, you might want to "grow" a solution for
> a very simple idealized system.

I'm in no rush.  I like the idea of having an architecture in mind,
coding a pilot version, getting feel for how well it will work, making
changes and then when happy adding all the bells and whistles.

By the way, here is a more detailed description of how my license
client and server work.  This works well enough in the environment
which i have described above.

My present implementation of the license client in ocaml is quite
small.  When the user wants to execute a command on a named license
queue, the user enters

	license_client name command

The license_client program communicates with the server by sending it
the license name.  The license_client program then does a read and
blocks.  When the server decides that the named license is available
to run it writes to the client.  The pending license_client read now
succeeds and then the license_client executes the desired user
command.

Once the user command finishes the license_client closes the
connection with the server.  The server now knows that the license is
available.  This approach works well as licenses are rarely lost even
if the client is killed or crashes.  If the server goes down i force
the clients to wait.  If the server goes down in the middle of a
transaction then there is a problem.  This is an area for improvement.

Since the present implementation of the server uses threads it is
convenient to only remember the currently running processes.  It
messes up the code quite a bit to create a queue of pending processes.
This is however a desirable feature and so i would like to add this in
the next implementation.