[erlang-questions] Building a DAW in Erlang

Thu Aug 31 21:06:52 CEST 2017

Hello Joe,

On 31.08.2017 13:11, Joe Armstrong wrote:
> Hello,
> 
> I want to build a "proof of concept" DAW (Digital Audio Workstation)
> 
> Why? - just for fun.
> 
> DAWs involve complex GUIs, complex audio processing, and complex
> man-machine interactions - I'd like to make the DAW from small well
> defined isolated communicating components.
> 
> I was wondering about audio - has anybody ideas about this.

A big challenge with a (distributed) setting consisting of isolated
components is to get synchronisation right, especially with an
asynchronous  architecture. If you think of hardware components
connected via e.g. AES/EBU unidirectional digital links, each of these
components usually gets a clock source (either audio or wordclock input)
unless it is the clock master.

When doing audio with a computer, the clock master is in general the
audio sound card. If you want to have a distributed system with
_several_ sound cards (even on the same host), you will need a hardware
wordclock sync between them. Otherwise you would introduce new clock
domains.

If the system clock is used to generate audio samples (lets say to
generate 48000 samples per second) and the samples are just fed to the
sound card, evil things will happen (which things exactly depends mainly
on the hardware and on drivers, but they will be audible in general
which I consider evil). The cause is: the system clock is in another
clock domain than the audio card. And system oscillators are in general
cheap ones with relatively bad stability and temperature coefficients.

When different clock domains are involved, sample-rate conversion and
DLL/PLL and stuff like that are needed, with in turn may introduce
audible pitch changes (DVB for instance relies on a reference clock of
27MHz with a max change (1st derivation) of 75mHz/s to avoid this, see
ISO/IEC 13818-1). Nevertheless adaptive sample rate conversion is done
e.g. by vlc and other players to play broadcast radio/television e.g.
via RTP. I'd rather avoid doing that myself (I did it once and its no fun).

So when thinking about a system open for professional use, I'd design it
to use a single clock domain. Two approaches to achieve this on a
computer based system are
  1. broadcasting or distributing a reference clock (e.g. with a sample
index)
  2. using backpressure
I think the former is a good fit for an asynchronous (Erlang) system.

On the other hand, one could start with an RTP sender (e.g. payload type
10 or 11, see https://en.wikipedia.org/wiki/RTP_audio_video_profile )
and rely on the system clock as a quick start (both encapsulated in a
single process, accepting timestamped audio and distributing timestamped
clock references). Perhaps vlc can be convinced to send RTP/RTCP based
on the audio input of the same audio interface, that would be a better
time reference (same clock domain).

The more pro audio open source variant would probably be to use jackd,
e.g. by writing an Erlang netjack1 client
(https://github.com/jackaudio/jackaudio.github.com/wiki/WalkThrough_User_NetJack).

> 
> All idea are very welcome

That's jump a dump of what came to my mind about your idea. I hope it's
useful.

Jacob