Interviewed by Christian Schaller
The PulseAudio sound server: An interview with Arun Raghavan
Published on 25rd of January 2012
Dealing with audio is a quite complex set of challenges, ranging from latency issues to the challenges of migrating the audio streams between different hardware such as your internal speakers and an HDMI output. Over the last few years PulseAudio has established itself as the standard audio server for Linux systems and it is still a thriving project with new features getting added on a regular basis. To give you some insight into what is happening around PulseAudio we are this time talking with Arun Raghavan, who is the leading PulseAudio expert at Collabora.
This is the fourth in a series of interviews we are posting about some of the technologies we work on at Collabora. You can find the previous interviews in this series on our developer interviews page.
The goal of these interviews is to help inform our customers and partners on what is happening with some of the technologies which Collabora offers support and open source consulting services around and at the same time let everyone get to know some of the talented people working at Collabora a little better.
Thanks for taking the time to talk with us Arun. Can you please tell us a little about yourself as an introduction?
Hello! I'm Collabora's first (and currently only) employee from India. I work out of Bangalore and have been with Collabora for 2 years now. Most of my hacking has been around PulseAudio and GStreamer. I also moonlight as a developer on Gentoo Linux, the -Osofast distribution.
And how did you originally get involved with open source software development?
My first Linux installation was around 1998 -- RedHat 5.0 from a magazine. I've been hooked pretty much since, becoming more passionate about the open source way of doing things as time passed. I made a number of minor contributions after switching to Gentoo, but my first "real" open source hacking begun in 2007 during the Google Summer of Code, where I wrote a Xesam adapter for the Beagle desktop search tool.
You are Collabora's leading expert on PulseAudio, how did you get involved in PulseAudio development?
For several years, a bunch of people including myself organised a conference called FOSS.IN. While being an organiser meant I could never actually attend talks, I happened to attend a hackfest-of-sorts conducted by Lennart Poettering (the original developer behind PulseAudio), and got very interested in the project. I love tinkering in the lower layers of the stack and as a result ended up getting more involved. This included many small fixes and improvements, and over time some larger features as well.
You did a status report on PulseAudio at the GStreamer Conference in Prague, what was the main items you talked about?
There was some perception from the community that PulseAudio development had reached a sort of stasis, and the thrust of my report at GstConf was to put that concern to rest. We got PulseAudio 1.0 out after a long gap, and it includes a number of interesting new features and improvements -- a DBus control interface, record stream volumes, an echo cancellation module (which Collabora's own Wim Taymans wrote), ports to other OSes, an equalizer plugin, passthrough support (which was written by me at Collabora, for Intel), and various improvements and optimisations. Contributions have been coming in from individuals as well as companies such as Nokia, Intel, Canonical, and others. I myself am given some time from work at Collabora to contribute to PulseAudio, which I think is quite nice.
So all-in-all, the PulseAudio project is very much alive and thriving!
You mention adding passthrough support in PulseAudio. Does that mean that we now have full and easy support for things like AC3 and DTS over S/PDIF or HDMI audio and mp3 for stereo Bluetooth?
We're almost there. All the core infrastructure for this is there. Support has been added to GStreamer and VLC for using this infrastructure, and some preliminary UI to deal with this is also available. Once we resolve some UI niggles, we will have, in my opinion, the smoothest passthrough experience currently available across all desktop operating systems.
In the early days of the project there was quite a bit of noise and controversy around PulseAudio, but it seems to have mostly died down, is it a sign that PulseAudio has matured?
The animosity towards PulseAudio in its early days were an unfortunate part of the growing pains of the Linux audio stack, along with some poor choices in distributions. PulseAudio ended up pushing ALSA drivers in ways that other applications had not -- a vocal minority saw this as "OMG you broke my audio!", while others saw this as "Oh, this driver is broken -- it needs to be fixed". As time progressed and drivers were fixed, the first camp shrunk considerably. Distributions are also now more clued-in, and the packaging mistakes of early days are less common. That's not to say PulseAudio didn't have its own set of bugs, but a lot of the initial pain was due to the reasons I mentioned.
While we still do have more than our fair share of haters these days, most people now see the value that PulseAudio provides. For example, we are seeing increasingly greater integration of PulseAudio into GNOME -- it is now a core GNOME 3 dependency, and some more recent features such as echo cancellation and per-record-stream volumes are being used directly by applications such as Empathy.
For a lot of people it is probably not clear what the work division is between ALSA, PulseAudio and GStreamer, how do you map the field and what criteria do you use to decide if something belongs in one or the other?
When you're using PulseAudio with ALSA (specifically alsa-lib, which is the userspace interface), it's most helpful to look at ALSA as the layer that exposes the hardware and that's it. The library can do a lot more (it has its own plugin system), but in our context, it is the interface used to talk to the sound card. In the ideal world, ALSA exposes all the capabilities of the hardware in a standardised way, and PulseAudio handles all the features and policy that is built on top of these. This is not how things always work now, but we are heading towards this sort of split.
The distinction between what belongs to GStreamer and what to PulseAudio is a bit fuzzier, since a lot of features can be implemented in either layer. In most cases, it helps to evaluate where a feature belongs based on how much "closer" to the hardware it needs to be. For example, echo cancellation could conceivably be implemented in both GStreamer and PulseAudio, but we chose to implement it in PulseAudio because being closer to the hardware gives us some advantages with regards to accounting for playback and capture delays that can help make echo cancellation more robust. Moreover, keeping this in PulseAudio also means that clients that do not use GStreamer still get echo cancellation out of the box. On the flip side, and this is my personal view, features that rely on being aware of the codecs belong outside PulseAudio as much as possible.
One of the major challenges in open source is always communication between individual projects, and I guess for PulseAudio close collaboration with projects such as ALSA, GStreamer, BlueZ and so on is critical for success, what would you say about the general level of collaboration between these projects?
I think the cross-project collaboration in PulseAudio is something that works really well. The ALSA folks, particularly Takashi Iwai who has been doing the bulk of the work on ALSA infrastructure, have been very supportive of the requests from the PulseAudio developers. We have a great relationship with the BlueZ hackers, who have been very proactive in shepherding the PulseAudio Bluetooth module. The GStreamer hackers who work on the PulseAudio plugin either work closely with the PulseAudio developers, or are a PulseAudio developer (that's me ;)).
Our community, especially Colin Guthrie who is the PulseAudio maintainer, has spent a significant effort in helping other projects improve their PulseAudio support when required. This includes open-source projects like VLC and proprietary ones such as Skype.
Another interesting thing about some of our contributors, like David Henningson and Pierre-Louis Bossart, is that they work across the stack to deal with problems at a platform level rather than just within the project. This is valuable since it makes sure we solve problems holistically rather than by working around limitations of other layers.
So what are you personally currently working on?
For the last few months, I've been working on improving the echo cancellation module. Most notably, I integrated the excellent echo cancellation engine that was released as part of Google's WebRTC project. More recently, I've been porting PulseAudio to Android and comparing this with AudioFlinger.
Echo cancellation is a killer feature for functional voice chat, and it is one of the things Skype got right. You have mentioned the work done by Wim Taymans and now yourself in this area, what is the exact status and quality of echo cancelation these days?
PulseAudio ships with a couple of echo cancellers today. One is the Speex echo canceller, which also includes some noise suppression and gain control. The recent addition of the WebRTC canceller has significantly improved upon this. The learning time for the canceller is extremely low now, it does analog gain control as well (so we can now adjust your microphone volume based on the input level), and some early support for drift compensation has also been added (allowing you to perform echo cancellation when performing playback and recording through different devices). There's always room for improvement, but in the last year or so, we have laid the foundations for a much better VoIP experience on the Linux desktop.
What can you tell us about your work comparing PulseAudio to Audioflinger on the Android platform? How do the two compare?
Interestingly enough, for all the flak that PulseAudio gets for being "heavy", we are quite comparable to AudioFlinger in terms of CPU and memory usage.
Where PulseAudio really shines, though, is latency. Our ability to dynamically adjust buffering, arbitrarily rewind streams (i.e. rewrite unconsumed data in the hardware buffer), and timer-based scheduling gives us the ability to provide fairly low (greater responsiveness) or very high buffering (better power savings) as applications demand.
This means we can currently outdo AudioFlinger in low latency (which a lot of application developers want) as well as in maximising buffering (which will save power).
For those interested I recently wrote a longer blog entry about PulseAudio versus AudioFlinger.
What is the roadmap for PulseAudio going forward?
World domination, of course!
We've got a number of features coming from various contributors in the pipeline. Most notably, jack detection should land in the coming months, and hopefully support for ALSA's Use Case Manager too, which should make embedded folks' life much easier. There are various areas where we can improve, including more love for the RTP parts in our network support and doing better with lower latencies among other things.
We're pretty much the de-facto standard on desktops. I would love to see more embedded vendors look at what we bring to the table — major power savings and an extremely flexible framework.
Jack detection? What will that feature do?
Jack detection is a feature that allows us to know when a user plugs in or plugs out something from a jack. For example, PulseAudio currently knows that you have analog output paths from your audio hardware to a 3.5mm port and built-in speakers on your laptop, but it doesn't know whether there is anything plugged into the 3.5mm port or not. The decision of which of these audio should be routed to is handled by the ALSA drivers in the kernel.
Passing on this information to PulseAudio and letting it deal with the associated policy is useful. It allows us to maintain separate volumes for your headphones and laptop speakers. It gives us control over things such as jack retasking for hardware where, for example, you can use your jacks for stereo input/output or for surround output over S/PDIF. Such things have been pretty hard to configure so far, but with the work that David Henningson has been doing at Canonical, should become a lot simpler soon.
Which reminds me, there is of course the JACK sound server system that a lot of pro-audio people use. What does the future hold in terms of JACK and PulseAudio, will they happily co-exist indefinitely or do you see a time when PulseAudio would be able to meet the requirements that Jack currently meets for its users?
The relationship between JACK and PulseAudio is now pretty much that they serve mutually exclusive feature sets and can coexist on the system. We have a DBus protocol that allows either to reserve the audio hardware, allowing the other to gracefully handle this. So applications with very low latency requirements target JACK, and more conventional applications target PulseAudio.
We do likely have room for improvement with regards to the latency that can be achieved with PulseAudio, but being able to maintain our low-power features is diametrically opposed to achieving very low latencies, so the two sound servers will likely continue exist side-by-side for a while.
Thank you for your time Arun, has been a pleasure talking to you
No problem at all, it was my pleasure
Arun works on PulseAudio at Collabora. He is a Gentoo developer and GNOME Foundation member, and can be found in all the usual IRC haunts as Ford_Prefect.