Saturday, May 28, 2011

Start of actual GSoC time and TODO

I got my welcome package (pretty pathetic but nice notebook, pen and sticker again) from Google and the GSoC has officially started this Monday. This means I should write some kind of list for myself to keep in order what I am going to do in the near future and in which order.

The most important feature is the channel mask that is missing and this will require a quite big commit. There are also some strange things that are working, but are a bit weird. For example the pulse output plugin doesn't set any XMMS_STREAM_TYPE_FMT_CHANNELS in its goal types (quite recent change) and xmms_stream_type_coerce skips all the output goals that don't have XMMS_STREAM_TYPE_FMT_CHANNELS. Looking at the code, nothing should be selected as the best option, but I'm missing something and a goal type gets selected anyway. I'll run it through the debugger just to find out what am I missing here.

After I've figured out how the output goal type is selected properly I should add XMMS_STREAM_TYPE_FMT_CHANNELMASK to the headers and streamtype.c. It should be noted that some places use XMMS_STREAM_TYPE_FMT_CHANNELS but they don't have to care about XMMS_STREAM_TYPE_FMT_CHANNELMASK, because they only want to know the number of channels. Those two should probably have to be kept in sync somehow though.

I think this should be enough work for a while, and I would really want to have a usable patch to be pushed upstream before starting to work on any of the other stuff. (multichannel wav files, converter) This is the basis of most of the multichannel support anyway.

Sunday, May 15, 2011

Multichannel decoder madness

I think I'm doing like 12 hour days on this project and multichannel already came into my dreams today, but can have a rest after a while. Today I worked on finding out about AC-3 and DTS decoders that are common formats for multichannel audio. Adding AC-3 was pretty straightforward using the avcodec plugin, but it did have a small problem in that it required full frames while avcodec plugin occasionally provided partial frames. This was not a big deal and avcodec plugin needed cleaning up anyway.

Adding DTS should be quite easy using the avcodec plugin as well, but another problem came up in addition to the partial frame issue. The ffmpeg DTS decoder was buggy and didn't return the consumed input bytes correctly, instead it always returned the input buffer length. The correct frame size value is always in a fixed offset in the DTS headers so parsing it in the avcodec plugin wasn't that big of a deal, although it is hacky.

I also noticed there was a new bug 2437 in mantis saying that the avcodec plugin doesn't compile with the latest libavcodec version. I found out that the libavcodec version 53 has changed the API again and CODEC_TYPE_AUDIO is now AVMEDIA_TYPE_AUDIO. Slightly bigger change was that function avcodec_decode_audio2 was changed to avcodec_decode_audio3 with different parameters. This is already the second time avcodec_decode_audio has been replaced during the lifetime of avcodec plugin, that's pretty good for a function that only takes 5 parameters. Make up your mind already ffmpeg guys... But I wrote the backwards compatibility macros and now avcodec plugin should compile cleanly with all libavcodec versions again, yay!

Problem when playing the files was that channel mapping in pulseaudio was wrong, I patched it to use mostly the channel mapping introduced in last blog post, including some sensible defaults for each channel number. I will write more about those defaults later, but they follow quite closely the channel assignment of FLAC.

So long story short, XMMS2 avcodec plugin now compiles with all libavcodec versions and supports AC-3 and DTS and pulse output plugin handles multichannel audio nicely. Hopefully available in upstream soonish.

Saturday, May 14, 2011

Beginning of the GSoC 2011 multichannel project

It's time to start the project again and since I'm afraid there might be some personal things keeping me busy in the summer I'm getting a head start. The idea of the project is to get multichannel working WELL in XMMS2 music player. There are several patches around, but they're partly conflicting and need cleaning up. I will start with the multichannel support and the also included higher quality resampling will be handled later.

One of the most difficult issues in multichannel handling the channel order. As weird as it sounds, there is no standard for channel order and everyone basically uses what they wish. After looking through a lot of options I've come to the conclusion that we should definitely be using the channel order described in Microsoft documentation combined with a channel mask. As the Microsoft website already says "Several external standards define parts of the following master channel layout". The motivation is further explained below.

First let's take a look at most common multichannel audio formats and their 5.1 channel assignments (FL=front left, FR=front right, FC=front center, LFE=low frequency effects, SL=surround/back left, SR=surround/back right):
  1. WAV/MP3/FLAC: FL, FR, FC, LFE, SL, SR
  2. DTS/AAC: FC, FL, FR, SL, SR, LFE
  3. AC3/Vorbis: FL, FC, FR, SL, SR, LFE
  4. AIFF: FL, SL, FC, FR, SR, LFE
The most popular formats in surround files according to my own experience are FLAC, AC3 and DTS. The suggested channel layout will work with FLAC 5.1 files without any changes, but for example for FLAC 4.0 files the center and LFE channels need to be masked out with channel mask. The DTS and AC3 channel assignments are not supported by any known output and require remapping in any case. This should be easy to do in the decoder anyway.

Then we need to have a look at known audio systems, their corresponding default channel assignments for 5.1 channel audio are as follows:
  1. ALSA: FL, FR, SL, SR, FC, LFE
  2. OSS4: FL, FR, FC, LFE, SL, SR
  3. MacOSX: FL, FR, FC, LFE, SL, SR
  4. Win32: FL, FR, FC, LFE, SL, SR
Pulse audio can easily handle all of these configurations and has channel remapping included, so we can leave that out of the calculations. It is quite clear that ALSA is different from all the others and there are quite good reasons for that. First of all ALSA as in most outputs includes support for user configured channel order, so the order can be changed in ALSA configuration. More importantly with the ALSA channel mapping it is possible to output 4.0 quadraphonic audio without any channel masks, because the first four channels are identical. However going with the rather non-standard (although de-facto standard in Linux) ALSA channel mapping is not worth it if we want to do things cleanly.

If we compare the two lists, audio format channel order and audio output channel order, it is quite easy to notice that the suggested channel order has most similarities in both and we can do out without channel masking. It should be noticed though that this is only true with 5.1 audio, if we want to include the mentioned 4.0 audio we need to have a support for channel mask. That means that some channels can be masked out of the audio stream and therefore ignored.

I hope this is good enough introduction to multichannel audio, I will get back to it later when I will introduce how it will affect the converter and what should be done there.