From audio file to PCM data - Text-to-speech module

3.Realisation

3.3. Text-to-speech module

3.3.4. From audio file to PCM data

In this section the information from the audio files must be put into an audio structure that can be used with TTSPlay. The component that is made is called TTSDecode.

3.3.4.1. Analysis

One of the simplest audio file formats is the wave format. It contains the uncompressed PCM samples¹² along with some other information. The files generated with Flite on the server uses this format. This format is easy to put into an audio structure as all the needed information is already in the wave files. It is however not very efficient regarding space in the file system and bandwidth over the internet. For this other compressed formats are better. They are however much more complex and time consuming to decode and they are not really needed in this project. The support for other file formats is therefore left as an expansion possibility. Among other codecs that might be considered in an expansion are MP3 or Vorbis because of there popularity.

They can both compress the audio a lot, but they are not really the best choice.

Considering that all the audio is speech a codec optimized for speech would be a better choice. Examples of such codecs are the widely used ITU-T G.711 and G.722

standards. There is an open source project working on a codec called Speex. It supports different sample rates meaning the quality vs. bandwidth can be controlled to whatever is needed. There are fixed-point ports of it which is good as the PXA 270 processor has no floating point unit.

Use case 1.3 Decode wave file Actors

This is step 4 in use case 1.

Pre-conditions

File data is loaded into memory.

Post-conditions

PCM data and the information needed to play it returned.

Basic Flow

1. File data is received

2. Format and PCM data is returned.

12 The wave file format can also contain compressed audio but here whenever a wave file is mentioned it is assumed to be uncompressed.

3.3.4 From audio file to PCM data 63

Alternative flows None

Special Requirements None

Use case relationships Sub use case of use case 1

3.3.4.2. Design

The wave file format is actually a sub format of the riff format. A riff file has a header and the rest is made of chunks. Each chunk starts with a four byte id and a four byte size in bytes of the chunk (exclusive the bytes used for the id and size). The first chunk is called the header. It has the id “riff”. It contains a string that tells what kind of riff file it is and the rest of the chunks. The minimum needed chunks in a wave file are the format chunk and the data chunk. They can be recognized by there ids “fmt “ and

“data”. The format chunk contains information about the sample rate, number of channels and other information. The data chunk contains the actual samples. There is a rule that the format chunk must come before the data chunk. The chunks must start at an even byte offset. Other than that there are no rules regarding the layout of the chunks.

Illustration 27. Wave file structure

To decode a wave file and validate that it is a wave file the following steps can be taken:

1. Check that the files starts with the string “riff”.

2. Search for the string “WAVE”.

3. Search for the string “fmt “.

4. Read the needed format information.

5. Search for the string “data”.

6. Read the PCM data.

3.3.4.3. Implementation

Using the steps from the design and information of the exact structure of the format and data chunks it is straight forward to decode the file. A few helper functions have been made. One that can compare a byte pattern with the start of another byte pattern.

This can be used to see if a string starts with another string. Another function then uses this function to search for a pattern in another pattern. Again this can be used to search for a sub string in another string. This is used to find the “riff”, “WAVE”, “fmt “ and

“data” strings in the file. Prototypes for similar functions already exist in the string.h header file. The problem with those is that they only work on NULL terminated strings.

In the file it is likely that there will bee a zero somewhere. This makes the functions in string.h useless for this purpose.

Another tricky thing is the alignment of the information. The start of each chunk is guaranteed to be aligned to an even offset. This is however not good enough because the PXA 270 processor can only do 4 byte aligned words read. This means if a 32 bit values is placed with two bytes in one word and two bytes in another word it can’t be read correctly. In this case it is only a problem for 32 bit values. 16 bit values can’t be split in different words because of the alignment requirements in the riff file format.

To deal with this a special function has been made that reads the individual bytes of the 32 bit value and puts them together into a 32 bit value. Here it is important to notice that the data in riff files are little-endean. This is the same as the PXA 270 processor.

int32_T TTSDecodeDecodeAudioFile( bufferT * fileP, audioT * audioP );

The prototype for the function that decodes a file and fills an audio structure.

3.3.4.4. Test

For this test program an array with the bytes from a wave file is hard coded into the test program. It is the same array used when TTSFetch was tested. Another array with the PCM data extracted¹³ from the wave file is also hard coded into the test code. The array with the PCM data is then compared to the PCM data returned in an audio structure from TTSDecodeDecodeAudioFile. The other information in the audio

13 The data is extracted with a program is called wav_fmt_reader. The source code and binary file is available on the attached cd.

3.3.4 From audio file to PCM data 65

structure is also validated¹⁴. When the test is run zero is returned. This means the wave file has been correctly decoded.

14 The other information is also read with wav_fmt_reader.

In document Washing machine user interface for visually impaired (Sider 62-66)