Você está na página 1de 24

ETSI TS 126 243 V11.0.

0 (2012-10)

Technical Specification

Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; ANSI-C code for the fixed-point distributed speech recognition extended advanced front-end (3GPP TS 26.243 version 11.0.0 Release 11)

3GPP TS 26.243 version 11.0.0 Release 11

ETSI TS 126 243 V11.0.0 (2012-10)

Reference
RTS/TSGS-0426243vb00

Keywords
GSM, LTE, UMTS

ETSI
650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16
Siret N 348 623 562 00017 - NAF 742 C Association but non lucratif enregistre la Sous-Prfecture de Grasse (06) N 7803/88

Important notice
Individual copies of the present document can be downloaded from: http://www.etsi.org The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http://portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: http://portal.etsi.org/chaircor/ETSI_support.asp

Copyright Notification
No part may be reproduced except as authorized by written permission. The copyright and the foregoing restriction extend to reproduction in all media. European Telecommunications Standards Institute 2012. All rights reserved. DECT , PLUGTESTS , UMTS and the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members. TM 3GPP and LTE are Trade Marks of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners. GSM and the GSM logo are Trade Marks registered and owned by the GSM Association.
TM TM TM

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

ETSI TS 126 243 V11.0.0 (2012-10)

Intellectual Property Rights


IPRs essential or potentially essential to the present document may have been declared to ETSI. The information pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web server (http://ipr.etsi.org). Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become, essential to the present document.

Foreword
This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP). The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables. The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under http://webapp.etsi.org/key/queryform.asp.

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

ETSI TS 126 243 V11.0.0 (2012-10)

Contents
Intellectual Property Rights ................................................................................................................................2 Foreword.............................................................................................................................................................2 Foreword.............................................................................................................................................................4 1 2 3
3.1 3.2

Scope ........................................................................................................................................................5 References ................................................................................................................................................5 Definitions and abbreviations ...................................................................................................................5


Definitions .......................................................................................................................................................... 5 Abbreviations ..................................................................................................................................................... 5

4
4.1 4.2 4.3 4.5 4.5.1 4.5.2 4.5.3

C code structure ........................................................................................................................................5


Contents of the C source code ............................................................................................................................ 5 Program execution.............................................................................................................................................. 6 Code hierarchy ................................................................................................................................................... 7 Variables, constants and tables ......................................................................................................................... 12 Description of constants used in the C-code ............................................................................................... 13 Description of fixed tables used in the C-code ........................................................................................... 16 Static variables used in the C-code ............................................................................................................. 17

5
5.1

File formats ............................................................................................................................................21


Speech file ........................................................................................................................................................ 21

Annex A (informative):

Change history ...............................................................................................22

History ..............................................................................................................................................................23

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

ETSI TS 126 243 V11.0.0 (2012-10)

Foreword
This Technical Specification has been produced by the 3rd Generation Partnership Project (3GPP). The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 or greater indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the document.

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

ETSI TS 126 243 V11.0.0 (2012-10)

Scope

The present document contains an electronic copy of the ANSI-C code for DSR Extended Advanced Front-end. The ANSI-C code is necessary for a bit exact implementation of DSR Extended Advanced Front-end.

2
[1] [2]

References
ETSI ES 202 050: "Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression Algorithm", Oct 2002. ETSI ES 202 212 "Distributed Speech Recognition; Extended Advanced Front-end Feature Extraction Algorithm; Compression Algorithm, Back-end Speech Reconstruction Algorithm", Nov 2003. 3GPP TS 26.177: "Speech Enabled Services (SES); Distributed Speech Recognition (DSR) extended advanced front-end test sequences".

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.

[3]

3
3.1 3.2
ANSI I/O RAM ROM AFE X-AFE DSR

Definitions and abbreviations


Definitions Abbreviations
American National Standards Institute Input/Output Random Access Memory Read Only Memory Advanced Front-end eXtended Advanced Front-end Distributed Speech Recognition

Definition of terms used in the present document, can be found in [1], [2]

For the purpose of the present document, the following abbreviations apply:

C code structure

This clause gives an overview of the structure of the bit-exact C code and provides an overview of the contents and organization of the C code attached to this document. The C code has been verified on the following systems: Sun Microsystems workstations and GNU gcc compiler IBM PC compatible computers with Linux operating system and GNU gcc compiler.

ANSI-C was selected as the programming language because portability was desirable.

4.1

Contents of the C source code

The distributed files with suffix "c" contain the source code and the files with suffix "h" are the header files.

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

ETSI TS 126 243 V11.0.0 (2012-10)

Makefiles are provided for the platforms in which the C code has been verified (listed above).

4.2

Program execution

There are separate executables for the FrontEnd and Vector Quantization, with and without Extensions. The command line options are described below. <> - indicates parameters for the given option for running the executable () indicates default parameter. FrontEnd w/ Extension: USAGE: bin/ExtAdvFrontEnd infile HTK_outfile pitch_outfile class_outfile [options] OPTIONS: -q Quiet Mode (FALSE) -F format Input file format <NIST,HTK,RAW> (NIST) -fs freq Sampling frequency in kHz <8,16> (8) -swap Change input byte ordering (Native) -noh No HTK header to output file (FALSE) -noc0 No c0 coefficient to output feature vector (FALSE) -nologE No logE component to output feature vector (FALSE) -skip_header_bytes n - Skip header, first n bytes ( Only for -F RAW) -noh, -noc0, -nologE and skip_header_bytes are not used and should not be changed. FrontEnd w/o Extension: USAGE: bin/AdvFrontEnd infile HTK_outfile [options] OPTIONS: - Same as FrontEnd w/ Extension Vector Quantization w/ Extension: Usage: extcoder htk_file_in pitch_file_in class_file_in bitstream_file_out pitch_file_out txt_file_out -freq x VAD/No_VAD htk_file_in Input mel-frequency cepstral coefficient file in HTK MFCC format. pitch_file_in Input pitch period file. class_file_in Input classification file. bit_file_out Output binary bitstream. pitch_file_out Output quantised pitch period file. txt_file_out Vector quantiser output in text format. -freq x Sampling frequency in kHz (8 or 16). -VAD Use voice activity detector data. Voice activity input file must have same name as htk_file, but extension .vad -No_VAD Do not incorporate voice activity detector information in output bitstream. Vector Quantization w/o Extension: Usage: coder htk_file_in bitstream_file_out txt_file_out -freq x -VAD/No_VAD htk_file_in Input mel-frequency cepstral coefficient file in HTK MFCC format. bit_file_out Binary output bitstream. txt_file_out Vector quantiser output in text format. -freq x Sampling frequency in kHz (8 or 16). -VAD Use voice activity detector data. Voice activity input file must have same name as htk_file, but extension .vad -No_VAD Do not incorporate voice activity detector information in output bitstream. File extension descriptions as generated by the sample script: .cep Binary file containing cepstral features in HTK format. Output from the FrontEnd, input to the vector quantizer. .pitch Binary file containing pitch information. Output from the FrontEnd, input to the vector quantizer. Only used for Extension. .class Ascii file containing class information. Output from the FrontEnd, input to the vector quantizer. Only used for Extension. .bs Binary file containing the bitstream. Output from the vector quantizer. .log Log files from the different executables.

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

ETSI TS 126 243 V11.0.0 (2012-10)

4.3

Code hierarchy

Tables 1 to 3 are call graphs that show the functions used for AFE (table 1), VQ (table 2), and Extension (table 3). Each column represents a call level and each cell a function. The functions contain calls to the functions in rightwards neighboring cells. The time order in the call graphs is from the top downwards as the processing of a frame advances. All standard C functions: printf(), fwrite(), etc. have been omitted. Also, no basic operations (add(), L_add(), mac(), etc.) or double precision extended operations (e.g. L_Extract()) appear in the graphs. The basic operations are not counted as extending the depth, therefore the deepest level in this software is level 7.

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

ETSI TS 126 243 V11.0.0 (2012-10)

Table 1: AFE call structure


main() AdvProcessInit_B() DoNoiseSupInit_B() DoWaveProcInit_B() DoCompCepsInit_B() DoPostProcInit_B() DoVADInit_F() Do16kProcInit_B() QMF_FIR_Init_B() fir_initialization_B() DP_HP_filters_B() BufIn32Alloc() AdvProcessAlloc_B() DoNoiseSupAlloc_B() DoWaveProcAlloc_B() DoCompCepsAlloc_B() DoPostProcAlloc_B() DoVADAlloc_F() Do16kProcAlloc_B() FlushAdvProcess_B() DoVADFlush_F() CvFeatInt2Float() AdvProcessDelete_B() DoNoiseSupDelete_B() DoWaveProcDelete_B() DoCompCepsDelete_B() DoPostProcDelete_B() DoVADDelete_B() BufIn32Free() DoAdvProcess_B() Do16kProcessing_B() DoNoiseSup_B() Get16k_p_bufferData16k_B() Get16k_bufData16kSize_B() Get16k_p_BandsForCoding16k_B() Get16k_p_CodeForBands16k_B() Get16k_dataHP_B() VAD_F() Log_2() DoSigWindowing16_F1() DoSigWindowing16_F2() ff4NRFix32_B() GetL15() GetH15() Mult16x32() Add_Mult16x16_16() Sub_Mult16x16_16() Permut() FFTtoPSD_F() Square24d2_B() Square24_B() Get16k_BFC_dec_B() GetBandsForCoding16k_B() PSDMean_F() NoiseEstimation_F1() Sqrt_2() Sqrt16_2() NoiseEstimation_F2() Sqrt_2() Sqrt16_2() FilterCalc_F() SpeechQVar() FilterBank16() SpeechQSpec() SpeechQMel() DoGainFact_F1() Log_2() DoGainFact_F2() Log_2() DoMelIDCT_F16() ApplyWF() Get16k_dec1() Get16k_dec2() Get16k_dec3() DoSigWindowing16_F3() ff4NRFix32_B() GetL15() GetH15() Mult16x32() Add_Mult16x16_16() Sub_Mult16x16_16() Permut() FFTtoPSD_F() Square24d2_B() Square24_B() DoMelFB_B() CodeBands16k_B() DoSpecSub16k_B() Log_2() UpDateDecal() ApplyDecal() DCOffsetFil_F() Get16k_hpBandsSize_B()

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

ETSI TS 126 243 V11.0.0 (2012-10)

Get16k_p_hpBands_B() Get16k_p_bufferCodeForBands16k_B() Get16k_p_CodeForBands16k_B() Get16k_p_bufferCodeWeights_B() Get16k_p_codeWeights_B() Set16k_hpBands_dec_B() DoWaveProc_B() TeagerEng() GetTeagerFilter() GetMaximaPositions() DoCompCeps_B() CepsCompute() Get16k_p_bufferCodeWeights_B() Get16k_p_bufferCodeForBands16k_B() PreEmphHamm() ff4NB16_B() GetBandsForDecoding16k_B() DecodeBands16k_B() FilterBank() Get16k_hpBands_dec_B() Get16k_p_hpBands_B() MergeSSandCoded_B() CorrectEnergy_B() CosInv16Khz() cosInv() (only for 8kHz) DoPostProc_B() DoVADProc_F() focalpoint()

Table 2: VQ call structure


main() quantize_and_print() get_best_dataframe() best_centroid() quant_pitch_abs() get_class_bit() quant_pitch_diff() get_class_bit() mfcc_crc_encode() pc_crc_encode()

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

10

ETSI TS 126 243 V11.0.0 (2012-10)

Table 3: Extension call structure


main() RVC_ConstructPitchRom_be() RVC_ConstructPitchMeter_be() Allocate_Interpolated Dft_be() RVC_ResetPitchMete r_be() RVC_DestructPitchRom_be() RVC_DestructPitchMeter_be() Deallocate_Interpolat edDft_be() DoAdvProcess_B() DoPitchExtract() FilterBank() dsr_afe_vad() get_vm() fnLog2() IsLowBandNoise() get_zcm() pre_process() iir_d() iir_s() RVC_MeasurePitch_be() ClearPitch_be() DirichletInterpolation_b e() IsLowLevelInput_be() Finalize_be() IsContinuousPitc h_be() Mpy_lw_sw() Mpy_lw_sw() PrepareSpectralPeaks_ be() CalcSpectrum_b e() Mpy_lw_sw() Mpy_lw_sw_Add( ) FindPeaks_be() Prelim_ScaleDow nAmpsOfHighFre qPeaks_be() qsort_be()* swap() CompareIpointA mp_be() RefineSpectralPe aks_be() sqrt_l_fix() Final_ScaleDown AmpsOfHighFreq Peaks_be() Mpy_lw_sw() FindPitchCandidates_b e() NormalizeAmplitu des_be() CalcUtilityFunctio n_be() CreatePieceWise ConstantFunction _be() L_Extract() Mpy_32_16() qsort_be()* swap() Compare_ARRA Y_OF_XPOINTS _be() LinkArrayOfPoint s_be() AddSortedArrayO fPoints_be() LinkArrayOfPoint s_be() ConvertLinkedLis tOfDiffPointsToUt ilFunc_be() FindDominantLoc alMaximaInUtility Function_be() Mpy_lw_sw() UtilityFunctionAt GivenPitchFreq_ be() qsort_be()* swap() ComparePitchFre qAscending_be() SelectTopPitchC andidates_be() Mpy_lw_sw() compute_pcorr_b e()

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

11

ETSI TS 126 243 V11.0.0 (2012-10)


interpolate_be() Mpy_lw_sw() Mpy_lw_lw() sqrt_l_fix() find_most_energ etic_window_be() accumulate_be() find_most_energ etic_window2_be () Mpy_lw_sw() SelectFinalPitch_be() qsort_be()* swap() ComparePitchFre qDescending_be( ) ClearPitch_be() GOOD_ENOUG H_be() CLOSELY_LOCA TED_be() Mpy_lw_sw() BETTER_be() IsContinuousPitc h_be() Mpy_lw_sw() CalculateDoubleWindo wDft_be()

classify_frame()

* qsort_be() is a recursive function

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

12

ETSI TS 126 243 V11.0.0 (2012-10)

4.5

Variables, constants and tables

The data types of variables and tables used in the fixed point implementation are signed integers in 2's complement representation, defined by: Word16 16 bit variable; Word32 32 bit variable.

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

13

ETSI TS 126 243 V11.0.0 (2012-10)

4.5.1

Description of constants used in the C-code


Table 5a: Global constants for AFE
Constant Value
64 15 4 80 0.7 0.99 100 118 1 8 16 3 3 100 12 200 13 23 256 241 80 200 65 180 80 8 10 100 10 256 25 14 14 25 0.98 0.079432823 0.99 -10.0 15 20 80 10 0.2

Description
Noise suppression Array length Noise suppression hangover count Noise suppression minmum speech frame hangover count Noise suppression analysis window lambda merge (empirically set constant) Noise estimation Lambda Noise suppression number of frame threshold used for NSE QMF filter length multiplier for QMF filter coefficients shift to get higher value shift to get lower value Higher frequnecy band Mel used Lower frequency band used in coding Noise estimation frames threshold Number of coefficients to postprocess Frame length for cepstral coefficients Number of cepstral coefficients (including c0) Number of filters used for cepstral coefficients FFT length for cepstral coefficients Denoised Output buffer size WaveProcessing input frame shift WaveProcessing frame size Noise suppression array length (8khz) Noise suppression past frame size Noise suppression input frame shift Noise suppression filter half size Noise suppression long term energy forgetting factor threshold (in frames) Noise suppression spectrum estimate forgetting factor threshold (in frames) Number of frame threshold to update average energy for Nosie suppression VAD FFT length for noise suppression Noise suppression Wiener filter order shift applied to noise spectrum estimate shift applied to gain coefficient (nosie suppression gain factoriization) Noise suppression idct order Noiseless signal suppression factor Minimum a priori SNR Forgetting factor for noise spectrum estimate average energy minimum threshold SNR threshold for noise suppression VAD Long term energy update threshold for noise suppression VAD Energy Minimum threshold for noise suppression VAD Maximum number of maxima in waveprocessing weigthing value added or substracted for waveprocessing

NS_SPEC_ORDER_16K NS_HANGOVER_16K NS_MIN_SPEECH_FRAME_HANGOVER_16K NS_ANALYSIS_WINDOW_16K PERC_CODED LAMBDA_NSE16k NS_NB_FRAME_THRESHOLD_NSE LENGTH_QMF f24 SHFF_H L_H HP16k_MEL_USED NB_LP_BANDS_CODING NE16k_FRAMES_THRESH NB_TOPOSTPROC CEP_FRAME_LENGTH CEP_NB_COEF CEP_NB_CHANNELS CEP_FFT_LENGTH FRAME_BUF_SIZE FRAME_SHIFT FRAME_LENGTH NS_SPEC_ORDER NS_BUFFER_SIZE NS_FRAME_SHIFT NS_HALF_FILTER_LENGTH NS_NB_FRAME_THRESHOLD_LTE NS_NB_FRAME_THRESHOLD_NSE NS_MIN_FRAME NS_FFT_LENGTH WF_MEL_ORDER SHFT_NOISE SHFT_FACT_MUL IDCT_ORDER NS_BETA NS_RSB_MIN NS_LAMBDA_NSE NS_LOG_SPEC_FLOOR NS_SNR_THRESHOLD_VAD NS_SNR_THRESHOLD_UPD_LTE NS_ENERGY_FLOOR MaxPos WP_EPS

Table 5b: Global constants for VQ


Constant
MIN_PERIOD MAX_PERIOD NUM_MULTI_LEVELS_1 NUM_MULTI_LEVELS_2 UNVOICED_CODE

Value
1245184 9175040 26 24 0 Minimum pitch period allowed Maximum pitch period allowed number of levels in pitch quantization number of levels in pitch quantization init value for Qpindex

Description

Table 5c: Global constants for Extension


Constant
HISTORY_LEN DOWN_SAMP_FACTOR NO_OF_DFT_POINTS BREAK_POINT LBN_HIST_WEIGHT LBN_CURR_WEIGHT LBN_MAX_THR LBN_LOW_ENR_LEVEL_MANT LBN_LOW_ENR_LEVEL_SHFT RVC_OK RVC_ERR RVC_ERR_NOT_ENOUGH_MEMORY RVC_ERR_ILLEGAL_ARGUMENT RVC_ERR_IO_FAILED RVC_ERR_BAD_FILE_FORMAT RVC_ERR_NOT_INITIALIZED RVC_ERR_ILLEGAL_USAGE RVC_ERR_NOT_ENOUGH_SAMPLES RVC_ERR_NOT_IMPLEMENTED

Value
100 4 128 12 32440 328 124518 32000 22 0 -1 -2 -3 -4 -5 -6 -7 -8 -9

Description
History length - past samples for pitch extraction Down-sampling factor - used in computing correlation Number of DFT points Break point - marks the end of low frequency band Low band noise history weight Low band noise current weight (32768 - LBN_HIST_WEIGHT) Low band noise maximum threshold Low band noise low energy level mantissa Low band noise low energy level shift Return code for success Return code for unspecified error Return code for not enough memory Return code for an illegal input / output argument Return code for failed input / output to a file Return code for a bad file header Return code for failure due to improper initialization Return code for illegal usage of a function Return code for insufficient number of samples Return code for an unimplemented function

ETSI

3GPP TS 26.243 version 11.0.0 Release 11


RVC_ERR_FAIL_OPEN_FILE UB_ENRG_FRAC ZCM_THLD SQRT_ONE_HALF FRAME_LEN_DS FRAME_LEN_DS_BY_2 HISTORY_LEN_DS WINDOW_LENGTH INV_WINDOW_LENGTH NUM_CHAN MIN_CH_ENRG_MANTISSA MIN_CH_ENRG_SHIFT INIT_SIG_ENRG_MANTISSA INIT_SIG_ENRG_SHIFT CE_SM_FAC CE_SM_FAC_COMPL CNE_SM_FAC CNE_SM_FAC_COMPL LO_GAMMA LO_GAMMA_COMPL HI_GAMMA HI_GAMMA_COMPL LO_BETA HI_BETA INIT_FRAMES SINE_START_CHAN PEAK_TO_AVE_THLD DEV_THLD HYSTER_CNT_THLD F_UPDATE_CNT_THLD NON_SPEECH_THLD FIX_34 FIX_18 FIX_INVSQRT2 swTHIRD_REF_BANDWIDTH swTWO_THIRDS_REF_BANDWIDTH MIN_ENERGY_MANTISSA MIN_ENERGY_SHIFT swREF_SAMPLE_RATE_Q0 swCLOSE_FACTOR_Q14 swFD_SCORE_THLD1_Q15 swFD_SCORE_THLD2_Q15 swCORR_THLD_Q15 swSUM_THLD_Q14 lwCRIT0_OFFSET_Q15 swCANDCORR_THLD1_Q15 swCANDCORR_THLD2_Q15 swCANDCORR_THLD3_Q15 swCANDAMP_THLD3_Q15 swSTARTFREQ_COEFF swENDFREQ_COEFF DIRICHLET_KERNEL_SPAN REF_SAMPLE_RATE REF_BANDWIDTH lwTHIRD_REF_BANDWIDTH lwTWO_THIRDS_REF_BANDWIDTH swCENTER_WEIGHT swSIDE_WEIGHT swAMP_SCALE_DOWN1 swAMP_SCALE_DOWN2 swAMP_SCALE_DOWN2b swUDIST1 swUDIST2 swUSTEP swFREQ_MARGIN1 swAMP_MARGIN1 swAMP_MARGIN2 MIN_STABLE_FRAMES MAX_TRACK_GAP_FRAMES swSTABLE_FREQ_UPPER_MARGIN swSTABLE_FREQ_LOWER_MARGIN UNVOICED lwMAX_PITCH_FREQ lwMIN_PITCH_FREQ MAX_PITCH_FREQ MIN_PITCH_FREQ HIGHPASS_CUTOFF_FREQ NO_OF_FRACS lwSHORT_WIN_START_FREQ lwSHORT_WIN_END_FREQ lwSINGLE_WIN_START_FREQ lwSINGLE_WIN_END_FREQ lwDOUBLE_WIN_START_FREQ lwDOUBLE_WIN_END_FREQ MAX_LOCAL_MAXIMA_ON_SPECTRUM MAX_PEAKS_FOR_SORT MAX_PEAKS_PRELIM MIN_PEAKS MAX_PEAKS_FINAL MAX_PRELIM_CANDS CREATE_PIECEWISE_FUNC_LOOP_LIM_SH CREATE_PIECEWISE_FUNC_LOOP_LIM_SNG CREATE_PIECEWISE_FUNC_LOOP_LIM_DBL -10 59 87 0x5A82 50 25 25 18 1820 23 20000 25 30518 8 18022 14746 3277 29491 22938 9830 29491 3277 31130 32702 10 4 10 1523942 9 500 32 24576 4096 -23170 85 171 25600 18 0x1F40 0x4CCD 0x63D7 0x570A 0x651F 0x6667 0x0000170A 0x799A 0x599A 0x6CCD 0x68F6 0x553F 0x4666 8 8000 4000 87381333 174762667 0x5000 0x1800 0x5333 0x399A 0x7333 -4160 -6400 -16384 0x4AE1 0x07AE 0x07AE 6 2 0x4E14 0x68EB 0 0x01A40000L 0x00340000L 420 52 300 77 0x00C80000L 0x01A40000 0x00640000L 0x00D20000L 0x00340000 0x00780000L 70 30 7 7 20 4 20 30 60

14

ETSI TS 126 243 V11.0.0 (2012-10)

Return code for failure to open a file Upper band energy fraction Zero crossing measure threshold Square root of 0.5 (0.707) Frame length downsampled (200/4) Frame length downsampled divided by 2 History length downsampled (100/4) Window length used in computing correlation Inverse of window length (1/18 = 0.05556) Number of channels or Mel-frequency bands Minimum channel energy mantissa Minimum channel energy shift Initial signal energy mantissa Initial signal energy shift Channel energy smoothing factor Channel energy smoothing factor complement Channel noise energy smoothing factor Channel noise energy smoothing factor complement Low gamma value Low gamma value complement High gamma value High gamma value complement Low beta value High beta value Initial number of frames (considered to be noise frames) Sine start channel (for sine wave detection) Peak to average threshold Deviation threshold Hysteresis count threshold Forced update count threshold Non-speech threshold (short) (32768.0 * 3.0/4.0) (short) (32768.0 * 1.0/8.0) 1 / sqrt(2) One third of the reference bandwidth Two thirds of the reference bandwidth Minimum energy mantissa Minimum energy shift Reference sampling rate in Q0 format Closeness factor in Q14 format Frequency domain score threshold 1 in Q15 format Frequency domain score threshold 2 in Q15 format Correlation threshold in Q15 format Sum threshold in Q14 format Offset for finding a better pitch candidate in Q15 format Pitch candidate correlation threshold 1 in Q15 format Pitch candidate correlation threshold 2 in Q15 format Pitch candidate correlation threshold 3 in Q15 format Pitch candidate amplitude threshold 3 in Q15 format Start frequency coefficient (for candidate search) End frequency coefficient (for candidate search) Direchlet kernal span (for interpolation) Reference sampling rate Reference bandwidth One third of the reference bandwidth Two thirds of the reference bandwidth Center weight Side weight Amplitude scale down factor 1 Amplitude scale down factor 2 Amplitude scale down factor 2b Utility function distance 1 Utility function distance 2 Utility function step Frequency margin 1 Amplitude margin 1 Amplitude margin 2 Minimum number of stable frames Maximum pitch track gap frames Stable frequency upper margin Stable frequency lower margin Pitch frequency of an unvoiced frame Maximum pitch frequency Minimum pitch frequency Maximum pitch frequency in Hz Minimum pitch frequency in Hz Highpass cut-off frequency in Hz Number of fractions in the frations table Short window start frequency Short window end frequency Single window start frequency Single window end frequency Double window start frequency Double window end frequency Maximum number of local maxima on the spectrum Maximum number peaks for sorting Maximum number of peaks (preliminary) Minimum number of peaks Maximum number of peaks (final) Maximum number of preliminary candidates (pitch) Create Piecewise function loop limit for short window Create Piecewise function loop limit for single window Create Piecewise function loop limit for double window

ETSI

3GPP TS 26.243 version 11.0.0 Release 11


swSUM_FRACTION swAMP_FRACTION MAX_BEST_CANDS N_OF_BEST_CANDS_SHORT N_OF_BEST_CANDS_SINGLE N_OF_BEST_CANDS_DOUBLE N_OF_BEST_CANDS SIZE_SCRATCH_DOPITCH SIZE_SCRATCH_ADVPROCESS RVC_PITCH_ROM_SIG RVC_PITCH_METER_SIG 0x799A 0x33F8 2 2 2 2 6 1090 825 11031 21053

15

ETSI TS 126 243 V11.0.0 (2012-10)

Sum fraction Amplitude fraction Maximum number of best candidates (pitch) Number of best candidates for short window Number of best candidates for single window Number of best candidates for double window Number of best candidates for all windows Scratch memory size for DoPitch() function (This is the actual size required. The declared size in C simulation is 1632) Scratch memory size for DoAdvProcess() function (This is the actual size required. The declared size in C simulation is 1100) Signature for RVC_PITCH_ROM structure Signature for RVC_PITCH_METER structure

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

16

ETSI TS 126 243 V11.0.0 (2012-10)

4.5.2

Description of fixed tables used in the C-code

This section contains a listing of all fixed tables sorted by source file name and table name. All table data is declared as Word16. Table 6a: Fixed tables for AFE
File
16kHzProcessing_B.c

Table Name
table_pow2 LambdaNSEx2 dp02_h dp02_l targetLMS16 HalfHamming16 CosMatrix16 CosMatrix16_16khz pondMelFilter tabSin tabCos tbInt0 lambda_1divX Hann_sh32_hi Hann_sh32_lo Hann_sh24_hi Hann_sh24_lo pondMelFilterNoise idctMel16 pondMelFilter16k M1_LamdaLTE M1_LambdaNSEx2 M1_LamdaNSE mInvLambda16

Length
33 100 59 43 12 100 144 156 309 64 64 48 20 100 100 100 100 157 234 134 8 100 9 10

Description
Table for square root Table used to compute first 100 LambdaNSE MSB of QMF filter coefficients LSB of QMF filter coefficients Target for blind equalization Hamming window coefficients Inverse cosinus coefficients at 8Khz (not used at 16khz) Inverse cosinus coefficients at 16Khz Mel bank coefficients Sine table Cosine table Coefficients for computation of square root Computation of 1/N MSB of hanning window coefficients (32 bits) LSB of hanning window coefficients (32 bits) MSB of hanning window coefficients (24 bits) LSB of hanning window coefficients (24 bits) Mel-frequency scale coefficients (applied to the Wiener filter) Mel-warped inverse DCT coefficients Filter bank coefficients at 16Khz Computation of 1/N Computation of 2/N Computation of 1/N Comutation od 2/N

PostProc_B.c ComCeps_B.c

ff4nrFix16_B.c MathFunc.c ExtNoiseSup_B.c

Table 6b: Fixed tables for VQ


File
coder_VAD.c

Table Name
quantizer16kHz_0_1 quantizer16kHz_2_3 quantizer16kHz_4_5 quantizer16kHz_6_7 quantizer16kHz_8_9 quantizer16kHz_10_11 quantizer16kHz_12_13 quantizer8kHz_0_1 quantizer8kHz_2_3 quantizer8kHz_4_5 quantizer8kHz_6_7 quantizer8kHz_8_9 quantizer8kHz_10_11 quantizer8kHz_12_13 weight16kHz_c0_shift weight16kHz_c0_norm weight16kHz_logE weight8kHz_c0_shift weight8kHz_c0_norm weight8kHz_logE plwQuantLevels[127] ppplwQuantSections[8][3] plwQuantLevels[31] pplwQuantSections[4][3] pswRatioThld_1[4][6] piMultiLevelIndex[4] pswRatioThld_2[4][8] piMultiLevelIndex_2[4] swAlpha1 swAlpha2

Length
128 128 128 128 128 64 512 128 128 128 128 128 64 512 1 1 1 1 1 1 127*2 24*2 31*2 12*2 24 4 32 4 1 1

Description
vq table vq table vq table vq table vq table vq table vq table vq table vq table vq table vq table vq table vq table vq table vq weights vq weights vq weights vq weights vq weights vq weights vq tables for pitch/class quantization vq tables for pitch/class quantization vq tables for pitch/class quantization vq tables for pitch/class quantization vq tables for pitch/class quantization vq tables for pitch/class quantization vq tables for pitch/class quantization vq tables for pitch/class quantization pitch/class constants pitch/class constants

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

17

ETSI TS 126 243 V11.0.0 (2012-10)

Table 6c: Fixed Tables for Extension


File
ExtNoiseSup_B.c preProc_B.c preProc_B.c preProc_B.c dsrAfeVad_B.c dsrAfeVad_B.c dsrAfeVad_B.c dsrAfeVad_B.c dsrAfeVad_B.c dsrAfeVad_B.c dsrAfeVad_B.c fix_mathlib.c fix_mathlib.c rvc_pitch_init_B.h rvc_pitch_init_B.h rvc_pitch_init_B.h

Table name
pswPePower pswHpfCoef pswLpfCoef pswLfeCoef piBurstConst piHangConst piVADThld piVMTable piSigThld piUpdateThld pswShapeTable coeff_sqrt5_58 coeff_sqrt5_78 ROM_astFrac ROM_pstWindowshiftTable ROM_aswDirichletImag

Length
129 15 15 3 20 20 20 90 20 20 23 5 5 312 514 8

Description
Coefficients to compute the pre-emphasis power spectrum High pass filter coefficients Low pass filter coefficients Low frequency emphasis filter coefficients Burst length constants for different SNR's Hang length constants for different SNR's VAD voice metric thresholds for different SNR's Voice metric table as a function of SNR index Signal threshold table as a function of SNR Update threshold table as a function of SNR Spectral shape correction table Coefficients for computation of square root Coefficients for computation of square root Fractions table Complex exponents table for time shifting in frequency domain Imaginary part of the Dirichlet kernel

4.5.3

Static variables used in the C-code

In this section two tables that specify the static variables for the AFE, VQ, and Extension respectively are shown.

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

18

ETSI TS 126 243 V11.0.0 (2012-10)

Table 7a: AFE static variables


Struct Name
QMF_FIR lengthQMF *dp_l *dp_h *T T_dec DataFor16kProc_B FrameLength FrameShift numFramesInBuffer SamplingFrequency Do16kHzProc *hpBands_B hpBandsSize CodeForBands16k_B bufferCodeForBands16k_B codeWeights_B bufferCodeWeights_B * pQMF_Fir *bufferData16k_B bufData16kSize *FirstWindow16k noiseSE16k_B noise_dec BandsForCoding16k_B vadCounter16k vad16k nbSpeechFrames16k hangOver16k meanEn16k nb_frame_threshold_nse lambda_nse *dataHP_B dec_16k BFC_dec fb16k_dec PostProcStructX weightLMS CompCepsStructX FFTLength Do16khzProc *pData16k WaveProcStructX *TeagerFilter16 *TeagerWindow32 TeagerOnset FrameLength ns_var_F SampFreq Do16khzProc buffers.nbFramesInFirstStage buffers.nbFramesInFirstStage buffers. nbFramesOutSecondStage buffers. FirstStageIn16Buffer buffers.SecondStageInBuffer32 buffers. SecondDecalSig prevSamples32.lastSampleIn32 prevSamples32.lastDCOut32 prevSamples32. oldShift spectrum.indexBuffer1 spectrum.indexBuffer2 spectrum.noiseSE1_32 spectrum.noiseSE1_dec spectrum.noiseSE2_32 spectrum.noiseSE2_dec spectrum.PSDMeanAntBuffer1 spectrum.nSigSE1Ant_dec spectrum.PSDMeanAntBuffer2 spectrum.nSigSE2Ant_dec spectrum.denSigSE1_32 spectrum. nSigSE1Cur_dec spectrum. denSigSE2_32 spectrum. nSigSE2Cur_dec vad_data_ns_F. nbFrame vad_data_ns_F. flagVAD vad_data_ns_F.hangOver vad_data_ns_F. nbSpeechFrames vad_data_ns_F.meanEn32 vad_data_ca. flagVAD vad_data_ca.hangOver vad_data_ca. nbSpeechFrames vad_data_ca.meanEn32 vad_data_fd.MelMean vad_data_fd.VarMean Word16 Word16 Word32 Word32 Word32 Word16[180] Word32[180] Word16[4] Word32 Word32 Word16 Word16 Word16 Word32[65] Word16[65] Word32[65] Word16[65] Word32[65] Word16[65] Word32[65] Word16[65] Word32[65] Word16[65] Word32[65] Word16[65] Word16[2] Word16 Word16 Word16 Word32 Word16 Word16 Word16 Word32 Word16 Word32 Sampling frequency (8/16) Flag to enable 16kHz processing number of frames in first stage number of frames in second stage number of frames out og second stage First stage buffer Second stage buffer Shift factor for each sub-frame of second stage buffer Last input sample of DC offset compensation last output sample of DC offset compensation lprevious window shift factor of DC offset compensation Where to enter new PSD for first stage, alternatively 0 and 1 Where to enter new PSD for second stage, alternatively 0 and 1 Noise spectrum estimate for first stage Shift factor for Noise spectrum estimate (first sage) Noise spectrum estimate for second stage Shift factor for Noise spectrum estimate (second sage) 1st stage PSD Mean buffer for precedent frame Shift factor for PSD Mean buffer for precedent frame (1rst stage) 2nd stage PSD Mean bufferfor precedent frame Shift factor for PSD Mean buffer for precedent frame (2nd stage) 1st stage PSD Mean buffer Shift factor for PSD Mean buffer (1rst stage) 2nd stage PSD Mean buffer Shift factor for PSD Mean buffer (2nd stage) Nubmer of frames (for the 2 stages) Vad Flag (1 = SPEECH, 0 = NON SPEECH) hangover Number of speech frames (used to set hangover) Mean energy for VAD Vad Flag (1 = SPEECH, 0 = NON SPEECH) hangover Number of speech frames (used to set hangover) Mean energy for VAD SpeechQMel (for frame dropping) SpeechQVar (for frame dropping) Word32 Word32 Word32 Word32 Pointer to teager filter Pointer to teager window Unused Input frame length Word32 Word16 Word32 FFT size Flag to enable 16kHz processing Pointer to data for 16Khz processing Word32[12] Current LMS weight Word32 Word32 Word32 Word32 BOOLEAN Word32 Word32 Word32[9] Word32[27] Word16[3] Word16[9] QMF_FIR Word32 Word32 MelFB_Window Word32[3] Word16 Word32[9] Word32 Word32 Word32 Word32 Word32 Word32 Word16 Word32 Word16[5] Word16[1] Word16[3] Input Frame length Shift value for the frame Number of frames in buffer Sampling frequency (8/16) Flag to enable 16kHz processing Buffer for HP bands hpBands_B buffer size HP coding buffer buffer used for HP coding code Weights buffer buffer used for code Weights Pointer to QMF_FIR structure temporary buffer to carry QMF LP data 16k data buffer size pointer to MelFB_Window structure noise spectrul energy variable Multiplier for noiseSE16k_B buffer for storing Bands for Coding vad flag counter vad flag number of speech frames counter hang over used for VAD mean Energy variable threshold NSE for frame lambda NSE variable buffer stores QMF HP value Multiplier for dataHP_B buffer Multiplier for computing bands for coding Buffer is used to store multiplier for current and pervious two frames Word32 Word16 Word16 Word16 Word16 QMF Filter length QMF filter low frequency Coeff QMF filter high frequency Coeff Temporary QMF filter buffer Multiplier for T

Variable

Type[Length]

Description

ETSI

3GPP TS 26.243 version 11.0.0 Release 11


vad_data_fd.AccTest vad_data_fd.AccTest2 vad_data_fd.SpecMean vad_data_fd.MelValues vad_data_fd.SpecValues vad_data_fd.SpeechInVADQ vad_data_fd.SpeechInVADQ2 gainFact.logDenEn1_32 gainFact.lowSNRtrack32 gainFact. alfaGF16 VADStructX_F Focus HangOver FlushFocus H_CountDown V_CountDown **OutBuffer *OutBuffer OutBuffer Word16 Word16 Word16 Word16 Word16 Word32 Word32[7] Word16[7x15] Word32 Word32 Word32 Word16[2] Word32 Word16 Word16 Word32[3] Word32 Word16

19

ETSI TS 126 243 V11.0.0 (2012-10)


SpeechQSpec (for frame dropping) SpecMean (for frame dropping) SpeechQMel (for frame dropping) SpeechQSpec (for frame dropping) Flag (for frame dropping) Flag (for frame dropping) Denoise frame energy for gain factorization Low SNR level for gain factorization Wiener filter gain factorization coefficient Position of circular buffe Hangover length Position in circular buffer when emptying at end Main hangover countdown Short hangover countdown outBuffer pointer pointer outBuffer pointer outBuffer

Table 7b: VQ static variables


Struct Name
coder_VAD.c

Variable
four_frames[27] plwQPHistory[3] IReliableFlag

Type [Length]
Word16[27] Word32[3] Word16

Description
Previous frames used to build multiframe History of Pitch Pitch reliability flag

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

20

ETSI TS 126 243 V11.0.0 (2012-10)

Table 7c: Extension static variables


Struct Name Variable
iFirstFrameFlag pswUBSpeech pswDownSampledProcSpeech lwCritMax iOldPitchPeriod iOldFrameNo s_be lwX1_X1 lwZ1_Z1 lwZ2_Z2 lwX1_Z1 lwX1_Z2 lwZ1_Z2 swX1_Sum swZ1_Sum swZ2_Sum iBurstConst iBurstCount iHangConst iHangCount iVADThld iFrameCount iFUpdateFlag iHysterCount iLastUpdateCount iSigThld iUpdateCount iChanEnrgShift iChanNoiseEnrgShift pswChanEnrg pswChanNoiseEnrg swBeta swSnr pnsLogSpecEnrgLong swMantissa iShift swC0 swC1 swC2 pswHpfXState pswHpfYState pswLpfXState pswLpfYState pswLfeXState pswLfeYState

Type[Length]
Word16 Word16[200] Word16[75] Word32 Word16 Word16 Word32 Word32 Word32 Word32 Word32 Word32 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16 Word16[23] Word16[23] Word16 Word16 Word16[23] Word16[23] Word16 Word16 Word16 Word16[6] Word16[12] Word16[6] Word16[12] Word16 Word16[2]

Description
First frame flag Upper band speech Down-sampled processed speech Maximum power ratio Old pitch period value Old frame number X1*X1 Z1*Z1 Z2*Z2 X1*Z1 X1*Z2 Z1*Z2 Sum of X1 Sum of Z1 Sum of Z2 Burst constant Burst count Hang constant Hang count VAD threshold Frame count Forced update flag Hysteresis count Last update count Signal threshold Update count Channel energy shift Channel noise energy shift Channel energy Channel noise energy Beta value SNR value Mantissa Shift C0 value C1 value C2 value High pass filter input state High pass filter output state Low pass filter input state Low pass filter output state Low frequency emphasis filter input state Low frequency emphasis filter output state

PCORR_STATE_be

NormSw

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

21

ETSI TS 126 243 V11.0.0 (2012-10)

5
5.1

File formats
Speech file

This section describes the file formats used by the AFE, VQ & Extension programs.

Speech files read by the X-AFE and written by the Extension consist of 16-bit words. The byte order depends on the host architecture (e.g. MSByte first on SUN workstations, LSByte first on PCs etc)

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

22

ETSI TS 126 243 V11.0.0 (2012-10)

Annex A (informative): Change history


Change history
Date TSG # TSG Doc. CR Rev Subject/Comment Old New

2004-06 2004-12 2004-12 2004-12 2004-12 2004-12 2007-06 2008-12 2009-12 2011-03 2012-09

24 26 26 26 26 26 26 42 46 51 57

SP-040343 SP-040837 SP-040837 SP-040837 SP-040837 SP-040837

001 1 002 1 003 1 004 2 005

Version 6.0.0 approved at 3GPP TSG SA#24 Software bug correction: Removal of Basicops simulation of "C" shift operator Software bug correction: Initialization of the variables lwc and i2aScale Software bug correction: Wrong assignment of the variables *piReliableFlag and *pcQPIndex Software bug correction: Use of incorrect variable fRefPeriod instead of iRefPeriod Add reference to test sequences document Version for Release 7 Version for Release 8 Version for Release 9 Version for Release 10 Version for Release 11

2.0.0 6.0.0 6.0.0 6.0.0 6.0.0 6.0.0 6.1.0 7.0.0 8.0.0 9.0.0 10.0.0

6.0.0 6.1.0 6.1.0 6.1.0 6.1.0 6.1.0 7.0.0 8.0.0 9.0.0 10.0.0 11.0.0

ETSI

3GPP TS 26.243 version 11.0.0 Release 11

23

ETSI TS 126 243 V11.0.0 (2012-10)

History
Document history
V11.0.0 October 2012 Publication

ETSI