Feature Extraction

March 23, 2019 ยท View on GitHub

Julius extracts features from waveform input. A feature extraction module is built to have compatibility with HTK. Supported base types are "FBANK" (log-scale filterbank parameter), "MELSPEC" (mel-scale filterbank parameter) and "MFCC" (mel-frequency cepstral coefficients). Supported qualifiers are _E, _N, _D, _A, _Z and _0.

Configurations

The feature extraction parameters should be set up as the same as training condition of acoustic model. The following table shows all the feature extraction parameters, with their default values and how to set. Note that HTK and Julius has different default values for some parameters, so you have to set carefully to match the feature extraction condition. Also note that the "parameter kind" (TARGETKIND in HTK) and the "number of cepstral parameters" (NUMCEPS in HTK) will be detected from the header information of acoustic HMM at run time, so unlike HTK, you need not to set them manually.

HTK OptionDescriptionHTK defaultJulius defaultOption
TARGETKINDParameter kindANONMFCC- (auto-set from HMM header)
NUMCEPSNumber of cepstral parameters12-- (auto-set from HMM header)
SOURCERATESample rate of source waveform in 100ns units0.0625"-smpPeriod value"
TARGETRATESample rate of target vector (= window shift) in 100ns units0.0160"-fshift samples" (*)
WINDOWSIZEAnalysis window size in 100ns units256000.0400"-fsize samples" (*)
ZMEANSOURCEZero mean source waveform before analysis (frame-wise)FF"-zmeanframe" to enable, "-nozmeanframe" to disable.
PREEMCOEFSet pre-emphasis coefficient0.970.97"-preemph value"
USEHAMMINGUse a Hamming windowTT- (Fixed)
NUMCHANSNumber of filerbank channels2024"-fbank value"
CEPLIFTERCepstral liftering coefficient2222"-ceplif value"
DELTAWINDOWDelta window size in frame22"-delwin value"
ACCWINDOWAcceleration window size in frame22"-accwin value"
LOFREQLow frequency cut-off in fbank analysis-1.0-1.0"-lofreq value", or -1 to disable
HIFREQHigh frequency cut-off in fbank analysis-1.0-1.0"-hifreq value", or -1 to disable
RAWENERGYUse raw energyTF"-rawe" / "-norawe"
ENORMALISENormalise log energyTF"-enormal" / "-noenormal" (**)
ESCALEScale log energy0.11.0"-escale value"
SILFLOOREnergy silence floor in Dbs50.050.0"-silfloor value"

(*) samples = HTK value (in 100ns units) / smpPeriod (**) Normalise log energy should not be specified on live input, at both training and recognition (see sec. 5.9 "Direct Audio Input/Output" in HTKBook).

Reading HTK Config file

Instead of using options directly, Julius can read in a HTK format config file by option -htkconf. When specified, the parameters in the given HTK conig file will be translated to corresponding values while reading within Julius. At the translation, values not explicitly specified in the HTK config file will be assumed to be the HTK's default value.

Parameter embedding

Struggling to set the exact feature parameters for an acoustic model? Since feature extraction parameters are purely subject to acoustic model, it is natural that an acoustic model should have full information for the feature types and parameters it requires. Thus the tool mkbinhmm, that converts HTK definition file in ascii format to binary HMM, can embed feature extraction parameters into the header of the output binary HMM file.

Specify feature parameters to mkbinhmm just as the same way as Julius, using direct options or HTK Config file as described above,and it will embed the determined parameters at the header of the output binary HMM. When using the binary HMM file in Julius, the parameter settings in the header of the file will be read into Julius.

When Value Conflicts

There may be some case when both HTK Config file and other direct options specifies different values, or embedded values in the given HMM has different values from the specified ones? The rule is that "explicit value supercedes implicit values". Precisely, the parameter priority to solve the conflict is as follows:

  1. Direct option values
  2. HTK Config values given by -htkconf
  3. Embedded values inside binary HMM

The values of direct options always supercedes others, and HTK Config value supercedes the embedded values. Note that this rule will be applied regardless of the option order. This behavior is not common to other Julius parameters in which the latter option always supercedes former.