JavaScript (NodeJS / ElectronJS)¶
Model¶
-
class
Model(aModelPath, aBeamWidth)¶ An object providing an interface to a trained DeepSpeech model.
- Arguments
aModelPath (string) – The path to the frozen model graph.
aBeamWidth (number) – The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.
- Throws
on error
-
Model.createStream()¶ Create a new streaming inference state. The streaming state returned by this function can then be passed to
Model.feedAudioContent()andModel.finishStream().- Throws
on error
- Returns
object – an opaque object that represents the streaming state.
-
Model.enableDecoderWithLM(aLMPath, aTriePath, aLMAlpha, aLMBeta)¶ Enable decoding using beam scoring with a KenLM language model.
- Arguments
aLMPath (string) – The path to the language model binary file.
aTriePath (string) – The path to the trie file build from the same vocabulary as the language model binary.
aLMAlpha (float) – The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta (float) – The beta hyperparameter of the CTC decoder. Word insertion weight.
- Returns
number – Zero on success, non-zero on failure (invalid arguments).
-
Model.feedAudioContent(aSctx, aBuffer, aBufferSize)¶ Feed audio samples to an ongoing streaming inference.
- Arguments
aSctx (object) – A streaming state returned by
Model.setupStream().aBuffer (buffer) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
aBufferSize (number) – The number of samples in @param aBuffer.
-
Model.finishStream(aSctx)¶ Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.
- Arguments
aSctx (object) – A streaming state returned by
Model.setupStream().
- Returns
string – The STT result. This method will free the state (@param aSctx).
-
Model.finishStreamWithMetadata(aSctx)¶ Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.
- Arguments
aSctx (object) – A streaming state pointer returned by
Model.setupStream().
- Returns
object – Outputs a
Metadata()struct of individual letters along with their timing information. The user is responsible for freeing Metadata by callingFreeMetadata(). This method will free the state pointer (@param aSctx).
-
Model.intermediateDecode(aSctx)¶ Compute the intermediate decoding of an ongoing streaming inference.
- Arguments
aSctx (object) – A streaming state returned by
Model.setupStream().
- Returns
string – The STT intermediate result.
-
Model.sampleRate()¶ Return the sample rate expected by the model.
- Returns
number – Sample rate.
-
Model.stt(aBuffer, aBufferSize)¶ Use the DeepSpeech model to perform Speech-To-Text.
- Arguments
aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize (number) – The number of samples in the audio signal.
- Returns
string – The STT result. Returns undefined on error.
-
Model.sttWithMetadata(aBuffer, aBufferSize)¶ Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
- Arguments
aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize (number) – The number of samples in the audio signal.
- Returns
object – Outputs a
Metadata()struct of individual letters along with their timing information. The user is responsible for freeing Metadata by callingFreeMetadata(). Returns undefined on error.
Module exported methods¶
-
FreeModel(model)¶ Frees associated resources and destroys model object.
- Arguments
model (object) – A model pointer returned by
Model()
-
FreeStream(stream)¶ Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
- Arguments
stream (Object) – A streaming state pointer returned by
Model.createStream().
-
FreeMetadata(metadata)¶ Free memory allocated for metadata information.
- Arguments
metadata (object) – Object containing metadata as returned by
Model.sttWithMetadata()orModel.finishStreamWithMetadata()
-
printVersions()¶ Print version of this library and of the linked TensorFlow library on standard output.
Metadata¶
-
class
Metadata()¶ Stores the entire CTC output as an array of character metadata objects
-
Metadata.confidence()¶ Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.
- Returns
float – Confidence value
-
Metadata.items()¶ List of items
- Returns
array – List of
MetadataItem()
-
Metadata.num_items()¶ Size of the list of items
- Returns
int – Number of items
-
MetadataItem¶
-
class
MetadataItem()¶ Stores each individual character, along with its timing information
-
MetadataItem.character()¶ The character generated for transcription
- Returns
string – The character generated
-
MetadataItem.start_time()¶ Position of the character in seconds
- Returns
float – The position of the character
-
MetadataItem.timestep()¶ Position of the character in units of 20ms
- Returns
int – The position of the character
-