JavaScript (NodeJS / ElectronJS)¶
Model¶
-
class
Model(aModelPath)¶ An object providing an interface to a trained DeepSpeech model.
- Arguments
aModelPath (string) – The path to the frozen model graph.
- Throws
on error
-
Model.beamWidth()¶ Get beam width value used by the model. If :js:func:Model.setBeamWidth was not called before, will return the default value loaded from the model file.
- Returns
number – Beam width value used by the model.
-
Model.createStream()¶ Create a new streaming inference state. One can then call
Stream.feedAudioContent()andStream.finishStream()on the returned stream object.- Throws
on error
- Returns
object – a
Stream()object that represents the streaming state.
-
Model.disableExternalScorer()¶ Disable decoding using an external scorer.
- Returns
number – Zero on success, non-zero on failure (invalid arguments).
-
Model.enableExternalScorer(aScorerPath)¶ Enable decoding using an external scorer.
- Arguments
aScorerPath (string) – The path to the external scorer file.
- Returns
number – Zero on success, non-zero on failure (invalid arguments).
-
Model.sampleRate()¶ Return the sample rate expected by the model.
- Returns
number – Sample rate.
-
Model.setBeamWidth(The)¶ Set beam width value used by the model.
- Arguments
The (number) – beam width used by the model. A larger beam width value generates better results at the cost of decoding time.
- Returns
number – Zero on success, non-zero on failure.
-
Model.setScorerAlphaBeta(aLMAlpha, aLMBeta)¶ Set hyperparameters alpha and beta of the external scorer.
- Arguments
aLMAlpha (float) – The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta (float) – The beta hyperparameter of the CTC decoder. Word insertion weight.
- Returns
number – Zero on success, non-zero on failure (invalid arguments).
-
Model.stt(aBuffer)¶ Use the DeepSpeech model to perform Speech-To-Text.
- Arguments
aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
- Returns
string – The STT result. Returns undefined on error.
-
Model.sttWithMetadata(aBuffer, aNumResults)¶ Use the DeepSpeech model to perform Speech-To-Text and output results including metadata.
- Arguments
aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.
- Returns
object –
Metadata()object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by callingFreeMetadata(). Returns undefined on error.
Stream¶
-
class
Stream(nativeStream)¶ Provides an interface to a DeepSpeech stream. The constructor cannot be called directly, use
Model.createStream().-
Stream.feedAudioContent(aBuffer)¶ Feed audio samples to an ongoing streaming inference.
- Arguments
aBuffer (buffer) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
-
Stream.finishStream()¶ Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.
- Returns
string – The STT result. This method will free the stream, it must not be used after this method is called.
-
Stream.finishStreamWithMetadata(aNumResults)¶ Compute the final decoding of an ongoing streaming inference and return the results including metadata. Signals the end of an ongoing streaming inference.
- Arguments
aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.
- Returns
object – Outputs a
Metadata()struct of individual letters along with their timing information. The user is responsible for freeing Metadata by callingFreeMetadata(). This method will free the stream, it must not be used after this method is called.
-
Stream.intermediateDecode()¶ Compute the intermediate decoding of an ongoing streaming inference.
- Returns
string – The STT intermediate result.
-
Stream.intermediateDecodeWithMetadata(aNumResults)¶ Compute the intermediate decoding of an ongoing streaming inference, return results including metadata.
- Arguments
aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.
- Returns
object –
Metadata()object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by callingFreeMetadata(). Returns undefined on error.
-
Module exported methods¶
-
FreeModel(model)¶ Frees associated resources and destroys model object.
- Arguments
model (object) – A model pointer returned by
Model()
-
FreeStream(stream)¶ Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
- Arguments
stream (Object) – A stream object returned by
Model.createStream().
-
FreeMetadata(metadata)¶ Free memory allocated for metadata information.
- Arguments
metadata (object) – Object containing metadata as returned by
Model.sttWithMetadata()orModel.finishStreamWithMetadata()
-
Version()¶ Print version of this library and of the linked TensorFlow library on standard output.
Metadata¶
-
class
Metadata()¶ An array of CandidateTranscript objects computed by the model.
-
Metadata.transcripts()¶ Array of transcripts
- Returns
array – Array of
CandidateTranscript()objects
-
CandidateTranscript¶
-
class
CandidateTranscript()¶ A single transcript computed by the model, including a confidence value and the metadata for its constituent tokens.
-
CandidateTranscript.confidence()¶ Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/token that contributed to the creation of this transcription.
- Returns
float – Confidence value
-
CandidateTranscript.tokens()¶ Array of tokens
- Returns
array – Array of
TokenMetadata()
-
TokenMetadata¶
-
class
TokenMetadata()¶ Stores text of an individual token, along with its timing information
-
TokenMetadata.start_time()¶ Position of the token in seconds
- Returns
float – The position of the token
-
TokenMetadata.text()¶ The text corresponding to this token
- Returns
string – The text generated
-
TokenMetadata.timestep()¶ Position of the token in units of 20ms
- Returns
int – The position of the token
-