Python¶
Model¶
-
class
Model(*args, **kwargs)[source]¶ Class holding a DeepSpeech model
-
createStream()[source]¶ Create a new streaming inference state. The streaming state returned by this function can then be passed to
feedAudioContent()andfinishStream().- Returns
Object holding the stream
- Throws
RuntimeError on error
-
enableDecoderWithLM(*args, **kwargs)[source]¶ Enable decoding using beam scoring with a KenLM language model.
- Parameters
aLMPath (str) – The path to the language model binary file.
aTriePath (str) – The path to the trie file build from the same vocabulary as the language model binary.
aLMAlpha (float) – The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta (float) – The beta hyperparameter of the CTC decoder. Word insertion weight.
- Returns
Zero on success, non-zero on failure (invalid arguments).
- Type
-
feedAudioContent(*args, **kwargs)[source]¶ Feed audio samples to an ongoing streaming inference.
- Parameters
aSctx (object) – A streaming state pointer returned by
createStream().aBuffer (int array) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
aBufferSize (int) – The number of samples in @p aBuffer.
-
finishStream(*args, **kwargs)[source]¶ Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.
- Parameters
aSctx (object) – A streaming state pointer returned by
createStream().- Returns
The STT result.
- Type
-
finishStreamWithMetadata(*args, **kwargs)[source]¶ Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.
- Parameters
aSctx (object) – A streaming state pointer returned by
createStream().- Returns
Outputs a struct of individual letters along with their timing information.
- Type
-
intermediateDecode(*args, **kwargs)[source]¶ Compute the intermediate decoding of an ongoing streaming inference.
- Parameters
aSctx (object) – A streaming state pointer returned by
createStream().- Returns
The STT intermediate result.
- Type
-
sttWithMetadata(*args, **kwargs)[source]¶ Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
- Parameters
aBuffer (int array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize (int) – The number of samples in the audio signal.
- Returns
Outputs a struct of individual letters along with their timing information.
- Type
-
Metadata¶
-
class
Metadata[source]¶ Stores the entire CTC output as an array of character metadata objects
-
confidence()[source]¶ Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.
-
items()[source]¶ List of items
- Returns
A list of
MetadataItem()elements- Type
-