.Net Framework¶

DeepSpeech Interface¶

interface IDeepSpeech¶

Client interface of the Mozilla’s DeepSpeech implementation.

Subclassed by DeepSpeechClient.DeepSpeech

Public Functions

void DeepSpeechClient.Interfaces.IDeepSpeech.PrintVersions(): Prints the versions of Tensorflow and DeepSpeech.

unsafe int DeepSpeechClient.Interfaces.IDeepSpeech.GetModelSampleRate()

Return the sample rate expected by the model.

Return: Sample rate.

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.EnableDecoderWithLM(string aLMPath, string aTriePath, float aLMAlpha, float aLMBeta)

Enable decoding using beam scoring with a KenLM language model.

Parameters

aLMPath: The path to the language model binary file.
aTriePath: The path to the trie file build from the same vocabulary as the language model binary.
aLMAlpha: The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta: The beta hyperparameter of the CTC decoder. Word insertion weight.

Exceptions

ArgumentException: Thrown when the native binary failed to enable decoding with a language model.
FileNotFoundException: Thrown when cannot find the language model or trie file.

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.SpeechToText(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The STT result. Returns NULL on error.

Parameters

aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize: The number of samples in the audio signal.

unsafe Metadata DeepSpeechClient.Interfaces.IDeepSpeech.SpeechToTextWithMetadata(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The extended metadata. Returns NULL on error.

Parameters

aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize: The number of samples in the audio signal.

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FreeStream(DeepSpeechStream stream): Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

unsafe DeepSpeechStream DeepSpeechClient.Interfaces.IDeepSpeech.CreateStream(): Creates a new streaming inference state.

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FeedAudioContent(DeepSpeechStream stream, short [] aBuffer, uint aBufferSize)

Feeds audio samples to an ongoing streaming inference.

Parameters

stream: Instance of the stream to feed the data.
aBuffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.IntermediateDecode(DeepSpeechStream stream)

Computes the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters

stream: Instance of the stream to decode.

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.FinishStream(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Parameters

stream: Instance of the stream to finish.

unsafe Metadata DeepSpeechClient.Interfaces.IDeepSpeech.FinishStreamWithMetadata(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The extended metadata result.

Parameters

stream: Instance of the stream to finish.

DeepSpeech Class¶

class

Client of the Mozilla’s deepspeech implementation.

Public Functions

DeepSpeechClient.DeepSpeech.DeepSpeech(string aModelPath, uint aBeamWidth)

Initializes a new instance of DeepSpeech class and creates a new acoustic model.

Parameters

aModelPath: The path to the frozen model graph.
aBeamWidth: The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.

Exceptions

ArgumentException: Thrown when the native binary failed to create the model.

unsafe int DeepSpeechClient.DeepSpeech.GetModelSampleRate()

Return the sample rate expected by the model.

Return: Sample rate.

unsafe void DeepSpeechClient.DeepSpeech.Dispose(): Frees associated resources and destroys models objects.

unsafe void DeepSpeechClient.DeepSpeech.EnableDecoderWithLM(string aLMPath, string aTriePath, float aLMAlpha, float aLMBeta)

Enable decoding using beam scoring with a KenLM language model.

Parameters

aLMPath: The path to the language model binary file.
aTriePath: The path to the trie file build from the same vocabulary as the language model binary.
aLMAlpha: The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta: The beta hyperparameter of the CTC decoder. Word insertion weight.

Exceptions

ArgumentException: Thrown when the native binary failed to enable decoding with a language model.
FileNotFoundException: Thrown when cannot find the language model or trie file.

unsafe void DeepSpeechClient.DeepSpeech.FeedAudioContent(DeepSpeechStream stream, short [] aBuffer, uint aBufferSize)

Feeds audio samples to an ongoing streaming inference.

Parameters

stream: Instance of the stream to feed the data.
aBuffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

unsafe string DeepSpeechClient.DeepSpeech.FinishStream(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Parameters

stream: Instance of the stream to finish.

unsafe Metadata DeepSpeechClient.DeepSpeech.FinishStreamWithMetadata(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The extended metadata result.

Parameters

stream: Instance of the stream to finish.

unsafe string DeepSpeechClient.DeepSpeech.IntermediateDecode(DeepSpeechStream stream)

Computes the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters

stream: Instance of the stream to decode.

unsafe void DeepSpeechClient.DeepSpeech.PrintVersions(): Prints the versions of Tensorflow and DeepSpeech.

unsafe DeepSpeechStream DeepSpeechClient.DeepSpeech.CreateStream(): Creates a new streaming inference state.

unsafe void DeepSpeechClient.DeepSpeech.FreeStream(DeepSpeechStream stream): Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

unsafe string DeepSpeechClient.DeepSpeech.SpeechToText(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The STT result. Returns NULL on error.

Parameters

aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize: The number of samples in the audio signal.

unsafe Metadata DeepSpeechClient.DeepSpeech.SpeechToTextWithMetadata(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The extended metadata. Returns NULL on error.

Parameters

aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize: The number of samples in the audio signal.

DeepSpeechStream Class¶

Warning

doxygenclass: Cannot find class “DeepSpeechClient::DeepSpeechStream” in doxygen xml output for project “deepspeech-dotnet” from directory: xml-dotnet/

ErrorCodes¶

enum DeepSpeechClient::Enums::ErrorCodes¶

Error codes from the native DeepSpeech binary.

Values:

DS_ERR_OK = 0x0000¶

DS_ERR_NO_MODEL = 0x1000¶

DS_ERR_INVALID_ALPHABET = 0x2000¶

DS_ERR_INVALID_SHAPE = 0x2001¶

DS_ERR_INVALID_LM = 0x2002¶

DS_ERR_MODEL_INCOMPATIBLE = 0x2003¶

DS_ERR_FAIL_INIT_MMAP = 0x3000¶

DS_ERR_FAIL_INIT_SESS = 0x3001¶

DS_ERR_FAIL_INTERPRETER = 0x3002¶

DS_ERR_FAIL_RUN_SESS = 0x3003¶

DS_ERR_FAIL_CREATE_STREAM = 0x3004¶

DS_ERR_FAIL_READ_PROTOBUF = 0x3005¶

DS_ERR_FAIL_CREATE_SESS = 0x3006¶

Metadata¶

struct Metadata¶

Package Attributes

unsafe IntPtr DeepSpeechClient.Structs.Metadata.items: Native list of items.

unsafe int DeepSpeechClient.Structs.Metadata.num_items: Count of items from the native side.

unsafe double DeepSpeechClient.Structs.Metadata.confidence: Approximated confidence value for this transcription.

MetadataItem¶

struct MetadataItem¶

Package Attributes

unsafe IntPtr DeepSpeechClient.Structs.MetadataItem.character: Native character.

unsafe int DeepSpeechClient.Structs.MetadataItem.timestep: Position of the character in units of 20ms.

unsafe float DeepSpeechClient.Structs.MetadataItem.start_time: Position of the character in seconds.