.Net Framework

DeepSpeech Interface

interface IDeepSpeech

Client interface of the Mozilla’s DeepSpeech implementation.

Subclassed by DeepSpeechClient.DeepSpeech

Public Functions

void DeepSpeechClient.Interfaces.IDeepSpeech.PrintVersions()

Prints the versions of Tensorflow and DeepSpeech.

unsafe int DeepSpeechClient.Interfaces.IDeepSpeech.GetModelSampleRate()

Return the sample rate expected by the model.

Return

Sample rate.

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.EnableDecoderWithLM(string aLMPath, string aTriePath, float aLMAlpha, float aLMBeta)

Enable decoding using beam scoring with a KenLM language model.

Parameters
  • aLMPath: The path to the language model binary file.

  • aTriePath: The path to the trie file build from the same vocabulary as the language model binary.

  • aLMAlpha: The alpha hyperparameter of the CTC decoder. Language Model weight.

  • aLMBeta: The beta hyperparameter of the CTC decoder. Word insertion weight.

Exceptions
  • ArgumentException: Thrown when the native binary failed to enable decoding with a language model.

  • FileNotFoundException: Thrown when cannot find the language model or trie file.

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.SpeechToText(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The STT result. Returns NULL on error.

Parameters
  • aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize: The number of samples in the audio signal.

unsafe Metadata DeepSpeechClient.Interfaces.IDeepSpeech.SpeechToTextWithMetadata(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The extended metadata. Returns NULL on error.

Parameters
  • aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize: The number of samples in the audio signal.

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FreeStream(DeepSpeechStream stream)

Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

unsafe DeepSpeechStream DeepSpeechClient.Interfaces.IDeepSpeech.CreateStream()

Creates a new streaming inference state.

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FeedAudioContent(DeepSpeechStream stream, short [] aBuffer, uint aBufferSize)

Feeds audio samples to an ongoing streaming inference.

Parameters
  • stream: Instance of the stream to feed the data.

  • aBuffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.IntermediateDecode(DeepSpeechStream stream)

Computes the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters
  • stream: Instance of the stream to decode.

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.FinishStream(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Parameters
  • stream: Instance of the stream to finish.

unsafe Metadata DeepSpeechClient.Interfaces.IDeepSpeech.FinishStreamWithMetadata(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The extended metadata result.

Parameters
  • stream: Instance of the stream to finish.

DeepSpeech Class

class

Client of the Mozilla’s deepspeech implementation.

Public Functions

DeepSpeechClient.DeepSpeech.DeepSpeech(string aModelPath, uint aBeamWidth)

Initializes a new instance of DeepSpeech class and creates a new acoustic model.

Parameters
  • aModelPath: The path to the frozen model graph.

  • aBeamWidth: The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.

Exceptions
  • ArgumentException: Thrown when the native binary failed to create the model.

unsafe int DeepSpeechClient.DeepSpeech.GetModelSampleRate()

Return the sample rate expected by the model.

Return

Sample rate.

unsafe void DeepSpeechClient.DeepSpeech.Dispose()

Frees associated resources and destroys models objects.

unsafe void DeepSpeechClient.DeepSpeech.EnableDecoderWithLM(string aLMPath, string aTriePath, float aLMAlpha, float aLMBeta)

Enable decoding using beam scoring with a KenLM language model.

Parameters
  • aLMPath: The path to the language model binary file.

  • aTriePath: The path to the trie file build from the same vocabulary as the language model binary.

  • aLMAlpha: The alpha hyperparameter of the CTC decoder. Language Model weight.

  • aLMBeta: The beta hyperparameter of the CTC decoder. Word insertion weight.

Exceptions
  • ArgumentException: Thrown when the native binary failed to enable decoding with a language model.

  • FileNotFoundException: Thrown when cannot find the language model or trie file.

unsafe void DeepSpeechClient.DeepSpeech.FeedAudioContent(DeepSpeechStream stream, short [] aBuffer, uint aBufferSize)

Feeds audio samples to an ongoing streaming inference.

Parameters
  • stream: Instance of the stream to feed the data.

  • aBuffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

unsafe string DeepSpeechClient.DeepSpeech.FinishStream(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Parameters
  • stream: Instance of the stream to finish.

unsafe Metadata DeepSpeechClient.DeepSpeech.FinishStreamWithMetadata(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The extended metadata result.

Parameters
  • stream: Instance of the stream to finish.

unsafe string DeepSpeechClient.DeepSpeech.IntermediateDecode(DeepSpeechStream stream)

Computes the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters
  • stream: Instance of the stream to decode.

unsafe void DeepSpeechClient.DeepSpeech.PrintVersions()

Prints the versions of Tensorflow and DeepSpeech.

unsafe DeepSpeechStream DeepSpeechClient.DeepSpeech.CreateStream()

Creates a new streaming inference state.

unsafe void DeepSpeechClient.DeepSpeech.FreeStream(DeepSpeechStream stream)

Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

unsafe string DeepSpeechClient.DeepSpeech.SpeechToText(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The STT result. Returns NULL on error.

Parameters
  • aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize: The number of samples in the audio signal.

unsafe Metadata DeepSpeechClient.DeepSpeech.SpeechToTextWithMetadata(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The extended metadata. Returns NULL on error.

Parameters
  • aBuffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize: The number of samples in the audio signal.

DeepSpeechStream Class

Warning

doxygenclass: Cannot find class “DeepSpeechClient::DeepSpeechStream” in doxygen xml output for project “deepspeech-dotnet” from directory: xml-dotnet/

ErrorCodes

enum DeepSpeechClient::Enums::ErrorCodes

Error codes from the native DeepSpeech binary.

Values:

DS_ERR_OK = 0x0000
DS_ERR_NO_MODEL = 0x1000
DS_ERR_INVALID_ALPHABET = 0x2000
DS_ERR_INVALID_SHAPE = 0x2001
DS_ERR_INVALID_LM = 0x2002
DS_ERR_MODEL_INCOMPATIBLE = 0x2003
DS_ERR_FAIL_INIT_MMAP = 0x3000
DS_ERR_FAIL_INIT_SESS = 0x3001
DS_ERR_FAIL_INTERPRETER = 0x3002
DS_ERR_FAIL_RUN_SESS = 0x3003
DS_ERR_FAIL_CREATE_STREAM = 0x3004
DS_ERR_FAIL_READ_PROTOBUF = 0x3005
DS_ERR_FAIL_CREATE_SESS = 0x3006

Metadata

struct Metadata

Package Attributes

unsafe IntPtr DeepSpeechClient.Structs.Metadata.items

Native list of items.

unsafe int DeepSpeechClient.Structs.Metadata.num_items

Count of items from the native side.

unsafe double DeepSpeechClient.Structs.Metadata.confidence

Approximated confidence value for this transcription.

MetadataItem

struct MetadataItem

Package Attributes

unsafe IntPtr DeepSpeechClient.Structs.MetadataItem.character

Native character.

unsafe int DeepSpeechClient.Structs.MetadataItem.timestep

Position of the character in units of 20ms.

unsafe float DeepSpeechClient.Structs.MetadataItem.start_time

Position of the character in seconds.