Java¶

DeepSpeechModel¶

class DeepSpeechModel¶

Exposes a DeepSpeech model in Java.

Public Functions

org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.DeepSpeechModel(String modelPath)

An object providing an interface to a trained DeepSpeech model.

Parameters

modelPath: The path to the frozen model graph.

long org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.beamWidth()

Get beam width value used by the model. If setModelBeamWidth was not called before, will return the default value loaded from the model file.

Return: Beam width value used by the model.

int org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.setBeamWidth(long beamWidth)

Set beam width value used by the model.

Return

Zero on success, non-zero on failure.

Parameters

aBeamWidth: The beam width used by the model. A larger beam width value generates better results at the cost of decoding time.

int org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.sampleRate()

Return the sample rate expected by the model.

Return: Sample rate.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.freeModel(): Frees associated resources and destroys model object.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.enableExternalScorer(String scorer)

Enable decoding using an external scorer.

Return

Zero on success, non-zero on failure (invalid arguments).

Parameters

scorer: The path to the external scorer file.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.disableExternalScorer()

Disable decoding using an external scorer.

Return: Zero on success, non-zero on failure (invalid arguments).

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.setScorerAlphaBeta(float alpha, float beta)

Enable decoding using beam scoring with a KenLM language model.

Return

Zero on success, non-zero on failure (invalid arguments).

Parameters

alpha: The alpha hyperparameter of the decoder. Language model weight.
beta: The beta hyperparameter of the decoder. Word insertion weight.

Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.sttWithMetadata(short [] buffer, int buffer_size, int num_results)

Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.

Return

Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.

Parameters

buffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
buffer_size: The number of samples in the audio signal.
num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

DeepSpeechStreamingState org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.createStream()

Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().

Return: An opaque object that represents the streaming state.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.feedAudioContent(DeepSpeechStreamingState ctx, short [] buffer, int buffer_size)

Feed audio samples to an ongoing streaming inference.

Parameters

cctx: A streaming state pointer returned by createStream().
buffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
buffer_size: The number of samples in buffer.

String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecode(DeepSpeechStreamingState ctx)

Compute the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters

ctx: A streaming state pointer returned by createStream().

Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecodeWithMetadata(DeepSpeechStreamingState ctx, int num_results)

Compute the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters

ctx: A streaming state pointer returned by createStream().
num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStream(DeepSpeechStreamingState ctx)

Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.

Return

The STT result.

Note

This method will free the state pointer (ctx).

Parameters

ctx: A streaming state pointer returned by createStream().

Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStreamWithMetadata(DeepSpeechStreamingState ctx, int num_results)

Compute the final decoding of an ongoing streaming inference and return the results including metadata. Signals the end of an ongoing streaming inference.

Return

Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.

Note

This method will free the state pointer (ctx).

Parameters

ctx: A streaming state pointer returned by createStream().
num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.

Metadata¶

class Metadata¶: An array of CandidateTranscript objects computed by the model.

MetadataItem¶

class MetadataItem¶

Stores each individual character, along with its timing information

Public Functions

String org.mozilla.deepspeech.libdeepspeech.MetadataItem.getCharacter(): The character generated for transcription

int org.mozilla.deepspeech.libdeepspeech.MetadataItem.getTimestep(): Position of the character in units of 20ms

float org.mozilla.deepspeech.libdeepspeech.MetadataItem.getStart_time(): Position of the character in seconds