API Reference

class SimpleAudioIndexer.SimpleAudioIndexer(src_dir, mode, username_ibm=None, password_ibm=None, ibm_api_limit_bytes=100000000, verbose=False, needed_directories=set(['filtered', 'staging']))

Indexes audio and searches for a string within it or matches a regex pattern.

Audio files that are intended to be indexed should be in wav format, placed in a same directory and the absolute path to that directory should be passed as src_dir upon initialization.

Call the method index_audio (which results in calling index_audio_ibm or index_audio_cmu based on the given mode) prior to searching or accessing timestamps, unless you have saved the data for your previously indexed audio (in that case, load_indexed_audio method must be used)

You may see timestamps of the words that have been indexed so far sorted by audio files and the time of their occurance, by calling the method get_audio_timestamps.

You may saved the indexed audio data (which is basically just the time- regularized timestamps) via save_indexed_audio method and load it back via load_indexed_audio

Do exhustive search with the method search_all, do iterative search with the method search_gen or do regex based search with the method search_regexp

For more information see the docs and read usage guide.

Attributes:
mode : {“ibm”, “cmu”}

specifying whether speech to text engine is IBM’s Watson or Pocketsphinx.

src_dir : str

Absolute path to the source directory of audio files such that the absolute path of the audio that’ll be indexed would be src_dir/audio_file.wav

verbose : bool, optional

True if progress needs to be printed. Default is False.

ibm_api_limit_bytes : int, optional

It holds the API limitation of Watson speech api http sessionless which is 100Mbs. Default is 100000000.

Methods

get_mode()  
get_username_ibm()  
set_username_ibm()  
get_password_ibm()  
set_password_ibm()  
get_verbosity()  
set_verbosity()  
get_timestamps() Returns a corrected dictionary whose key is the original file name and whose value is a list of words and their beginning and ending time. It accounts for large files and does the timing calculations to return the correct result.
get_errors() Returns a dictionary that has all the erros that have occured while processing the audio file. Dictionary contains time of error, file that had the error and the actual error.
_index_audio_ibm(name=None, continuous=True, model=”en-US_BroadbandModel”, word_confidence=True, word_alternatives_threshold=0.9, profanity_filter_for_US_results=False) Implements a searching-suitable interface for the Watson API
_index_audio_cmu(name=None) Implements an experimental interface for the CMu Pocketsphinx
index_audio(*args, **kwargs) Returns a corrected dictionary whose key is the original file name and whose value is a list of words and their beginning and ending time. It accounts for large files and does the timing calculations to return the correct result.
save_indexed_audio(indexed_audio_file_abs_path)  
load_indexed_audio(indexed_audio_file_abs_path)  
search_gen(query, audio_basename=None, case_sensitive=False, subsequence=False, supersequence=False, timing_error=0.0, anagram=False, missing_word_tolerance=0) A generator which returns a valid search result at each iteraiton.
search_all(queries, audio_basename=None, case_sensitive=False, subsequence=False, supersequence=False, timing_error=0.0, anagram=False, missing_word_tolerance=0) Returns a dictionary of all results of all of the queries for either all of the audio files or the audio_basename.
search_regexp(pattern, audio_basename=None) Returns a dictionary of all results which matched pattern for either all of the audio files or the auio_basename
get_mode(self)

Returns whether the instance is initialized with ibm or cmu mode.

Returns:
str
get_username_ibm(self)
Returns:
str, None

Returns str if mode is ibm, else None

set_username_ibm(self, username_ibm)
Parameters:
username_ibm : str
Raises:
Exception

If mode is not ibm

get_password_ibm(self)
Returns:
str, None

Returns str if mode is ibm, else None

set_password_ibm(self, password_ibm)
Parameters:
password_ibm : str
Raises:
Exception

If mode is not ibm

get_verbosity(self)

Returns whether the instance is initialized to be quite or loud while processing audio files.

Returns:
bool

True for being verbose.

set_verbosity(self, pred)
Parameters:
pred : bool
get_timestamps(self)

Returns a dictionary whose keys are audio file basenames and whose values are a list of word blocks. In case the audio file was large enough to be splitted, it adds seconds to correct timing and in case the timestamp was manually loaded, it leaves it alone.

Returns:
{str: [[str, float, float]]}
get_errors(self)

Returns a dictionary containing any errors while processing the audio files. Works for either mode.

Returns:
{(float, str): any}

The return is a dictionary whose keys are tuples whose first elements are the time of the error and whose second values are the audio file’s name. The values of the dictionary are the actual errors.

index_audio(self, *args, **kwargs)

Calls the correct indexer function based on the mode.

If mode is ibm, _indexer_audio_ibm is called which is an interface for Watson. Note that some of the explaination of _indexer_audio_ibm’s arguments is from [1]

If mode is cmu, _indexer_audio_cmu is called which is an interface for PocketSphinx Beware that the output would not be sufficiently accurate. Use this only if you don’t want to upload your files to IBM.

Parameters:
mode : {“ibm”, “cmu”}
basename : str, optional

A specific basename to be indexed and is placed in src_dir e.g audio.wav.

If None is selected, all the valid audio files would be indexed. Default is None.

replace_already_indexed : bool

True, To reindex some audio file that’s already in the timestamps.

Default is False.

continuous : bool

Valid Only if mode is ibm

Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned. If true, such phrases are returned; if false (the default), recognition ends after the first end-of-speech (EOS) incident is detected.

Default is True.

model : {

‘ar-AR_BroadbandModel’, ‘en-UK_BroadbandModel’ ‘en-UK_NarrowbandModel’, ‘en-US_BroadbandModel’, (the default) ‘en-US_NarrowbandModel’, ‘es-ES_BroadbandModel’, ‘es-ES_NarrowbandModel’, ‘fr-FR_BroadbandModel’, ‘ja-JP_BroadbandModel’, ‘ja-JP_NarrowbandModel’, ‘pt-BR_BroadbandModel’, ‘pt-BR_NarrowbandModel’, ‘zh-CN_BroadbandModel’, ‘zh-CN_NarrowbandModel’

}

Valid Only if mode is ibm

The identifier of the model to be used for the recognition

Default is ‘en-US_BroadbandModel’

word_confidence : bool

Valid Only if mode is ibm

Indicates whether a confidence measure in the range of 0 to 1 is returned for each word.

The default is True. (It’s False in the original)

word_alternatives_threshold : numeric

Valid Only if mode is ibm

A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive.

Default is 0.9.

profanity_filter_for_US_results : bool

Valid Only if mode is ibm

Indicates whether profanity filtering is performed on the transcript. If true, the service filters profanity from all output by replacing inappropriate words with a series of asterisks.

If false, the service returns results with no censoring. Applies to US English transcription only.

Default is False.

Raises:
OSError

Valid only if mode is cmu.

If the output of pocketsphinx command results in an error.

References

[1]: https://ibm.com/watson/developercloud/speech-to-text/api/v1/

Else if mode is cmu, then _index_audio_cmu would be called:

save_indexed_audio(self, indexed_audio_file_abs_path)

Writes the corrected timestamps to a file. Timestamps are a python dictionary.

Parameters:
indexed_audio_file_abs_path : str
load_indexed_audio(self, indexed_audio_file_abs_path)
Parameters:
indexed_audio_file_abs_path : str
search_gen(self, query, audio_basename=None, case_sensitive=False, subsequence=False, supersequence=False, timing_error=0.0, anagram=False, missing_word_tolerance=0)

A generator that searches for the query within the audiofiles of the src_dir.

Parameters:
query : str

A string that’ll be searched. It’ll be splitted on spaces and then each word gets sequentially searched.

audio_basename : str, optional

Search only within the given audio_basename.

Default is None

case_sensitive : bool, optional

Default is False

subsequence : bool, optional

True if it’s not needed for the exact word be detected and larger strings that contain the given one are fine.

If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.

Default is False.

supersequence : bool, optional

True if it’s not needed for the exact word be detected and smaller strings that are contained within the given one are fine.

If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.

Default is False.

anagram : bool, optional

True if it’s acceptable for a complete permutation of the word to be found. e.g. “abcde” would be acceptable for “edbac”.

If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.

Default is False.

timing_error : None or float, optional

Sometimes other words (almost always very small) would be detected between the words of the query. This parameter defines the timing difference/tolerance of the search.

Default is 0.0 i.e. No timing error is tolerated.

missing_word_tolerance : int, optional

The number of words that can be missed within the result. For example, if the query is “Some random text” and the tolerance value is 1, then “Some text” would be a valid response. Note that the first and last words cannot be missed. Also, there’ll be an error if the value is more than the number of available words. For the example above, any value more than 1 would have given an error (since there’s only one word i.e. “random” that can be missed)

Default is 0.

Yields:
{“File Name”: str, “Query”: query, “Result”: (float, float)}

The result of the search is returned as a tuple which is the value of the “Result” key. The first element of the tuple is the starting second of query and the last element is the ending second of query

Raises:
AssertionError

If missing_word_tolerance value is more than the total number of words in the query minus 2 (since the first and the last word cannot be removed)

search_all(self, queries, audio_basename=None, case_sensitive=False, subsequence=False, supersequence=False, timing_error=0.0, anagram=False, missing_word_tolerance=0)

Returns a dictionary of all results of all of the queries for all of the audio files. All the specified parameters work per query.

Parameters:
queries : [str] or str

A list of the strings that’ll be searched. If type of queries is str, it’ll be insterted into a list within the body of the method.

audio_basename : str, optional

Search only within the given audio_basename.

Default is None.

case_sensitive : bool

Default is False

subsequence : bool, optional

True if it’s not needed for the exact word be detected and larger strings that contain the given one are fine.

If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.

Default is False.

supersequence : bool, optional

True if it’s not needed for the exact word be detected and smaller strings that are contained within the given one are fine.

If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.

Default is False.

anagram : bool, optional

True if it’s acceptable for a complete permutation of the word to be found. e.g. “abcde” would be acceptable for “edbac”.

If the query is a sentences with multiple words, it’ll be considered for each word, not the whole sentence.

Default is False.

timing_error : None or float, optional

Sometimes other words (almost always very small) would be detected between the words of the query. This parameter defines the timing difference/tolerance of the search.

Default is 0.0 i.e. No timing error is tolerated.

missing_word_tolerance : int, optional

The number of words that can be missed within the result. For example, if the query is “Some random text” and the tolerance value is 1, then “Some text” would be a valid response. Note that the first and last words cannot be missed. Also, there’ll be an error if the value is more than the number of available words. For the example above, any value more than 1 would have given an error (since there’s only one word i.e. “random” that can be missed)

Default is 0.

Returns:
search_results : {str: {str: [(float, float)]}}

A dictionary whose keys are queries and whose values are dictionaries whose keys are all the audiofiles in which the query is present and whose values are a list whose elements are 2-tuples whose first element is the starting second of the query and whose values are the ending second. e.g. {“apple”: {“fruits.wav” : [(1.1, 1.12)]}}

Raises:
TypeError

if queries is neither a list nor a str

search_regexp(self, pattern, audio_basename=None)

First joins the words of the word_blocks of timestamps with space, per audio_basename. Then matches pattern and calculates the index of the word_block where the first and last word of the matched result appears in. Then presents the output like search_all method.

Note that the leading and trailing spaces from the matched results would be removed while determining which word_block they belong to.

Parameters:
pattern : str

A regex pattern.

audio_basename : str, optional

Search only within the given audio_basename.

Default is False.

Returns:
search_results : {str: {str: [(float, float)]}}

A dictionary whose keys are queries and whose values are dictionaries whose keys are all the audiofiles in which the query is present and whose values are a list whose elements are 2-tuples whose first element is the starting second of the query and whose values are the ending second. e.g. {“apple”: {“fruits.wav” : [(1.1, 1.12)]}}