A comparison of cloud-based speech recognition engines

Andrey L. Herchonvicz; Cristiano R. Franco; Marcio G. Jasinski

Data de publicação: 29/05/2019

Human-machine interaction is present in our routines and has become increasingly natural these days. Devices can record a person’s speech, transcribe into text and execute tasks accordingly. This kind of interaction provides more productivity for several operations since it allows users to have hands free through a more natural interface. Moreover, the speech recognition engines need to assure reliability and speed. However, the maturity of speech recognition systems vary from providers and most importantly accordingly to the language. For instance, Brazilian Portuguese language has a particularity of using several foreign terms, especially if we consider corporate environments.
In this paper, an experiment was conducted, to evaluate three speech recognition engines regarding accuracy and performance: Bing Speech API, Google Cloud Speech and IBM Watson Speech to Text. To obtain the accuracy value, we used a well-known string similarity algorithm. The results showed a high level of accuracy for Google Cloud Speech and Bing Speech API. However, the best accuracy provided by Google services came with a cost on performance – requiring additional time to provide the speech to text transcription.

Anais do Computer on the Beach

O Computer on the Beach é um evento técnico-científico que visa reunir profissionais, pesquisadores e acadêmicos da área de Computação, a fim de discutir as tendências de pesquisa e mercado da computação em suas mais diversas áreas.

Access journal

Portal de Periódicos

Anais do Computer on the Beach

Portal de Periódicos

Pesquisa

A comparison of cloud-based speech recognition engines

Anais do Computer on the Beach