Customised large-vocabulary speech recognition

Highly accurate speech recognition is one of the most exciting results of the Deep-Learning revolution. Until recently, such accuracy could only be achieved within severely restricted domains, by limiting the ‘vocabulary’ of words to be recognized. Through the magic of Deep Learning, we can now recognize virtually unrestricted domains with sufficient accuracy to support commercial applications.

At Saigen, we develop such large-vocabulary recognizers that are optimized for our customers’ specific needs. In a call centre, for example, the requirement may be for highly accurate recognition of certain key elements (related to legal compliance, for example) while maintaining good coverage of a broad domain of other topics that may occur in a telephone conversation. Or a media monitoring company may need adaptable keyword recognition in several languages, for which Deep-Learning based speech recognition is not available.

Using the most advanced open-source software in conjunction with our own tools, resources and expertise, we work with customers to ensure that speech recognition meets their business needs, with cost efficiency and user-friendliness as important characteristics.


SAIGEN is part of the Alphawave group, which employs 165+ people, 100+ of which are engineers.

We develop customised large-vocabulary speech-recognition systems for commercial applications. While being well-published academics, we have also collaborated with international partners to solve interesting and challenging problems: we were part of the consortium which won the recent IARPA-sponsored spoken term detection Babel-program, we worked with Google on Voice search for the South African languages and were the first to build speech recognition systems in all 11 of South Africa’s official languages.

Our Team


Etienne Barnard

CEO & Founder

Etienne obtained his Ph. D. in Electrical and Computer Engineering from Carnegie Mellon University in 1989, writing his thesis on “Neural Networks for Scene Analysis”. He has since then been active in research and development in pattern recognition and speech processing. He has held a number of academic positions, and has also worked in industry. Etienne has co-authored more than 250 refereed scientific publications, on topics including pattern recognition, neural networks, speech recognition and human-computer interaction. He holds a number of international patents in speech processing and is a past Associate Editor of the IEEE Transactions on Neural Networks. Etienne’s research contributions have been recognized in various ways, including Google Research Awards in 2009, 2011 and 2014.


Charl van Heerden

CTO & Founder

Charl completed his B. Eng at the University of Pretoria, and started working with Etienne (then his M. Eng supervisor) at the CSIR in 2005. He interned with Google in 2007 (language modeling for GOOG-411), 2008 (language modeling for Voice Search) and 2010 (Voice Search for Afrikaans, English and isiZulu), before returning to South Africa to complete his Ph. D. in Computer Engineering (thesis: “Efficient training of Support Vector Machines and their hyperparameters”), co-authoring 40+ peer-reviewed papers on speech recognition and general pattern recognition. More recently, he was part of the NWU MuST team, which was part of the consortium which won the international IARPA-sponsored Babel competition.

Application examples

Speech analytics in the call centre

In a large call centre, thousands of hours of speech are recorded each day. That speech is a potential treasure trove of information on topics such as the following:

  • Are operators complying with the legal and other requirements of their respective tasks?
  • Are there identifiable operator behaviours that correlate with successful call outcomes?
  • Are customers raising common issues that are not known within the rest of the business?

In many call centres such issues are partially addressed by quality-control staff who listen to a small sample of the recorded calls. However, such QC is both expensive and limited in scope, since it is difficult for each QC auditor to keep track of the statistics of subtle patterns that occur in highly variable telephone conversations.

We therefore offer speech-recognition based speech analytics, that is optimized for the dialogues that occur in a particular call centre. This solution, which can be cloud-based or deployed on-premise, is surprisingly cost effective and can discover patterns in both customer and operator speech turns that have significant business impact.

Monitoring web-based and broadcast media

Podcasts, radio and TV broadcasts, user-generated content in social media … there is a massive universe of spoken content that is available to the public but hard to utilize for business purposes. However, by tailoring a speech-recognition platform to transcribe such content, it is possible to develop insights into public perceptions and media coverage on a large scale.

Our ability to monitor spoken media content in a variety of languages is useful for purposes such as verifying advertisement transmissions, analyzing editorial content and understanding social trends.


Speech technology in the era of Big Data

Public interest in speech technology recently increased dramatically, as Google Duplex was demonstrated, and elicited a wide range of reactions. Most observers were stunned at the apparent sophistication and power of the application, which can book appointments over the telephone by “conducting natural conversations” (to quote the Google blog that accompanied the public release of …