What is voice analytics?


In market research, consumer feedback is mostly collected by answering scales and writing texts. In qualitative market research, consumer feedback is also predominantly analyzed using traditional content analysis.


In this traditional analysis, however, the pitch in the consumer’s voice usually plays no role, or only a subordinate role.


That should be different! Because the language of consumers implies important information about their emotional state when they talk about brands, products, concepts, motivations for their behavior or, for example, product or packaging designs.


Voice analytics is an implicit method that analyzes the voice of speakers to provide an objective assessment of the affective state of consumers.


If we understand the emotional impact of a stimulus, we can not only understand what consumers are saying, but also better understand if they really mean what they are saying.

The voice analytics technology we use is based on state-of-the-art AI machine learning technology that continuously learns acoustic patterns of emotional expressions in speech. This is because the diversity of the training data is an important factor in the evaluation of audio recordings: For example, the technology considers speakers of different genders, numerous languages and dialects, and different ethnic groups. This is done on the basis of many thousands of hours of training data, which is continuously being expanded.


The accuracy of the analysis is continuously validated through series of tests and using an emotional label assigned by a team of human experts.

How does voice analytics work?


Setup & Execution

We implement recording technology both in quantitative questionnaires for all relevant questions with open answers, and in recordings of focus groups or in-depth interviews. Recording increases the motivation and willingness of the test subjects and leads to higher data quality.


For proper personal calibration, an exercise recording is made of each subject prior to the actual interview.



With the help of the emotion algorithm, the affective state of speakers is analyzed based on the fundamental frequency of the voice, voice quality, loudness, and spectral characteristics such as timbre and rhythm.


Here, voice calibration is used to define a neutral individual emotional state.


In addition, all recordings are automatically transcribed. The emotion analysis identifies the positive or negative meaning of the spoken text and marks the utterances accordingly.



The audio data is then segmented based on voice activity detection (and exclusion of audio segments with poor audio quality) and tagged with the identified class-based emotion.


  • An emotional profile of each test person is created while they talk about a test stimulus. The stimulus can be a brand, a concept, a package design, or other communication.
  • In addition to the various emotions, their activation level and valence are also determined, i.e. the strength of an activation and the emotional mood. In this way, it can be measured whether a product triggers serenity or excitement, disinterest or enthusiasm, thoughtfulness or irritation – and to what extent.
  • The software automatically creates emotion-related audio summaries and organizes them in a database. This enables a quick retrieval of all relevant recordings, e.g. of moments of enthusiasm: Thus, the respective quotes can be output and further processed both as audio file or text file.  


Improve product or communication testing with emotional feedback! Our method analyzes the emotional state of your customers during the evaluation. This gives you a comprehensive insight into the emotional user experience.