Elements of Multimodal Design

5 min readDec 18, 2020

Multimodal design is the art and science of creating a user interface that includes multiple touchpoints using different modalities. The concept is to produce an interface that combines these different modalities that organically fit together, such as using voice as an input mechanism and a graphical user interface as the output for the user. Instead of relying on one modality, multimodal design doesn’t exclude any type of input or output.

Input Versus Output

Let’s start with the basics: input and output modalities. Think of an input modality as any way of interacting with a system. This could be through your mouse, keyboard, or touchpad. However, there are many other options, such as speech, gestures, and physiological events. In summary, the user is providing information to a system, ideally in the most efficient and effortless way possible.

With a Voice User Interface (VUI) it’s fairly straightforward from a user’s perspective. The user wants something and simply asks for it. This is commonly referred to as the intent. The user then expects the system to come back with a relevant response, either as an action or as information.

This is nothing new, especially when you think about IVR systems that have been in use for years. However, conversation design is improving and AI engines are becoming better at understanding the user through machine learning and Natural Language Processing (NLP).

On the other hand, the output modality could be a graphical user interface (GUI), audio output, or even an AI-enabled speech.

A Simple Multimodal Design Example — Smart Living

There is a lot of input coming from the user, whether explicitly or implicitly, that influences his or her experience. A simple example is location. When the user comes homes, the smart home system picks up that the owner has arrived at his or her place and starts a sequence of automated actions. For example, turning on the lights and radio. This is the output.

Next, the user could either use a remote control or ask the AI-enabled assistant to adjust the air-conditioning or thermostat (input). Again, a fairly straight forward experience.

As designers, we have to take into account that we can make life easier by incorporating and automating these actions through different modalities.

In this scenario, we have the following input modalities:

The user’s location
The user’s voice
A physical control element

And the following output modalities could be a part of the experience:

Audio playback
Lighting
A Graphical User Interface

All in all, a pretty simple, but relaxing homecoming experience.

A Complex Multimodal Design Experience — Health Management

Coming home, driving a car, commuting to work are typical and popular examples where multimodal design is growing in popularity. Yet many other industries could profit from focusing more on designing multimodal experiences.

However, nowadays, we’re capable of capturing much more input from the user. We can measure the stress levels, heartbeat, sleep cycles, water intake, … and the list goes on. APIs, could be used to connect health-related issues and really any type of input that a human can produce with a system that is capable of synthesising their input.

We have this input available and a system could give the right advice and keep track of what you’re doing. First Aid by Red Cross and Fitbit are one of those examples that helps you keep an overview of what’s happening with your body and mental state. A health app could give you a warning, through vibration that let’s the user know that his or her heart rate is increasing too quickly.

In this scenario, we have the following input modalities:

The user’s location
The user’s heart rate
External input: weather, pollution, …

And the following output modalities:

Speech (system — motivational)
A Graphical User Interface
Vibration or sound from the device

Even though the modalities in this example are fairly limited, there’s a lot more complexity to it and data to be analysed in these scenarios. It becomes even more complex when a patient has to start using home remedies and medication where guidance may be missing or may be unclear.

Conversation Design

Conversation design is arguably the most human way of interacting with a computer. We grew up communicating with other people, usually in our language.

Conversation design is primarily focused on text and speech interactions. They should not be limited to text and speech alone. For example, a chatbot should be capable of showing dynamic graphics to provide the best possible response.

Nevertheless, even though from a UX point of view where most of the experiences have been focused on visuals, the impact that conversation design will have should not be underestimated. Talking to a human, or a comprehensive AI-assisted robot could help the user jump from point A to B much quicker and without having to understand a GUI or find his or her answer through the information architecture of a system. You simply ask.

Conversation design will be pioneering a new way of humans interacting with systems. And, in an ideal world, you wouldn’t even have to ask. Rather, an ideal user experiences would be provided through the user’s input that he or she has actively and passively provided.

Prediction Based on History

So far, we’ve been discussing explicit input modalities, such as speech and keyboard input. However, equally important is analysing a user’s past behaviour.

In banking, we can identify which profiles carry a higher risk than others. This information isn’t classified as a modality, but is equally important in providing the right experience. It gives an idea of what’s going on with a customer’s financial situation and creates the opportunity to support him or her and offer this person the right services to help them reach their financial objectives.

Even though this is often more of an educational piece, it fits the concept of multimodal design where we use all possible channels to provide the right direction and experience.

What’s Next

Chatbots and VUIs will greatly impact the next generation of user experiences. And so will other related technologies. A user’s intent can be interpreted as well as fulfilled in many ways.

In the near future, systems will recognise the user’s intent without having to explicitly receive that intent from the user. Obviously, we will still maintain control over these systems. At least, let’s hope we will.

We offer multimodal product design and strategy to deliver experiences that your users will love.