Modality Switching Online


I hate it when my dad leaves me a voicemail. Whenever I open my phone and see the pending voicemail, I roll my eyes. He tends to meander. My dad’s messages can range from 40 seconds to 2 minutes. He typically wants to inform me of something, like an upcoming family event or an important-looking envelope under my name. But, the voicemail is typically filled with uhm’s and uhh’s, distractions, loss of words… as I impatiently wait for the key information! I try to scan through the voicemail transcript, but transcription is awful on the iPhone. Why doesn’t he just text me? Why doesn’t he understand that text is much easier to read, search, and reformat?

Okay, let’s turn the tables and put myself in my dad’s shoes. My dad calls me because real-time communication is ideal—totally fair. But, when I don’t pick up, he leaves a voicemail to avoid introducing a second sense or skill. To my dad, it’s the most efficient modality. Why should he switch channels?

By having different preferences as a sender and a receiver, my dad and I face the modality problem. If he leaves a voicemail, he wins with convenience, while I lose the structure and search-ability. If he texts, the opposite happens. This mismatch in preferred modality impacts our workplace. Receivers are frustrated with senders who prefer Zoom calls over emails (the classic “this could’ve been an email’ meme). Conversely, senders are frustrated by receivers who don’t offer quality, up-to-date video content.

This mismatch feels like a natural law of internet communication. However, Loom, a video recording tool for teams, challenged me on this. If you know Loom, you know it as an easy way to record your screen & camera, edit videos, and share them. If you are reporting a bug or building a training series, Loom is your go-to. Over the year, Loom has released a slew of AI features; this new one really takes the cake:

Loom’s new AI workflows let you to turn your video into any document format. With a couple clicks, Loom takes a transcript and generates an SOP, bug report, documentation, or any template created by your company. With AI, the video-preferred sender and text-preferred receiver can both win.

Here’s a common example: you’re working on your company’s product, and the screen goes blank after clicking a button. How do you report this? Since interaction caused the bug, a video recording is best. You make a 30-second video and spend another 90 seconds finding the bug intake form and uploading the video. However, the form has more required fields: 1) what type of issue is this, 2) explain the issue, 3) provide the steps to re-create the issue. What was once a two-minute task now becomes five minutes of paperwork! However, this documentation is critical for developers to triage the issue. You company’s Head of Product wonders if senders stop sharing bugs due to time constraints and receivers not solving bugs due to poor documentation.

Loom was recently acquired by Atlassian, the creator of the largest issue tracking tool for developers. If anyone is motivated to improve the quality and timeliness of data in tracking tools, it’s Atlassian. This acquisition exemplifies how tech companies will spend serious money to solve the modality problem over the next few years. Atlassian said the same in its announcement post:

By integrating Atlassian’s and Loom’s investments in AI, customers will be able to seamlessly transition between video, transcripts, summaries, documents, and the workflows derived from them.

At Kibu, a tool for taking and reviewing notes, the modality problem is top of mind. Those that take notes (the senders) are typically service workers who spend a lot of time on their feet, constantly addressing the needs of special needs individuals. They don’t have much time for detailed forms. Conversely, those that review the notes (the receivers) are administrators who must report daily, high-quality notes to government entities for essential funding. I wonder how Kibu’s product can combine the convenience of casual note-taking with the necessity of formal progress reviews. How would senders like taking notes? Talking in a microphone? Taking a hand-written note? Posting on a Facebook-like interface? I’m not sure, but as a tech company, we have an immense opportunity to use automation, AI, and UX to solve the modality problem for our customers.