Sam Liang, chief executive and co-founder of artificial intelligence transcription start-up Otter.ai, has a plan to avoid wasting us all from infinite, boring meetings. His company is working on personalised AI avatars that may in the future give you the chance to attend online meetings on their owner’s behalf.
Founded in 2016 and based in Mountain View, California, Otter.ai has evolved from an easy voice-to-text transcription service to supply automatic recordings of live events, meeting summaries and content searches. Liang says he envisages Otter as a productivity tool that may improve attention and save everyone time. It built its speech recognition and summary service in-house and uses third-party large language model partners to supply an AI chatbot.
AI Exchange
This spin-off from our popular Tech Exchange series of dialogues will examine the advantages, risks and ethics of using artificial intelligence, by talking to those on the centre of its development
The start-up, which last raised $50mn in 2021, claims that it’s approaching 20mn users but has not provided information on what number of pay for its service. In 2022, it imposed latest limits on free users, offering a maximum of 600 minutes of transcription per 30 days. Paying customers receive much more. Competition within the sector is growing. Big Tech firms, equivalent to Google, offer their very own audio transcription services. Google can also be working on a project to create avatars in video conferencing.
Liang was born in China and moved to the US in 1991. He received a PhD from Stanford University before joining Google, where he led the search giant’s location services. His first start-up was acquired by Chinese ecommerce company, Alibaba.
In this conversation with Elaine Moore, the FT’s tech comment editor, Liang describes access to audio data as a brand new option to break down the silos in any business.
Elaine Moore: Can we start by talking about your plan to create AI avatars for meetings? How’s that going to work? What kind of information might be required and does it mean that eventually we won’t be required for meetings in any respect?
Sam Liang: The first step is to gather a great quantity of information from the user. The data can come from many various forms . . . crucial is meeting data.
(Take) the meetings I actually have had within the last seven years. I talked to enterprise capitalists; I talked to customers; I obviously have tons of internal meetings with our own teams: sales team, marketing team, recruiting team, engineering team. So that’s an enormous amount of information we will use. We need to use another data as well. For me, we could share Google documents I wrote, or other memos, a few of the emails, a few of the Slack messages.
The more you learn concerning the user, the higher the avatar will be. Then, we inject all this into the training system and construct a model that emulates them.
Of course, we’d like to check this and evaluate this method, so now we have asked our colleagues to check drive the avatar. They may ask it various questions, or we just send the avatar to a daily meeting and see the way it performs. We have our prototype we’re testing. It is way from perfect, so there’s still a protracted option to go. But it’s very promising,
EM: Is the concept that avatars will give you the chance to talk in addition to record what’s happening?
SL: Oh, yeah, absolutely, absolutely. The simplest type of meeting is a one-on-one meeting. So we will start with that. Another one we’re working on is what we call a sales agent. We train a sales agent that may discuss with a customer, and explain the product, and answer customers’ questions. That’s one other form. An avatar tries to emulate a selected person, but an agent can either emulate an individual or use the knowledge of multiple people collectively.
EM: You’ve said up to now you possibly can envisage a world during which any person is recording their entire day-to-day existence on Otter. Were you serious about that?
SL: Longer term, it’s a goal. Short term, we’re specializing in business and meetings. But we see that (a) beneficial conversation can occur at any time: it could actually occur in a hallway if you meet someone; it could actually occur at Starbucks.
I find that a variety of beneficial (data in) conversations are missed. I’d like to have Otter be present at any time and capture every thing. So, although, again, we’re specializing in the business use cases, this will be utilized in personal life as well.
Actually, I’m using Otter after I’m having a conversation with my sons. We are empty nesters: one in all my boys is in college; the opposite one is working in New York City. It’s really hard to come up with them now. I actually have to beg them to have a call with me! So, at any time when I actually have a call, I see that as very precious and I take advantage of Otter to capture it.
EM: Are you using Otter as a memory device to assist you to seek for things said in past meetings? Or for something else?
SL: It’s mostly memory. We created Otter AI chat. So, I can use Otter AI chat to question all my past meetings. Actually, you and I had a conversation on August 15 and, in an effort to prepare for this meeting, I reviewed our call to refresh my memory.
That was a gathering I used to be a part of. But, in our company, there are a whole bunch of meetings every week. Obviously, I cannot go to each one, but there’s a variety of information that’s beneficial that I might like to have. So I take advantage of Otter AI chat to question our company meeting database.
One good example is the calls our sales team have with our customers. I query the sales meetings every week to grasp higher what our customers are searching for, what their pain points are, what their problems are, and what their workflows are.
EM: On the topic of with the ability to see notes from meetings you didn’t participate in, there was a report that one Otter user by chance received a transcript of a conversation that took place after he’d left a gathering. How do you concentrate on user data security and privacy?
SL: We definitely take security very seriously. We totally understand that voice conversations are extremely sensitive and security is paramount, so we offer a variety of measures to guard user privacy. All the info is encrypted and now we have a strict access control system. This system is definitely not much different to Google Docs: the user controls who has access. If you by chance share it with people you didn’t need to share it with, you possibly can all the time remove their access. And there are different levels.
The incident you discussed, I wouldn’t say it’s AI specific. It’s actually a hot mic situation that may occur to anyone. In this particular situation, so far as we all know, after the meeting had finished some participants dropped off but other participants continued to speak without being aware that (the meeting) was still being captured on Otter and the notes were being shared with all of the participants. So that’s the way it happened.
In the sharing mechanism, we warn the user prematurely that ‘Hey, this note is being shared. So only discuss stuff you’re willing to share.’
We’ll definitely improve the product to make it more outstanding and more intuitive. But the user does must take some responsibility to make use of the tool appropriately.
EM: You worked at Google within the early 2000s and I read that you just were the designer of the blue dot that shows where we’re on Google Maps. Is that where you bought the concept to create an organization that may organise recorded information? Because, in Google Search, it’s still quite hard to go looking for information in audio or video clips.
SL: I worked on Google Maps and placement platform for 4 years between 2006 and 2010. I left Google in 2010 to construct a start-up in Palo Alto that may track mobile location after which analyse the info to supply personalised mobile services. After we sold that company, I realised that voice data could be very similar — within the sense that nearly all of voice data has never been captured.
I forget a variety of things and it’s really hard to go looking and recall information that has been heard. So we decided to work on this problem, to gather as much audio data as possible, and help people to resolve their memory problem.
It is a sharing problem. If you concentrate on enterprise, so many meetings are happening in each department but many of the meetings aren’t shared with people in other departments. So that creates a variety of information silos that make the enterprise less efficient and fewer productive.
EM: Otter was founded in 2016. What’s the fundraising environment like immediately? How does it compare to just a few years ago?
SL: We raised our last run in 2021 . . . it’s been greater than three and a half years. We have been super efficient. Because our users grow organically, we didn’t must spend an excessive amount of money acquiring more. And revenue is growing very rapidly.
So we haven’t had the urgency to boost a brand new round. But we’re seeing that the enterprise capital community is getting more energetic now — especially after the Federal Reserve cut the rate of interest. I see the sentiment is far more enthusiastic. You saw that with OpenAI doing a brand new round valuing them at greater than $150bn.
There are a variety of other start-ups getting a variety of latest funding. Many of them are really good AI firms. But the market is slightly bit frothy at this moment.
It somewhat resembles the web bubble era. Many of those firms will die and only those who have core AI technologies, that construct a novel business model, can survive. Many young start-ups don’t have their very own core AI technologies. They just call some third-party APIs (application programming interfaces) and construct a really thin wrapper over. Unless they construct some strong user or data model, they will easily be replicated by other firms.
We construct our own speech recognition technology. We construct a variety of proprietary AI technologies. And, you recognize, now we have processed over a billion meetings, so now we have an incredible amount of meeting data that may also help finetune and enhance the AI models we construct.
So now we have built an AI flywheel that we will leverage to proceed to grow rapidly. For AI start-ups to survive or thrive, they must construct their very own AI system, and so they must have huge amounts of information they will leverage.
EM: Are you concerned concerning the competition?
SL: There are already a variety of competitors. Obviously, we see competition from two directions. One is large tech from Microsoft, Zoom, Google and others. They control the video conferencing system. However, against them, now we have a variety of benefits. We are far more nimble. We’re far more agile . . . we’re platform agnostic. We not only support one video conference (platform), we support all of them. And we even have a very strong mobile app that individuals use for in-person meetings. None of the Big Tech (firms) actually concentrate on mobile, in-person meetings.
And the opposite direction is, in fact, there are a variety of other small start-ups. There are at the very least a dozen meeting assistant start-ups on the market. But none of them is as big as us. (We) have a much greater user base and a much greater data set than all the opposite start-ups.
Of course, latest start-ups are being born each day. We are watching the market and seeing what other start-ups are doing. We just must move super fast.
EM: How do you’re thinking that you possibly can preserve your area of interest?
SL: Our price could be very competitive. But that’s not crucial (thing). The most vital is product quality, product features, and the user experience.
(Take) Google for example — they’ve an infinite amount of money. They have 100 times more of certain sorts of engineers than us. But, when you take a look at Google in the previous few years, there’s no latest interesting product coming out. They just (don’t have) the proper product mindset. So that is why we’re not afraid of enormous tech. Our product is far more user friendly . . . the AI chat we’re providing permits you to query all of the meetings in your system. We still haven’t seen that from Google, Microsoft or Zoom, so we’re way ahead of them already.
In terms of pricing . . . many other start-ups who don’t own their very own AI model, (and) who must call third-party APIs to do speech recognition and other AI algorithms, must pay a much higher price to make use of that API. That really hurts their profit margin. So for us, we do have a bonus because we own a variety of models ourselves, and may keep our price low.
4hrsAverage time saving per week claimed by users of Otter.ai
EM: Are you focused on enterprise customers immediately? Or is the concentrate on expanding the entire variety of users?
SL: We support each. We have our freemium model that enables individual users to make use of Otter on their very own. Most of those users are skilled staff. And we leverage this huge user base to get into enterprises. This bottom up system could be very much like other successful SaaS (software-as-a-service) firms, like Dropbox or Slack. They have a variety of organic users who penetrated large enterprises. Then, later, they use that user base to aggregate them and create enterprise contracts.
EM: You had a really rapid increase in users throughout the pandemic. Has the pace of growth slowed since then?
SL: It continued to grow rapidly, especially this yr. Actually, it’s slightly bit slow in the summertime, when persons are taking vacations. But it seems late August, September, up to now, we’ve seen record growth. It’s each user growth and revenue growth. So, awareness of AI and overall AI adoption is getting stronger and stronger. More persons are realising AI can really help them.
EM: Finally, what do you say to potential business customers who is likely to be concerned about hallucinations or accuracy in terms of using a transcription AI service for sensitive meetings?
SL: We can construct our model and manage our model parameters to minimise hallucinations. It happens less and fewer often now. Of course, people do must double-check essential numbers and essential facts themselves. But the professionals definitely outweigh the cons.
We recently did a survey of greater than 600 skilled users of Otter. They say they save 4 hours every week, on average. So people can use those 4 hours to chill out and perhaps have more family time. Or to do so much more work. I believe that’s more beneficial and, perhaps, they will tolerate slightly little bit of hallucination.