BETA
This is a BETA experience. You may opt-out by clicking here
Edit Story

Video Editing App Captions Just Raised $25 Million To Bring AI To Creators

Following

The video editing software, designed for “talking videos,” has been used by 3 million creators to date and stands at a $250 million valuation.


In a short video, a man explains how to make fajitas in an airfryer. The video, being played on AI-based video editing app Captions, is automatically generating subtitles in a bold font. Gaurav Misra, CEO and cofounder of Captions, then proceeds to show how the app’s translation tool can be used to dub the entire video into another language, Hindi. Through a series of taps and toggles, he demos the app’s features that can automatically adjust the audio volume, color of the background, remove certain words and add transitions.

This demo, says Misra, highlights how his company makes it easier for video creators to reach a wider audience. In addition, the video-editing startup announced Thursday that it has raised $25 million in a Series B round led by Silicon Valley VC firm Kleiner Perkins with participation from Sequoia Capital, Andreessen Horowitz and SV Angel. The fresh injection of cash brings the startup's valuation to $250 million and total funds raised to $40 million. Kleiner Perkins has been “bullish” on the video communication space, says Everett Randle, a partner at the firm, which has previously led funding rounds into AI video startup Synthesia and video recording platform Loom. “Gaurav came to us with what he thought was a fair valuation for the business incorporating its traction, profitability, and vision while maintaining lots of upside for investors, and we agreed with him,” Randle says.

Captions has its origins in Misra’s time leading the design engineering team at Snap Inc. from 2016 till 2021. During this time, he witnessed an evolution of social media videos — from TikTok-style dance videos to Instagram Reels to YouTube Shorts. He also saw the rise of a new category; “talking videos,” in which creators address the camera directly, were gaining traction. In 2020, Misra left Snap and joined his former colleague Dwight Churchill, who left Goldman Sachs, to cofound Captions.

Since then, about 3 million creators have used the app to automatically caption and edit videos, in categories as widespread as golf, real estate and aviation Misra says. The app has about 100,000 daily active users and about a million videos are created on the platform each month.

It’s not, however, in the market alone. The New York-based startup is up against more established companies like Bytedance-owned editing app CapCut, which reportedly reached 200 million active users, and Adobe, which has rolled out its own generative AI features under the umbrella of Firefly. Other AI-based video and audio editing startups like Descript have turned up in recent years, garnering millions of dollars in funding from VCs.

Misra says Captions’ approach to video editing software is different because its tools are designed for specifically editing talking videos. “Most video production editing is focused more on aesthetics like filters and colors, whereas our focus became more about conveying an idea or experience,” Misra told Forbes.

For $10 a month, the app offers a cluster of generative AI-based features encompassing the different stages of video production like recording, editing and distribution. While most features are built on open source models, some are built by Captions’ team of 16, Misra says. The app’s AI script writer feature lets creators use ChatGPT to write a script for their video and OpenAI’s speech-to-text tool Whisper to caption their audio. It offers an in-house voice cloning tool trained on licensed audio recordings to translate users’ audio into 28 other languages or use an AI voiceover to narrate the content from scratch. To reduce the chances of misuse, creators can only change the language of the audio rather than insert or create a new audio recording for an imported video, Misra says, acknowledging the risks of users using the software to create deepfakes.

Other features let users automatically zoom in and out, detect and remove filler words and offensive words and adjust the sound level of the background audio of a video. Captions also uses an AI eye correction tool, originally developed by Nvidia for potential application in Zoom, to adjust the eyes of users to make it look like they are looking at the camera.

With the new infusion of capital, the startup plans to expand its team and develop existing features likes its AI music feature that creates background instrumental music by automatically rearranging pre-recorded musical instruments. Adding on more features, says Misra, will make it even easier for content creators to compete on a level playing field against better-resourced competitors.

“Our goal is to bring these technologies to everyday people,” Misra says. “Half the battle is the technology.”

Follow me on Twitter or LinkedInSend me a secure tip