OpenAI’s Sora: The devil is in the ‘details of the data’

by | Mar 14, 2024 | Technology

Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.

For OpenAI CTO Mira Murati, an exclusive Wall Street Journal interview with personal tech columnist Joanna Stern yesterday seemed like a slam-dunk. The clips of OpenAI’s Sora text-to-video model, which was shown off in a demo last month and Murati said could be available publicly in a few months, were “good enough to freak us out” but also adorable or benign enough to make us smile. That bull in a china shop that didn’t break anything! Awww.

But the interview hit the rim and bounced wildly at about 4:24, when Stern asked Murati what data was used to train Sora. Murati’s answer: “We used publicly available and licensed data.” But while she later confirmed that OpenAI used Shutterstock content (as part of their six-year training data agreement announced in July 2023), she struggled with Stern’s pointed asks about whether Sora was trained on YouTube, Facebook or Instagram videos.

‘I’m not going to go into the details of the data’

When asked about YouTube, Murati scrunched up her face and said “I’m actually not sure about that.” As for Facebook and Instagram? She rambled at first, saying that if the videos were publicly available, there “might be” but she was “not sure, not confident,” about it, finally shutting it down by saying “I’m just not going to go into the details of the data that was used — but it was publicly available or licensed data.”

I’m pretty sure many public relations folks did not consider the interview to be a PR masterpiece. And there was no chance that Murati would have provided details anyway — not with the copyright-related lawsuits, including the biggest filed by the New York Times, facing OpenAI right now. But whether or not you believe OpenAI used YouTube videos to train Sora (keep in mind, …

Article Attribution | Read More at Article Source

[mwai_chat context=”Let’s have a discussion about this article:nn
Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.

For OpenAI CTO Mira Murati, an exclusive Wall Street Journal interview with personal tech columnist Joanna Stern yesterday seemed like a slam-dunk. The clips of OpenAI’s Sora text-to-video model, which was shown off in a demo last month and Murati said could be available publicly in a few months, were “good enough to freak us out” but also adorable or benign enough to make us smile. That bull in a china shop that didn’t break anything! Awww.

But the interview hit the rim and bounced wildly at about 4:24, when Stern asked Murati what data was used to train Sora. Murati’s answer: “We used publicly available and licensed data.” But while she later confirmed that OpenAI used Shutterstock content (as part of their six-year training data agreement announced in July 2023), she struggled with Stern’s pointed asks about whether Sora was trained on YouTube, Facebook or Instagram videos.

‘I’m not going to go into the details of the data’

When asked about YouTube, Murati scrunched up her face and said “I’m actually not sure about that.” As for Facebook and Instagram? She rambled at first, saying that if the videos were publicly available, there “might be” but she was “not sure, not confident,” about it, finally shutting it down by saying “I’m just not going to go into the details of the data that was used — but it was publicly available or licensed data.”

I’m pretty sure many public relations folks did not consider the interview to be a PR masterpiece. And there was no chance that Murati would have provided details anyway — not with the copyright-related lawsuits, including the biggest filed by the New York Times, facing OpenAI right now. But whether or not you believe OpenAI used YouTube videos to train Sora (keep in mind, …nnDiscussion:nn” ai_name=”RocketNews AI: ” start_sentence=”Can I tell you more about this article?” text_input_placeholder=”Type ‘Yes'”]

Share This