{"id":44222,"date":"2024-02-19T20:36:45","date_gmt":"2024-02-20T01:36:45","guid":{"rendered":"https:\/\/www.technewsday.com\/?p=44222"},"modified":"2024-02-19T20:36:45","modified_gmt":"2024-02-20T01:36:45","slug":"openai-sora-launch-leads-to-industry-debate","status":"publish","type":"post","link":"https:\/\/technewsday.com\/staging\/openai-sora-launch-leads-to-industry-debate\/","title":{"rendered":"OpenAI Sora launch leads to industry debate"},"content":{"rendered":"<p>OpenAI&#8217;s introduction of Sora, its first video-generation model, lauched last week with a a series of one minute text-to-video samples that were generally regarded as simply astonishing. Not only were they naturalistic, they didn&#8217;t have any of the flaws that have limited even the best video production done to date using AI.<\/p>\n<p>Despite the public acclaim, the underly architecture and appraoch has sparked a significant debate among AI experts and researchers, particularly from competing companies like Meta and Google. The critique centers around Sora&#8217;s understanding of physical laws and its comparison with other AI models designed for video synthesis and analysis. Here are the key points from the discussion:<\/p>\n<p>Competitors have critiqued Sora for its perceived lack of understanding of the physical world. Yann LeCun of Meta emphasized that generating realistic-looking videos does not equate to understanding physical reality, highlighting the distinction between generation and causal prediction.<\/p>\n<p>The debate also contrasts Sora with Meta&#8217;s V-JEPA (Video Joint Embedding Predictive Architecture), which focuses on analyzing interactions between objects in videos. This comparison aims to showcase V-JEPA&#8217;s superiority in making predictions based on object interactions over Sora&#8217;s generative approach.<\/p>\n<p>Elon Musk and other experts have expressed skepticism about Sora&#8217;s ability to predict accurate physics, suggesting that models like Tesla\u2019s video-generation capabilities might be more advanced in this regard.<\/p>\n<p>Despite the criticism, OpenAI and researchers like NVIDIA&#8217;s Jim Fan defend Sora&#8217;s approach, arguing that the model learns an implicit physics engine through extensive video data analysis. This approach is likened to a data-driven physics engine or learnable simulator, challenging the reductionist critique that the model merely manipulates pixels without understanding physics.<\/p>\n<p>OpenAI acknowledges Sora&#8217;s limitations in accurately simulating complex physical interactions and spatial details. However, the model is seen as a significant step towards more advanced video generation capabilities, likened to the &#8220;GPT-3 moment&#8221; for video. The acquisition of Global Illumination and the release of Sora highlight the potential to revolutionize video generation and simulation-model platforms, with promising implications for the video game industry and beyond.<\/p>\n<p>This debate underscores the complex challenges in developing AI models that not only generate realistic content but also grasp the underlying physical principles, marking a critical juncture in the evolution of generative AI and its applications.<\/p>\n<p>Sources include: <a href=\"http:\/\/OpenAI Sora Ignites Physics Debate   OpenAI\u2019s newest text-to-video generation model has invited scrutiny by rival big tech companies by Vandana Nair          Design by Nikhil Kumar    Last week, the internet went berserk with OpenAI\u2019s first video-generation model Sora. However, at the same time, a flurry of AI experts and researchers from competitor companies were quick to dissect and criticise Sora\u2019s transformer model, igniting a physics debate of sorts.   AI scientist Gary Marcus was one among the many to criticise not just the accuracy of the videos generated by Sora, but also the generative AI model used for video synthesis.     Competitors Unite In a move that seemed to undermine Sora\u2019s diffusion model structure, Meta and Google dissed the model\u2019s understanding of the physical world.   Meta chief Yann LeCun said, \u201cThe generation of mostly realistic-looking videos from prompts does not indicate that a system understands the physical world. Generation is very different from causal prediction from a world model. The space of plausible videos is very large, and a video generation system merely needs to produce one sample to succeed.\u201d   LeCun explains further in order to differentiate Sora from Meta\u2019s latest AI model offering, V-JEPA (Video Joint Embedding Predictive Architecture), a model that analyses interactions between objects in videos. He said, \u201cThat is the whole point behind the JEPA (Joint Embedding Predictive Architecture), which is not generative and makes predictions in representation space\u201d \u2013 a push to make V-JEPA\u2019s self-supervised model seem superior to Sora\u2019s diffusion transformer model.   Researcher and entrepreneur Eric Xing chimed in to support LeCun\u2019s views. \u201cAn agent model that can reason based on understanding must go beyond LLMs or DMs,\u201d he said.  The timing of the Gemini Pro 1.5 announcement couldn\u2019t have been better. The videos generated by Sora were made to run on Gemini 1.5 Pro, where the model critiqued the inconsistencies in the video suggesting that \u201cit is not a real-life scene\u201d.   Elon Musk was not far behind. He called Tesla\u2019s video-generation capabilities superior to OpenAI\u2019s with respect to predicting accurate physics.    Source: X  While the experts have been quick to dismiss the generative model\u2019s capabilities, the understanding of the \u2018physics\u2019 behind the model has been overlooked.    The Physics of Things Sora uses a transformer architecture similar to GPT models, and OpenAI believes that the foundation will \u2018understand and simulate the real world\u2019, which will help towards achieving AGI. Though not called a physics engine, it is possible that Unreal Engine 5\u2019s generated data may have been used to train Sora\u2019s underlying model.   Senior research scientist at NVIDIA, Jim Fan, clarified OpenAI\u2019s Sora model by explaining a data-driven physics engine. \u201cSora learns a physics engine implicitly in the neural parameters by gradient descent through massive amounts of videos,\u201d he said, referring to Sora as a learnable simulator or world model.  Fan also expressed his disapproval of Sora\u2019s reductionist views. \u201cI see some vocal objections: \u2018Sora is not learning physics, it\u2019s just manipulating pixels in 2D\u2019. I respectfully disagree with this reductionist view. It\u2019s similar to saying, \u2018GPT-4 doesn\u2019t learn coding, it\u2019s just sampling strings\u2019. Well, what transformers do is just manipulate a sequence of integers (token IDs). What neural networks do is just manipulating floating numbers. That\u2019s not the right argument,\u201d he said.   Sora is at the GPT-3 Moment  Perplexity founder Aravind Srinivas, who has been vocal on social media of late, also spoke in support of LeCun. \u201cReality is Sora, while being amazing, is still not ready yet to model physics accurately,\u201d he said.    Interestingly, OpenAI themselves have called out the limitations of the model before anyone could point them out. The company blog states that Sora may struggle with accurately simulating the physics of a complex scene, where it may not understand specific instances of cause and effect. It can also get confused with spatial details of a prompt, such as following a specific camera trajectory, and more.   Fan has also likened Sora with the \u2018GPT-3 moment\u2019 in 2020, when the model required \u2018heavy prompting and babysitting\u2019. However, it was the \u2018first compelling demonstration of in-context learning as an emergent property\u2019.  The current limitations do not cloud the quality of output generated. When OpenAI acquired Global Illumination, a digital product company that created open-source game Biomes (that resembles Minecraft) in August last year, the scope of video generation and building simulation-model platforms via auto agents were some of the speculations.   Now, with the release of Sora, the possibilities to disrupt the video game industry has only escalated. If Sora is at the GPT-3 moment, GPT-4 of the model will be incomprehensible. Until then, sceptics will continue to debate and probably teach one another a thing or two.\" target=\"_blank\" rel=\"noopener\">Analytics India<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI&#8217;s introduction of Sora, its first video-generation model, lauched last week with a a series of one minute text-to-video samples that were generally regarded as simply astonishing. Not only were they naturalistic, they didn&#8217;t have any of the flaws that have limited even the best video production done to date using AI. Despite the public [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":44223,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[34,21,9,215],"tags":[772,1300,275],"class_list":["post-44222","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-emerging-tech","category-todays-news","category-top-stories","tag-openai","tag-sora","tag-top-story"],"acf":[],"_links":{"self":[{"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/posts\/44222","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/comments?post=44222"}],"version-history":[{"count":1,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/posts\/44222\/revisions"}],"predecessor-version":[{"id":44224,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/posts\/44222\/revisions\/44224"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/media\/44223"}],"wp:attachment":[{"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/media?parent=44222"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/categories?post=44222"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/tags?post=44222"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}