{"id":40974,"date":"2023-08-30T08:43:35","date_gmt":"2023-08-30T12:43:35","guid":{"rendered":"https:\/\/www.technewsday.com\/?p=40974"},"modified":"2023-08-31T09:20:16","modified_gmt":"2023-08-31T13:20:16","slug":"open-source-models-race-to-beat-gpt-4-on-coding-tasks","status":"publish","type":"post","link":"https:\/\/technewsday.com\/staging\/open-source-models-race-to-beat-gpt-4-on-coding-tasks\/","title":{"rendered":"Open source models race to beat GPT-4 on coding tasks"},"content":{"rendered":"<p data-ar-index=\"1\">Two open source models, WizardCoder 34B by Wizard LM and CodeLlama-34B by Phind, have been released in the last few days. Both models are based on Code Llama, a large language model (LLM) developed by Meta.<\/p>\n<p data-ar-index=\"2\">Wizard LM claims that WizardCoder 34B outperformed GPT-4, ChatGPT-3.5, and Claude-2 on HumanEval, a benchmark for evaluating the coding abilities of LLMs. However, it appears that Wizard LM compared WizardCoder 34B&#8217;s score to the HumanEval rating of GPT-4&#8217;s March version, rather than the August version, where GPT-4 achieved an 82%.<\/p>\n<p data-ar-index=\"3\">Phind also claims that their fine-tuned versions, CodeLlama-34B and CodeLlama-34B-Python, achieved pass rates of 67.6% and 69.5% on HumanEval, respectively. These numbers are almost equivalent to GPT-4&#8217;s.<\/p>\n<p data-ar-index=\"4\">The open source community is said to be obsessed with beating GPT-4, which is considered to be the ultimate benchmark for LLMs. Meta on its own is creating models meant for specific tasks, and they are trying to surpass GPT-4 in those particular tasks.<\/p>\n<p data-ar-index=\"5\">HumanEval benchmark may not be a perfect measure of the coding abilities of LLMs. Factors like code explanation, docstring generation, code infilling, SO questions, and writing tests are not captured by HumanEval.<\/p>\n<p data-ar-index=\"6\">OpenAI on its own has not released any details about the training data or evaluation metrics used for GPT-4. This has led some to speculate that OpenAI is holding back its trade secrets in order to maintain its lead in the LLM market.<\/p>\n<p data-ar-index=\"7\">The sources for this piece include an article in <a href=\"https:\/\/analyticsindiamag.com\/code-llamas-fight-over-gpt-4\/\" target=\"_blank\" rel=\"noopener\">AnalyticsIndiaMag<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Two open source models, WizardCoder 34B by Wizard LM and CodeLlama-34B by Phind, have been released in the last few days. Both models are based on Code Llama, a large language model (LLM) developed by Meta. Wizard LM claims that WizardCoder 34B outperformed GPT-4, ChatGPT-3.5, and Claude-2 on HumanEval, a benchmark for evaluating the coding [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[34],"tags":[577],"class_list":["post-40974","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","tag-development"],"acf":[],"_links":{"self":[{"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/posts\/40974","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/comments?post=40974"}],"version-history":[{"count":2,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/posts\/40974\/revisions"}],"predecessor-version":[{"id":40976,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/posts\/40974\/revisions\/40976"}],"wp:attachment":[{"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/media?parent=40974"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/categories?post=40974"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/technewsday.com\/staging\/wp-json\/wp\/v2\/tags?post=40974"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}