Open source models race to beat GPT-4 on coding tasks

August 30, 2023

Two open source models, WizardCoder 34B by Wizard LM and CodeLlama-34B by Phind, have been released in the last few days. Both models are based on Code Llama, a large language model (LLM) developed by Meta.

Wizard LM claims that WizardCoder 34B outperformed GPT-4, ChatGPT-3.5, and Claude-2 on HumanEval, a benchmark for evaluating the coding abilities of LLMs. However, it appears that Wizard LM compared WizardCoder 34B’s score to the HumanEval rating of GPT-4’s March version, rather than the August version, where GPT-4 achieved an 82%.

Phind also claims that their fine-tuned versions, CodeLlama-34B and CodeLlama-34B-Python, achieved pass rates of 67.6% and 69.5% on HumanEval, respectively. These numbers are almost equivalent to GPT-4’s.

The open source community is said to be obsessed with beating GPT-4, which is considered to be the ultimate benchmark for LLMs. Meta on its own is creating models meant for specific tasks, and they are trying to surpass GPT-4 in those particular tasks.

HumanEval benchmark may not be a perfect measure of the coding abilities of LLMs. Factors like code explanation, docstring generation, code infilling, SO questions, and writing tests are not captured by HumanEval.

OpenAI on its own has not released any details about the training data or evaluation metrics used for GPT-4. This has led some to speculate that OpenAI is holding back its trade secrets in order to maintain its lead in the LLM market.

The sources for this piece include an article in AnalyticsIndiaMag.

Top Stories

Related Articles

December 23, 2025 Editor's Notes: This is the first of two articles reflecting on the year but Yogi Schulz. Schulz' more...

December 23, 2025 Google parent company Alphabet said Monday that it will acquire Intersect Power for $4.75 billion in cash more...

December 22, 2025 Artificial intelligence dominated global search behaviour in 2025, with Google’s own AI assistant, Gemini, emerging as the more...

December 22, 2025 OpenAI has hired the former head of Shopify’s core product organization to lead its next phase of more...

Jim Love

Jim is an author and podcast host with over 40 years in technology.

Share:
Facebook
Twitter
LinkedIn