Hugging Face partners ServiceNow for StarCoder and StarCoderBase

May 15, 2023

Hugging Face and ServiceNow’s BigCode partnership is making great progress in the development of large programming language models (LLMs) with the development of StarCoder and StarCoderBase, with an emphasis on ethical principles.

StarCoder and StarCoderBase were developed in collaboration with GitHub and trained on its freely licensed data set, which includes over 80 programming languages, Git commits, GitHub problems, and Jupyter notebooks.

StarCoder was trained with 1 trillion tokens and has a 8,192-token context window. It creates realistic code and works with a variety of programming languages. It is distributed under the OpenRAIL-M license, which places legal restrictions on its usage and modification. Furthermore, like other LLMs, StarCoder has the potential to generate inaccurate or biased information, and it is critical to recognize these limitations and strive toward overcoming them.

While the StarCoderBase model surpasses other open Code LLMs in numerous prominent programming benchmarks, it is on par with, if not better than, closed models like as OpenAI’s code-Cushman-001. Its context length, which exceeds 8,000 tokens, enables it to process more input than any other open LLM now available.

The researchers also disclosed OpenRAIL license of the model’s code, which includes intermediate checkpoints. Furthermore, all training and preprocessing code is released under the Apache 2.0 license. A thorough framework for testing computer programs, a new dataset for training and assessing PII-removal methods, and a tool to identify the source of the produced code inside the dataset are among the additional materials made accessible.

The sources for this piece include an article in MarkTechPost.

Top Stories

Related Articles

November 23, 2023 A recent trend has emerged among some big-box retailers, including Canadian Tire and Walmart, as they move more...

November 8, 2023 Consumer Financial Protection Bureau (CFPB) has proposed new regulations for tech giants operating in the digital payments more...

October 24, 2023 Next month will mark an independence day of sorts for IT infrastructure provider Kyndryl Inc. In 2021, more...

October 19, 2023 The European Central Bank (ECB) has announced a two-year "preparation phase" that will begin on November 1. more...

Jim Love

Jim is an author and podcast host with over 40 years in technology.

Share:
Facebook
Twitter
LinkedIn