Today, the Paris-based mistralthe AI ​​startup that raised the biggest seed round in European history a 12 months ago and has since turn into a rising star in the worldwide AI space, marked its entry into the programming and development space with the launch of Codestral, its first-ever code-centric Large Language Model (LLM).
Available today under a non-commercial license, Codestral is a 22 billion parameter, open-weighted generative AI model specialized in coding tasks from generation to completion.
According to Mistral, the model focuses on greater than 80 programming languages, making it a really perfect tool for software developers seeking to design advanced AI applications.
The company claims that Codestral already outperforms previous models designed for coding tasks similar to CodeLlama 70B and Deepseek Coder 33B and is utilized by several industry partners similar to JetBrains, SourceGraph and LlamaIndex.
A robust model for all the pieces related to coding
Codestral 22B has a context length of 32K at its core and provides developers with the flexibility to jot down and interact with code in several coding environments and projects.
The model was trained on a dataset of greater than 80 programming languages, making it suitable for a wide selection of programming tasks, including generating code from scratch, completing coding functions, writing tests, and completing partial codes using a fill-in-the-middle mechanism. The programming languages ​​it covers include popular languages ​​similar to SQL, Python, Java, C, and C++, in addition to more specific languages ​​similar to Swift and Fortran.
mistral says Codestral might help developers improve their programming skills, speed up workflows, and save loads of effort and time when constructing applications. Not to say, it could also help reduce the danger of errors and bugs.
Although the model has just been released and has yet to be publicly tested, Mistral claims that it already performs higher than existing code-centric models on most programming languages, including CodeLlama 70B, Deepseek Coder 33B and Llama 3 70B.
On RepoBench, designed to judge Python code completion over the long run on the repository level, Codestral outperformed all three models with an accuracy rating of 34%. The model also outperformed the competition on HumanEval for evaluating Python code generation and CruxEval for testing Python output prediction with scores of 81.1% and 51.3%, respectively. It even outperformed the models on HumanEval for Bash, Java, and PHP.
In particular, the model's performance on HumanEval for C++, C and Typescript was not the perfect, but the common rating of all tests combined was the very best at 61.5%, just ahead of Llama 3 70B at 61.2%. It ranked second within the Spider SQL performance evaluation with a rating of 63.5%.
Several popular developer productivity and AI application development tools have already began testing Codestral. These include big names like LlamaIndex, LangChain, Continue.dev, Tabnine, and JetBrains.
“From our initial testing, it's an ideal option for code generation workflows since it's fast, has an inexpensive context window, and the Instruct version supports tool usage. We tested self-correcting code generation with LangGraph through the use of the Instruct tool Codestral for output, and it worked very well out of the box,” said Harrison Chase, CEO and co-founder of LangChain, in an announcement.
How do I start with Codestral?
Mistral offers Codestral 22B on Hugging Face under its own non-production license, allowing developers to make use of the technology for non-commercial purposes, testing and to support research.
The company also makes the model available through two API endpoints: codestral.mistral.ai and api.mistral.ai.
The former is for users who need to use Codestral's Instruct or Fill-In-the-Middle routes of their IDE. It comes with an API key managed on a private level without the same old organizational rate limits, and is free to make use of during an eight-week beta period. The latter is the same old endpoint for more comprehensive research, batch queries, or third-party application development, with queries charged per token.
In addition, interested developers can test Codestral’s capabilities by chatting with a guided version of the model on CatMistral's free conversational interface.
By introducing Codestral, Mistral offers enterprise researchers one other notable option for accelerating software development. However, it stays to be seen how the model compares to other code-centric models available on the market, including the recently launched StarCoder2 in addition to offerings from OpenAI and Amazon.
The former offers Codex, which powers GitHub's Co-Pilot service, while the latter has the CodeWhisper tool. ChatGPT was also utilized by programmers as a coding tool, and the corporate's GPT-4 Turbo model powers Devin, Cognition's semi-autonomous coding agent service.
There can also be strong competition from Replit, which is a some small AI coding models about Hugging Face and Codenium, which recently secured $65 million in Series B funding at a $500 million valuation.