Falcon 40B Soars to the Top of the Open LLM Leaderboard 🚀

In a recent remarkable development, the Falcon 40B parameter Large Language Model (LLM) has clinched the top spot on the coveted Open LLM Leaderboard. An accomplishment made more extraordinary by the fact that this is the first open model developed by the team behind Falcon 40B, making a significant statement in the field of AI.

The Open LLM Leaderboard, hosted by 🤗 Hugging Face, is a notable platform that ranks and evaluates LLMs and chatbots based on their performance in different evaluation tasks. The platform provides an invaluable opportunity for the open-source community to test their AI models on a unified framework developed by Eleuther AI utilizing Hugging Face GPU clusters. This leaderboard is considered the gold standard for evaluating open source LLMs and chatbots.

This achievement for Falcon 40B is a watershed moment, not just for its creators, but also for the entire AI community. The Falcon 40B model, developed by the UAE-based TII, has been released under an open-source Apache 2.0 license. This move signifies the UAE’s commitment to advancing the cause of open-source development and fostering global collaboration in the field of AI.

The Falcon 40B model was evaluated against four diverse benchmarks, specifically designed to test a variety of reasoning and general knowledge across multiple fields. These benchmarks include the AI2 Reasoning Challenge (ARC), HellaSwag, Multimodal Multitask Learning Understanding (MMLU), and TruthfulQA.

The Falcon 40B variant tiiuae/falcon-40b-instruct performed impressively across all benchmarks, scoring an average of 63.2%. It demonstrated commendable reasoning abilities by scoring 61.6% on the ARC, a test consisting of grade-school science questions. Its performance was even more impressive on the HellaSwag benchmark, a measure of commonsense inference, where it achieved an exceptional 84.4%.

Furthermore, Falcon 40B showcased its adept multitasking accuracy with a 54.1% score on the MMLU, a rigorous test that spans 57 tasks from diverse fields including elementary mathematics, US history, computer science, law, among others. On the TruthfulQA benchmark, a test of a model’s truthfulness in generating answers, it secured a noteworthy 52.5%.

Comparatively, another variant of Falcon 40B scored slightly lower, with an average of 60.4%, yet still held its own, especially in the HellaSwag benchmark where it excelled with 85.3%.

The Falcon 40B’s triumph underscores the potential of open-source development in the realm of AI, with the model outperforming other highly-regarded models like ausboss/llama-30b-supercot and llama-65b.

The achievement of the Falcon 40B parameter LLM has raised the bar for future language models and sets a precedent for open-source development in AI. It stands as a testament to the strength and capabilities of open-source models and marks a key milestone in the AI language modeling space.

Hugging Face Open LLM Leaderboard

Falcon 40B on huggingface.co

Related Startup News:

Share this Article:

Falcon 40B Soars to the Top of the Open LLM Leaderboard