This is a UC news story, published by TechCrunch, that relates primarily to Chatbot Arena news.
For more UC news, you can click here:
more UC newsFor more Chatbot Arena news, you can click here:
more Chatbot Arena newsFor more Ai research news, you can click here:
more Ai research newsFor more news from TechCrunch, you can click here:
more news from TechCrunchOtherweb, Inc is a public benefit corporation, dedicated to improving the quality of news people consume. We are non-partisan, junk-free, and ad-free. We use artificial intelligence (AI) to remove junk from your news feed, and allow you to select the best tech news, business news, entertainment news, and much more. If you like this article about Ai research, you might also like this article about
big powerhouse chatbots. We are dedicated to bringing you the highest-quality news, junk-free and ad-free, about your favorite topics. Please come every day to read the latest AI chatbot news, Chatbot Arena users news, news about Ai research, and other high-quality news about any topic that interests you. We are working hard to create the best news aggregator on the web, and to put you in control of your news feed - whether you choose to read the latest news through our website, our news app, or our daily newsletter - all free!
Chatbot ArenaTechCrunch
•89% Informative
Chatbot Arena is a crowdsourced AI benchmarking tool that lets users test AI models.
It lets anyone on the web ask a question (or questions) of two randomly-selected, anonymous models.
LMSYS is a non-profit run by students and faculty at Carnegie Mellon , UC Berkeley’s SkyLab and UC San Diego .
In March , the organization released a data set containing a million conversations between users and models.
Chatbot Arena is framed as an empirical test, but it amounts to a relative rating of models.
The voting isn’t accounting for people’s ability to spot hallucinations from models, nor differences in their preferences.
LMSYS is trying to balance out these biases by using automated systems — MT-Bench and Arena-Hard-Auto — that use models themselves to rank the quality of responses from other models.
LMSYS and Chatbot Arena provide real-time insights into how different AI models perform outside the lab.
The platform is sponsored in part by organizations, one of which is a VC firm, with horses in the AI race.
Google ’s Gemini and Mistral AI models are among those being tested on the platform.
VR Score
91
Informative language
91
Neutral language
68
Article tone
formal
Language
English
Language complexity
59
Offensive language
not offensive
Hate speech
not hateful
Attention-grabbing headline
not detected
Known propaganda techniques
not detected
Time-value
long-living
External references
18
Source diversity
14
Affiliate links
no affiliate links