News Score: Score the News, Sort the News, Rewrite the Headlines

Task-free intelligence testing of LLMs (Part 1)

Introduction I recently wrote about the apparently narrow focus of LLM evaluation on "task based" testing. The typical eval has a set of tasks, questions, problems, etc that need to be solved or answered, and a model is scored based on how many it answers correctly. Such tests are geared towards measuring an input/output system, or a "function approximator" which is great for confirming that LLMs can learn any task but limited in probing the nature of intelligence. I'm interested in interactions...

Read more at marble.onl

© News Score  score the news, sort the news, rewrite the headlines