paper-conference

MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step …
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs