On benchmark measures cited by Anthropic, Sonnet 4.5 scored highest on the SWE-bench Verified evaluation (an evaluation of real-world ... including “Agent Mode” in Excel and Word and an “Office Agent” ...