Passing code execution as a tool is more powerful than granular tools. You combine multiple tools and tool calls into one. You move code to the data rather than the other way around. Mostly, you need bash, Python (or Jupyter), file manager, web browser.
UI: Go where the user is, instead of bringing them to you.
A remote runtime is a critical component.
Claude 3.5 Sonnet (20241022) and Claude 3.5 Haiku (20241022) perform best on SWE Bench, followed by Deepseek V3, then O1 2024-12-17. X