Auto-generated PR for: eval include runner link
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Added support for passing the GitHub workflow run URL through the
evaluation pipeline for better tracking and visibility.
- **New Features**
- The workflow URL is now constructed and passed as a command-line
argument.
- The evaluation service accepts and stores the workflow URL for each
run.
<!-- End of auto-generated description by cubic. -->
- Update DOM element highlighting from *[index]* to <new>[index]</new>
- Update system prompts to reflect new <new> tag format
- Provides clearer semantic meaning for new elements in browser state
Clarify system prompt description of `<new>` tags to specify new
clickable elements.
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Updated the system prompt to clarify that elements tagged with <new> are
clickable elements that appeared since the last step, if the URL has not
changed.
<!-- End of auto-generated description by cubic. -->
Auto-generated PR for: remove git function helpers
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Removed the git-functions.sh script, which included helper functions for
automating branch creation, commits, pushes, and pull requests. This
cleans up unused shell helpers from the codebase.
<!-- End of auto-generated description by cubic. -->
- Update DOM element highlighting from *[index]* to <new>[index]</new>
- Update system prompts to reflect new <new> tag format
- Provides clearer semantic meaning for new elements in browser state
…ce.py
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Updated the model names for gemini-2.5-pro and gemini-2.5-flash to
remove preview suffixes and use the latest stable names.
<!-- End of auto-generated description by cubic. -->
- Introduce a max_history_items parameter to limit the memory of the model
- changed the system messages to have <sys> tag instead of <s> to avoid confusion with HTML
- Got rid of MessageMetadata, SupportedMessageTypes and implemented cleaner MessageManagerState
- Implemented a HistoryItem class to cleanly reconstruct agent history description
Auto-generated PR for branch: eval-4-core-runners
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Updated the evaluation workflow to use the new 'eval-4-core-500' runner
label instead of 'ubuntu-latest-16-core'.
<!-- End of auto-generated description by cubic. -->
Stopped initializing the file system with results.md and updated prompts
and tests to only use todo.md by default. Now, results.md is created
only for long tasks.
Also included a extract_links parameter in extract_content so that agent can find URLs or links
Auto-generated PR for branch: eval-test-new-runners
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Added a 3-minute timeout to the comprehensive judge evaluation to
prevent long-running tasks.
<!-- End of auto-generated description by cubic. -->