Ask a structural question about how an agent worked, like "did it edit five times with no test?" or "did it submit without testing?", and get an exact answer instantly, over real trajectory datasets. No model call, deterministic, free. Why not just ask an LLM? →
When you shrink a trajectory set to a fixed budget, the common move is to keep the shortest runs. But short traces are the least diverse: you throw away procedural variety. procgrep selects for diversity instead, preserving far more of the dataset's distinct procedures at the same budget.
The same dataset, broken down by the model that produced each trajectory: how long its runs are, how much it repeats itself, and the mix of actions it favours.
Every public trajectory dataset procgrep discovered on Hugging Face, and what it can parse today. Most non-parseable entries aren't failures. They're benchmarks (task definitions, no trajectories), corpora, or out-of-scope domains.
| dataset | status | downloads | likes | updated |
|---|