Baseten and Harvey Push Open Legal Agents with Post-Training Breakthrough
Harvey AI, in collaboration with Baseten Research, unveiled promising results from their efforts to enhance open-weight AI models for legal applications through post-training optimization. Using Harvey’s Legal Agent Benchmark (LAB) as a foundation, the research team demonstrated how tailored pipelines can narrow the performance gap between open-weight and closed-source AI models in high-stakes legal work.
LAB, a public framework for evaluating AI performance across 1,200 legal tasks spanning 24 practice areas, revealed that even top closed-source models fail to complete more than 10% of tasks end-to-end. Open-weight models typically lag further behind due to challenges in domain expertise, high computational costs, and governance limitations. Harvey and Baseten's approach aimed to address these issues by integrating post-training methods with LAB-driven feedback loops, raising the bar for open-weight model capabilities.
Key Challenges in Legal AI
Legal AI applications face unique hurdles. First, tasks demand deep domain expertise, including the ability to retrieve, analyze, and draft complex legal documents within a "closed-universe" of client-specific data. Second, computational costs are prohibitive; leading models on LAB cost roughly $50 per task with up to 20 minutes of latency, making them impractical for broad deployment. Finally, governance is critical in the legal field, where sensitive data requires secure and auditable AI processes. By post-training open-weight models on LAB datasets, Harvey and Baseten sought to reduce costs, improve interpretability, and enhance task performance.
Breakthrough Results
The team post-trained a 27-billion parameter open-weight model using LAB metrics and a custom harness designed for long-horizon legal work. The results were striking: the model achieved performance on par with some closed-source leaders, significantly improving task completion rates. For example, a reinforcement learning pass on Qwen3.5-9B led to a 20% increase in the task pass rate. The model also adopted advanced behavioral strategies seen in top performers, such as reading documents in full rather than relying on shortcuts like basic keyword searches.
Another innovation was the introduction of a natural-language compaction harness, which allows models to summarize and compress document context without losing critical information. This method boosted "all-pass" rates (where every criterion for a task is met) by 3.7x for GPT-5.5 and 2.6x for Claude Sonnet 4.6. However, the compaction strategy proved less effective for smaller open-weight models, which required additional post-training to optimize their use of the harness.
Baseten's Expanding Role
Baseten, an AI infrastructure company valued at $5 billion following a $300 million Series E funding round in January 2026, played a critical role in scaling Harvey’s research. Known for its expertise in AI inference infrastructure, Baseten provided the tools to optimize GPU usage and manage large-scale computations required for training and inference. Its growing influence in the enterprise AI sector was further highlighted by a recent partnership with Benchling to support biotech R&D workflows.
Baseten's support aligns with its broader strategy to enable organizations to deploy both proprietary and open models without infrastructure lock-in, a key differentiator in the high-growth AI inference market. The company's revenue has surged, reportedly reaching $600 million annually as of March 2026, driven by enterprise demand for scalable AI solutions.
Future Directions
Harvey and Baseten’s collaboration underscores the potential of combining open-weight models with domain-specific post-training and infrastructure support. Future research will focus on less lossy compaction techniques, such as compressing model key-value caches, and refining reinforcement learning approaches to close reasoning gaps in long-horizon tasks. As legal AI evolves, these advancements could make open-weight models viable alternatives to costly proprietary systems, democratizing access to advanced legal automation tools.
The implications extend beyond legal tech. By proving the viability of post-training pipelines in one of the most complex knowledge domains, Harvey and Baseten are opening doors for broader applications in finance, healthcare, and beyond.