GLM-5.1 Trained Into Top Legal AI Model, Outperforms GPT-5.5
Harvey.ai, in collaboration with Applied Compute, has transformed Z.ai’s GLM-5.1 foundational model into the most capable legal AI to date, surpassing GPT-5.5 xhigh and Opus 4.8 Max on Harvey’s proprietary Legal Agent Benchmark (LAB) by rubric pass rate. The fine-tuned model now achieves a 91.3% rubric pass rate, up from GLM-5.1’s base 85.3%, cementing its status as a leader in legal-specific AI applications.
The project builds on GLM-5.1’s already strong technical foundation. Released in April 2026, GLM-5.1 is a 754B-parameter Mixture-of-Experts model designed for long-context reasoning and agentic execution, with a standard context window of 200,000 tokens and up to 1M tokens in specialized deployments. While initially a general-purpose model, its flexibility has proven critical for legal-domain adaptation.
Post-Training Drives Performance Gains
Harvey.ai and Applied Compute optimized GLM-5.1 across several dimensions to achieve its new legal performance milestone:
- Grader Alignment: Applied Compute refined grading systems to ensure reliable feedback during training, using GPT-5 Mini as a cost-effective yet accurate grader aligned with frontier models like GPT-5.5 xhigh and Opus 4.8 Max.
- Harness Optimization: By improving the model’s tools and environment—such as restricting inefficient tool calls, enhancing prompts, and introducing token compaction—researchers enabled more precise and efficient workflows.
- Reinforcement Learning: Full-parameter training on Applied Compute’s cloud platform allowed GLM-5.1 to not only exceed its initial capabilities but also outperform its specialized competitors in key metrics.
The result was significant. Rubric pass rate increased to 91.3%, while all-pass rate—more stringent as it requires success across multiple consecutive criteria—rose from 5.9% to 12.6%, approaching Opus 4.8 Max's 13.2% threshold.
Implications for LegalTech
While GLM-5.1’s performance marks a breakthrough, it’s noteworthy that the model is still a general-purpose LLM. Historically, domain-specific models like LegalBERT or Lawformer have held advantages in legal NLP tasks due to their tailored vocabularies and pretrained datasets. However, GLM-5.1’s extended context capabilities and tool integration make it particularly suited for complex legal workflows, such as contract review, multi-document analysis, and long-horizon research.
Harvey.ai’s advancements suggest that general-purpose models, when fine-tuned with domain-specific benchmarks, can rival—and in some cases surpass—specialized systems. This raises the bar for both general and domain-specific legal AI providers as firms increasingly demand high-context, scalable solutions.
Looking Ahead
Harvey.ai believes there is still room for improvement. Future enhancements may include relevance-masked self-distillation to reduce hallucination rates and agentic router training for optimizing cost-to-quality ratios. These innovations could push GLM-5.1 closer to Opus 4.8 Max on all-pass rate while further refining its legal reasoning capabilities.
As legal AI adoption accelerates, the success of GLM-5.1 highlights the growing potential of large-context, foundation models in specialized fields. The next frontier may not be defined by raw parameter size but by strategic fine-tuning and domain alignment, as demonstrated by this collaboration.