Modelserve: Golem Network's New AI Inference Service
Golem Network has unveiled Modelserve, a new service aimed at providing scalable and affordable AI model inferences, according to a recent announcement by the Golem Project. This service is designed to allow seamless deployment and inference of AI models through scalable endpoints, enhancing the efficiency and cost-effectiveness of AI applications.
What Is Modelserve?
Modelserve, developed in collaboration with an external team and Golem Factory, integrates into the Golem Network ecosystem. It aims to support the AI open-source community and attract developers of AI applications for GPU providers. The service allows for the seamless deployment and inference of AI models through scalable endpoints, ensuring efficient and cost-effective AI apps operations.
Why Is Golem Network Introducing Modelserve?
The introduction of Modelserve aims to meet the growing demand for computing power in the AI industry. By leveraging consumer-grade GPU resources, which offer sufficient power and memory, the service can effectively run AI models such as diffusion models, automatic speech recognition, and small to medium language models. This approach is more cost-effective compared to traditional methods. The decentralized architecture of the Golem Network serves as a marketplace for matching supply and demand for these resources, enabling access to computing power that is perfectly suited to AI applications.
The addition of Modelserve to the Golem ecosystem plays a key role in getting AI use cases, driving demand for providers and contributing to the broader adoption of the Golem Network.
Target Audience
Modelserve is designed for a diverse range of users including service and product developers, startups, and companies operating in both Web 2.0 and Web 3.0 environments. These users typically:
- Utilize small and medium-sized open-source models or create their own models from scratch
- Require scalable AI model inference capabilities
- Seek an environment to test and experiment with AI models
Technical Implementation
Modelserve comprises three key components:
- Website: Allows users to create and manage endpoints
- Backend: Manages GPU resources to handle inferences, featuring a load balancer and auto-scaling capabilities. It leverages GPU resources available in the market, sourcing them from the Golem open and decentralized marketplace and other platforms offering GPU instances
- API: Enables the running of AI model inferences and management of endpoints
The service uses USD payments for user transactions, while settlements with Golem GPU providers are conducted using GLM, the native token of the Golem Network.
Benefits for Users
- Maintenance-Free AI Infrastructure (AI IaaS): Users do not need to manage model deployment, inference, or GPU clusters as Modelserve handles these tasks
- Affordable Autoscaling: The system automatically scales GPU resources to meet application demands without requiring user intervention
- Cost-Effective Pricing: Users are charged based on the actual processing time of their requests, avoiding the costs associated with hourly GPU rentals or maintaining their own clusters
Synergy with Other AI/GPU Projects
Modelserve integrates with GPU Provider and AI Provider GamerHash AI, which is currently in the proof-of-concept stage. Additionally, the first version of Golem-Workers has been created as part of Modelserve, which will be developed as a separate project in the future.
Milestones and Next Steps
- Beta tests have been conducted with several AI-based startups and companies
- The Golem Community Tests are scheduled for July
- Commercialization of the service is set to begin in August
For more detailed information, visit the Golem Project blog.