At Cypher 2024, Vivek Raghavan, Co-Founder of Sarvam AI, presented the revolutionary role of voice-based AI agents in transforming engagement at scale. Raghavan discussed Sarvam AI’s mission to make generative AI accessible for billions of users, particularly in India, by focusing on small models, voice-led interactions, and organizational sovereignty. The session emphasized the potential of voice-based AI in diverse applications, including customer service, commercial transactions, and multilingual support.
Core Concepts
Small Models for Scalability
- While large models like GPT are powerful, Raghavan emphasized that for tasks requiring repetitive or frequent use at scale, small models are more efficient.
- Small models offer lower latency, consume less energy, and can handle high-frequency tasks, making them ideal for daily interactions at a population scale.
Voice-Led AI for India
- India is a country where voice-based communication is integral, especially across diverse regional languages.
- Sarvam AI has developed voice agents capable of understanding and responding in 10 Indian languages, allowing businesses to reach a wider audience with seamless voice interactions.
- These agents are designed to work across channels, including phone calls, WhatsApp, and in-app interfaces.
Organizational Sovereignty
- Raghavan highlighted the importance of organizational sovereignty, where businesses can maintain control over their data, compute, and AI models.
- By using small models, organizations can deploy AI systems within their own infrastructure, ensuring data privacy and reducing dependency on third-party AI services.
Challenges and Solutions
Creating Scalable Voice Agents
- Traditional generative AI solutions often focus on text-based interactions. Sarvam AI is building solutions that use voice as the primary mode of interaction.
- These voice agents are multilingual by default, supporting seamless transitions between Indian languages in a single conversation.
Reducing Costs with Small Models
- The cost of deploying large AI models for voice-based interactions can be prohibitive, especially for high-frequency tasks.
- Sarvam AI’s focus on small, task-specific models allows businesses to deploy voice-based agents at a cost as low as ₹1 per minute, making AI-powered interactions affordable at scale.
Multimodal AI Interactions
- Sarvam AI’s voice agents support multimodal interactions, combining voice, text, images, and even UI elements within platforms like WhatsApp.
- This allows businesses to offer complex, interactive experiences, such as product exploration or transaction completion, using voice commands augmented with visual and text-based content.
Implementation Insights
Voice AI Across Multiple Channels
- Sarvam AI’s voice agents can be deployed across phone calls, WhatsApp, and apps, ensuring that businesses can engage with users through their preferred channels.
- These agents can conduct tasks ranging from product recommendations to booking appointments, handling voice and text inputs interchangeably.
Orchestrating Multiple AI Models
- Sarvam’s architecture involves orchestrating multiple AI models that handle speech recognition, translation, and response generation.
- This orchestration ensures that voice-based interactions remain fast and contextually accurate, delivering real-time responses without the need for large, slow models.
Low-Code Agent Deployment
- Sarvam AI provides a low-code platform that allows businesses to create and deploy voice agents with minimal technical expertise.
- Businesses can create fully functional voice agents in just 15 minutes, with the option to deploy them in 10 languages, reducing the time and complexity of AI integration.
Industry Impact
Widespread Adoption of Voice-Based AI
- Voice agents are making it easier for businesses to engage with customers who prefer verbal interactions, especially in regions with lower literacy rates or where digital literacy is a barrier.
- The ability to deploy AI agents that speak local languages opens up new opportunities for businesses to connect with rural and regional populations.
Commercial Transactions Through Voice
- Sarvam AI’s agents can facilitate commercial transactions, including payments, through voice interfaces, transforming how businesses conduct sales and customer service.
- This capability is particularly useful in sectors like e-commerce, healthcare, and customer support, where fast, frictionless interactions are critical.
Future of Sovereign AI
- As businesses become more concerned about data privacy, the concept of sovereign AI—where organizations maintain full control over their data and AI models—is gaining traction.
- Sarvam AI’s solutions align with this trend, offering businesses the ability to own and control their AI infrastructure while maintaining high efficiency.
Conclusion
- Vivek Raghavan’s session at Cypher 2024 highlighted the transformative potential of voice-based AI agents in India and beyond.
- By focusing on small, efficient models, multilingual capabilities, and organizational sovereignty, Sarvam AI is enabling businesses to deploy scalable, affordable AI solutions that meet the needs of diverse populations.
- The future of AI in India lies in voice-led interactions, and Sarvam AI is at the forefront of this revolution, making generative AI accessible to billions.