Designing Scalable Data Science Systems for Global Enterprises

Explore fundamental design principles for scalable data science systems in global enterprises, focusing on efficiency and innovation.

Published on June 13, 2024

Explore more from MachineHack

Responsible AI: Building An Ethical Governance Framework for AI Models in BFS : Insights from Cypher 2024

Status of GenAI for manufacturing industry : Insights From Cypher 2024

Boosting Engagement Between Banks, Merchants, and Consumers Through Data and AI

Maximizing AI Capabilities with Intel® Core Ultra Processors: The Future of AI PCs — Insights from Cypher 2024

Innovative Canonical Workflows with Multi-Agent Generative AI: Insights from Cypher 2024

Factory Pattern for Building and Scaling ML Solutions : Insights from Cypher 2024

AI in Enterprise: Promise, Pitfalls, and Path Forward : Insights from Cypher 2024

Architecting for Analytics Ascent: Insights from Gaurav Anand, Head of Data and Analytics at Diageo India

Measuring what works in AI: Kearney’s Business First Approach to LLM Leaderboards : Insights from Cypher 2024

Transformative Impact of Digi Yatra on the Digital Experience: Insights from Cypher 2024

At the Data Engineering Summit 2024, held in Bengaluru, Rahul Prakash, Director of Sales & Distribution Analytics at AB InBev, delivered a compelling talk on the advancements and future of analytics in sales and distribution. With over 17 years of experience spanning roles in consumer packaged goods (CPG), consulting, and startups, Rahul has established himself as a leader in the field. In his talk, Rahul delves into the fundamental design principles for building scalable data science systems in large enterprises, focusing on the importance of business context, innovative architecture, and efficient engineering practices.

Setting the Business Context

In his session, Rahul Prakash began by providing a comprehensive overview of Anheuser-Busch InBev’s global operations. With over 500 beer brands like Corona, Hoegaarden, Michelob Ultra, and Modelo, AB InBev operates in about 50 countries, selling through various channels including individual stores, wholesalers, retail chains, and online platforms. This vast and complex data landscape includes sales data, sensor data, streaming data, macroeconomic indicators, weather data, social demographics, and social media analytics. Such diversity and volume of data necessitate a robust and scalable data science infrastructure.

The CatExpert.ai Product

Rahul introduced CatExpert.ai, a business-facing product designed to assist retail chains in maximizing their sales through optimal assortment and planogram analytics. The tool leverages best-in-class technology and artificial intelligence to provide real-time, data-driven recommendations for product placements on shelves, taking into account demographic, consumption, and geographic data. This tool exemplifies AB InBev’s commitment to driving innovation in sales and distribution through advanced analytics.

Deep Dive into System Architecture

Rahul provided a detailed look at the architecture behind CatExpert.ai, highlighting the critical components that ensure its scalability and efficiency. The architecture employs a comprehensive active directory for role-based access, an application gateway for load balancing, and mesh applications to route requests to appropriate microservices. These microservices are hosted on an Azure Kubernetes cluster, allowing for seamless scaling and management of compute-intensive tasks.

The architecture also includes an API management gateway to handle incoming API traffic and ensures that various microservices, whether for data fetching or computing tasks, operate efficiently. This setup enables business users to run multiple simulations and experiments, enhancing their decision-making processes with reliable and timely data.

Key Design Considerations

Rahul emphasized three fundamental design considerations crucial for developing scalable data science products: scalability, adaptability, and observability.

Scalability: The ability to handle increasing volumes of data and user requests without compromising performance is vital. The architecture uses auto-scaling mechanisms, especially for compute-intensive tasks, to ensure there is no downtime and that performance remains consistent.
Adaptability: Given the global scale of AB InBev’s operations, the system must be adaptable to different regional requirements and seamlessly integrate with various local systems. The use of a hub-and-spoke model, with a center of excellence in Bangalore and zonal headquarters in several countries, facilitates this adaptability.
Observability: Continuous monitoring and logging are essential to maintain system health and performance. The architecture includes comprehensive logging services that track performance metrics and error rates, enabling proactive identification and resolution of issues.

Data Strategy and Microservices

Rahul detailed the data strategy employed in CatExpert.ai, which involves integrating multiple data sources into a data lakehouse. This includes data warehouses like Snowflake and Vertica, as well as flat files and other data formats. The data is organized into a Delta Lake structure with bronze, silver, and gold tiers, ensuring that the most refined data is available for analysis.

Microservices play a crucial role in the system, handling both data retrieval and compute-intensive tasks. The system uses queuing technology and message brokers to manage and distribute these tasks efficiently. Each microservice is designed to be payload-agnostic, allowing for easy configuration and scalability.

Full Stack Deployment and Observability

The technology stack for CatExpert.ai includes front-end design patterns, back-end frameworks like FastAPI and Python, and various queuing and message-brokering technologies. The system’s observability framework monitors application performance, infrastructure health, user data, and security metrics. This comprehensive monitoring setup enables AB InBev to predict and preempt potential failures, moving towards a self-healing infrastructure.

Collaborative Development and Algo-Vault

Rahul concluded by discussing the importance of collaborative development across global teams. AB InBev has developed Algo-Vault, a repository of standardized algorithms and methods that can be reused and enhanced by teams worldwide. This repository includes components for data fetching, preprocessing, exploratory data analysis, feature engineering, modeling, and reporting. By modularizing these components, AB InBev ensures that development efforts are not duplicated and that innovations can be quickly integrated into the system.

Conclusion

Rahul Prakash’s talk at the Data Engineering Summit 2024 highlighted the sophisticated architecture and innovative strategies employed by AB InBev to enhance sales and distribution analytics. Through products like CatExpert.ai and a robust, scalable infrastructure, AB InBev is leading the way in leveraging data science to drive business success. The insights shared by Rahul provide a valuable blueprint for organizations looking to harness the power of data engineering and artificial intelligence in their operations.

Transform your team into AI powerhouses

Targeted suite of solutions for enterprises aiming to harness the power of AI. MachineHack is your partner in building a future-ready workforce adept in artificial intelligence.

Online AI Hackathons to accelerate innovation

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Designing Scalable Data Science Systems for Global Enterprises

Explore more from MachineHack

Setting the Business Context

The CatExpert.ai Product

Deep Dive into System Architecture

Key Design Considerations

Data Strategy and Microservices

Full Stack Deployment and Observability

Collaborative Development and Algo-Vault

Conclusion

Transform your team into AI powerhouses

Online AI Hackathons to accelerate innovation

Unlock the Full Spectrum of AI Developer Engagement and Learning Solutions

Explore Our Comprehensive Offerings Tailored for AI Developers - From Assessments to Hackathons, and Corporate Training to Advocacy

Assessments

Measure and elevate AI skills with precision, using assessments designed to benchmark developer capabilities.

Hackathons

Ignite innovation and foster community among AI developers through engaging hackathons that challenge and inspire.

Interview Solutions

Streamline your hiring process with tailored interview solutions that identify top AI talent, ensuring a perfect fit for your team.

Learning Management System (LMS)

Deliver personalized learning experiences at scale, empowering AI developers with the knowledge to advance in their careers.

Enterprise Upskilling

Elevate your team’s AI proficiency with bespoke training programs designed to boost productivity and drive technological innovation.

Developer Advocacy

Amplify your brand within the AI developer community, fostering connections and promoting growth through strategic advocacy.

Blogs

For Developers

For Organizations

Talk to us

support@machinehack.com