Designing Scalable Data Science Systems for Global Enterprises

Explore fundamental design principles for scalable data science systems in global enterprises, focusing on efficiency and innovation.
CAD

At the Data Engineering Summit 2024, held in Bengaluru, Rahul Prakash, Director of Sales & Distribution Analytics at AB InBev, delivered a compelling talk on the advancements and future of analytics in sales and distribution. With over 17 years of experience spanning roles in consumer packaged goods (CPG), consulting, and startups, Rahul has established himself as a leader in the field. In his talk, Rahul delves into the fundamental design principles for building scalable data science systems in large enterprises, focusing on the importance of business context, innovative architecture, and efficient engineering practices.

Setting the Business Context

In his session, Rahul Prakash began by providing a comprehensive overview of Anheuser-Busch InBev’s global operations. With over 500 beer brands like Corona, Hoegaarden, Michelob Ultra, and Modelo, AB InBev operates in about 50 countries, selling through various channels including individual stores, wholesalers, retail chains, and online platforms. This vast and complex data landscape includes sales data, sensor data, streaming data, macroeconomic indicators, weather data, social demographics, and social media analytics. Such diversity and volume of data necessitate a robust and scalable data science infrastructure.

The CatExpert.ai Product

Rahul introduced CatExpert.ai, a business-facing product designed to assist retail chains in maximizing their sales through optimal assortment and planogram analytics. The tool leverages best-in-class technology and artificial intelligence to provide real-time, data-driven recommendations for product placements on shelves, taking into account demographic, consumption, and geographic data. This tool exemplifies AB InBev’s commitment to driving innovation in sales and distribution through advanced analytics.

Deep Dive into System Architecture

Rahul provided a detailed look at the architecture behind CatExpert.ai, highlighting the critical components that ensure its scalability and efficiency. The architecture employs a comprehensive active directory for role-based access, an application gateway for load balancing, and mesh applications to route requests to appropriate microservices. These microservices are hosted on an Azure Kubernetes cluster, allowing for seamless scaling and management of compute-intensive tasks.

The architecture also includes an API management gateway to handle incoming API traffic and ensures that various microservices, whether for data fetching or computing tasks, operate efficiently. This setup enables business users to run multiple simulations and experiments, enhancing their decision-making processes with reliable and timely data.

Key Design Considerations

Rahul emphasized three fundamental design considerations crucial for developing scalable data science products: scalability, adaptability, and observability.

  1. Scalability: The ability to handle increasing volumes of data and user requests without compromising performance is vital. The architecture uses auto-scaling mechanisms, especially for compute-intensive tasks, to ensure there is no downtime and that performance remains consistent.
  2. Adaptability: Given the global scale of AB InBev’s operations, the system must be adaptable to different regional requirements and seamlessly integrate with various local systems. The use of a hub-and-spoke model, with a center of excellence in Bangalore and zonal headquarters in several countries, facilitates this adaptability.
  3. Observability: Continuous monitoring and logging are essential to maintain system health and performance. The architecture includes comprehensive logging services that track performance metrics and error rates, enabling proactive identification and resolution of issues.

Data Strategy and Microservices

Rahul detailed the data strategy employed in CatExpert.ai, which involves integrating multiple data sources into a data lakehouse. This includes data warehouses like Snowflake and Vertica, as well as flat files and other data formats. The data is organized into a Delta Lake structure with bronze, silver, and gold tiers, ensuring that the most refined data is available for analysis.

Microservices play a crucial role in the system, handling both data retrieval and compute-intensive tasks. The system uses queuing technology and message brokers to manage and distribute these tasks efficiently. Each microservice is designed to be payload-agnostic, allowing for easy configuration and scalability.

Full Stack Deployment and Observability

The technology stack for CatExpert.ai includes front-end design patterns, back-end frameworks like FastAPI and Python, and various queuing and message-brokering technologies. The system’s observability framework monitors application performance, infrastructure health, user data, and security metrics. This comprehensive monitoring setup enables AB InBev to predict and preempt potential failures, moving towards a self-healing infrastructure.

Collaborative Development and Algo-Vault

Rahul concluded by discussing the importance of collaborative development across global teams. AB InBev has developed Algo-Vault, a repository of standardized algorithms and methods that can be reused and enhanced by teams worldwide. This repository includes components for data fetching, preprocessing, exploratory data analysis, feature engineering, modeling, and reporting. By modularizing these components, AB InBev ensures that development efforts are not duplicated and that innovations can be quickly integrated into the system.

Conclusion

Rahul Prakash’s talk at the Data Engineering Summit 2024 highlighted the sophisticated architecture and innovative strategies employed by AB InBev to enhance sales and distribution analytics. Through products like CatExpert.ai and a robust, scalable infrastructure, AB InBev is leading the way in leveraging data science to drive business success. The insights shared by Rahul provide a valuable blueprint for organizations looking to harness the power of data engineering and artificial intelligence in their operations.

Transform your team into AI powerhouses

Targeted suite of solutions for enterprises aiming to harness the power of AI. MachineHack is your partner in building a future-ready workforce adept in artificial intelligence.

Online AI Hackathons to accelerate innovation

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.