Measuring what works in AI: Kearney’s Business First Approach to LLM Leaderboards : Insights from Cypher 2024

Explore Maksim Khaitovich's framework for selecting enterprise-ready LLMs, prioritizing integration and business value.
session

At Cypher 2024, Maksim Khaitovich, AI Lab Head at Kearney in Dubai, delivered a groundbreaking presentation on the critical challenge of selecting the right Large Language Model (LLM) for enterprise applications. His talk addressed a fundamental problem facing organizations investing in generative AI: how to choose an LLM that not only performs well in prototypes but can truly scale across an entire enterprise ecosystem.


Core Concepts of LLM Selection

The traditional approach to evaluating Large Language Models has been predominantly technical, focusing on synthetic metrics and narrow performance indicators. Khaitovich highlighted a crucial gap in existing methodologies: most leaderboards fail to consider critical business integration factors. His team developed a comprehensive evaluation framework that goes beyond standard performance metrics to assess enterprise readiness and practical applicability.

The proposed leaderboard introduces two primary dimensions of evaluation:

  1. Enterprise Readiness
  2. Business Performance

Enterprise Readiness encompasses several key subdimensions:

  • Total model functionality
  • Integration capabilities with RAG and agent pipelines
  • Cloud infrastructure compatibility
  • Training data diversity
  • Ease of model usage
  • Development framework integration
  • Licensing considerations
  • Accessibility across cloud platforms
  • Computational speed and response time


Challenges and Implementation Insights

Organizations face significant challenges when scaling generative AI solutions. Khaitovich noted that many companies have invested billions of dollars into AI technologies with minimal returns. The primary obstacles include:

  • Difficulty integrating LLMs into existing IT ecosystems
  • High computational costs
  • Inconsistent performance across different business domains
  • Limited model accessibility

The solution lies in a tailored, business-first approach to LLM selection. Khaitovich recommended:

  • Creating a custom leaderboard specific to organizational needs
  • Focusing on actual business value extraction
  • Conducting quick proof-of-concept tests
  • Considering long-term model viability
  • Evaluating models against specific business use cases


Implementation Recommendations

When selecting an LLM, organizations should:

  • Assess models against their specific business context
  • Prioritize evaluation criteria based on unique requirements
  • Consider factors beyond pure performance
  • Test top candidate models through practical experiments
  • Continuously update and refresh model evaluations


Industry Impact

The approach represents a significant shift from purely technical model assessment to a more holistic, business-oriented evaluation. By focusing on practical integration, cost-effectiveness, and domain-specific performance, organizations can make more informed decisions about AI technology adoption.


Conclusion

Khaitovich’s framework offers a pragmatic solution to the complex challenge of LLM selection. As he emphasized, the key to successful generative AI implementation is not just choosing a high-performing model, but selecting the right model for your specific business context. The future of enterprise AI lies in nuanced, context-aware model selection that prioritizes business value over pure technical performance.

Transform your team into AI powerhouses

Targeted suite of solutions for enterprises aiming to harness the power of AI. MachineHack is your partner in building a future-ready workforce adept in artificial intelligence.

Online AI Hackathons to accelerate innovation

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.