Unlocking Image-Match Data to Drive Growth

Explore innovative use of image match data to enhance business growth and unlock new insights.

The Data Engineering Summit 2024, held in Bengaluru, was a landmark event for India’s burgeoning data engineering community, focusing on the latest innovations in generative AI. Among the notable speakers was Padma Chitturi, Senior Engineering Manager of the Machine Learning Platform at Indeed.com. Padma brought her extensive experience and expertise to the stage, sharing insights on how her team leverages image-match data to derive valuable business insights and drive growth.

The Evolution of Analytical Approaches in Uncertain Times

Padma began her talk by setting the context with a brief overview of traditional analytical approaches and their limitations, especially in volatile economic conditions. She emphasized that while descriptive, diagnostic, predictive, and prescriptive analytics have been fundamental in driving business insights, they often rely on known data patterns. This reliance can be problematic in unprecedented situations like the COVID-19 pandemic, where past data becomes irrelevant.

Traditional Analytical Approaches

  1. Descriptive Analytics: Provides reports on what has happened, slicing and dicing market data to understand trends.
  2. Diagnostic Analytics: Analyzes factors influencing trends through correlation and statistical analysis.
  3. Predictive Analytics: Forecasts future trends based on historical data.
  4. Prescriptive Analytics: Suggests optimal actions by exploring permutations and combinations of various strategies.

While powerful, Padma highlighted that these methods often fall short when dealing with unexpected market disruptions. The key challenge is identifying and adapting to new, hidden patterns that emerge in such situations.

Image-Matching Case Study

Padma then delved into a compelling case study from her tenure at Expedia, where her team tackled the challenge of matching properties across different platforms. Expedia, being a conglomerate of multiple brands like Hotels.com and Brand Expedia, faced significant issues with duplicate property listings and insufficient inventory to meet customer demand.

The Matching Challenge

Expedia’s goal was to identify and resolve duplicate property listings across its platforms and public real estate sites. This required sophisticated matching techniques to ensure the accuracy and completeness of the inventory data.

Techniques Used

  1. Image Matching: Utilized perceptual hash algorithms to match property images.
  2. Address Matching: Compared property addresses to identify duplicates.
  3. Registration Number Matching: Verified property registration details across platforms.

These techniques enabled Expedia to match properties listed on sites with those on Expedia and public real estate records. The results were impressive, with the identification of significant duplicate entries and the consolidation of property data, enhancing the overall inventory quality.

Leveraging Graph Technology

One of the most innovative aspects of Padma’s approach was the use of graph technology to manage and analyze the matched data. By creating an in-memory graph of properties and their connections, her team could easily identify and visualize relationships between different listings.

Applying Spark GraphFrames

The graph-based approach simplified the process of extracting insights from the matched data. Spark GraphFrames were used to create vertices and edges representing properties and their matches. This method enabled the team to generate connected components, revealing clusters of related properties.

By analyzing these clusters, Expedia could answer critical business questions, such as identifying properties listed on real estate sites but not on Expedia’s platforms or understanding the distribution of duplicate listings.

Enhancing Accuracy with Data Science Techniques

To refine the image-matching results, Padma’s team applied several data science techniques. These included:

  1. Cosine Similarity for Titles and Descriptions: Ensured that the text data matched accurately.
  2. Lat-Long Distance Calculations: Verified the geographical proximity of properties.
  3. Binary Classification Algorithms: Assessed the likelihood of properties being duplicates based on multiple features.

These enhancements improved the precision of the matching process, ensuring that only genuinely duplicate properties were flagged.

Responding to the COVID-19 Crisis

The COVID-19 pandemic posed a unique challenge, disrupting traditional travel patterns and rendering many pre-existing insights obsolete. Padma’s team leveraged their graph-based approach to adapt to the new landscape. They discovered that travelers preferred vacation rentals over hotels, especially those in remote areas with flexible cancellation policies.

New Insights and Strategies

By analyzing the graph data, the team identified emerging trends and provided actionable recommendations to travel partners. These included targeting properties in less congested areas and promoting those with better cancellation terms. This strategic pivot helped Expedia recover and drive revenue growth during the pandemic.

The Future of Data-Driven Business

Padma concluded her talk by emphasizing the importance of understanding relationships between data objects in addressing real business problems. She advocated for a shift from traditional, pattern-based analytics to more dynamic, relationship-focused approaches that can adapt to changing circumstances.

In summary, Padma Chitturi’s presentation at the Data Engineering Summit 2024 highlighted the transformative potential of innovative data engineering techniques. By leveraging image matching, graph technology, and advanced data science methods, businesses can uncover hidden insights, adapt to market changes, and drive sustainable growth even in uncertain times.

Transform your team into AI powerhouses

Targeted suite of solutions for enterprises aiming to harness the power of AI. MachineHack is your partner in building a future-ready workforce adept in artificial intelligence.

Online AI Hackathons to accelerate innovation

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.