Duplicate record detection using GenAI techniques to improve data quality

Ian Ormesher

Wednesday 11:45 in Platinum3
  • A description of the problem of duplicate records and their impact on businesses
  • An overview of the proposed solution
  • How to use GenAI models and techniques to identify potential duplicate records
  • Step 1: identify your columns to match on
  • Step 2: creating embedding vectors for these columns
  • Step 3: creating match clusters
  • Step 4: presenting those cluster to the users who can then choose what to do with the duplicates

Ian Ormesher

Ian Ormesher is a seasoned full-stack Data Scientist with a robust background in training and deploying AI models in production environments. With a career spanning over four decades, he has honed his skills in Machine Learning, Deep Neural Networks, Reinforcement Learning, and Computer Vision. He is proficient in a wide array of programming languages and data analysis tools with a proven track record of implementing data-oriented solutions in the Cloud.