Declarative Entity Resolution Via Matching Dependencies and Combining Matching Dependencies With Machine Learning for Entity Resolution
Public Deposited- Resource Type
- Creator
- Contributors
- Zeinab Bahmani (Author)
- Abstract
Entity resolution (ER) is an important problem in data cleaning. It is about iden- tifying and merging records in a database that represent the same external entity. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. An ER process supported by MDs over a dirty instance may lead to multiple clean instances.In this thesis, we first present disjunctive answer set programs that capture through their models the class of alternative clean instances obtained after an ER process based on MDs. With these programs, we can obtain clean answers to queries by skeptically reasoning from the program. As an important practical case of ER, we provide a declarative reconstruction of the so-called union-case ER methodology, as presented through a generic approach to ER, the so-called Swoosh approach. We extend our ASP-based account of the union-case of Swoosh with negative rules.In this work, we extend MDs to relational MDs, which capture more application semantics, and identify classes of relational MDs for which the proposed declarative specifications for ER via MDs can be automatically rewritten into stratified Datalog programs.We also show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs using machine learn- ing (ML) techniques; (b) Use of relational MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language LogiQL -an extended form of Datalog supported by the LogicBlox platform- for all activities related to data processing, and the specification and enforcement of MDs.
- Subject
- Language
- Publisher
- Thesis Degree Level
- Thesis Degree Name
- Thesis Degree Discipline
- Identifier
- Rights Notes
Copyright © 2017 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.
- Date Created
- 2017
Relations
- In Collection:
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
bahmani-declarativeentityresolutionviamatchingdependencies.pdf | 2023-05-05 | Public | Download | |
bahmani-declarativeentityresolutionviamatchingdependencies-supplemental.zip | 2023-05-05 | Public | Download |