Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach

Freitas, Andre; Curry, Edward

by Andre Freitas, Edward Curry

Abstract:

The demand to access large amounts of heterogeneous structured data is emerging as a trend for many users and applications. However, the effort involved in querying heterogeneous and distributed third-party databases can create major barriers for data consumers. At the core of this problem is the semantic gap between the way users express their information needs and the representation of the data. This work aims to provide a natural language interface and an associated semantic index to support an increased level of vocabulary independency for queries over Linked Data/Semantic Web datasets, using a distributional-compositional semantics approach. Distributional semantics focuses on the automatic construction of a semantic model based on the statistical distribution of co-occurring words in large-scale texts. The proposed query model targets the following features: (i) a principled semantic approximation approach with low adaptation effort (independent from manually created resources such as ontologies, thesauri or dictionaries), (ii) comprehensive semantic matching supported by the inclusion of large volumes of distributional (unstructured) commonsense knowledge into the semantic approximation process and (iii) expressive natural language queries. The approach is evaluated using natural language queries on an open domain dataset and achieved avg. recall=0.81, mean avg. precision=0.62 and mean reciprocal rank=0.49.

View PDF

Reference:

Andre Freitas, Edward Curry, "Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach", In 18th International Conference on Intelligent User Interfaces (IUI'14), ACM, Haifa, Israel, pp. 279-288, 2014.

Bibtex Entry:

@inproceedings{Freitas2014,
abstract = {The demand to access large amounts of heterogeneous structured data is emerging as a trend for many users and applications. However, the effort involved in querying heterogeneous and distributed third-party databases can create major barriers for data consumers. At the core of this problem is the semantic gap between the way users express their information needs and the representation of the data. This work aims to provide a natural language interface and an associated semantic index to support an increased level of vocabulary independency for queries over Linked Data/Semantic Web datasets, using a distributional-compositional semantics approach. Distributional semantics focuses on the automatic construction of a semantic model based on the statistical distribution of co-occurring words in large-scale texts. The proposed query model targets the following features: (i) a principled semantic approximation approach with low adaptation effort (independent from manually created resources such as ontologies, thesauri or dictionaries), (ii) comprehensive semantic matching supported by the inclusion of large volumes of distributional (unstructured) commonsense knowledge into the semantic approximation process and (iii) expressive natural language queries. The approach is evaluated using natural language queries on an open domain dataset and achieved avg. recall=0.81, mean avg. precision=0.62 and mean reciprocal rank=0.49.},
address = {Haifa, Israel},
author = {Freitas, Andre and Curry, Edward},
booktitle = {18th International Conference on Intelligent User Interfaces (IUI'14)},
file = {:Users/ed/Library/Application Support/Mendeley Desktop/Downloaded/Freitas, Curry - 2014 - Natural Language Queries over Heterogeneous Linked Data Graphs A Distributional-Compositional Semantics Approach.pdf:pdf},
pages = {279--288},
publisher = {ACM},
title = {{Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach}},
url = {http://www.edwardcurry.org/publications/preprint_iui_2014.pdf},
year = {2014}
}