The Role of Community-Driven Data Curation for Enterprises

Curry, Edward; Freitas, Andre; O'Riáin, Sean

doi:10.1007/978-1-4419-7665-9

by Edward Curry, Andre Freitas, Sean O'Riáin

Abstract:

With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.

View PDF

Reference:

Edward Curry, Andre Freitas, Sean O'Riáin, "The Role of Community-Driven Data Curation for Enterprises", Chapter in Linking Enterprise Data, Springer US, Boston, MA, pp. 25-47, 2010. [slides]

Bibtex Entry:

@incollection{Curry2010,
abstract = {With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.},
address = {Boston, MA},
annote = {<a href="http://www.slideshare.net/edwardcurry/the-role-of-communitydriven-data-curation-for-enterprises">[slides]</a>},
author = {Curry, Edward and Freitas, Andre and O'Ri{\'{a}}in, Sean},
booktitle = {Linking Enterprise Data},
chapter = {2},
doi = {10.1007/978-1-4419-7665-9},
editor = {Wood, David},
file = {:Users/ed/Library/Application Support/Mendeley Desktop/Downloaded/Curry, Freitas, O'Ri{\'{a}}in - 2010 - The Role of Community-Driven Data Curation for Enterprises.pdf:pdf},
isbn = {978-1-4419-7664-2},
keywords = {Community Data Curation,Computer Science,Linked Data,Precompetitive Collaboration,Treo},
mendeley-tags = {Treo},
pages = {25--47},
publisher = {Springer US},
title = {{The Role of Community-Driven Data Curation for Enterprises}},
url = {http://www.edwardcurry.org/publications/curry_LED_Curation_2010.pdf},
year = {2010}
}