Case Studies on Mutually Enriched Data Projects

Case Studies on Mutually Enriched Data Projects (Part 1)
Tyng-Ruey Chuang, with Huang-Sin Syu and Wen-Ting Yang
Institute of Information Science
Academia Sinica, Taipei, Taiwan

Case Studies on Mutually Enriched Data Projects (Part 2)
-- Daudu Plateau Investigation Project
Shih-Chieh Ilya Li (@ilya) and Der-Tsai Lee
CEO and President, Honghua Environmental Protection and Digital Future Foundation

Slide URL for Part 2: https://hackmd.io/s/S1UfzMwqm


These are the slidesets for a talk at the SciDataCon 2018 session Citizen Science Data – from Collection to Curation to Management.

These are works in progress. Feedback and comment welcomed!



More and more data has been acquired, accumulated, analyzed, reused, distributed, and preserved. As scientific research is increasingly collaborative, research data collections are often community produced and maintained. Research data can be sourced from individuals (as in citizen science projects), taken from the public domain (e.g. old manuscripts and maps), generated by automatic sensors, or derived and combined from various data sources. Collaborative projects focusing on data collection, curation, dissemination are everywhere. Encyclopedia of Life, OpenStreetMap, Project Gutenberg, Wikidata, among many others, are exemplar in demonstrating that data in support of everyday use and research work can be collaboratively produced, expertly curated, and freely reusable.

It has also been observed that increasingly there is cross-fertilization among these data projects. Data from diverse sources is being linked for domain applications. Datasets are mixed and visualized together. An authoritative data collection may become the basis of cross-reference for other data collections. One dataset may be used as the control vocabulary in the others, etc.

There seems to be many types of cross-project mutual enrichment. We aim to look into cases where such mutual enrichments cross traditional boundaries (e.g. proprietary/free, public sectors/commercial applications, top-down/collaborative, etc.). There are cases where (anonymized) datasets from social media services are being used to enrich community generated datasets, and vice versa. The use cases surrounding the OpenStreetMap datasets and services will be good starts. We will also study the cases where the collection of concepts and entities in Wikidata has been used as a source of semantic grounding. The aggregation, distribution, and repurpose of various open government data collections will be a focused area too.


