❖ Browsing media by trc
註:全文刊於《圖書館學與資訊科學》,第43卷,第1期,第7-46頁。2017年4月。
http://jlis.glis.ntnu.edu.tw/ojs../index.php/jlis/article/view/722
結構資料的再次使用:語意、連結與實作
黃韋菁
中央研究院 資訊科學研究所 專案經理
李承錱
中央研究院 資訊科學研究所 研究助理
莊庭瑞
中央研究院 資訊科學研究所 副研究員
摘要
持續創造資料的語意與連結,藉由全球資訊網散布同時可由常人和機器處理並理解的結構性資料,進而增進資料集的「再次使用價值」(reuse value)是目前廣受重視的課題,也是本研究由理論探討邁向系統實作的動力與目的。本文簡述與「開放資料連結」(Linked Open Data, LOD)相關國際計畫與技術發展,介紹以「開放資料連結」方式建置的五項跨領域知識庫和七項專業知識庫,並解析資料品質、後設資料(Metadata)及資料溯源(Provenance)的關聯脈絡。本研究同時進行實作網站 data.odw.tw,收納典藏品目錄資料,並設計知識本體(voc4odw)轉換半結構式資料為富語意結構的連結式資料。一方面擴充 CKAN(The ComprehensiveKnowledge Archive Network)資料集管理系統,作為連結式資料的儲存與展示平台,進而強調從原始目錄資料到語意連結資料的分段轉換步驟,最後將各步驟轉換程式以及 CKAN 軟體程式碼以「開放原始碼」(Open Source)方式釋出。另一方面,由於研究資料來源採「創用CC」(Creative Commons)公眾授權,因此研究成果亦以相同方式釋出,在開放基礎上促使資料與程式碼的保存與發展,可被自由再次使用與擴散。
關鍵字: CKAN、資料溯源、資料品質、知識庫、開放資料連結 (LOD)、知識本體、語意再現。
Reuse of Structured Data: Semantics, Linkage, and Realization
Andrea Wei-Ching Huang
Project Manager (Research)
Institute of Information Science, Academia Sinica, Taiwan
Cheng-Jen Lee
Research Assistant
Institute of Information Science, Academia Sinica, Taiwan
Tyng-Ruey Chuang
Associate Research Fellow
Institute of Information Science, Academia Sinica, Taiwan
Abstract
In order to increase the reuse value of existing datasets, it is now becoming a general practice to add semantic links among the records in a dataset, and to link these records to external resources. The enriched datasets are published on the web for both human and machine to consume and re-purpose. In this paper, we make use of publicly available structured records from a digital archive catalogue, and we demonstrate a principled approach to converting the records into semantically rich and interlinked resources for all to reuse. While exploring the various issues involved in the process of reusing and re-purposing existing datasets, we review the recent progress in the field of Linked Open Data (LOD), and examine twelve well-known knowledge bases built with a Linked Data approach. We also discuss the general issues of data quality, metadata vocabularies, and data provenance. The concrete outcome of this research work is the following: (1) a website data.odw.tw that hosts more than 840,000 semantically enriched catalogue records across multiple subject areas, (2) a lightweight ontology voc4odw for describing data reuse and provenance, among others, and (3) a set of open source software tools available to all to perform the kind of data conversion and enrichment we did in this research. We have used and extended CKAN (The Comprehensive Knowledge Archive Network) as a platform to host and publish Linked Data. Our extensions to CKAN is open sourced as well. As the records we drawn from the originally catalogue are released under the Creative Commons licenses, the semantically enriched resources we now re‐publish on the Web are free for all to reuse as well.
全文詳見 PDF 檔案。
Please refer to the PDF for full text.