Reference Understanding in the Social Sciences

Making bibliographic data available is important in all disciplines to ensure easy and fast access to the literature and other scientific resources such as research datasets. To this end, many publishers strive to index their publications in bibliographic databases enabling the linking of publications in a citation graph. Still, a significant part of citation data in disciplines such as social science is not accessible via bibliographic databases.

Our previous project EXCITE has addressed this problem and has narrowed the gap between the availability of citation data in the social sciences and other disciplines. EXCITE has developed tools that localize, extract and segment reference strings in PDF documents and then match them against bibliographic databases. One of the main conclusions derived from EXCITE is that the metadata of 60% of the cited papers and other scientific resources are outside of available bibliographic databases. The extracted reference strings that could not be matched are called non-source items. Non-source items include incomplete or erroneous references as well as references that indeed do not exist in the available bibliographic databases, especially references to datasets, websites and other material. 

The main goal of OUTCITE is to research, develop and deploy a toolchain which follows-up on the output produced by the EXCITE pipeline in order to link non-source items to their sources.

Operating Time: 04/2021 - 03/2023

Source of Funding: DFG


Web Site: https://excite.informatik.uni-stuttgart.de


