August 29, 2020 / Ramin Hedeshy

Creating an Open Citation Graph from PDF Documents

The German National Science Foundation (DFG) has accepted the second phase of our project that creates an open citation graph from PDF documents. In the first phase that we ran at the University of Koblenz-Landau and that was titled EXCITE we have successfully developed, deployed, and open-sourced methods that locate, extract and segment citations. Our partner, GESIS, Leibniz Institute for Social Sciences], maintains bibliographic databases for the social sciences and has matched our output run on social science literature against the bibliographic records they maintain. Social science research papers are particularly challenging as their citation habits include a larger variation of styles and even many references in footnotes (cf. our paper at JCDL-2019). While our methods are domain-independent they were geared in order to cope also with the specifics found in social sciences and humanities research.
In the second phase titled OUTCITE that we will run at the University of Stuttgart we will deal with the unmatched citations, these are the citations that are not known in GESIS’ bibliographic records. Like before, we will open source our developed tools and will feed extracted citations into OpenCitations, an open knowledge graph for citation networks.
