Research Core Dataset
German Council of Science and Humanities Fraunhofer FIT
Home Deutsche Version
Unbenanntes Dokument


The aim of the project is to develop a specification of the concepts as outlined in the recommendations of the German Council of Science and Humanities on the specification of a future research core dataset. The research core dataset is meant to be an offer to universities and research institutions to support existing efforts in measuring research activities and processes. In a first step relevant parameters of the science system will be identified in order to specify the concepts of the research core dataset. In a second step, the project team in close cooperation with experts will derive uniform data definitions and formats that shall constitute the basis for the future standardisation. In addition, the project will work towards a reform of existing subject classifications. The integration of the research core dataset in local research information management systems is supposed to lay the foundations for a future electronic exchange of data.

Given the large number of elements to be defined and heterogeneous reporting requirements, the standardisation of the full range of concepts can hardly be fully achieved in the course of the project period. However, since concerted and constructive efforts are needed in the German science system, the project will focus on the clarification of core issues.

The project work is carried out in four working groups: that is Definitions and data formats, Bibliometrics, Technics and interfaces and Subject classification. The project management coordinates and supervises the working groups. The working groups screen existing definitions, data sources, publication formats and classifications in order to develop specification proposals in internal consultation processes. A working group by the German Council of Science and Humanities supports the project as an advisory board. Its members are appointed by the German Council of Science and Humanities.

Given the high number of involved actors, their interests and requirements as well as the variety of existing definitions and best practices, hearings will take place in the course of the project that enable the members of the project to draw on external expertise. In so doing, the members of the project will design a draft specification of widespread acceptance and good fit for the German science system.

In order to fully reflect the factual reporting requirements of universities and research institutions, pilot institutions will be involved in the standardisation process at an early stage. On the one hand, this approach ensures that the specification can be implemented as smoothly as possible. On the other hand, this will initiate an exchange on the costs and the benefits that result from the implementation of different elements of the specification.

The overall success of the specification as a standard depends on the early and ongoing communication of the results also during the project period. The involvement of all relevant actors is also highly important for the sustained success of the specification as a general standard.

The project work is oriented along the following definitional aims:

System-wide data standard show/hide text

Given the heterogeneity of existing data collections, institutional comparisons are hardly feasible. The research core dataset will enable well-informed analyses and to compile reliable data by creating a system-wide data standard. For this reason, a national standardisation of data formats is indispensable. The research core dataset defines and standardises data formats

  1. on the lowest level of data assessment (i.e. the individual),
  2. for different units of a given subject, and based on a suitable aggregation approach on higher levels,
  3. for different institutional units, such as entire universities.

The resulting data will be used by different actors of the German science system for a number of purposes, analyses and comparisons, reports as well as internal assessments. In addition, students will benefit from the conclusions drawn on the basis of these data. Finally, the core dataset will provide transparent information on research-related processes and activities of interest to scientists and researchers.

Quality assurance show/hide text

The standardisation introduced by the research core dataset removes definitional blurriness that often impedes the provision of comparable data. This enhances the quality of the data. In addition, the broad implementation of the standardisation creates incentives for data quality checks on all levels, including researchers and institutions alike. The responsibility for the implementation of adequate quality assurance procedures will reside with the research institutions and universities.

Interpretability show/hide text

Well-defined data formats also facilitate the interpretation of data and therefore enhance their informative value. Unambiguous data are a prerequisite for meaningful comparisons over time.

Reducing the complexity show/hide text

The standardisation minimises the efforts for the collection of data because universities and research institutions will maintain a data inventory. Standardised data are adequate for multiple uses and different requests. In case the standards keep their long-term validity, costs will also be reduced for recipients of the data. After a short initial implementation period, universities and research institutions will be in a position to refer to the research core dataset in order to reject other, e.g. incompatible, requests of data. The definitions constitute the core of available data, which might be extended by additional "layers" for different requests where needed.

Reducing the number of requests show/hide text

On the basis of the research core dataset, requests for essential information on research activities can simply be directed to the universities and research institutions. As a consequence, the indicators of the research core dataset will become standard instruments for the measurement and evaluation of research outputs and achievements. This process will reduce the necessity and extent of additional data collection and requests.

Data ownership show/hide text

As before data on research activities and processes remain with the universities and the research institutions. Contrary to the current situation, however, data requests can easily be addressed because certain core information will be held available in research institutions and universities. As the research core dataset is entirely based on existing data inventories regulations on data privacy continue to be applicable.