Skip to main content

News

From Access to Impact: Exploring the Role of French Open Science Platforms in Measuring Societal Change

From Access to Impact: Exploring the Role of French Open Science Platforms in Measuring Societal Change

By Simon Apartis & Tommaso Venturini, CNRS

Introduction

The presented interview of case study leaders examines how French Open Science platforms — HAL, OpenEdition, and RechercheDataGouv— shape the impact of Open Science policies by analyzing connection logs. It focuses on user access patterns to explore the societal impact of Open Science beyond citation metrics.

Why was this particular study selected to support testing and operationalization of Open Science indicators?

Over the past decade, France has implemented ambitious Open Science (OS) policies underpinned by two National Plans for Open Science (2018–2021 and 2021–2024). These initiatives aim to foster "an ecosystem where scientific research becomes more cumulative, better supported by data, and more transparent, with faster and more universal access to results." The ultimate vision is to make "knowledge accessible to all, serving research, education, the economy, and society." 

As incentives for researchers to publish in Open Access generalized and public funding for OS expanded, two French platforms established in 2000—HAL (Hyper Articles en Ligne) and OpenEdition—have emerged as central pillars in France's open publication and Open Data ecosystem. HAL, in particular, has seen remarkable growth, with total deposits rising from 500,000 in 2010 to 3.5 million by 2024.  

More recently, the RechercheDataGouv Platform, funded as part of the National Plan for Open Science, was launched to promote data recognition as an independent research output. The platform aims to improve reproducibility, reduce redundancy, and encourage the open sharing of research data to increase citation rates. It intends to support researchers in managing their data effectively, enabling them to retrieve, reuse, and decide whether to preserve, share, or delete data, thereby hoping to minimize the digital footprint. 

Along with expanding OS practices and platforms and following the principle of evidence-based steering of public policies, there has been an increasing need for advanced monitoring tools to properly understand and demonstrate the effects fostered and expected by OS policies on science, society and the economy. Launched in 2018, the Open Science Barometer aims to address this need, but much remains to be done to align it with the indicators set by UNESCO´s working group on Open Science monitoring, to dive into the complexity of societal effects, especially on the long run, and not only to monitor but to counterfactually assess and evaluate the effects as consequences causally attributable to OS policies and OS outputs.  

Why do you think this is study is important for the broader Open Science context? 

Because Open Science Platforms act as a pivotal meeting point between, on the one hand, scientific offer and, on the other hand, scientific, economic and societal demand for science, they are crucial to study in order better to understand the effects of OS policies and practices. Access to Open Science through platforms is the first step for its reuse by a wide variety of heterogeneous players, whose societal, economic and scientific practices would not have been impacted and thus be made to impact the world around them in unprecedented ways if OS platforms had not been available in the first place.  

Our study primarily focuses on the preliminary step for impact: access. Access can be studied at the platform level by focusing on the connection logs. Our approach boils down to three core questions. Who are the leading private and public organizations using these platforms? When are these platforms used the most and why? Who are the websites referring to these platforms and how are they connected? 

We will tackle those questions by analysing the connection logs to HAL, OpenEdition and RechercheDataGouv and enriching them with data based on the IP address, referrer-based data and data from OpenAlex.

Figure 1 shows the first result of the first version of this automated classification script run on the list of all the identifiable organizations (~10%) which accessed OpenEdition during the first four days of October 2022 (96% of organizations (4267) domains divided into 22 categories, 5,07% (228) couldn’t be appropriately classified).

Figure 1: Preliminary results - distribution of identifiable organizations accessing OpenEdition, categorized by sector

Our goal is to analyze how patterns of OS usage (specifically, which categories of users access certain types of resources) vary based on their degree of openness. For example, we might find that articles published through diamond Open Access are accessed by a larger proportion of non-academic users compared to articles available only as preprints. This trend could be an indicator of the potential for greater societal impact of scientific work published in Open Access formats.

How will this study contribute to the main aims of the project?  

This case study seeks to understand the role of OS platforms as central nodes in the impact pathways of Open Science. Rather than focusing solely on citations—which primarily reflect the academic diffusion and impact of OS—we expand our analysis to include 1) referrers pointing to OS artifacts and, more importantly, 2) connection logs. This approach aims to develop replicable and scalable tools for assessing the societal impact of OS while laying the foundation for further qualitative analyses of how OS artifacts are used and reused by various actors in society.

Our methodology is closely aligned with PathOS' framework. It is structured around data sprints, fostering collaboration with platform technical teams and experts from the French Ministry of Higher Education. Following PathOS' iterative process, we integrate the causality model insights from meta-analyses on OS impact assessment and recommendations from the Open Science handbook At the same time and in a four-step iterative manner, we address technical challenges, explore the possibilities and constraints of log analysis, and incorporate the practical knowledge of platform technical teams alongside the policy needs reported by the ministry collaborators.

Who are the main actors involved and why are they important within the R&I ecosystem represented in this study?  

Based on a generalist conception of the R&I ecosystem, which still remains a widely discussed term[1], it is possible to distinguish between four main groups of stakeholders that are either indirectly challenged or directly involved in OS platforms. (1) Concurring science dissemination infrastructures, (2) academic science producers and consumers, (3) non-academic science producers and consumers, (4) Open Science professionals who manage the platforms.  Among those four categories, the most significant stakeholders are large commercial publishers; small commercial publishers; not-for-profit publishers; university libraries; universities, research communities and individual researchers at research performing organizations (RPOs); companies; patient groups; citizen sciences, non-governmental organizations (NGOs), activists and citizens; technical, administrative and support department of platforms at RPOs. It is important to note, however, that the extent and nature of these stakeholders' involvement and transformations in the R&I ecosystem remain under investigation. This inquiry includes both quantitative and econometric methods like those employed by PathOS and detailed ethnographic approaches to research ecosystems.

What kind of impact is expected to be generated by the results/outcomes of the study? 

By leading this case study we are already strengthening collaboration between Open Science platforms and the technical teams that run them. Connection logs are the common denominator of thousands of Open Science platforms, and working on a standardized log analysis method could also help achieve broader strategic goals.

For easy future upscaling, we are coding a Node.js middleware for the Ezpaarse toolkit, a set of COUNTER5 compatible digital tools for eletronic resources dedicated to the detection, collection, enrichment and dynamic visualization of usage data, which is already being used by hundreds of platforms. Our development script is available here and will be published on Ezpaarse’s own GitHub repository once it has been approved for production.

Because it relies on widely used standards and Open Source software, our method could be easily replicated and applied to greater sets of connection logs, both from for-pay and Open Access platforms, allowing for greater comparability and scale effects due to the volume of data and an even stronger contractual assessing method if for-pay and Open Access data can be used respectively as a test and control group.

The possible impacts will stem from our tools enabling everyone to easily visualize and intuitively grasp how strongly different degrees of openness influences the kind of actors who access science, possibly advocating for lasting and significant funding and support of OS policies at the highest political level, at times when science is expected to bridge its gap with society, address societal challenges and support social innovation.


[1] See for instance :

  • Altman, Micah et Philip N. Cohen (2022). “The Scholarly Knowledge Ecosystem: Challenges and Opportunities for the Field of Information,” Frontiers in Research Metrics and Analytics, vol. 6. https://doi.org/10.3389/frma.2021.751553
  • Kuehn, Evan F. (2022). “The information ecosystem concept in information literacy: A theoretical approach and definition,” JASIST, pp. 1–10. https://doi.org/10.1002/asi.24733
  • Lyle, Peter, Henrik Korsgaard et Susanne Bødker (2020). “What’s in an Ecology? A Review of Artifact, Communicative, Device and Information Ecologies,” NordiCHI ’20, October 25–29, 2020, Tallinn, Estonia. https://doi.org/10.1145/3419249.3420185
  • Mounier, Pierre et Simon Dumas Primbault (2023). “Sustaining Knowledge and Governing its Infrastructure in the Digital Age: An Integrated View”, Zenodo, 2023. https://doi.org/10.5281/zenodo.10036402
  • Star, Susan Leigh (ed.) (1995). Ecologies of Knowledge: Work and Politics in Science and Technology. New York, State University of New York Press.

 


Read more …From Access to Impact: Exploring the Role of French Open Science Platforms in Measuring Societal...

New Preprint - Introduction to causality in science studies

New Preprint: Introduction to causality in science studies

Sound causal inference is crucial for advancing the study of science. Incorrectly interpreting predictive effects as causal might be ineffective or even detrimental. Many publications in science studies lack appropriate methods to substantiate their causal claims. In this preprint we provide an introduction to structural causal models. Such models allow researchers to make their causal assumptions transparent and provide a foundation for causal inference. We illustrate how to use structural causal models based on simulated data of a hypothetical structural causal model of Open Science. We hope our introduction helps researchers in science studies to consider causality explicitly.


The PathOS context

Concerns of causality are centre stage for the PathOS project. Without a proper understanding of causality, it is impossible to provide proper policy recommendations. For example, imagine we observe that published research using open data is less reproducible. Even if open data does in fact have a positive effect on reproducibility, this negative association might appear if journals select research based on open data and rigour. That is, journals may be more likely to publish research if it has open data, but also if it is more rigorous. If published research has no open data, it therefore tends to be more rigorous, otherwise it would not be published at all. Research that is more rigorous tends to be more reproducible, and this effect might be stronger than the effect of open data. For this reason, the association between open data and reproducibility might be negative, even if the actual causal effect is positive. If we incorrectly interpret the negative association as causal, and then recommend not to incentivise open data, we would be providing ill advice.


Read the preprint here

What's next?

Having a common understanding of causality and structural causal modelling helps the PathOS project interpret the existing literature and the identification of impact pathways. This requires us to differentiate the impact of open science from the effect of openness on that impact. That is: how does the fact that something is open—be it publication, data, code, review—have a causal effect on its impact? The introduction to causality provides such a common understanding. This will be especially important as PathOS builds upon the knowledge gained through our evidence scoping and intervention logic definition to further map and validate Open Science impact pathways and their verification methods (work on which is well underway – watch this space!).

Read more …New Preprint - Introduction to causality in science studies

Unveiling Open Science Impacts with Cost-Benefit Analysis

Written by Jessica Catalano, CSIL

Understanding the impacts of Open Science (OS) and the extent to which they materialise requires a solid methodological framework, which is not yet fully established. Within PathOS we are currently working on developing a model that can identify these impacts and the different paths through which they occur in academia, economy and society.

As part of this model, project partner,  CSIL is developing a Cost Benefit Analysis (CBA) framework tailored for Open Science. This framework is designed to methodically and thoroughly quantify the impacts of Open Science. It does so by considering not only the benefits but also the costs, and crucially, it involves a comparison with a hypothetical scenario where Open Science is not implemented. While the overall model will describe the entire causal pathways associated to OS practices, the CBA framework will specifically focus on those impacts directly attributable to the OS under assessment and will allow the quantification – in monetary terms - of the net effect of a changed scenario.

Case studies Scoping Exercise 

The PathOS CBA framework for OS aspires to become a benchmark tool for assessing the costs and benefits of different OS practices. In November 2022, a wider discussion with PathOS case study leaders was initiated to select specific Open Science practices for testing the CBA framework.  Different criteria were considered for the selection, including the clear identification of the Open Science practice and its scope, the willingness of responsible institutions to be involved in data collection and interviews, and the availability of data needed for the analysis of costs and benefits from a CBA perspective. As a result, OpenEdition, CNRS’s electronic publishing infrastructure for scientific communication in the humanities and social sciences, and UniProt, an ELIXIR resource that serves as a central hub for collecting information on proteins, were selected for testing the framework. The selection of additional practices are being considered for this testing phase, including the Repository infrastructure for Open Access (RCAAP) managed by the University of Minho.

What's next?

To tackle the intricate task of performing a Cost Benefit Analysis (CBA) for selected Open Science (OS) practices, we anticipate a set of intriguing challenges. These include charting a course through an OS-absent scenario (the counterfactual), delving into the collection of data on both the costs and benefits linked to OS. In a bid to navigate these complexities, we plan to engage in regular discussions with leaders of each case study, focusing on the accessibility of data while addressing issues sensitive information, such as cost details. Our approach extends to in-depth interviews with key stakeholders and comprehensive surveys designed to unravel the tangible benefits of OS, like reductions in access costs and time savings. We are setting our sights on wrapping up this extensive data gathering by mid-2024, paving the way for a thorough validation phase involving a series of critical focus groups with stakeholders. This phase is key to cementing the reliability and accuracy of our findings.

The culmination of this exciting journey will see us presenting the final results at the start of 2025. 

Reflecting on the 1st PathOS General Assembly

In early November, the PathOS project held its inaugural General Assembly and International Advisory Board meetings followed by an intensive technical session in Athens, Greece, in a hybrid format. Hosted by project coordinator Athena RC, this was a key moment to reflect on progress and strategize for challenges ahead.

The interaction with the PathOS International Advisory Board was invaluable, offering diverse perspectives that significantly shaped our strategic direction, particularly in enhancing the sustainability and impact of our outcomes.

This year's significant achievements include completing an extensive scoping review, with three ensuing preprints due in early 2024 set to enrich our knowledge base. We also plan to make a Zotero library publicly accessible.

The development of the Open Science Intervention Logic marks a key preparatory step for the anticipated Open Science Impact Pathways model in Spring 2024. This model, crucial to PathOS and validated through our case studies, is designed to cohesively link various aspects of the project.

This year also saw the release of the Open Science Indicator Handbook, a comprehensive guide to Open Science and Reproducibility indicators and open for community feedback via GitHub. As the project evolves, we will continue enhancing this handbook, refining and enriching it with Open Science impact indicators.

Developing a Cost-Benefit Analysis framework for Open Science is also expected to be a key project contribution. This year, we laid the framework's foundation and selected Uniprot (ELIXIR source) and OpenEdition as initial case studies.

On the event's second day, the spotlight was on the PathOS case studies. Discussions centered on causality inference, impact pathways logic, and identifying key impact indicators. This session, one of many in our series of project-wide technical meetings, played a crucial role in enhancing and addressing the project's technical dimensions.

As we continue on this journey, we invite you to stay connected. More updates, insights, and breakthroughs from PathOS are on the horizon.

Check out the PathOS Zenodo community for all published reports and presentations.

Follow us on X and LinkedIn to stay up-to-date with all our activities and developments.