Skip to main content

From Open Data to Industry Impact: A Journey of the BY-COVID Project 

Within PathOS we are collecting stories on how Open Science (Open Access to publications, Open/FAIR data and software, collaborations with citizens) has made a positive or negative impact. Our ultimate aim is to highlight stories of Open Science practices and how these are linked to impactful outcomes. In this way, we hope to foster a learning experience and to inspire others to follow. Join us and read the first Open Science stories!

Could you briefly introduce yourself and what your Open Science story is about, including its time (e.g. year range) and location? 

I am Despoina Sousoni, the Programme Manager for Impact, Innovation, and Industry at the ELIXIR Hub, the European Research Infrastructure for Bioinformatics. ELIXIR is a distributed digital infrastructure that unites bioinformaticians from 21 countries to manage data, compute, tool, and training resources across various life sciences domains, operating under Open Science principles. Most of these resources are completely open and free, often not even requiring registration. This openness, while beneficial, poses challenges when assessing the impact of these resources. Over the years, we have developed several methods to demonstrate the impact of our resources to funders, ensuring they are well-sustained through community efforts and remain open and free. As part of this journey, we did some work during the BY-COVID project (funded by EOSC, 2021 to 2024), where we demonstrated the value of open infectious disease data for industry innovation related to COVID-19. 

"Quantifying the value of Open Data in innovation is difficult, as academic research often takes many years before it reaches its full potential or becomes an invention and is produced at an industrial scale"

What was the context or background in which this Open Science practice was used? What were the goals or expected outcomes? 

The BY-COVID project was launched in autumn 2021 as part of the European Commission’s HERA incubator plan, 'Anticipating together the threat of COVID-19 variants.' The aim was to consolidate solutions, often rapidly assembled during the COVID-19 pandemic, to support the ongoing response to COVID-19 and prepare for future infectious disease outbreaks. The project aimed to make COVID-19 data easily accessible not only to scientists in laboratories but also to medical staff in hospitals, government officials, and anyone else who could benefit from it. 

One of the key tasks was industry engagement, which aimed to explore the usage and value of COVID-19 data and affiliated resources by industry. This task also aimed to demonstrate the importance of Open Data and research infrastructures during and beyond the COVID-19 pandemic for developing vaccines, medicines, and other industrial products. 

The outcome of this task was a deliverable report entitled "Industry value of Infectious disease data." This report included a desktop research analysis of patents and publications that mentioned the COVID-19 data portal or at least one of its integrated biodata resources. The analysis focused on identifying industrial affiliations and analysing these companies and the inventions. The report also included statements made by industrial representatives in interviews, highlighting the integration of open biodata resources in their R&D work and their operational flexibility during the pandemic. 

What was your role or relationship to this Open Science practice? Who were the key actors involved? 

I had a leading role in the industry-related task of the BY-COVID project, with the ELIXIR Hub managing the overall project. For the industry-related task, I collaborated closely with colleagues from Uppsala University, particularly during the initial stages of industry engagement activities, when we approached companies to extract use cases. After several unsuccessful efforts, I decided to transition to desktop research, where I brought in the expertise I had built over the years from my previous Open Science work in the European Commission and UNESCO, as well as the more recent knowledge gained from the PathOS project and ELIXIR's work on the bioinformatics case study "Innovation from Open Research Resources", through the PathOS Handbook of Indicators and the Cost-Benefit Analysis conducted by CSIL. Based on this transition, we managed to create a compelling narrative for the BY-COVID partners, demonstrating the impact of Open Data on innovation during a health crisis. 

How was this Open Science practice implemented, to your knowledge?  

The transition to desktop research in this task was crucial due to the difficulties in engaging with industry representatives. Based on my experience working in Open Science over the last five years, I can easily say that the hardest Open Science topics are impact assessment and open innovation. Therefore, when I first joined ELIXIR at the end of the COVID-19 pandemic (2022) and started being involved in the BY-COVID project, I very soon saw the opportunity of building a complete story regarding the applications of Open Data in innovation in the time of a crisis, like the COVID-19 pandemic. And this is what we did in this report. 

Were there any quantifiable outcomes or measurable successes linked to this practice? What metrics or indicators were used to evaluate these outcomes, if any? 

This study includes both quantitative and qualitative information to demonstrate the impact of COVID-19 open biodata resources on the operations of the private sector and COVID-19 related innovation. 

The quantitative information focuses on patent and publication analysis, along with further analysis of the affiliated industries. Over 1,000 patent mentions reference at least one of the COVID-19 Data Portal resources (5% of the total found in this search). It is worth mentioning that this number represents only the mentions of the resources' names, not the data included in the resource. 30% of these patents are affiliated with for-profit companies, with the majority of these companies being in the SME size and covering the pharma and biotech sectors. Additionally, we examined the number of citations of these patents, identifying the most impactful inventions, and analysed how many patents the identified companies had in this search (50% of the companies found had more than one patent in this search). These findings highlight the importance and successful integration of open resources in the industry sector. 

The COVID-19 Data Portal was mentioned in scientific articles, with 25 for-profit companies cited. The most cited article referred to the portal as a "great example of international collaboration for building infrastructure for a global approach." 

The qualitative information in this study includes interviews conducted during and beyond this project, as we know that innovation is not always documented in scientific publications or patent filings, and it depends on the company's mandates or operating procedures. Based on the work conducted by Lauer K.B. as part of her thesis (2022), interviewees agreed that Open Data resources, free and without restrictions, are crucial for enabling scientific discovery and benefiting society through job creation, tax contributions, and lifesaving medicines. Interviews were also conducted later in this project, aiming to understand the business operations of companies and research infrastructures that work with industry during and after the COVID-19 pandemic. All agreed that standardised data collection and sharing procedures are crucial for a rapid pandemic response, along with efforts to break the silos and build collaborative approaches in research. 

What impacts, both expected and unexpected, did this practice have? Were there any surprising developments or results? 

The immediate impact of this study has been the demonstration to funders of the socio-economic benefits of the COVID-19 Data Portal and its open resources during the pandemic. This can potentially be translated to more open research infrastructures that play a crucial role in boosting innovation in academia and industry, creating a social mandate to sustain them as open and free-of-charge resources. 

In addition to the socio-economic impact, we also observed the impact on better pandemic preparedness for the future. Some interviewees mentioned the need for standards in data collection and sharing, along with the establishment of flexible guidelines for emergency procedures. These areas are now a focus in upcoming EC-funded projects. 

What challenges were associated with this practice, from your perspective? What lessons can be drawn from its implementation? 

Despite the success and the great outcomes of the BY-COVID project, including the COVID-19 Data Portal, an infectious disease toolkit, and more (see success stories), and the continuous impact of these outcomes in pandemic preparedness (Pathogen Portal, EVORA project), the journey of the industry engagement task in the BY-COVID project was not easy and straightforward. Our initial efforts were focused around surveys and engaging through events, though the issues we identified were: 

  • Industry representatives often do not see the return on investing their time in sharing experiences, or they may not have a full story to share. 
  • COVID-19 research was no longer a high-level priority topic for companies, after early 2022. 
  • There is not a defined methodology to assess the impact of open digital data resources in innovation. 

Therefore, we managed to extract some stories from industrial representatives regarding the usage and benefits of Open Data in their COVID-19 related work, and the combination of quantitative and qualitative evidence was the best way to demonstrate the high integration of open biodata resources in the R&D sector of companies. 

A useful insight that I have kept from this work is that not all companies are willing to mention the usage of open resources in their openly available methodological description, as it might cause replication and procedural questioning. This was an important point to understand the limitations of the information when collecting desktop data and when engaging with companies throughout the project. Therefore, my lessons learned from this work is to ensure a clear communication of the underestimation of the collected numbers due to the limitations of the used methodologies, and the need for positive referencing of the companies that mention the usage of open resources in their methodology. 

How do you perceive this practice's influence on the wider scientific community or society? Has it affected your own views or approaches to research? 

When a digital resource is completely open, it typically does not require user identification. However, this makes it challenging to track who is accessing the data and how it is being used. Additionally, quantifying the value of Open Data in innovation is difficult, as academic research often takes many years before it reaches its full potential or becomes an invention and is produced at an industrial scale. These aspects of Open Science present challenges in assessing the return on value for publicly funded infrastructures and create a continuous race to find impact stories and supplementary data to demonstrate their usage in products and services. This study is an attempt to show how we could start building some good practices and stories tackling these challenges. 

In addition, this work demonstrates the importance of collaborative efforts across domains to build a common infrastructure that benefits scientists in academia and industry, as well as medical staff in hospitals, government officials, and citizens. These topics are very hard to measure the impact of, but they are essential for the success of Open Science. 

Based on your experience or observation, would you recommend this Open Science practice to others? Why or why not? 

Definitely. It is essential to establish mechanisms to continuously monitor the usage of Open Science resources in innovation, and this study highlights that. Implementing good practices is the only way to keep up with the value of Open Data in innovation and address new challenges collectively. This study demonstrates the impact of open resources on company development and in the creation of products and services with high social value. 

Additionally, it is important to ensure positive visibility for companies that are willing to acknowledge the usage of open resources. Better understanding the industrial contributions to research and society can further encourage the adoption of Open Science practices in the private sector. 

Stay tuned with PathOS updates

Sign up for our newsletter!

Follow us on LinkedIn & Twitter!

Do you want to share your own OS story? Join us and share your story here.