The REF 2021 Impact Data Analysis was a small project, between King's Digital Lab (KDL) and colleagues in the King's College London (KCL) Research Management & Innovation Directorate (RMID), to support the analysis of the college's REF 2021 impact case studies and environment statements.
The data used during the development of this project includes 153 impact case studies and environment statements, in PDF (5-10 pages of text each), which follow standard templates but are expressed with heterogeneous descriptions and language.
The project applied several machine learning and natural language processing techniques to extract structured insights from the unstructured text: zero-shot topic classification using a BART-based transformer model to categorise documents against REF impact areas, fields of research, and pathways to impact; named entity extraction using spaCy to identify organisations, places, and products as partners and beneficiaries; abstractive text summarisation; and semantic search over document embeddings. Extracted place entities were geocoded using OpenStreetMap to enable geographic analysis.
The results were presented through an interactive dashboard built with Streamlit and Plotly, enabling research impact leads to explore the data and address questions such as:
- What are the main types of impact KCL has delivered? Which pathways have been used to deliver those impacts?
- Who are our key partners and beneficiaries of our impacts?
- Where are they - local (London), national or global?
- Is there a correlation between discipline and types of impact or pathways to impact used?
- What are the areas identified as strengths, areas for development and future plans?
The work was presented at RSECon 2022.
Team
- Arianna Ciula KDL Research Software Analyst, King's Digital Lab
- Geoffroy Noël KDL Research Software Engineer, King's Digital Lab
- Miguel Vieira KDL Research Software Engineer, King's Digital Lab
- Tiffany Ong KDL Research Software Designer