News and insights

Applications of Data Science in Evaluation

15 November 2023

In evaluation, there are a lot of opportunities as well as indeed some risks associated with the use of new data and methods for measuring programme impact and building a robust evaluative judgment.

Sections

Summary
References

A decade ago, the United Nations declared that a data revolution was underway, enabling governments to make evidence-based policy decisions (UN 2013). More recently, the World Bank echoed this sentiment, stating that new data sources and analytical methods are improving our ability to conduct robust evaluations (World Bank 2020). At Verian, we fully support the potential of big data and innovative methods to enhance evaluation work within the public sphere (including the development of our own AI tools), in the UK and beyond.

Last week, Verian led a session on this topic at the Research Methods e-Festival organised by the National Centre for Research Methods.

Big Data's three Vs

The adoption of data science techniques on both new and existing large data sources can increase the efficiency, quality, and breadth of evaluative methods and analysis. However, managing and analysing large datasets can be challenging due to three key elements, known as the Big Data’s three Vs:

A large Volume of data in many different environments;
a wide Variety of data types stored in data systems; and
data generated, collected and processed at speed (Velocity).

By applying data science methods to this type of data, we can address new research and evaluation questions or answer existing questions in new ways.

This includes using text analytics and natural language processing for the classification of information and for evaluative synthesis as well as identifying data patterns, trends, and predications that may not be immediately apparent.

Numerous data science techniques are already being applied to evaluation, with many more holding the potential for future use. Geospatial analytics, segmentation, and social and traditional media scraping are just a few examples of what we are currently doing and exploring at Verian:

Geospatial analytics and geospatial data involve large sets of spatial data from different sources, including satellite imagery, Census and survey microdata, cell phone data and CAD images of buildings and other structures that can provide geographic and architectural data. Combining these data sources can help build composite indexes that are mapped to specific locations, in urban or rural areas. For instance, in Nigeria, we helped develop a vulnerability index to guide the Global Community Engagement and Resilience Fund (GCERF) on where to target community development grants. The composite index used survey data, social media content, satellite imagery, and GIS data to map out vulnerability across the country. From an evaluation perspective, the same type of map could be used to determine whether, where, and how vulnerabilities change over time in areas and communities targeted by the grant.
Segmentation is the process of taking a dataset and breaking it down into smaller pieces, or “segments”, where all the cases within a segment are similar to one another. Data science techniques such as machine learning can help identify potential subgroups of interest that would be difficult to identify by relying on theory or manual data analysis. As part of our Safer Streets Fund (SSF) evaluation for the Home Office in the UK, we used segmentation to allocate SSF locations and related interventions aimed at reducing crime to different groups. We then identified areas not covered by SSF that could serve as an acceptable comparison for all the locations in each group. This enhanced both the robustness and efficiency of our evaluation approach.
Social and traditional media scraping entails extracting data from the web (e.g. social media platforms and news websites), cleaning it, analysing it, and then often using a visualisation tool to present the emerging insights. Natural Language Processing (NLP) techniques can be employed to understand the tone (e.g. positive, negative, or neutral) behind the text and language of the extracted data. We are exploring the possibility of using this new data and methods alongside our extensive survey data collection (e.g. Address-Based Online Surveying – ABOS) and panel research capabilities (e.g. Public Voice) to gather the views and opinions of the general population, or of specific audiences, concerning the themes, policies, and programmes that we are evaluating. While being very mindful of privacy and ethical considerations, including issues around consent, anonymity, and bias, we do believe that data from the web has the potential to become one of the sources of evaluation evidence.

The application of data science in evaluation is thus an opportunity to enhance decision-making and generate valuable insights for stakeholders at national and local levels.

The examples mentioned above represent only a fraction of the data science and Artificial Intelligence (AI) tools that can be integrated into the evaluation toolkit. However, it is also crucial to acknowledge and address the associated risks and ethical considerations, as the analysis and interpretation of evaluation data can have significant implications for individuals and communities. A growing literature is focusing on ethical considerations in data science, emphasising the need for transparency, privacy protection, and informed consent when collecting and analysing new (and existing) data (Bormida, 2021).

Returning to the examples discussed above, we saw how geospatial analysis and data can be powerful evaluation tools, but can also raise ethical concerns as they may inadvertently reveal sensitive information about individuals or communities, leading to privacy breaches. Segmentation also comes with ethical considerations as the definition of distinct categories and groups may perpetuate stereotypes and reinforce biases. Finally, while social and traditional media web scraping offer opportunities for understanding public sentiment and behaviour, safeguards must be in place to protect privacy and ensure responsible data handling (Zimmer, 2010).

In conclusion, to fully harness the potential of data science in evaluation while mitigating risks, it is essential to adopt ethical frameworks and guidelines.

At Verian, we always prioritise rigour, privacy, consent, and fairness throughout our data science and evaluation cycles. This is because we want to responsibly take advantage of the full potential for data science to contribute to more informed decision-making through robust evaluation evidence.

References

Bormida, M.D. (2021). The Big Data World: Benefits, Threats and Ethical Challenges, Iphofen, R. and O'Mathúna, D. (Ed.) Ethical Issues in Covert, Security and Surveillance Research (Advances in Research Ethics and Integrity, Vol. 8), Emerald Publishing Limited, Bingley, pp. 71-91.

Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures and their consequences. SAGE Publications Ltd.

UN (2013). A New Global Partnership: Eradicate Poverty and Transform Economies through Sustainable Development. United Nations Publications, 300 E 42nd Street, New York, NY 10017.

World Bank (2020). Rewiring Evaluation Approaches at the Intersection of Data Science and Evaluation (Rewiring Evaluation Approaches at the Intersection of Data Science and Evaluation | Independent Evaluation Group (worldbankgroup.org))

Zimmer, M. (2010). But the data is already public: On the ethics of research in Facebook. Ethics and Information Technology, Volume 12, Issue 4, pp 313–325.