Research — HeMI : IDI
Demonstration Project 1:
Advancing knowledge and reasoning in Artificial Intelligence to predict and understand zoonotic transmission and spillover


Figure 1. Conceptual overview of demonstration project.

Zoonotic diseases are impacted by multiple interconnected systems involving animals, humans, and the environment. The viral evolution and basis of transmission are often examined in isolation, without considering interactions between host behavior and ecology. This project aims to use artificial intelligence to enhance predictive intelligence by learning patterns from data to advance system reasoning (what will happen next and why?).

This demonstration project will utilize existing knowledge about disease-specific systems and system relations to create a comprehensive framework of knowledge about Highly Pathegenic Avian Influenza (HPAI) across both human and non-human animals at a global scale. Additionally, this project expects to create and test an automated reasoning engine to fill in gaps in informational concerning HPAI.

HPAI Knowledge Graphs

The growing occurrence of zoonotic disease outbreaks underscores the interconnectedness of animal and human networks.

Researchers have developed numerous mechanistic, statistical, and machine learning models to capture the multifaceted and dynamic nature of zoonotic disease outbreaks systems from multiple disciplinary perspectives. However, bridging data and models across disciplines presents a significant challenge, in part due to the lack of interoperability between datasets.

Knowledge graphs offer a powerful solution for reconciliation, integration, and synthesis of heterogeneous data. We demonstrate the utility of a knowledge graph to integrate laboratory, field, and historical data about highly pathogenic avian influenza (HPAI) and generate new hypotheses upon which future disease mitigation will depend.

Traditional data systems are not interoperable across sources (Fig 2). To make related heterogeneous datasets interoperable, we first define an ontology, represented as a graph of element types and relations among types (Fig 3). Source datasets can then be ingested into a knowledge graph with standardized elements and relations.

Figure 2. Traditional data systems are not interoperable across sources. The figure shows sample records from three source datasets. Each row shows the same species (Passer domesticus). Because the record id’s are not standardized and the structure of the datasets are not the same, the datasets cannot easily be joined.

Figure 3. A zoonotic disease ontology aligns data across domains based on shared elements for integration in a knowledge graph. Ontologies are defined in terms of data types (nodes) and relations between data types (edges). The graph represents the zoonotic disease ontology we have developed for our knowledge graph. Blue nodes represents elements of that have already been ingested from source datasets; gray nodes represent elements from other datasets that have yet to be ingested.

Machine Learning Applied to Knowledge Graphs

The goal of our project is to develop a knowledge and reasoning engine for HPAI knowledge graphs. To do this, we use a class of machine learning models called Knowledge Graph Embeddings (KGE). Such models are able to learn the structure of a graph and predict links. These models are interpretable because they capture semantic information about entities and relations; this helps provide explanations for model predictions.

The technique has been proven for drug discovery and social network recommendations; we are applying KGEs to HPAI knowledge graphs for the first time.

Supplemental Information

Poster: Robertson, H., E. Graeden, A. Castellanos, D. Rosado, J.M. Drake, B. Han. “Knowledge Graphs for Scalable Data Integration: A Case Study of Highly Pathogenic Avian Influenza (HPAI).” MIDAS Network Annual Meeting. October 29-31, 2023. (pdf)