Natural history museums commonly store and host research specimens and digital data collections that are incredibly important to advancing our understanding of biodiversity. While many data resources focus on preserving specimen information, some resources record species interactions (e.g., a parasite species infects a specific host species). One of the largest host-parasite databases in existence focuses solely on helminth parasites (parasitic worms). While a great resource, it lacks true location data, stopped receiving new interactions in 2003, and some taxonomic data on host and parasite species is incorrect.
The researchers will work to clean, curate, augment, and georeference these data for the benefit of researchers wishing to understand parasite specificity, public health researchers thinking about potential spillover of helminths into humans or livestock, and biogeographers wishing to understand the distribution of biodiversity.
Helminth parasites are an incredibly diverse group in terms of life history, transmission mode, and host range. The London Natural History Museum host-helminth database currently details over 250,000 interactions between helminths and their hosts, georeferenced to geopolitical boundary. However, this resource is not actively maintained. The researchers will take over the curation, maintenance, augmentation, and distribution of this important data resource. The overall goal of this research is to georeference occurrence points at finer resolution (ideally latitude and longitude plus associated uncertainty), augment the existing data source (no new records have been added since 2003), and provide a consistent taxonomic backbone to the data. This research will include training at least two graduate researchers and a team of undergraduate researchers in georeferencing and data curation best practices, codeFest activities, teaching programming to middle and high school students through local collaborations, and module development in upper-level undergraduate courses. Further, the curated data will be available through a developed web portal, programmatically through a constructed application programming interface (API), and through the R package ‘helminthR’.