Data, blood and malaria


Dr Abdisalan Mohammed Noor tackles the subject of malaria surrounded by computers and with no microscopes in sight. A recent groundbreaking paper published in a highly regarded scientific journal, The Lancet, demonstrates how he and his colleagues have conducted cutting edge research based on painstaking curation of data on parasitological studies across Africa.

Unequal studies, equal data

In general, when epidemiologists want to track how a disease spreads and who dies from it, they use mortality data backed by hard laboratory fact to establish that link. It is hard to track what is happening with malaria throughout much of Africa because of two reasons. Firstly, health information systems are poor and reporting is not exhaustive. Secondly, it is difficult to confirm which deaths are directly attributable to malaria.

The new approach Noor and his colleagues took was to move away from the restrictive confines of how to establish malarial deaths. Instead, they use lab-confirmed data from numerous field surveys from all over Africa. This morbidity-based approach can determine who has malaria parasites in their blood. This isn’t necessary who is ill from malaria, but it demonstrates, at the very least, the pressure the community faces from malaria transmission.

The team set out to compile a database of all such community-based malaria prevalence studies from published data sources. This means that the team of researchers has carefully read through every published source they could find on malaria, and extracted the data points they could from each paper. For each place that a survey was conducted, they recorded the year of the survey, the number of people involved, broken down by age group, and the proportion infected. Then each location was carefully geo-coded, so that each had a precise longitude and latitude.

There were many bits of information that these published papers did not include. The researchers had to track down the scientists involved and attempt to get additional information. Sometimes, it meant getting the original data set and generating the standardised calculations and metrics needed themselves. Not surprisingly, this was an arduous and thankless task. From this first wave of effort, the team compiled about 8,000 data points in 2009.

The data set seemed large, but it had numerous holes in it – a surfeit of studies in one geographic area, or time period, or age range – meant there were entire regions, nay countries even, which were totally unrepresented in the database. Their task, and ambition, grew. There were also many unpublished sources of data, hidden away in national malaria programme offices, colonial archives, international organisations and universities. They were about to embark upon tackling ‘grey’ literature, to track down every unpublished geographically specific study they could for communities across Africa. The next phase began.

They despatched teams to forgotten archives all over Africa, and further afield. The tools of the trade were: laptops, handheld scanners, cameras, data sticks and any friendly photocopiers they found en route. These unpublished sources were hidden away in some unlikely places. They visited the World Health Organisation’s archives in Congo Brazzaville as well as lusophone datasets secreted away in Lisbon. It took eight years to compile the database used for this groundbreaking research.

Eventually they got to just over 28 thousand data points. All was still not well, as not all data points were equally useful. Scientific rigour demanded more. There were studies that had to be excluded if:

1. they could not be placed on a map and geographically identified
2. they were from too wide a geographic areas, which meant the estimate wasn’t precise enough
3. they had less than 15 people, as they could not then be a reliable ‘reading’ for the community they were associated with.

Now down to just over 21 thousand data points, there was a new goal ahead – finalising the tools required to enable them to model the ‘malaria pressure’, at a 1x1km pixel level for the whole of Africa. To repeat, slightly differently to emphasise the magnitude of the computational task: Dr Noor and his team intended to model an estimate for malaria prevalence for each 1 sqkm block of Africa for a particular year.

To Markov or to Laplace?

The next challenge was using appropriate computational methods. “The classic age for peak infection rates is 2-9 years,” says Dr Noor. This means that in any community, those most at risk of dying from malaria are children who have lost the protection they grained from their mothers, but have not yet been exposed enough to malaria to acquire their own immunity.

Methodologically, they needed an “age-correction algorithm” so that whatever they modelled would generate estimates for this age group. Together with collaborators, they found a way of fine-tuning a method for doing this, using additional information from a multitude of other clinical and field studies. This meant for example that knowledge about how malaria immunity is acquired was incorporated, and they repeatedly hypothesised, tested and verified to develop an appropriate algorithm to do this task.

They used iterative Bayesian models to generate estimates that would build on the outputs of the previous iterations. Each data point would generate thousands of iterations. “We used a Markov chain initially,” says Dr Noor, “but this is intensive computationally”.

Cloud computing makes it possible for anyone, anywhere pretty much, to make use of as much computing power as they need. To a point. Depending on the size of the files you need to upload. Depending on the skill you have at dividing up the tasks for parallel processing. Eventually the team realised it was taking too long to do their modeling. They needed computational improvements.

The best team for this was developed by researchers in Norway, using “integrated nested laPlace approximation techniques” instead. They sent a modeller to learn the methods, and then customised them for their purposes. They purchased additional cloud space, and bought some high-end computing facilities to use back in Nairobi. Eventually, they arrived at a set of computationally efficient techniques for running their models.

Next, the seemingly simple matter of running through the computations. Each country in sub-Saharan Africa had to be modelled twice, once for 2000 and again for 2010. Some small countries like Rwanda had thousands of data points; other big countries like the Congo had many fewer data points. It took 7-10 days per country to run each model, meaning that it required 6-8 months continuous computational processing. With parts of the modeling running concurrently, the researchers could reduced this time, but still had to allow for re-runs when the “model mesh did not fit properly” for example.

Afri-specific challenges

Although their offices are now in a more conventionally high-tech glass-style building, the Malaria Research group did most of this work in rather modest buildings. In immediate post-war austerity style, low ceilinged long bungalow attachment housed the computing power they used to connect to the outside world. Boiling in the Nairobi dry seasons, and rather chilly in the wet, the team started to test the models it needed to run for the study.

There were the practical difficulties of doing such research in Africa. The most significant of these was the human resource capacity. These are advanced, cutting-edge techniques. Most of the published papers in this area are from institutions outside Africa and other emerging economies. Nothing at this scale had ever been attempted before in Africa.

In terms of breaking new ground, the team has broken through previous limitations of generating maps of malaria prevalence, which merged countries into regions in the interest of computational efficiency. The results in these models were not precise; an example is that if Southern Africa is combined into a single “computational tile”, it would assume Swaziland (with less than a hundred malaria cases per year and therefore close to elimination) was equivalent to Mozambique (with endemic areas and therefore much more malaria). Dr Noor’s study is the first time these methods have been applied country by country, taking into account malaria control measures introduced within country borders.

Less guess-work with policy

This kind of research makes it possible to make policy less ad-hoc. It enables those who wish to listen to think about how they deliver malaria-interventions, which ones are likely to work best, and how to maximise the impact of their budgets. Some health ministries have enthusiastically embraced the findings for their countries and begun to use them to change their management of malaria.

The comparative maps generated show one important fact in a simple format: how much malaria prevalence has changed in the decade between the year 2000 and 2010. It provides a visual summary of the biggest impacts across the continent after a decade of intense global effort and resources. But the maps do much more than that. They are powerful tools for the future, and can help countries and governments determine where they need to focus their effort. Doing this effectively means paying attention to historical malaria pressure when determining the resources that need to be put into place.

For example, some of the countries in the Sahel are using “seasonal malaria chemo-prevention” methods, a way of pre-emptively treating populations just before the peak malaria season in order to reduce infections. The maps can be used to compute estimates of the total population that needs to be targeted and therefore the budget that would be required to do this well.

A different kind of husbandry

What he and his colleagues have achieved is rather a big deal. Rather precisely, Dr Noor says, “It is the first pixel level product of malaria infection prevalence using integrated Laplace approximation for inference at this scale.” In other more everyday words, never before in the history of humankind have a bunch of data hungry malaria scientist so doggedly pursued a goal, in order to create a detailed map in 5x5km chunks of Africa, supported by appropriate technological backup and scientific rigour, estimates of malaria risk in order to help national malaria programmes spend their money wisely.

Dr Noor grew up in Wajir, a semi-arid area of Northern Kenya with low-intensity seasonal malaria. As a boy, he looked after camels during his school holidays. He says, “Actually they start you off with goats, and then you move to cattle.” He eventually proved his skills enough to graduate to camel herding. In addition, he learnt how to spot geographic formations to guide his way home, saying “This is normal. You use trees and other landmarks to determine where you’ve come from and where you are going.”

He learnt about science in a classroom with gaps left in the walls for windows. He worked as an unpaid intern at ILRI, the best place to learn about geospatial modelling in Kenya at the time, saying to himself that “Having no salary was no reason not to do it”.

It has taken him years to hone his skills in another kind of husbandry, learning the geography of spatial epidemiology, applying computer-based techniques to make sense of the geo-spatial task. To extend the analogy way past its creaking point, Noor has used his knowledge of a new geographic landscape and succeeded in bringing the humps of data home.