It is easy to get excited about all the new information we now have about the world’s development projects. Maps and tables, charts and graphs flood our inboxes with ‘big data.’ Most recently, AidData published a huge dataset on Tracking Chinese Aid to Africa. All the hype has caused some backlash, and rightfully so. Big data is still data and requires the same careful handling as any other dataset. This is not meant to dull enthusiasm or lessen the use of data. This is a precaution against the misuse and overgeneralization of big data. One size does not fit all, and overgeneralizations from large or small datasets can be dangerous. Here are Big Data’s 4 downsides found by practitioners and academics.
1) Big data is not a panacea. One size does not fit all. The dynamic nature of development projects means that many are time-place specific. While sweeping data collection projects can lead to better practices at high-level institutions, implementing policies based on improperly generalized data is bad policy and poor use of data.
2) Difficulties in filtering relevant information. Data from developing countries regarding health systems, political upheaval, natural disasters, etc. are most often reported by vulnerable people experiencing the event first hand. The sourcing of the data is often social media. Aside from possible problems with the validity of the data, the sheer amount of potential data is enormous. Key word searches across selected media yield thousands of data points which have to be carefully reviewed to filter for relevancy. The computer programs are simply not nuanced enough to pull out the differences between hate speech, for example, and slang (as shown in a study on mapping hate speech in twitter recently). Additionally, a parallel problem is availability of reliable and secure statistical processing. Unlike data processing for pharmaceutical companies, aid data processing is not backed by billions of dollars in profit.
3) Data exhaustion on the ground. By the time social scientists are through cleaning, manipulating, and making sense of the data, the situation on the ground has often changed. This is called “data exhaustion.” The big data collectors (UN, World Bank, USAID, AidData) are constantly playing catch up. This means that the people on the ground are not able to use the most up-to-date information. The use of social media has mitigated the delay; however, data extraction and implementation of policies based on data is a top-down approach that may not accord with the culture of the project or practical feasibility. For example, the best way to empower women according to big data analyses might be to get women into the work place allowing them independent incomes. The on-the-ground reality might be that they are already responsible for non-paid work, such as childcare or maintaining subsistence crops, which already takes up their whole day.
4) Validity of data is questionable. As indicated by the debates over the validity of AidData’s Tracking Chinese Aid to Africa, socially sourced data cannot be the only source of data to influence policy. Self-reporting has inherent “barriers, blindspots and biases.” For example, the information collected from the Arab Spring was based on self-reporting of goings-on. The outside world used information from texts, Tweets, Facebook and blog posts to analyze the situation.
These four potential downsides of big data all suggest the need for caution in using data to inform development policy.
– Katherine Zobre