What Big Data means for smart cities

July 2, 2020

Big Data is often discussed in the context of international development, and improving the urban environment - but what exactly is Big Data, how can it be used, and why is it a fundamental component of smart cities?

Big Data refers to data generated by digital technologies such as mobile phones, websites, satellites or sensors. This data requires data science - or artificial intelligence - tools or methods in order to capture, curate, manage, and analyse it in an efficient way. Big Data is often a by-product of our digital behaviour, founded on the digital 'crumbs' that we leave behind as we interact with systems or machines in our daily lives. 

In the context of the Sustainable Development Goals, Big Data is being applied to explore everything from analysing mobile phone data to predict individual poverty (Blumenstock, 2015), through understanding gender gaps in urban mobility (GovLab and others, 2019). However, Big Data has particular relevance in the context of smart cities - and driving improvements in the urban environment more broadly.

The data sources behind Big Data can be grouped into human sourced data (such as social media, blogs, an crowdsourcing), process mediated data (including financial transitions, electronic health records, and tax records), machine generated data (for example, satellite imagery, and data from Internet-of-Things devices) and media-sourced data (such as radio broadcast and digital news).

All of these data sources are particularly present in an urban environment - and leveraging them can play a crucial role in making cities more liveable places. For example, machine-generated data from Internet-of-Things devices - including in relation to mobility, transport usage, and living conditions - is helping many city officials, and their partners, to make better policies. Mobile network data is also being used to understand how people interact with a city. Many of these data sources can also be combined to produce information that is more detailed, timely and relevant.

However, Big Data is not a panacea. It has some real limitations, and can present real challenges. Big Data is not a complete data-source, it has gaps - highlighting the need to combine data sources. It is also often inaccessible: owned or managed by many of the private organisations that drive large aspects of our digital behaviour. This also highlights a further challenge: bias is a constant risk. Big Data - by definition - does not include the digitally-excluded. It is skewed toward wealthier, younger, and more educated citizens - with the billions marginalised from digital society similarly absent as Big Data data-points.

Related to this, a number of ethical principles need to be considered when applying Big Data approaches. Has the data been collected, and is being used, fairly and lawfully? Are suitable controls and accountability processes in place? Is the data of high quality, or is it biased? And are we using the minimum amount of data - or collecting huge volumes of data without considering the rights, privacy, and agency of individuals?

Notwithstanding these concerns and considerations, Big Data presents real opportunities for smart cities - and for international development more broadly. In exploring the role of Big Data in any project, it's crucial to think about the below five questions:

  1. Is Big Data better data? The composition of the data is more important than its size. Data that is gathered through platforms in which different groups are able to participate and feel included, will be more complete - and solutions informed by this data will be more inclusive. There is an important risk of ‘elite capture’ in Big Data, with certain groups being excluded - whilst social inequalities can be amplified by unfair algorithms trained on biased data.
  2. Is it valid and accurate data? The value of Big Data lies in its ability to generate reliable and valid measurements. There is a mutual dependence between traditional data such as surveys or administrative data and Big Data. As passive data that can be repurposed, Big Data needs to be validated through other data sources - such as more traditional sources. This traditional data, if collected and processed well, can be used as a benchmark to assess the quality of Big Data.
  3. What type of knowledge does it entail?  As with other types of data, Big Data needs to be processed and fit to statistical, algorithmic, visual, or other models in order to be interpreted. Each of these steps requires making decisions about the data, and these decisions are not neutral. For example, distinguishing between what comprises the ‘signal’ and the ‘noise’ in a dataset, identifying how data is clustered, and deciding how outliers are handled are important decisions. These are also all informed by the worldview, beliefs, and biases of the individuals making these decisions. In order to tackle this risk, these decisions should not be left up to individuals. Data scientists should be working side-by-side with social scientists - and other experts - from different genders, age-groups, and ethnicities.
  4. Is there meaning without context? Big Data needs to be interpreted within the context in which it is generated - and from the perspective of the individuals represented in the data. However, social norms - and other principles - that inform who has access to the digital platform and how the platform is used are intertwined with cultural factors, perceptions of fairness, and data security.  A good technical understanding of how data has been sourced can be particularly important - especially as patterns in the data may be reinforced by users’ behaviour or artefacts in the process.
  5. Does being accessible make it ethical? Just because data can be collected, does not mean that it has to be collected - or that it should be used. Accessible and ethical are not synonymous, and using particular datasets or data sources can cause real and irreparable harm to individuals and societies. These can be compounded by data leakages, irresponsible sharing, and misreporting. Ethical data usage is about prioritising dignity, respect and privacy in the collection, analysis, and usage of data. This includes ensuring that individuals benefit from the use of their data.

Big Data demands a fundamental rethink of how we use data in international development. This includes the skills, tools, and processes involved and required. However, like any other technology or innovation, Big Data also needs to be applied in a strategic, thoughtful, and inclusive way. This includes avoiding creating, exacerbating, or entrenching any digital divide of other inequality.

Big Data is an important element of smart cities, and could be crucial in improving how cities function and work for the benefit of citizens and other stakeholders. Echoing our systems focus at the Global Centre, we can also use Big Data to see a city in its broadest sense: including informal settlements, and rural and urban linkages. When applied well, Big Data has the potential to drive the shaping of urban spaces that can work for all citizens.