Can Google search queries help countries map their diasporas?

March 18, 2020

Diaspora is an invaluable asset to its country of origin. For example, it has been estimated that more than 4 million people from Serbia reside abroad. Annually, their remittances make up close to 9% of the country's GDP, but what makes diaspora valuable is much more than the money they send back home. The potential of people in the diaspora lies in their education, professional experience and professional networks, which can contribute to opening up opportunities for trade, investment, and innovation.

In order to take advantage of these opportunities, home countries need to know where their diaspora is, but this is often a challenge for many countries. Estimates of between-country migration flows are often imprecise, and the official statistics cannot cope with the dynamics of modern migrations.

Researchers who used digital tools to analyse web searches to analyse web searches came a little closer to real-time mapping of people’s movement between countries. An estimated 4.5 billion people are online and 92% of them use Google to search the web.

Google search data could give us some insights on the geographic distribution of citizens who live abroad if we carefully and creatively choose the right queries. Google Trends is a publicly available analytical tool that standardizes search volume by language and location over time. It aggregates billions of instances each day in which someone types a query into a search box. It analyses and visualizes most popular queries in time based on region and language.

There are several examples of Google Trends data use in development and crisis information management. Google data was used to predict the movement of refugees during the 2015 refugee crisis. The Pew analysis started off in Turkey and followed the refugees’ journey. Using Google Trends the researchers looked at how often Arabic speakers in Turkey used the search term “Greece.” They also looked at how often they used the search term “Germany” in the Arabic language from Greece. Those searches started to climb steeply in the summer of 2015. A couple of months later, tens of thousands of immigrants were landing on Greek shores.           

Williams and Ralph [1] (2013) found out that Google searches for ‘Polski’ were closely related to statistics on Polish nationals in the UK. The results were also promising for migration from Romania and Lithuania, as these countries’ emigrants speak unique languages and their populations are big, less for the other countries in scope. Böhme et al. used Google trends to predict international migration in their research “Searching for a better life: Predicting international migration with online search keywords”. They selected a set of close to fifty words specific to the migrant population, such as passport, asylum, embassy, social assistance, allowing real-time predictions of current migration flows ahead of official statistics, and improving the performance of conventional models of migration flows.

Source: Google Trends

  1. Hesse 100
  2. Baden-Württemberg 92
  3. Bavaria 81
  4. Hamburg 60
  5. Berlin 59
  6. North Rhine-Westphalia 58
  7. Rhineland-Palatinate 55
  8. Lower Saxony 38
  9. Bremen 36
  10. Saarland 34

We at the UNDP Serbia Accelerator Lab made a similar analysis trying to map Serbian diaspora. We started this analysis with the assumption that migrants are likely to be searching in Google in their native language. Internet searches were captured from Google Trends in Serbian language from Germany, Austria and other countries with large concentrations of Serbian diaspora. Search data was obtained using the Google Trends tool.

The analysis was performed for selected words such as “Serbian” (“srpski”) that can be linked to emigrants’ search queries’ patterns. Google releases hourly, daily and weekly search data. Google does not release the actual number of searches conducted but provides a metric capturing the relative change in searches over a specified time period. The metric ranges from 0 to 100 and indicates low- or high-volume search activity for the time period: 100 represents the highest popularity of a search term in the selected region and time frame and a score of 0 means that there was not enough information for this term.

Source: Google Trends

  1. Munich 100
  2. Frankfurt 98
  3. Nuremberg 94
  4. Stuttgart 82
  5. Hamburg 72
  6. Berlin 69
  7. Cologne 61

We chose the last 5 years data for Germany. The official statistical data in Germany (2011 census data) says that the majority of the Serbian population (64%) is concentrated in four federal states: Hesse, North Rhine-Westphalia, Baden-Württemberg and Bavaria. However, considerable part of the Serbian migrants was not included in the German census, because they still went under former nationalities (Federal Republic of Yugoslavia, Serbia and Montenegro) at the time of the census. The Google search trends for “srpski” mostly confirm these data, with a slightly lower percentage (61%).

Similarly, we could map Serbian diaspora in Austria. These two countries are topping the list of the destination countries for migrants from Serbia.

Source: Google Trends

  1. Vienna 100
  2. Salzburg 58
  3. Vorarlberg 56
  4. Upper Austria 48
  5. Lower Austria 46
  6. Tyrol 38
  7. Carinthia 32
  8. Styria 28
  9. Burgenland 22

Source: Google Trends

  1. Vienna 100
  2. Wels 76
  3. Salzburg 70
  4. Dornbirn 57
  5. Linz 51
  6. Wiener Neustadt 47
  7. Graz 37
  8. Innsbruck 37
  9. Klagenfurt 31

Mapping can be done quickly choosing region-level or city-level and can give us new insights on migrants’ population. Big data and Internet search tools can help us address development challenges by providing an additional layer of information around other statistics and by comparing the patterns and providing an indicator about possible change on the ground or flag the event, such as refugee crises, protests, disasters or epidemics.

On the other hand, many times it’s really hard to provide meaningful interpretation to Google search trends. We are limited by language selection, broad geographic location and time. In some countries the region is broad. For example, in Serbia one can choose Vojvodina and Central Serbia. The data has been available since 2004, but the algorithm was improved in 2011, which complicates the comparison of data. It would be helpful to get access to actual transaction volumes of Google searches, and not only the Search Volume Index. In this particular case, competing populations, like those from Bosnia and Herzegovina, and Croatia, all speaking BCS language, may influence the accuracy of the findings.

When we can say with certainty that a set of words is used only by a certain migratory population, Google Trends can be a source of useful information. By combining datasets from other sources with Google Trends data we could get better insights on the geographic location of the diasporas, helping countries of origin connect with them.

[1] Susan Williams and Martin Ralphs, “Preliminary Research into Internet Data Sources”, UK Government Statistical Service, 18th GSS Methodology Symposium, 2013 

[2] Böhme, M.H., et al., “Searching for a better life: Predicting international migration with online search keywords”, Journal of Development Economics, 2019

Drasko Draskovic is the Head of Exploration within the UNDP Accelerator Lab Serbia. You can follow him on Twitter.