A Bold Transatlantic Plan to Open Corporate Databases

Photo credit: Ian Battaglia
Photo credit: Ian Battaglia

It is high time for tech companies to open their vaults for researchers. Governments should step in.

When Presidents von der Leyen and Biden met at the EU-U.S. summit, they agreed to form a new EU-U.S. Trade and Technology Council (TTC). Originally proposed by the European Commission last November, the TTC will seek to stabilize the semiconductor supply chain, align technology regulatory and standardization efforts, and resolve the longstanding dilemma over the transfer of personal data between the EU and the U.S. These seem to be the most prominent goals of the TTC, along with the understated desire to create a stark contrast with China’s use of technology for surveillance and social scoring. These are worthwhile efforts within the traditional norms of international cooperation, but data governance should also be a top priority as the U.S. and the EU seek to work more closely on tech regulation.

The United States and European Union face an identical problem in technology governance — their public institutions are systemically disadvantaged by an enormous information asymmetry between themselves and the largest online platforms. Today, technology companies have exclusive access to accumulated datasets of unprecedented size and scope. The data contains critical insights into core questions of democratic stability (e.g., the spread of disinformation and political polarization), public health (e.g., child pornography and child self-harm from cyberbullying), and the functions of online markets (e.g., concerns of artificial intelligence, or AI, market collusion). Seen together, such data is essential to keeping the public and policy-makers sufficiently informed about the societal role of the internet, but both sides of the Atlantic are in the dark.

Despite its clear value, the network data that defines these large platforms remains largely unavailable to independent and public-oriented researchers. Private-sector efforts to make it more accessible have so far been insufficient. The data released by Facebook’s Social Science One is aggregated and altered to the point where it cannot be used to answer these critical questions (see above), leading its European advisory committee to step down. Uber’s academic research program appears to have political objectives that make trusting the resulting research difficult or impossible. And while Google produces a wide range of academic research, the controversy around the dismissal of an AI researcher engineer called Dr. Timnit Gebruhas raised concerns about the company’s commitment to academic research.

In response to this challenge, the European Commission included a provision in the proposed Digital Services Act (DSA) which would require the largest internet platforms to open up their data to independent researchers with Commission approval. The DSA would affect companies with at least 10% of EU citizens as active users, which would likely include Facebook, YouTube, Twitter, TikTok, Amazon, and others. This is a promising proposal that would help hold tech firms accountable and enable a far greater understanding of the impact of online platforms on society. In fact, many researchers in the U.S. have come to the same conclusion. Princeton Computer Science Professor Arvind Narayanan writes about YouTube that there is “no good way for external researchers to quantitatively study radicalization.” University of North Carolina Professor Zeynep Tufekci says, “we need independent research access” to study polarization on Facebook. This data is key to governmental oversight in the U.S. too, as Stanford University’s Nathaniel Persily argues that data access is “the first step towards the regulation of these platforms.”

Given that the EU and the U.S. face the same problem, and would benefit from the same solution, this is a meaningful opportunity for transatlantic collaboration. Simply by agreeing to allow researchers access to joint EU-U.S. datasets from the online platforms (e.g., all the EU and U.S. user data on Facebook), these programs would be substantially more beneficial even if implemented separately. In part, this is because the effect of the online social networks and markets on EU citizens is intertwined with those of U.S. citizens. A U.S.-based company would be rightly reluctant to hand over sensitive data about Europeans to U.S. researchers based only on a U.S. law—and Europeans would certainly object, too. But working together solves this problem. Enabling independent researchers' access to the joint datasets will paint a fuller picture simply by including a larger percentage of platform activity.

Further, the broader concerns of the EU and U.S., from algorithmic discrimination in advertisements to covid-19 misinformation, are strikingly similar. This means that opening databases to researchers would increase the number of relevant research projects for both countries, and would likely foster more research collaboration (also a stated goal of the TTC.) This might help the EU and the U.S. make progress on other issues, too. From data privacy to AI governance, there are stark differences in how the EU and U.S. approach technology governance. Yet a common base of scientific evidence, at least regarding large technology platforms, might help build trust and consensus between the potentially diverging governments.

This is unquestionably a bold proposal, and I would not argue with anyone who calls it unrealistic. It would require intentionally aligned legislation from both sides of the Atlantic, as well as some ongoing operational cooperation. Still, it is worth considering new forms of international partnership, especially because the nature of today’s technology platforms challenges the jurisdictional limits of governance. By default, websites are accessible regardless of geography, data flows freely across borders, and AI systems can be developed in one country and then easily applied in another.

The inherent challenge of digital governance justifies rethinking the normal bounds of international collaboration. A transatlantic agreement on independent researcher data access is certainly ambitious, but it is an intervention proportionate to the difficulty of governing platforms that span millions or even billions of users around the world.

July 8, 2021