
🎙️ The fourth path
TL; DR – AI systems enhance human autonomy if we deliberately design them that way. This piece proposes one piece of that puzzle: ensuring the data that underpins AI models is bespoke, and owned and stewarded by those represented in the data.
This essay was originally submitted in response to the Cosmos Institute Essay Competition. It answered the question: ‘How should AI be developed or governed to protect and enhance human autonomy, safeguarding both freedom of thought and freedom of action?‘
Shaping the trajectory of artificial intelligence (AI) towards human autonomy is one of this century’s biggest challenges.
Here’s my foundational premise. Data is one of the biggest levers we have, for building AI that safeguard our freedom of thought and action.
To take a term popularised by Bill Gurley, datasets are the backhoes of modern AI systems. In 1996, Gurley observed that backhoes dig trenches, which host cables, which power the internet. Because of this, internet speeds are dependent on backhoes, and improvements in backhoe technology.
Similarly, data underpins AI systems. Approaches to how we construct and use datasets affect how effective AI systems become, how safe they are, and how much they contribute to human autonomy. Or, garbage in, garbage out.
Smart people call this a data-centric approach. It means putting data at the heart of building better AI systems.
And in this essay, I propose that grappling with and improving datasets gives us a template for how to build an AI that promotes human autonomy.
Namely, it guides us towards a fourth path, different from:
- Rejecting AI altogether, for fear of its catastrophic negative impact on autonomy.
- Extensive regulation to slow down AI progress, while we ‘figure out’ what developments in AI mean for autonomy.
- Undertaking rapid technological advancement at any cost, without considering the wider consequences on autonomy (or blindly assuming that development in AI always leads to enhanced autonomy)
The fourth path is to deliberately design and build AI systems that maximise autonomy. To do this, we have to use all the levers we have — including and especially data — at our disposal towards this end.
This path resonates with what Vitalik Buterin has called ‘defensive accelerationism’: boldly pursuing technological development, while focusing on tech that defends us from harm.
This path also puts humankind, and our autonomy, at the centre of the story. We’re not putting artificial limits on our ability to build and harness tech (paths 1 & 2). Neither are we assuming technological progress has its own logic, which will shape our destiny no matter what (path 3). We’re intelligently and intentionally building tech and exercising our agency over it, towards a future that maximises human autonomy.
So, how can better datasets get us to this future?
I’ll jump right to the answer. Diverse, bespoke datasets, owned and stewarded by those represented in them, are a game-changer for our collective autonomy. The rest of this essay unpacks this statement (particularly the words in italics).
The power of bespoke data
Drive through Tanzania, and you’re in for a bumpy journey.
Dry, uneven roads criss-cross much of the country.
For travellers, it’s an uncomfortable ride. But for those who live more than 5km from a good road, it means lower incomes, and isolation from family, friends and markets1.
In 2018, I supported Tim, James and Bertrand2, to explore whether images of roads, taken by drones and analysed through custom-built algorithms, could tell you which roads needed repair in Tanzania. This was one of the radical projects funded by the UK government’s Frontier Tech Hub.
Existing algorithms, trained on smooth roads in Europe and the US, had no value in Tanzania, where 95%+ of roads are dirt or partly paved. We needed to train an algorithm from the ground up specifically for Tanzania.
For that, we needed a training dataset.
Building a first-of-a-kind dataset, from the ground up, is hard work3. The steps James, Bertrand and the University of Nottingham team took were roughly as follows:
- Evaluating and stitching together photos taken from drones battered by wind and rain
- Driving hundreds of miles in cars fitted with sensors to monitor bumpiness
- Manually assigning thousands of photos with classification labels, based on the bumpiness (“very good”, “good”, “poor”, etc)
- Double checking the photos and labels based on memory, and conversations with Tanzanian partners.
This hard work and smarts came together to create a first-of-its-kind, labelled dataset that evaluated the condition of unpaved roads.
A dataset representing the kinds of roads you get in most of the world.
The government in Tanzania, and particularly in Zanzibar (a semi-autonomous part of Tanzania) use this data, alongside other things, to make decisions about which roads to repair. Anecdote, happenstance, and politics no longer dictate road maintenance. The bespoke dataset that has laid the foundations for greater connection, economic growth, and autonomy.
We can also glimpse the importance of bespoke datasets in how people use large language model (LLM) powered chatbots today.
Whenever my behavioural scientist friend is using Claude, he starts every prompt with “As Daniel Kahnemann, brainstorm ideas for…”.
Whenever my teacher friend is using Chat-GPT to plan lessons, she’ll start the prompt with “As a knowledgeable biologist, explain how photosynthesis…”.
Asking AI to adopt personas is a strategy, designed to direct it to a specific part of its latent space. The goal is to get a more tailored response. Although it works at the level of a prompt (not the data), it’s akin to harnessing a bespoke dataset, and meets a similar need.
I’m fortunate to spend large parts of my working life with entrepreneurs in sub-Saharan Africa. These amazing humans are, like James and Bertrand, building products that can radically improve lives. And they believe AI, built on data that reflects the African context, is what’s needed to get us there.
Take Kiko. He founded Angaza Elimu in 2017, to bring eLearning into every school in Kenya. Students use the platform to learn, and teachers get data and insights into how to plan lessons to meet their pupils’ needs. I worked with Kiko on his product strategy earlier this year, and he was particularly excited about using AI models like GPT-4 to generate lesson plans, saving teachers’ time.
However, when prompted to plan a Mathematics lesson on fractions, the model outputs a lesson around pizza slices.
The catch: outside of big cities, no child in Kenya eats pizza.
Even after fine tuning, GPT-4 wouldn’t stop suggesting pizza-based lessons.
What Kiko needs is bespoke data.
Look at any potential use case for AI in education, according to educators: creating learning materials and exercises for students, lesson plans for teachers, assessing learner levels, translating high quality materials into local languages. All would be done much better with AI models built on bespoke datasets4.
And all of those use cases would lead to flourishing lives.
The next step: data owned and stewarded by those represented by it
Crafting bespoke datasets is one part of the puzzle.
But to truly harness data as a lever for human autonomy, I believe those datasets need to be stewarded and owned by those represented in them.
What do I mean by stewarded?
One of the most groundbreaking models for data stewardship I’ve seen is the Māori Data Governance Model (MDGov), developed by the government of New Zealand.
MDGov puts data on the Māori — the indigenous people of New Zealand — in Māori hands. Concretely, this means:
- The Māori elect a Chief Data Steward, with the resource and mandate to represent Māori interests and perspectives when it comes to data that represents them.
- Through the Chief Data Steward, Māori can lead and design their own way of classifying the data. For instance, they co-designed the Te Kupenga post-census survey of 5000 Māori adults, ensuring it captured subjective factors related to household and extended family wellbeing.
- Māori are the primary judges of data quality and accuracy on the data
- Consent to use data on the Māori is considered an ongoing relationship, with the Māori holding authority to repatriate their data back into Māori hands.
In a tangible sense, the Māori are stewards of their bespoke dataset. Agency is rooted in the community represented in the data.
And what do I mean by owned?
I was lucky enough to live in Rwanda for 18 months, between 2021-22.
As in much of Africa, co-operatives are everywhere. Reflecting the community spirit inherent in Rwandese culture, co-operatives of one kind or another have for centuries informally organised mutual assistance and labour pools. For instance, Ubuhede is the Rwandese tradition of calling on a neighbour’s help to cultivate a field. In the last couple of centuries, co-operatives have morphed to include formal communal ownership and management of shared assets, spreading returns among members5.
This time-honoured tradition of co-operatives has touched the world of data. For example, in India, a co-operative of indigenous women farmers is pooling data on income and credit history, to improve their collective creditworthiness. This dataset is owned by the co-operative, on behalf of all women members6.
Advances in AI capabilities can turbo-charge the data cooperative. Where demand for bespoke datasets was previously niche, it will become widespread as AI models proliferate.
There’s considerable value to banks, if the AI models they used were built on high-quality data on women farmers’ income trends. Just as there’s considerable value to governments in low income countries, for AI models built on James and Bertrand’s road quality data.
Imagine a world where this value accrues to those who are represented by the data. They deserve it: through the work of crafting the dataset, and their wisdom and experiences represented in it.
And through the co-operative, we have a model for communal ownership that has stood the test of time.
Autonomy on two fronts
So far, I have argued for new approaches in building, governing, and owning datasets. But how would such datasets enhance autonomy?
At its most basic level, to be an autonomous agent is to be “self-governing”7.
Yet, as Amartya Sen (a Nobel Prize winning economist) and Martha Nussbaum (a moral philosopher) have pointed out, this autonomy to self-govern has to be “real and substantive”8. By this, they mean it has to include basic capabilities: that allow the agent to go out and do the things they want to do, or be the person they want to be. Things like nourishment, education, good health, and so on9.
Sen and Nussbaum term this the ‘capability approach’ to freedom. For them, building capabilities should be the primary goal of any government policy. I believe it should also be the goal of technological progress, too.
Imagine a world with a diversity of datasets, tailored to a wide diversity of needs and contexts. This world will give us AI products that build capabilities and, by doing so, enhance autonomy.
A dataset on educational content, tailored to the Kenyan context, means Kiko’s eLearning platform will have greater relevance for the children using it. Which improves the likelihood of learning, and the likelihood those children go on to shape and live a life that means something to them.
A dataset on road quality might lead to an AI product that suggests road maintenance and repairs in the Global South. This product can give communities the freedom to travel should they choose to do so.
In short: a plurality of datasets, equals contextually relevant and effective AI products. Which equals more capabilities, equals greater autonomy.
And if communities own that data through co-operative style models, the autonomy compounds. The dataset generates wealth, giving a much larger chunk of the world’s population real and substantive freedoms through the AI value chain. No longer is AI delivering outsized benefits to a small, concentrated group. AI development is enhancing wealth, and autonomy, for more people.
Feminist perspectives provide a further lens on the link between autonomy and bespoke, community owned datasets.
Imagine datasets that reflect a diversity of values, perspectives and human experiences.
If those datasets get used in AI models, more of us will see ourselves and our contexts reflected in AI products, wherever and whoever we are. The indigenous woman farmer in India will see her context reflected in a bank’s algorithms. The teacher will see her community represented in the AI generated lesson plan on fractions.
Feminist perspectives on autonomy emphasise that autonomy is relational. Decisions made by self-governing agents are never made in isolation. They’re made based on networks of relationships and interactions within communities of people who care about you, and reflect your own values and experiences. Just as Sen and Nussbaum argued that autonomy is built on capabilities, feminist perspectives on autonomy are built on relationships.
People who are truly free understand that, without other people, autonomy is inconceivable. They harness their networks and communities in making decisions, and in doing the things they want to do and being the person they want to be.
For AI products to enhance autonomy, they need to reflect the user’s context and make salient the wisdom of their community. To do that, they need to be built on high quality, bespoke datasets.
Today’s LLMs, built on data scraped from the internet, represent just one of many possible contexts. If products built on such LLMs become pervasive, they funnel humankind down a narrow set of thoughts and actions, limiting their autonomy.
I’ll jump right to the answer. Diverse, bespoke datasets, owned and stewarded by those represented in them, are a game-changer for our collective autonomy. The rest of this essay unpacks this statement (particularly the words in italics).
Concluding remarks
I have argued for bespoke, community-owned datasets, which enhance autonomy by building capabilities and reflecting an agent’s networks, wherever and whoever they are.
The journey — of thoughtful and deliberate development of AI that maximises autonomy — is the kind I believe we need to embark on, if we’re to bend the arc of technological progress towards human flourishing.
I’ll end with two final observations.
Firstly, I believe what makes this journey powerful is that it is realistic and it aligns the incentives of everyone involved. Bespoke community-owned datasets bring value to users of AI products, by making those products more effective and contextually relevant. They bring financial value to data producers.
And, crucially, they bring value to developers of AI models and products too. Despite being the backhoes of modern AI systems, datasets today have huge constraints. They are out of date, opaque, noisy, unrepresentative of global wisdom, perspectives and experiences, and have little accountability for incorrect or dangerous content11. As AI advances, a market worth billions of dollars opens up for data; a market we can shape towards human autonomy.
Secondly, this essay cuts against “scale thinking”, the dominant mental model of Silicon Valley12. Scale thinking takes ‘more’ to mean ‘better’, and prioritises systems that can fit an ever growing number of users (i.e. systems that are ‘scalable’).
Scale thinking dictates that AI developers through ever larger datasets, with ever larger amounts of compute.
And while scalability has democratised access to amazing tech products, any dominant mental model can constrain thinking. Datasets that are built/owned by, and for, specific communities, and rooted in their specific material needs and context, push us to a radically different mode of thought.
And, by so doing, they assert our own autonomy too.
1 Financing Roads in the United Republic of Tanzania (F. Haule, 2005)
2 Tim was a pioneering civil servant at the British High Commission in Tanzania, and James and Bertrand were machine learning specialists at the University of Nottingham). A truly mission-driven group.
3 You can read about their efforts in this delightful series of blogs the team wrote while they were doing it.
4 I’m particularly excited about the potential of Lelapa’s InkublaLM, trained on African language data, for use cases like Kiko’s.
5 Jump-starting the Rwandan cooperative movement (J.D. Nyamwasa, 2008)
6 How to Build a Data Cooperative: A Practitioner’s Handbook (Aapti Institute et al, 2024)
7 Personal Autonomy (Stanford Encyclopedia of Philosophy, 2014)
8 The Capability Approach (Stanford Encyclopedia of Philosophy, 2018)
9 Many others have had similar ideas. Isaiah Berlin, for example, developed two concepts of liberty. “Freedom from” constraints on its own isn’t enough: we must also have the “freedom to” act on our own free will and realise our potential.
10 Feminist Perspectives on Autonomy (Stanford Encyclopedia of Philosophy, 2014)
11 The social construction of datasets: On the practices, processes and challenges of dataset creation for machine learning (W. Orr & K. Crawford, 2023)
12 Against Scale: Provocations and Resistances to Scale Thinking (A. Hanna & T. Park, 2020)
🎬 Thanks to Joel Christiansen, Alice Sholto-Douglas and Jude Klinger for looking at drafts of this.
🤔 Got thoughts? Don’t keep them to yourself. Email me on asad@asadrahman.io. Let’s figure this out together.
If you enjoyed this, subscribe to get pieces just like it straight to your inbox. One email, every so often (and nothing else).
Banner generated by DALL.E from Open-AI, in response to the prompt: “visualise the data that gives you knowledge“