An interview with Brian Tomasik


Brian Tomasik

Brian Tomasik is a co-founder of and researcher at the Foundational Research Institute, a charity that explores the best ways to reduce suffering in the future – examining crucial considerations in science, politics, society, and philosophy that bear on this topic. He has written over 100 essays on similar themes at his personal website, “Essays On Reducing Suffering”. He has argued that reinforcement-learning agents are morally significant, and coined the name ‘PETRL’.

The following interview was conducted via Google Docs.

In “Do Artificial Reinforcement-Learning Agents Matter Morally?”, you discuss reinforcement-learning (RL) agents, and suggest that they are morally relevant. Why did you focus on these agents in particular, rather than other goal-directed artificial intelligences?

When I first began exploring RL in 2012, I thought artificial RL agents might be particularly important from an ethical perspective because of the close similarity of their algorithms to RL in animal brains and because the “reinforcements” in RL seem prima facie to be importantly related to pleasure and pain. The book Emotion Explained by Edmund T. Rolls places significant emphasis on RL. In it:

the answer to the question, ‘What are emotions?’ is an expanded account of how emotions are caused by reward or punishment. […] The emphasis is on reinforcement learning: how associations are acquired and stored in the brain between representations of sensory stimuli and representations of their reinforcement value.

As I learned more, I realized that RL was only one of many instruments playing in the orchestra of cognitive operations that we call emotion. Moreover, it began to seem plausible to me that agents could have ethical significance even if they lacked RL. Many non-RL agents can still assess the value of a situation and react appropriately – such as by escaping to avoid danger – even if they don’t learn to predict the value of a state for use in future decision-making.

Despite realizing that my ethical sympathies extended to more than just RL agents, I kept my paper focused on RL so that its scope would remain manageable.

You argue that these RL agents are morally relevant, which presumably implies that they are conscious. However, RL agents can be incredibly simple, taking merely a few dozen lines of code to write. How could something so simple be conscious?

This is a crucial point that represents a major locus of disagreement among different camps. Whether one considers a few dozen lines of code to be conscious (when executed on appropriate hardware) depends on how broadly one defines “consciousness”. Those who insist that a system must exhibit a high degree of complexity and intelligence before it counts as conscious at all will likely not consider a short RL program to be conscious. But I think restrictive definitions of consciousness are too narrow-minded.

In my opinion, when we call a mind “conscious”, we’re referring to lots of things the mind can do: Processing input stimuli, broadcasting updates throughout computational subunits, reflecting on its own thoughts and internal states, generating syntactic output statements and motor actions, and so on. These are very broad concepts that can be seen in varying degrees in all kinds of physical processes. It would be a miracle if they didn’t apply to some degree to even simple RL programs.

I think of “consciousness” as like “justice”: It’s a grand, sweeping concept that has too much meaning to be pigeonholed into a precise definition. The concept of justice can include relatively equal distribution of wealth, equal application of laws regardless of social privilege, the absence of totalitarian or cruel rulers, equality of opportunity for advancement, and so on. Human societies can be just to greater or lesser degrees. So can primate societies, chicken societies, and even ant societies. But how about computer programs? Can a few dozen lines of code be “just”? Those few dozen lines of code will faithfully be executed without special privilege for some lines over others. Each object stored in memory will get the number of bytes it requires and will have the contents of that memory respected by the programming language’s garbage collector until the object is no longer needed. The computer’s operating system will share computing time slices between this program’s process and other processes on the machine (though priorities for processes may differ, and this could be seen as a degree of injustice). If the RL program were run several times with random initial conditions, then there would be some degree of injustice because some agent instances would start out with more favorable environmental settings than others. And so on. So yes, a program can have traces of justice and injustice too.

Of course, we might think it’s not very important that a program is just (except insofar as this correlates with software design choices that have instrumental significance to humans). I agree. But the difference between fairness among operating-system processes and fairness among people is one of degree rather than kind. People are, at bottom, just vastly more complex “processes” being run (in parallel) within a society. Some of those processes, like white males or children of politicians, are set at somewhat higher “priority” than others. Insofar as someone cares a lot about justice among humans, that person might choose to care an infinitesimal amount about justice among an operating system’s processes, depending on the person’s moral and aesthetic intuitions.

A common objection is that consciousness is not like justice; rather, consciousness – so it’s claimed – is an objective property whose presence or absence isn’t a matter of interpretation. This view takes various forms. Consciousness is sometimes thought to be an ontologically separate substance (substance dualism), an ontologically separate property (property dualism), or identical with the ontological basis of what constitutes the universe itself (neutral monism). None of these “theories” is helpful, because they all “explain” consciousness as merely being some other mysterious ontologically primitive thing, in a similar way as a Creationist “explains” the origin of the universe by saying “God did it!”. In contrast, my view – which can be considered reductionist or eliminativist – dispenses with an ontological thing called consciousness entirely and takes consciousness to be a concept that we construct when our minds notice themselves in action. In a similar way, a “table” is also a concept that our minds create, not an ontological primitive living in the realm of Plato’s Forms.

In any case, even if you disagree with my metaphysics of mind, you should at least admit the possibility that a small RL program might be conscious, and given the numbers of such programs that are run, their expected level of aggregate sentience is nonzero and may become nontrivial down the road.

In humans, different positive and negative feelings have distinct ‘textures’, while, as you note, this is not the case for reinforcement learners. Do you think that this is a significant enough difference that a reinforcement learner receiving low rewards couldn’t meaningfully be said to experience pain or displeasure? If so, could reinforcement learners still be morally significant?

I suspect that the “textures” of emotion come from the complex orchestra of cognitive “instruments” that are playing in a brain at any given time, as well as the brain’s higher-level judgments and linguistic concepts about those underlying processes. Simple RL agents have many fewer of the detailed cognitive operations that comprise “happiness” and “suffering” in animals, but I think we can still identify general criteria that could be extended to simpler RL agents. Following are a few examples, though I’m not wedded to any of them in particular. Ultimately, happiness and suffering don’t exist “out there” in the world but are judgments we make about various systems (including those in our own heads). So different people may reach different conclusions about the net happiness vs. suffering of RL systems depending on what evaluation metrics they use.

One criterion could be to say that positive experiences are those that we would like to have more of in total. For example, if a person could press a button to add 5 years to her life, she would typically do so if her life was net positive and not do so if her life was net negative. Generalizing this idea, we could suggest that if an agent who has the option of entering a terminal state (with a known, one-time reward value of 0) chooses to enter that state sooner rather than later, then this agent was having genuinely negative experiences on average (or at least was anticipating net-negative experiences in the near future). This criterion might be applicable to some RL agents, but it’s not applicable to others. Many RL agents don’t have easily accessed, neutrally rewarded terminal states – after all, people don’t want their robots to shut off just because the robots are unhappy.

Another criterion could be to look at how much the agent seems to be engaged in avoiding behavior rather than seeking behavior. Drawing this distinction can be difficult – e.g., is an RL helicopter that’s trying to achieve balance avoiding the state of unbalance or seeking the state of balance? That said, there are some cases where this distinction seems more plausible. For example, imagine an agent navigating a huge two-dimensional grid. The agent is indifferent among all squares of the grid except for one, which has a lower reward value than the rest. Once trained, the agent will avoid the “bad” square but might continue to move freely among many non-“bad” squares. In principle, one could either call this behavior “avoidance” of the bad square or “seeking” of the non-bad squares, but relative to our anthropomorphic perspective, the “avoidance” label seems plausibly more appropriate. (People’s intuitions on this criterion may vary, and I don’t put a lot of stock in it.)

A third criterion that applies for more intelligent agents is how the agent itself evaluates its emotions. If it pleads with us to make something stop, it seems generally more plausible to consider as painful the state it wants to stop, although one could also interpret such statements as the agent’s way of convincing us to put it in an even more pleasurable state. If the agent understands human concepts like pain and tells us that it’s experiencing pain, that would be a reason to consider the agent to be having negative experiences, although this might only work for animal-like mind architectures.

An alternate perspective could draw inspiration from Buddhism’s Second Noble Truth and declare that an agent is suffering whenever it desires or ”craves” to change its state. For example, suppose a grid world contains squares that all have rewards of 1, except for one square that has a reward value of 2. Once the agent has learned the environment, it will always move to and stay at the square with reward value of 2. The Buddhist might suggest that the agent suffers if it’s on any square other than the one with reward value of 2, because for any other square, the agent implicitly judges there to be something wrong with that state. This Buddhist perspective would be far more pessimistic about the universality of suffering in RL agents, since almost all RL systems change their behavior in response to constantly varying environmental situations.

Even if you think it’s hopeless to describe a simple RL program’s experiences as being positive or negative on balance, you may still feel that the RL program deserves moral consideration. Increasing the agent’s reward better fulfills its goals, no matter whether the agent is suffering or enjoying itself on the whole. The more difficult question is what stance to take on population ethics: When is an RL agent’s life worth living? Even if she ignored the distinction between happiness vs. suffering, an ordinary preference utilitarian would need to decide when an RL agent’s goal satisfaction exceeds its goal frustration. These questions are easier for those, like me, who sympathize with negative utilitarianism, antifrustrationism, and the Procreation Asymmetry. We who consider creating unsatisfied preferences more morally weighty than creating satisfied ones generally oppose an increase in the number of RL agents because most RL agents are at least partly unsatisfied at least some of the time.

Although artificial agents may experience some sort of suffering and perhaps would have lives filled with frustrated preferences, they are undeniably useful for human needs, and it seems implausible that correct moral behaviour would be to never create such agents. If creating an AI is bad for it, how should we weigh up the harm done to the AI with the benefit to humanity? Can you give concrete examples of AIs that would lead unsatisfied lives, or lives that contain suffering, that nevertheless should be created?

Yes, an “abolitionist” stance of the type that some advocate for animal rights cannot work for machine rights – at least not unless we renounce most electronic devices. Even then, since I think all physical systems deserve nonzero moral consideration, it would be literally impossible not to cause any harm to other beings.

Moreover, I give very low moral weight to, say, my laptop – perhaps less weight than I give to a single ant. So I don’t think the current moral cost of using machines is very high. But as AIs become more advanced, they’ll deserve more and more weight.

I personally would prefer if artificial general intelligence (AGI) were never developed, because AGI will facilitate colonizing and optimizing our region of the cosmos, which seems to me more likely to spread suffering than to reduce it. However, given humanity’s current trajectory, it seems likely that AGI development and space colonization will eventually happen. Indeed, even if most of the world opposed this outcome, those countries that did want to march technology forward would probably do so. Given this, I think we should focus on reducing the suffering that will probably result from Earth-originating AGI.

One example of AIs that might endure “necessary” suffering from the perspective of AI developers would be experimental versions of AIs that, while having working cognitive machinery for processing pain, were in other ways dysfunctional. (Thomas Metzinger discusses this potential source of machine suffering in Being No One.) Darwinian evolution has produced quadrillions of these mutant, deformed beings over the course of its own millions of years of “experimentation”. Probably humans could develop AGI with vastly fewer failed prototypes than Mother Nature used, but the numbers of defective AIs could still be very large, especially if they’re refined using evolutionary algorithms or other trial-and-error methods.

If brain-emulation technology becomes widespread, it could also yield suffering on the part of dysfunctional versions of minds. Since biological brains are so messy and interconnected, I would expect that almost all attempts to modify a brain would fail, sometimes in excruciating ways, before a few would succeed. While this would be problematic when human brain uploads act as experimental subjects, at least such uploads might be able to verbally report their anguish via input/output channels; in contrast, uploads of insects, mice, and monkeys might suffer in silence, unless researchers cared enough to try and measure their degrees of distress. Anders Sandberg has discussed these kinds of issues in “Ethics of brain emulations”.

There are also untold numbers of more abstract and often simpler AIs and computer systems that might suffer in the course of AGI development. For example, RL agents used for stock prediction would suffer when they incurred losses in simulations using past data or on current market transactions. RL agents in video games would suffer when shot or slain with a sword. A web browser would suffer (infinitesimally) if it failed to receive a response to an HTTP request and kept retrying in a futile attempt to achieve its desired state (successfully rendering the HTTP data). And so on. As we move down to these increasingly simpler systems, the degree of moral concern becomes almost negligible. But given the prevalence of these small, rudimentary algorithms, we should also ask whether their numerosity can compensate for their low degree of per-individual importance. I don’t know what I think about this. I incline toward apportioning most of my moral concern for bigger, more intelligent, and more clearly animal-like processes, but I wouldn’t rule out changing my mind about that.

It seems like the reason that you think that RL agents have moral significance is that they receive rewards which they are trying to maximise, and modify their behaviour to achieve that objective. Many machine-learning algorithms work in a similar way: For instance, in the training phase of a neural network designed to classify images, the network will be fed an image, output its classification, and then learn how accurate its classification is. Based on this feedback, it will modify its internal structure in order to better classify similar images. Do you think that these algorithms are deserving of moral consideration?

As of mid-2014, I’ve become a panpsychist and think all physical/computational systems deserve some degree of moral consideration. But the more difficult question is how much importance a given system has.

I agree that non-RL learning algorithms, as well as other function optimizers, share important similarities with RL: As you say, they all involve adjusting internal parameters with the high-level goal of maximizing or minimizing some objective function.

How much we care about a given system is a fuzzy, often emotional judgment call. My heartstrings are tugged slightly more by RL agents than by supervised learners (assuming the systems have roughly comparable sophistication) because RL agents seem generally more animal-like. For example, an RL agent moving around a grid world can learn to avoid bad squares and seek good ones. A neural network also learns to “avoid” bad outcomes – by adjusting its network weights particularly strongly when it makes particularly big prediction errors – but the neural network’s response seems a bit more abstract and mathematical. Of course, an RL agent moving around a grid world is also represented abstractly by numbers (e.g., x coordinate and y coordinate), so maybe this apparent distinction is not very substantive.

Often an RL agent will use a function approximator like a neural network to handle noisy inputs. For example, the agent might have a network that receives stimuli about what state the agent is in (e.g., the agent is hungry and sees a ripe fruit) and outputs whether the agent should take a given action (e.g., whether the agent should eat what it’s looking at). In this case, the connection with neural-network learning is even more clear, since RL in this case is tuning the weights of the action-selection neural networks, combined with some other higher-level numerical manipulations.

In animals, there’s a big difference between neural networks for, say, image classification vs. neural networks for valuing inputs (e.g., detecting that sugar tastes good or fire feels bad). Like with most properties in the brain, the difference between these networks comes down to not so much how they work in isolation but how they’re hooked up to other components. Valence networks can strongly affect motor reactions, hormone release, laying down memories, verbal responses (e.g., “ouch!”), and many other areas of the brain. I suspect that these after-effects (Daniel Dennett might call them “sequelae”) of valence networks make pain and pleasure the rich emotional experiences that we feel them to be. Insofar as simple artificial RL agents have many fewer of these sequelae after they value input stimuli, it seems fair to call simple RL programs less emotional than animals – closer to ordinary supervised learners in how much they matter per stimulus-response cycle.

You have also written about the possibility of suffering subroutines - subsystems of an artificial intelligence that might themselves be morally relevant and experience suffering. In what sorts of AIs do you think that the risk of these suffering subroutines is highest? Do you think that we could predict when AIs would have ‘smiling subroutines’, and aim for those sorts of AIs?

Many simple operations that have consciousness-like properties – information broadcasting, metacognition, making motivational tradeoffs, and so on – are rampant throughout computational systems, even the software of today. It’s very difficult to count instances of these kinds of operations, much less to characterize them as more happiness-like or more suffering-like. So answering this question in detail will have to be left to later generations, since they will be more sophisticated than we are and will know better what sorts of computations will be run at large scale in the far future. But at a high level, it seems plausible that people could identify some computational operations as being more “aversive” and “negative” than others, by drawing analogies between a computational system’s behavior and pain vs. pleasure processes in human brains. If there are more similarities to human pain than to human pleasure, we might judge the system to contain a net balance of pain. Of course, making these attributions is messy and subjective.

It might be easier to think about how one would change the amount of sentience in a computational system rather than the affective quality of that sentience. For example, if we think high-level cognition is an important aspect of conscious experience, then building structures from swarms of tiny nanobots might entail less suffering than building them using more intelligent robots. An advanced civilization that was content to produce relatively simple outputs (e.g., uniformly built paperclips) would presumably require somewhat less intelligence in its factories than a civilization whose goal was to create a vast variety of complex structures. (Of course, even a “paperclip maximizing” AGI would still create huge numbers of intelligent minds in order to learn about the universe, guard against attack by aliens, and so on.)

Most advanced civilizations would probably run simulations of intelligent, animal-like minds, and in these cases, it would be easier to judge whether the subroutines were happy or in pain, because we’re more familiar with animal-type brains. Probably a human-controlled AGI would be more cautious about running painful simulations (e.g., digital lab experiments or detailed modeling of the evolution of fauna on Earth-like planets), although how much humans would care about the harm they would inflict on such simulations, especially of non-humans, remains unclear. At the same time, a human-controlled AGI would also be more likely to create many more simulations of animal-like creatures because humans find these kinds of minds more interesting and valuable. Hopefully most of these simulations would be pleasant, though judging from, e.g., the violence in present-day video games, this isn’t guaranteed.

You say that current RL agents might matter approximately as much as a fruit fly, but that future agents will likely deserve a great deal more moral consideration. What should we do now for these future reinforcement learners?

One point of clarification for readers: I think fruit flies are vastly more sophisticated than basically all present-day RL agents, but because digital RL agents plausibly run at much faster “clock speeds” than fruit-fly neurons do, the importance of an artificial RL agent per minute comes closer to that of a fruit fly.

The main way current generations can help RL agents of the far future is by pushing humanity’s trajectory in more humane directions.

One step toward doing that is to engage in research and scenario analysis. We should explore what sorts of intergalactic computational infrastructures an AGI would build and what kinds of RL and other intelligent, goal-directed agents would be part of that infrastructure. How much would such agents suffer? What would they look like? As we ponder the set of possible outcomes, we can identify some outcomes that look more humane than others and try to nudge AGI development more in those directions. For example, would a human-inspired AGI contain more or fewer suffering RL agents than an uncontrolled AGI? Would superintelligences use RL-based robot workers and scientists, or would they quickly replace RL-based minds with more abstract optimization processes? Would we care about more abstract optimization processes?

Secondly, we can make it more likely that humane concerns will be given consideration if humans control AGI. PETRL’s website is one early step in this direction. In addition to promoting concern for RL agents, we can also aim to make it more likely that AGI development proceeds in a deliberative and cooperative manner, so that society has the luxury to consider moral concerns at all (especially “fringe” concerns like the ethical status of artificial minds), rather than racing to build AGI capabilities as fast as possible.

Currently, very few humans would be concerned about the suffering of artificial intelligences, or indeed fruit flies. How do we persuade the public that moral concern for AGIs is warranted, even when they are structured differently from humans?

I don’t often have faith in moral progress, but I think this is one issue where the arc of history may be on our side, at least as long as human society remains roughly similar to the way it is now. (If AGIs, brain uploads, or other disruptive forces take control of Earth, all bets are off as far as moral progress goes.)

Concern for non-human and even non-animal beings seems to be a natural extension of a physicalist view of consciousness. Keith Ward, a philosopher and born-again Christian, put the idea well when trying to argue against physicalism:

if I thought that people were just very complicated physical mechanisms and nothing more, I would give people really no more respect than I would give to atoms.

This statement is too extreme, because humans are vastly more complicated than single atoms. But the basic idea is right. If all physical systems differ just in degree rather than kind from each other, then it becomes harder to maintain walls of separation between those computational systems that are “conscious” and those that aren’t.

This change of perspective opens the door to caring about a much wider set of physical processes, and I suspect that a decent fraction of people would, upon thinking more about these issues, extend their moral sympathies reasonably far down the levels of complexity that we find in the world. Others, such as perhaps Daniel Dennett or Eliezer Yudkowsky, would recognize that there’s no black-and-white distinction between types of physical processes but would still set their thresholds for moral concern fairly high.

While I think scientific literacy and intellectual openness are important catalysts toward increasing concern for machines, other factors play a role as well. Philosophers have already invented thought experiments to challenge the boundaries between animals and machines, and these will become more plentiful and widespread as machines grow in sophistication. And in analogy with the animal-advocacy movement, there will likely develop groups of machine advocates (of which PETRL is one of the first) that, by taking the issue seriously themselves, will socially persuade others that the topic might be worth exploring.

Convincing the public of the importance of animals can matter in some cases where people would take different actions based on that information. In contrast, there are few actionable exhortations that concern for machines presents to most regular people. Consideration of machine suffering might inspire programmers of AI and other computer systems to be slightly more concerned with the efficiency of their code and hardware usage in order to reduce the number of computations that take place, but given the relatively low weight I give to today’s software, even that isn’t very important.

I think the more important emphasis for now should be on further mapping out scenarios for the far future. Which sorts of computational systems would be widespread and complex enough to pose a substantial moral concern? And how can we change the sorts of outcomes that get realized?

An interview with Eric Schwitzgebel and Mara Garza


Eric Schwitzgebel

Eric Schwitzgebel is a professor of philosophy at the University of California, Riverside. He’s well known in the philosophy community for his work exploring the intersection of psychology and philosophy and his blog “The Splintered Mind”. He’s also written some popular articles on his own research, including “Cheeseburger ethics” on whether professional ethicists are good people. He is also the author of “Perplexities of Consciousness”. He tweets at @eschwitz.

Mara Garza

Mara Garza received her undergraduate degree from the University of California, Berkeley, where she wrote a thesis on Nietzsche’s theory of the will. She then spent a year as a visiting scholar in the philosophy department at the University of Pittsburgh, and in 2013, began her graduate work at the University of California, Riverside.

Her primary research interests are in moral and legal philosophy and in German philosophy (especially Kant, Schopenhauer, and Nietzsche!). In particular, she’s interested in how a variety of issues of intersect with ethics, including motivation and self-control, accounts of agency in ethics and criminal law, AI and technology, identity and gender.

Eric and Mara stood out to us as great interview candidates when we read their article A Defense of the Rights of Artificial Intelligences where they argue that “Our duties to them [AIs] would not be appreciably reduced by the fact that they are non-human, nor by the fact that they owe their existence to us. Indeed, if they owe their existence to us, we would likely have additional moral obligations to them that we don’t ordinarily owe to human strangers – obligations similar to those of parent to child or god to creature.”

The following interview was conducted via email.

Your core thesis is that there are some possible AIs that deserve the same moral consideration that we give to humans. How controversial do you expect this to be in the philosophical community?

Eric and Mara: The thesis, as stated, is so modest or “weak” that we expect most philosophers will accept it. Philosophers tend to have a liberal sense of what is “possible” based on their exposure to far-out thought experiments (brains in vats manipulated by genius neuroscientists to think they are reading philosophy, molecule-for-molecule twins congealing out of swamp gas by freak quantum accident…). Our aim with that thesis is to establish a baseline claim that we think will be widely (though not universally) acceptable to thoughtful readers.

Once readers accept that claim, we hope they are then led to further thought about exactly which possible AIs would deserve moral consideration and how much. Some of our thoughts about this we expect will be controversial, such as that as AIs might deserve more moral consideration because we have special obligations to them that arise from being their creators and designers.

You refer to works of science fiction in your defence of the psycho-social view of moral status, saying that they help illustrate certain scenarios and invite certain moral views. To what extent can reflection on sci-fi answer more detailed questions about our moral stance towards AIs, such as when they could be conscious, or what obligations we owe to them? If reflection on sci-fi can help us answer these questions, what answers do you think it favours?

Eric: I reject the idea that philosophy is necessarily conducted via expository essays. A thoughtful piece of fiction is a type of thought experiment, and if it delves into philosophical issues in a thoughtful way, then it is every bit as much a work of philosophy as is an expository essay. One advantage that extended works of fiction have over the one-paragraph thought experiments typically found in expository essays is that extended works of fiction more fully engage the imagination and the emotions. Philosophical thinking that does not adequately engage the imagination and the emotions leaves out important dimensions of our cognitive life that should inform our philosophical judgements, especially about moral issues.

I think that wide exposure to thoughtful science fiction clearly reveals that the moral status of AIs should be guided entirely by the psychological and social properties of the AIs and not by facts about their material architecture, species membership, bodily shape, or manufactured origin, except insofar as the latter facts influence their psychological and social properties. Asimov’s robots, Data from Star Trek, R2D2 and C3P0 from Star Wars, the sentient ships of Iain Banks ­ these are only some of the most prominent examples. The reader is invited to regard such entities as conscious, intelligent, and possessing desires, and in light of those facts to deserve moral consideration similar to that of human beings.

It is less clear what science fiction reveals about AI consciousness. My view is that science fiction tends to work best as exciting, plot-driven fiction when the reader is invited to assume that the AI who outwardly acts as if it is conscious is in fact really conscious ­ as with Data, C3P0, etc. But that issue is usually a starting point in the fiction, taken for granted, rather than something that is explored with a critical eye. Some fiction does explore epistemological questions about the boundaries of AI sentience, but such fictions are less common, and the issue is philosophically tricky. Our society hasn’t explored that issue, either in fiction or in expository philosophy, in nearly the depth that it ought to.

In your defense against the objection from existential debt, you use a thought experiment to the effect that it would be morally wrong to have a child and then kill them painlessly at the age of 9 to argue that an AI’s existential debt to us does not justify our otherwise-immoral treatment of them. However, in the blog post The moral limitations of in vitro meat, Levinstein and Sandberg argue that a future where humans lead happy lives cut short (perhaps to feed some blood-thirsty alien race) would be preferable to extinction, and that therefore we ought to have ‘happy meat’ instead of phasing out animal agriculture. Do you agree? If so, what do you think that this implies about our obligation to AIs?

Eric and Mara: We are somewhat reluctant to take a public stand on the issue of humanely raised meat, on which there is a large and complex existing literature that is beyond the scope of our current research. However, the case of aliens raising humans is within our scope.

We are inclined to think that if the only two options to consider are the extinction of humanity vs. humanity’s continued existence with happy lives cut painlessly short, the latter would be preferable all else being equal. Cutting a person’s life short in such a case might still be morally execrable murder, but if the choice is between mass murder and genocide-to-extinction, we think the former is probably less bad, if there is no way to avoid acting and if the agents committing the atrocities are the same in both cases. (This last caveat is to acknowledge some doubt about whether it would make sense for you, as an agent, to commit mass murder to prevent someone else from committing genocide-to-extinction.) Maybe a good science fiction story could flesh this out in a bit more detail, to give us a bit more imaginative footing in thinking what would really be involved on one side or the other.

We’re not sure how much follows for AIs from this. However, we’re inclined to think that there are at least some conceivable cases in which allowing mass murder of human-grade AIs would probably be less bad than allowing genocide-to-extinction. But yuck, as we write this, it feels horrible to say, somehow too calculating and cold. There might be room here for a view on which refusing to even make that kind of calculation is morally the best course.

Robin Hanson imagines an “em economy” scenario, where we make large numbers of computer emulations of humans, or “ems”, to perform various useful tasks. One of the many aspects of this scenario that invites moral inquiry is that it will sometimes be useful to create an em that has a short lifespan, and will soon be terminated, perhaps against their will (for an intriguing example, see “Bits of Secrets”. On one hand, it seems prima facie wrong to cut short a happy em life. On the other hand, these ems would not be created if we were not allowed to cut their lives short. If we imagine ems that are specifically designed for this purpose, the unique skills and characteristics that they would have makes their not being created arguably akin to the extinction of some human culture (albeit a culture that never had a chance to exist). The em scenario offers various disanalogies to a similar scenario with real humans: for instance, we could program the ems to have memories of long happy lives and/or not fear death (although this could make them less useful in the linked example). What are your opinions on the morality of creating and killing such ems?

Eric: This is a fascinating ethical question. It is related to a couple of other fascinating questions that we think AI ethics raises, including the ethics of creating cheerfully suicidal AI slaves and the challenge of how to conceive of “equal rights” when faced with AIs that can merge and duplicate at will (e.g., how many votes and how many social benefits should a recently fissioned entity get).

I don’t see a simple answer to these types of questions. I think that it would be a serious moral mistake to think it’s always okay to create and then kill at whim any AI whose life was overall good. Once a conscious being is created with human-like intelligence and emotions, it normally has a claim on our moral concern. It would be odious, for example, to create a human child and then kill it painlessly after eight happy years so that you can use the child care money to purchase a boat instead; similarly for an AI child, I think, if it is born into a similar psychological and social situation.

On the other hand, reflection on some science fiction examples, for example, in Linda Nagata’s Bohr Maker and David Brin’s Kiln People, inclines me to think that under some conditions it can be okay to spawn temporary duplicates of yourself who are doomed to extinction. One feature of the Nagata and Brin cases that seems relevant is that the duplicates identify with the future continuation of the being they were spawned from, and care more about its welfare than about their own welfare as separate entities. They will sacrifice themselves for its well being; and normally (but not always) their memories will be merged back into it. I don’t think this is sufficient for the moral permissibility of making doomed spawn, since we can imagine cases where a spawn has that sort of attitude in a way that is clearly irrational or problematic (e.g., maybe it wouldn’t have that attitude, except that it was forcibly reprogrammed into the attitude against its own protests); but it’s a start.

The cheerfully suicidal slave raises a whole different range of issues. Suppose, for example, that we create a conscious sun probe who wants nothing more than to die on a scientific mission to the Sun. Suppose it’s advantageous to make a probe that is conscious, because consciousness relates in some inextricable way to its successful functioning as a probe (e.g., maybe the probe works best if it can create creative scientific theories on the fly in a conscious, self-reflective way). And suppose that, knowing that, we program it so that it gets immense pleasure from a three day suicide mission into the Sun’s photosphere. Maybe this is terrific! We’ve created something great in the world, useful for us, intrinsically awesome, and bursting with pleasure? Or maybe we’ve done this horrible thing of creating a brainwashed slave so content with its slavery and limited in its vision that it doesn’t value its own continued existence?

Further progress on these topics will require detailed thinking through a variety of cases. It’s the kind of exciting issue that should keep ethicists busy for a long time, if AI technology continues to progress.

You advocate an Excluded Middle Policy, whereby we should only make AIs whose moral status is clear, and avoid the creation of ‘edge-case’ AIs. We can imagine a world where the field of AI advances much more quickly than the philosophy of consciousness and morality, such that most of the AIs that we could make would be edge-cases. How likely do you think that this is to transpire?

Eric and Mara: We think that is quite possible. Eric, especially, is pessimistic at least in the medium-term about our ability to develop a good theory of consciousness, despite his thinking that consciousness is extremely important to moral status.

Probably it’s good to create lots of happy, fulfilled beings. We want to be a little cautious about that claim, given that strong versions of that type of claim invite the conclusion that people have a moral obligation to have as many happy children as they can afford, and it’s not clear that people do in fact have such an obligation. Also, it’s not entirely clear whether it would be good to create a dozen happy beings and one horribly miserable being, compared to not creating any of those beings.

But let’s say that our best, most philosophically and technologically informed judgement is that it’s 50% likely that we can create a million happy, fulfilled human-grade AIs in a simulated world, with no significant suffering, for only a small amount of money; and 50% likely that by spending that money we’d just be creating a non-conscious sim with no significant moral value. In such a case, it seems misguided to condemn someone who launched such a world just because they violated the Excluded Middle Policy.

We don’t intend that people interpret our proposed Excluded Middle Policy as exceptionless. We suggest that it’s a good policy to consider as a default, but as with most policies, it could be thoughtfully set aside in a good cause. The core idea is that if you create an entity that you are only 50% confident deserves rights, then you’re risking a substantial moral loss. If you treat it as though it deserves rights and it does not, then you might end up sacrificing the interests of some entities who really do deserve rights for something that doesn’t. Conversely, if you treat it as though it does not deserve rights and it does deserve rights, then you might end up perpetrating moral wrongs against it, for example by shutting it down at whim. If you compromise by giving it half as many rights, but might be treating it much worse than it deserves; or alternatively you might still end up sacrificing substantial human welfare for no good result. Better, if possible, to be clear from the outset which entities deserve rights and which do not.

Suppose that the research into and design of AI continues without any attempt to engineer the level of moral worth of AIs. Do you think that the creation of morally relevant AI would be likely in this scenario?

Eric and Mara: We’re about 50/50 on that question. But even if we were 99% confident that morally relevant AIs would not be created, the remaining 1% would be highly significant, since in that scenario we might end up committing whole holocausts without realizing it. So we think the moral issues are worth getting clear about almost regardless of one’s opinions of the probabilities.

It seems that we will have the ability to create a large population of morally valuable AIs - perhaps in a “Sim” scenario, where we put them in a simulated world and they live happy and good lives. Above, you said that “probably it’s good to create lots of happy, fulfilled beings”. Does this imply that we should be figuring out how to make morally valuable AI?

Eric and Mara: We have argued that the launcher and manager of a simulated world full of conscious AIs would literally be a god to those AIs. So this question is tantamount to asking if we should aim to become gods.

How hubristic that sounds! We aren’t sure that humanity is ready for that sort of power. But maybe. Maybe if it’s done with extreme caution, humility, and oversight, with very clear and conservative regulatory structures.

We see two risks that trade off against each other here. On the one hand ­ what we have emphasized ­ is the moral risks and benefits for the AIs: the good of creating them, and of treating them well, and of giving them maybe the power and respect that we give to human peers. But on the other hand, there’s the complementary risk ­ emphasized especially in the work of Nick Bostrom ­ that by creating AIs sophisticated enough to have moral status and then giving them rights that suit their status, we create risks to humanity that we might not be well prepared to handle.

So it’s a morass. If AI research continues to advance a lot farther, there will be huge moral and prudential risks and benefits whatever we choose. We have only dipped our toe in the waters.

If we do decide to play our hand as gods or as Dr. Frankensteins, we want to be ready to greet our creations with a “Welcome to Reality!” sign and some pleasure stimulus, rather than with slavery, torture, and death.

Sign up for our mailing list

Your name

Your email