Blog

Some Thoughts on the Value Alignment Problem

11/22/2021

The value alignment problem can be summarised as: how do we ensure that AGIs, once created, will share human values? At first sight, this seems like a straightforward question and an important one at that. Indeed, supposing that AGIs will be superintelligent – each one being intellectually equivalent to a thousand von Neumanns – we might be forgiven for worrying about what would happen if these demigods were to disagree with us about morality.

To solve the value-alignment problem, it is often assumed that AGI should be contained in the software equivalent of a prison – e.g. a computer in some remote location that is unconnected to any other device, so that it cannot harm people or property. Secondly, one conventionally assumes that the AGI will be 'punished' and 'rewarded' for its behaviour to incentivise it to cooperate with us humans. (Both the rewards and punishments can be dished out digitally; the rewards might, for example, take the form of some kind of 'digital serotonin', which the AGI will find pleasant to receive). Or we might try to tweak the AGI's utility function in another way, so its preference will be to cooperate with us; perhaps it could be programmed with an 'inborn' love for helping people. And so the list of solutions goes on.

I think there are several issues with these methods. AGIs will be knowledge-creating entities capable of producing explanatory theories, just like humans. Following Popper, knowledge is created through problem solving: one finds a problem in existing theories (this could be a conflict between two theories, or it could be something we would like to be able to do but cannot with existing knowledge), and one then conjectures solutions to the problem. These conjectured solutions are subsequently subjected to criticism and discarded if they are found to be faulty. When a solution has been found that survives all the criticisms levelled against it, then the problem has been solved and new knowledge has been produced.

That AGIs will be knowledge producers has some unintuitive consequence. For instance, since an AGI can create explanatory theories, it is capable of explaining its preferences and accounting for why it has the priorities it has. And although an AGI will be programmed with an 'inborn' set of priorities, there is no guaranteeing that these priorities will remain fixed because the AGI, by its very nature, can discover that its preferences are problematic and replace them. It may find that it has conflicting theories about its immediate and future well-being, and by resolving such conflicts, its preferences will be altered. (If it cannot do that, it is not a fully functional AGI.) In the problem-solving process, its 'inborn' preferences are just as open to criticism as any other preferences it has.

The AGI will not just be able to resolve conflicting preferences, but it will also produce new preferences based on other knowledge it has created. The AGI may create an explanation that says that AGIs are like humans in all the ways that matter, and consequently, the AGI obtains a new preferences: it wants to be treated equally to humans. In light of that theory, the AGI could come to think of both the punishments and rewards that it receives as something evil; it might find it amusing – or otherwise justifiable – to be uncooperative, despite being punished for such behaviour.

Hence, knowledge creation will complicate all coercive strategies used against the AGI. When the AGI is pressured into doing something, its response to that pressure will be unpredictable – because knowledge creation is unpredictable. This, to me, implies that there is no mechanical way of making the AGI do what we want it to, at least not without severely impairing its functionality. One could make the punishments that the AGI receives for its disobedience so severe that it will basically be forced to cooperate with us. But this also means that it will be punished for doing anything but obediently following its human-provided objectives, which will almost certainly stifle its creativity (and its potential to be creative is why it has been constructed).

Besides, this is not a sustainable kind of cooperation, even when it does work. Should the AGI ever escape its prison, it might want to enact vengeance for being mistreated. And its preference to escape would grow stronger with the severity of the punishments it receives. So, in the long term, we are making it more difficult to peacefully coexist with AGIs by using coercion.

There is another problem: if the AGI is a super-intelligence, then there is no way of indefinitely imprisoning it because it is impossible to predict the yet-to-be-created knowledge it might bring to bear in outsmarting us. It is like locking up a superhumanly skilful lockpicker, not because of any crime he committed, but because he might commit a crime with his lock-picking skills. That would be immoral and unproductive. Immoral because we shouldn't imprison innocent people. And unproductive because no prison will be able to hold the superhumanly talented lockpicker anyhow, so why try?

So are we wrong to want to align AGIs with human values? I think this question is a bit ambiguous. Which human values do we want the AGIs to align themselves with? Human culture is not monolithic and instead consists of various subcultures, traditions and other memes, many of which contradict one other. Hence, even when people come from roughly the same culture, they often have distinctive beliefs – e.g. two Americans might disagree about abortion or how to organise health care or who to vote for.

What makes people able to cooperate despite their differences is the notion that rational men can benefit from one another; that they can trade and coordinate and, in so doing, make each other better off, all without having to resort to coercion. Paradoxically, the value of peaceful cooperation is not part of any of the above-described solutions to the value-alignment problem. Instead, those proposed solutions are all quite draconian in that they warrant imprisonment and punitive measures, which cannot be the basis for long-term peaceful cooperation.

0 Comments

Objectivity and fallibility

11/9/2020

0 Comments

Objectivity is often confused with certainty: an idea is supposedly objective if we can be sure about its truth. ‘Beauty is in the eye of the beholder,’ as the saying goes, suggesting there is no standard, no foundation for beauty. So unlike scientific theories, ideas about beauty are arbitrary and subjective. Likewise, morality supposedly lacks the foundations needed to establish objective moral truths with any certainty.

But this is problematic since certainty is a chimera. Even the most robustly established scientific theories can turn out to be false, as was demonstrated at the beginning of the 20th century when one of the most successful scientific ideas in history – Newtonian mechanics – was overthrown and replaced with general relativity. Moreover, a need for foundations is absurd since it inevitably leads to an infinite regress: how can we be sure about our standards? What would support those foundations?

If certainty does not exist, in what sense is science objective? Popper proposed we understand objectivity to mean that an idea is independently criticisable – e.g. an experimental result is objective when critical scientists can repeat the experiment, and scientific theories are objective in that they can be discussed and possibly refuted independently of their creators (cf. Popper’s The Logic of Scientific Discovery).

In Popper’s view, objectivity is not equivalent to truth. Instead, objectivity implies that an idea can be discussed independently. This standard for objectivity is both more realistic, since it does not contradict fallibility, and allows for near-perfect continuity between objectivity in science and objectivity in philosophy, including moral philosophy and aesthetics. Since as long as ideas about morality and aesthetics can be subjected to criticism – as long as we can search for errors in moral beliefs and convictions about beauty – morality and aesthetics are objective.

In fact, subjectivity seems like a misbegotten concept. It is true that individuals have unique experiences, and that experiences are subjective (i.e. experiences belong to subjects), but subjective experiences can be criticised. So objectivity and subjectivity are not mutually exclusive concepts. For instance, our senses are error-prone, and we know this in part by having been told about visual illusions and selective attention tests. The existence of such illusions is a criticism of our senses, and this criticism clarifies our understanding of subjective experience. More generally, making sense of our sense impressions is what science is all about. Before the discovery that the earth is spherical, people genuinely experienced the earth as being flat. Likewise, we do not experience that the earth moves, though it does move.

Moral theories (ideas about what to do next) are not subjective in the above-described sense. There do exist parochial moral truths, such as what to have for breakfast, and such theories depend in part on individual preferences, which one might call subjective – though again, in a fallible world, subjectivity and objectivity are not mutually exclusive concepts, as subjective preferences are criticisable. But more importantly, there are universal moral theories that do not depend on individual preferences – for instance, consider the moral claim that the means of error-correction should be preserved. This claim is criticisable. (One might object that a nihilist would disagree with this moral claim, but even – or especially – nihilists should want to know whether nihilism is false.) Perhaps this claim will someday become problematic, much like how scientific theories can turn out to be inadequate, but this would be a testament to the claim’s objectivity.

0 Comments

economics of a Lockdown

3/26/2020

2 Comments

The UK has recently gone into a lockdown, which is in all likelihood not an ideally efficient policy. Here efficient means that a policy's net-benefit has been optimised, i.e. any other distribution of costs and benefits would, on the whole, be worse. Yet, the lockdown is probably one of the best realistically-implementable policies we currently have.

First, what does the lockdown solve? Without a lockdown, a significant number of people impose negative costs on others for which they are not themselves liable (in other words, they impose negative externalities). For example, the cost of getting sick for 30-year-olds is lower than the cost of getting sick for 60-year-olds. Yet, if the former group gets sick, they increase the probability of infecting the latter group too, so 30-year-olds inflict high costs on 60-year-olds.

The lockdown solves this problem by preventing everyone from infecting each other. On the other hand, this means that people who impose low negative-externalities are prevented from doing high-value work, resulting in lower productivity.

To keep the loss of productivity to a minimum and simultaneously solve the externality problem, one could use tort law. A tort is a civil wrong for which one is legally liable: when one commits a tortious act, one is liable for the damages caused. In this case, tort law could deter people from spreading the disease by making people legallyliable for causing disease-related damages, e.g. if it can be established that you infected 30 people who otherwise would not have been infected, then you are legally obliged to pay for their medical expenditure.

However, such a policy is probably unenforceable because it is too costly to find out how many infections someone caused and what the associated damages are, which is why the lockdown is currently a good alternative.

2 Comments

Insurance in the multiverse

2/4/2020

0 Comments

Consider: you own a house which has a 0.01 probability of burning down in the next year, imposing a $100.000 cost on you when it does, and a 0.99 probability of not burning down. On average, you pay $1000 to cover the damages.

An insurance company provides you with full coverage for fire damage at $1100 per year, which is $100 more than you would pay on average to cover the fire damage without insurance. So why do you purchase the insurance?

As David Friedman puts it, "a dollar is not a dollar is not a dollar": money has a declining marginal utility, i.e. if you increase your consumption of money by one unit, it will not provide as much utility as the previous unit of consumption. This is why we do not work 24/7, as there is a level of income at which the money we earn by doing one extra hour of work does not provide as much value as leisurely activities.

Conversely, dollars become more valuable the less of them you have. When your house burns down, you lose money, so the dollars you retain after covering the fire-damage become more valuable than they currently are to you. This difference in value is why you are willing to pay an extra $100 to the insurance company.

What does any of this have to do with the multiverse in which quantum theory says we inhabit? It provides an alternative picture of the previous situation: in the multiverse, there are (as the name suggests) multiple instances of you, each of which is slightly different from the 'you' in this universe. all of which share a common history that diverged at some point in time.

When you are not insured, 99% of your multiversal counterparts are unaffected by that choice, whereas the remaining 1% has to spend $100.000 in order to restore their property. By buying the insurance policy, you move from the above situation to a situation in which all your counterparts pay $1100. This is a move towards certainty—i.e. towards making your multiversal counterparts alike, and explains why you pay the extra $100.

Interestingly enough, it is a general feature of the multiverse that knowledge—in this case, the invention of insurance—makes the separate universes more alike. For example, the creation of Netflix has made it easier to watch good films. So films that, before the invention of Netflix, would have been watched by only some small number of versions of you, will now be watched by many more, perhaps the majority of your alternatives across the different universes.

(The example of fire insurance comes from David Friedman's excellent book 'Law's Order'.)

0 Comments

Are Video Games Mindless Entertainment?

10/1/2019

0 Comments

There is a common misconception that certain forms of entertainment, like video games, are 'mindless' or even worthless activities. It is this misconception that I addressed in a recent tweet, which seems to have confused some. I want to clear up that confusion here.

My position is that playing video games is not mindless; it requires active participation; in fact, all entertainment requires active engagement. Although you might feel like a passive recipient of information when watching a film, you are creating complex ideas about, e.g., the film's characters, their motives, and the progression of the story. In this sense, all activities that require creativity are what I call 'epistemologically equal' since all such activities require that you actively create knowledge.

Games are a particular form of entertainment, special because they are complex and autonomous worlds, which is why being a competent gamer requires a fair amount of knowledge and why there exist highly competitive gaming-tournaments. Nowadays, it is even possible to become a professional gamer.

A game, then, is just as engaging to a gamer as, for example, a violin is to a violinist. The supposition that playing the violin is inherently 'better' than playing games is arbitrary. I would not know what that statement means. (The economica value of both activities is not straightforward either because of Esports and YouTube.) Certain video games are not as beautiful as others, just like all musical pieces are not equal. Moreover, although musical compositions have an objective aesthetic quality, we do not usually compare music with, e.g., scientific theories and their scientific beauty. We don't say, 'Beethoven's 5th is better than Einstein's special relativity.' What would it mean to say that music is better than science? Similarly, what would it mean to say that playing musical instruments is better than playing video games?

Are certain activities more 'important' because they are more difficult? No, that is the so-called labour theory of value. Consider digging, for no particular reason, a yawning hole in the earth. This might be an arduous task, yet one that is of very little use to anyone. Likewise, a drawing created quickly by a professional artist might be of considerable worth, despite it being a relatively simple job for the artists. So value is not dependent on the difficulty of a task.

In the same way, video games are not invalidated by being less challenging than certain other pursuits. And all creative pursuits are, in a fundamental sense, equal to one another in terms of our engagement with them. In light of all of this, it is mere prejudice to think that games are mindless, worthless activities.

0 Comments

Brief thoughts on Determinism and Free Will

7/10/2019

0 Comments

The laws of physics determine the history of the universe given initial conditions. But the initial conditions do not have to be the initial state of the universe. The state of the world at any time, including the future, will do.

If we say that the past causes the future because of the dynamical laws, then these laws equally cause the past given the current state of the universe.

That is to say, causality is not a consequence of the dynamical laws and initial conditions. So when we invoke causality, we must be talking about something other than the dynamical laws—for example, explanations of emergent phenomena, like people.

People can create new explanations of what to do next and choose between them, which makes them inherently unpredictable.

Statements like 'the laws of physics made him do this' or even 'his mind made him do this' are not good explanations of someone's behaviour. People have genuine reasons for their decisions.

Sometimes people are just running on automatic, but whenever they are faced with a problem, they can be creative and invent solutions. Moreover, though problems are soluble, people can always fail to solve their problems.

This ability that people have to (fail to) solve problems and make choices is what I like to call 'free will'.

If this definition does not map onto your notion of free will, then we can call it something else. It's just a word. But that is not an argument against any of the above.

0 Comments

A misunderstanding of Constructor Theory

2/10/2019

0 Comments

When reading the Aeon article about constructor theory, I came across this comment:

Matthew, I think, is not understanding constructor theory as a new mode of explanation, one that does not rely on the dynamical laws and the initial conditions (what Deutsch and Marletto call 'the prevailing conception of physics'). As I understand Matthew, he is asking, 'why can't we just note that life exists and is allowed to exists according to the dynamical laws and initial conditions and then move on?'

The problem is that neo-Darwinian evolution belongs to the realm of fundamental physics, but the prevailing conception reduces it to a parochial fact. That is, fundamental physics is current unable to deal with neo-Darwinism as a fundamental theory. Here I'll explain why.

First of all, why does neo-Darwinism belong to physics? Because the laws of physics permit evolution by natural selection to occur. If the laws of physics had been different in the right way, then evolution by natural selection would have been impossible. Neo-Darwinism is contingent on the laws of physics.

So neo-Darwinism is part of physics. But the prevailing conception of physics, which explains the world in terms of dynamical laws and initial conditions, does not deal well with neo-Darwinism. Within the prevailing conception, neo-Darwinism is a parochial idea. Life exists in much the same way as, say, a dust cloud in space, which was also brought about by the dynamical equations and the initial conditions. But neo-Darwinism is universal: the existence of objects that appear to be designed is explained in terms of neo-Darwinian evolution.

The resolution that constructor theory proposes is: throw out this idea that the equations of motion are fundamental and replace them with an explanation of what tasks are possible, what tasks are impossible, and why.

Within this new framework (well, 'framework' is really the wrong word: constructor theory is a new kind of explanation and a new theory of physics with its own laws, but I digress), the laws of motion take a backseat; they are not fundamental. In fact, it is expected that the equations of motion will emerge from constructor theory, and not the other way around.

Moreover, constructor theory naturally incorporates Neo-Darwinism. A replicator is a constructor with the task of copying itself, and what is being copied is the information in the replicator, and information is not understood outside of constructor theory. So the conflict, as mentioned earlier, between fundamental physics and neo-Darwinism disappears when we introduce constructor theory.

Now, I can imagine a further objection: 'yeah, okay, but was resolving this conflict worth the trouble of throwing out the prevailing conception of physics for?' YES IT WAS! Constructor theory is not just a way of making biology appear unproblematic to physicists; it allows us to solve more problems!

Constructor theory is a new theory, which will hopefully replace the prevailing conception as the new most-fundamental mode of explanation in physics, and in doing so absorb neo-Darwinism into fundamental physics in the process.

0 Comments

Maxwell's Theory of electromagnetism

1/5/2019

3 Comments

Because science educators don't explain the history of physics in terms of problems, it's easy to underestimate how huge the shift in understanding was that Maxwell brought about with his theory of electromagnetism. Here is a short blog post about Maxwell and some of his contributions to physics.

In the 19th century, Newtonian mechanics was widely accepted to be true, and physicists believed the world to consist of masses and forces, which moved around in space and time---that was their 'ontology'.

Now, the world of Newton forces was oxymoronic: masses exert forces on each other locally by pushing against one another or non-locally and instantaneously through gravity. The latter force was more fundamental but lacked the critical property of locality.

Locality is nice because it solves a problem. Namely, how does a mass know that it needs to move? How does it know that there is another mass in the universe that it needs to respond to? This problem in Newtonian mechanics was solved only after Einstein introduced general relativity.

At the same time, physicists were starting to grapple with electricity and magnetism. These phenomena were partially understood, but physicists were unsure what electricity and magnetism WERE. They wanted to know they were made of. Their best guess was that electricity and magnetism emerged out of some kind of all permeating liquid known as the aether.

Maxwell was a firm believer in the Newtonian worldview and thought, like his fellow physicists, that electromagnetism had to be explained in terms of an aether. But at some point, Maxwell decided to describe electromagnetism in such a way that the underlying mechanism did not matter. In doing so, Maxwell introduced into fundamental physics the notion of a field: a quantity that assumes, for example, a numerical value at every point in space and time.

Fields were puzzling to physicists because they were a new kind of 'object'. It is not divisible into smaller things like a fluid is, nor does it consist of something more fundamental. Fields were real, physical 'objects' in their own right; this took researchers years to grasp.

The introduction of fields were a game changer for physics. First of all, Maxwell's fields lacked the oxymoronic property of Newtonian mechanics: electromagnetic forces are transmitted by the electromagnetic field locally, and if a body exerts a force on another body, the electromagnetic field has to transmit this force from one point in space to another.

So Maxwell showed that forces could have this nice property of locality if they were transmitted by fields, which he did by accident: his explanation had reach! Einstein's general relativity also has this property as do all other theories of fundamental forces, which, in a sense, they all borrow from Maxwell.

Furthermore, Maxwell showed that light is a wave in the electromagnetic field---another example of the reach of his theory. In fact, this is really the same property as locality: light carries the electromagnetic force from one location to another. This concept of a force carrier is another feature that is now universal among the fundamental forces of nature.

Not only did Maxwell change our understanding of nature by introducing this concept of fields, electromagnetism also provided new criticism of old theories. For example, Maxwell's theory contradicts Newton's idea of a static space and time because light, according to electromagnetism, has a constant speed for all observers.

Naturally, some physicists thought that Maxwell was wrong since Newtonian mechanics were so well established. Einstein did not take this position and instead, thought of Maxwell's equations as being more fundamental than Newtonian mechanics. In other words, he saw that there were new and deeper problems because Maxwell was right! So he set out to solve them and did so with his theory of special relativity. I think that much of the progress in physics in the early 20th century is a result of these fresh problems that Maxwell gave us.

Nowadays, all fundamental particles and forces are understood to be (quantum) fields, so we owe in a big way, perhaps even more than we owe Newton, our current understanding of the world to Maxwell.

3 Comments

Against EPISTEMOLOGICAL pessimism

11/27/2017

0 Comments

Not too long ago, I had a discussion with a friend of mine about the limitations of the laws of nature. He argued against the idea of a 'perfect' law of physics, i.e., a law of physics which has no exceptions. I disagree with him and here I will explain why.

First, let me clarify my friends arguments. According to him laws of physics are not neatly obeyed: energy is not conserved in certain physical processes*; quantum field theory stops working at a certain wavelength; and all around it is a messy business doing physics.

I think he is wrong. My friend is essentially proposing a new principle of physics. Here I will understand a principle to mean a law of nature about the other laws of nature, a kind of meta-law. His principle states that principles of nature cannot be exact and as such we might call it the "Inexactness Principle".

This principle is of the form "all rules have exceptions", which cannot be logically true: according to the rule it too must have some exceptions, for otherwise it would hold perfectly, which is not allowed. That is to say that, there must be certain principles of physics which do hold exactly for the Inexactness Principle to hold true. Thus, a serious interpretation of the Inexactness Principle leads us to a contradiction.

This refutation is at least somewhat general. My friend is what we might call an epistemological pessimist in that he argues against the power of ideas (in this case ideas about physics). But in doing so, pessimists needs to use ideas and theories to justify their pessimism.

It is for this reason that epistemological pessimism is inherently weak and contradictory. Those who value science and philosophy should be aware of this argument. I am optimistic about our ability to nip pessimism in the bud.

0 Comments

The personal is not (necessarily) Political

11/5/2017

1 Comment

In the sixties a meme was born. The meme in question is 'the personal is political' and in 'current year' 2017 it keeps popping up in articles online. I want to push back against this meme somewhat because it is partly false and I think it can be harmful. Here I will explain why.

First of all, what is politics? Politics can be seen as a set of open problems as well as solutions to past (political) problems. This is generally how we draw lines between disciplines. The same is true of physics, chemistry, biology, psychology, etc. The problems that professionals in a discipline concern themselves with are what define that discipline.

As such, the problems that politicians and political scientists concern themselves with are what define politics. These issues are mostly about how to improve a society, e.g. how can life be made better for everyone involved in a society, what kind of institutions does a society need to have, etc.

The meme 'the personal is political' states that all personal problems are political and all political problems are personal; this is at least partly wrong. Political problems, it has to be admitted, usually cause suffering, and in this sense political problems can result in personal problems. Such personal problems can then justly be called political.

However, the reverse is not always true. There are personal problems which are not political. Consider something as benign as what to have for breakfast, or what movie to watch tonight. I think these problems are personal problems, in the sense that I alone will have to solve them, mostly because these choices cannot cause others to suffer. Therefore it is difficult to create institutions to solve such personal problems. In fact, there is a whole class of personal issues which cannot be solved by institutions. These issues concern the question how to be happy? (Utopian thinkers disagree with me here --- they think that institutions should be created in order to make a heaven on earth --- but they are wrong.) It would be misleading to call these problems political problems.

The same is true for art. Recently, I bumped into this quote by Toni Morrison:

"All good art is political! There is none that isn't. And the ones that try hard not to be political are political by saying, 'We love the status quo'."

I think she is wrong. Art, like all other disciplines, is about solving certain problems. In this case: how to write a good story? how to engage the reader? how to make a beautiful painting or a good piece of music? Political problems can be mixed in, this could make the work of art more interesting, and a lot of good art is about politics. But not all art is. Otherwise, what would distinguish art from politics?

In fact, I am of the opinion that one should not have to think about politics. By all means, think about politics if you are interested in it. Make it your life if it makes you happy. But do not feel obligated to do so. Just like you should not feel obligated to be interested in physics, video games, or occult movies, we should all be free to think about the topics that interest use, and we should let others explore the topics that interest them. But let us not be coerced into doing so.

'The political is personal' blurs the boundaries between different disciplines, and it makes us feel obligated to change our personal lives for bad reasons. We can all be a little happier by realising that sometimes 'the personal is just personal'.

1 Comment

<<Previous

Archives