Skip to content

Monthly Archives: February 2015

The Moral Arc

I am seeking suggestions for things to read on a specific topic, which I am struggling to articulate. I would like to read an analysis of how individuals understand their own moral development. Moral philosophers have accounts of what is moral, how it should be understood. This lacks the first person perspective I want to explore – I want to read something that takes seriously the subjective moral life as it is, not as it should be. Experimental philosophers have accounts of differences in people’s responses to moral dilemmas. This is too static – I want to read something that takes seriously our ability to change morally, and particularly to be agents of our own changes in belief. Biographies, particularly of spiritual or political figures, have first person accounts of moral change – why people lost their faith, or changed faith, in deities, parties or principles – but these don’t allow the comparison across people that I’d like.

I wonder if such a book exists. Something like “In a different voice“, but with more emphasis on adult development, or The Intellectual Life of the British Working Classes, with a specific focus on moral change.

The motivation is to escape the implicit model of many psychological accounts, which portray people as passive information processors; at their worse stimulus response machines, but even at their best as mere suboptimal rational agents. I’d like to think more about people as active moral agents – as having principles which are consciously developed, seriously considered, subject to revision, passionately defended and debated. Then, of course the trick is to design empirical psychology research which, because it takes this perspective seriously, allows this side of people to manifest rather then denying or denigrating it.

Habits as action sequences: hierarchical action control and changes in outcome value

Dezfouli, Lingiwi and Balleine (2014) advocate hierarchical reinforcement learning (hierarchical RL) as a framework for understanding important features of animal action learning.

Hierarchical RL and model-free RL are both capable of coping with complex environment where outcomes may be delayed until a sequence of actions is completed. In these situations simple model-based (goal-directed) RL does not scale. The key difference between hierarchical and model free RL is that in model free RL actions are evaluated at each step, whereas in hierarchical RL they are evaluated at the end of an action sequence.

The authors note two features of the development of habits. The concatenation of actions, such that sequences can be units of selection, is predicted by hierarchical RL. The insensitivity of actions to the devaluation of their outcomes is predicted by model-free RL. Here they report experiments, and draw on prior modelling work, to show that hierarchical RL can lead to outcome devaluation insensitivity. This encompasses these two features of habit learning under a common mechanisms, and renders a purely model-free RL account of action learning redundant. Instead model-free RL will be subsumed within a hierarchical RL controller, which is involved in early learning of action components but will later devolve oversight (hence insensitivity to devaluation).

Hierarchical RL leads to two kinds of action errors, planning errors and action slips (for which they distinguish two types).

Planning errors result from ballistic control, meaning that intervening changes in outcome do not affect the action sequence.
Action slips are also due to ‘open-loop control’, ie due to a lack of outcome evaluation for component actions. The first kind is where ballistic control means an action is completed despite a reward being delivered midsequence (and so rendering completion of the action irrelevant, see refs 30 and 31 in the original). The second subcategory of action slip is ‘capture error’ or ‘strong habit intrusion’, which is where a well rehearsed completion of a sequence runs off from initial action(s) which were intended as part of a different sequence.

I don’t see a fundamental difference between the first type of action slip and the planning error, but that may be my failing.

They note that model free RL does not predict specific timing of errors (hierarchical RL predicts errors due to devaluation in the middle of sequences, and habitual intrusions at joins in sequences, see Botvinick & Bylsma, 2005), and doesn’t predict action slips (as Dezfouli et al define them)


They use a two stage decision task to show insensitivity to intermediate outcomes in a sequence, in humans.

Quoting Botvinick & Weinstein (2014)’s description of the result, because their own is less clear:
“they observed that when subjects began a trial with the same action that they had used to begin the previous trial, in cases where that previous trial had ended with a reward, subjects were prone to follow up with the same second-step action as well, regardless of the outcome of the first action. And when this occurred, the second action was executed with a brief reaction time, compared to trials where a different second-step action was selected.”

The first action, because it was part of a successful sequence, was reinforced (more likely to be choosen, quicker), despite the occasions when the intermediate outcome – the one that resulted from that first action – was not successful.


Rats tested in extinction recover goal-directed control over their actions (as indicated by outcome devaluation having the predicted effect). This is predicted by a normative analysis where habits should only exist when their time/effort saving benefits outweigh the costs.

The authors note that this is “consistent with a report showing that the pattern of neuronal activity, within dorso-lateral striatum that marks the beginning and end of the action sequences during training, is diminished when the reward is removed during extinction [37]”


They review evience for a common locus (the striatum of the basal ganglia) and common mechanism (dopamine signals) for action valuation and sequence learning. Including:
“evidence suggests that the administration of a dopamine antagonist disrupts the
chunking of movements into well-integrated sequences in capuchin monkeys [44], which can be reversed by co-administration of a dopamine agonist [45]. In addition, motor chunking appears not to occur in Parkinsons patients [46] due to a loss of dopaminergic activity in the sensorimotor putamen, which can be restored in patients on L -DOPA [47].”

My memory of this literature is that evidence on chunking in Parkinsons is far from convincing or consistent, so I might take these two results with a pinch of salt.

Their conclusion: “This hierarchical view suggests that the development of action sequences and the insensitivity of actions to changes in outcome value are essentially two sides of the same coin, explaining why these two aspects of automatic behaviour involve a shared neural structure.”


Botvinick, M. M., & Bylsma, L. M. (2005). Distraction and action slips in an everyday task: Evidence for a dynamic representation of task context. Psychonomic bulletin & review, 12(6), 1011-1017.

Botvinick, M., & Weinstein, A. (2014). Model-based hierarchical reinforcement learning and human action control. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1655), 20130480.

Dezfouli, A., Lingawi, N. W., & Balleine, B. W. (2014). Habits as action sequences: hierarchical action control and changes in outcome value. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1655), 20130482.

Limits on claims of optimality

Jarvstad et al (2014) provide a worked illustration showing that it is not straightforward to declare perceptuo-motor decision making optimal, or even more optimal when compared to cognitive decisions.

They note that, in contrast to cognitive level decisions, that perceptuo-motor decisions have been described a optimal or near optimal (Seydell et al, 2008; Trommershäuser et al, 2006). But they also note that there are differences in the performance conditions and criteria of assessment as optimal between perceptuo-motor and cognitive decisions. Jarvstad et al (2013) demonstrated that when these differences are eliminated, claims about differences between domains are harder to substantiate.

In this paper, Jarvstad et al (2014) compare two reaching tasks to explore the notional optimality of human perceptuo-motor performance. They show that minor changes in task parameters can affect whether participants are classified as behaving as optimally or not (even if these changes in task parameters do not effect the level of performance of an optimal agent). Specifically they adjusted the size and distance of the reaching targets in their experiment, without qualitatively altering the experiment (nor the instructions and protocol at all).

The bound below which performance is classified as sub-optimal depends on a number of factors. The ease of task affects this (for easier tasks observed performance will be closer to optimal), but so does the variability in an optimal agent’s performance, or the accuracy of estimation around an optimal agent’s performance affect. Jarvstad et al conclude that, for this task at least, it is not straightforward to know how changes in an experiment will effect the bounds within which a subject is classified as optimal. They say (p.413):

That statements about optimality are specific and conditional in this way – that is, a behaviour is optimal given a task of this difficulty, and given these capacity constraints included in the optimal agent— may be appreciated by many, however the literatures typically do not make this explicit, and many claims are simply unsustainable once this fact is taken into account.


Jarvstad, A., Hahn, U., Warren, P. A., & Rushton, S. K. (2014). Are perceptuo-motor decisions really more optimal than cognitive decisions?. Cognition, 130(3), 397-416.

Seydell, A., McCann, B. C., Trommershäuser, J., & Knill, D. C. (2008). Learning stochastic reward distributions in a speeded pointing task. The Journal of Neuroscience, 28, 4356–4367.

Trommershäuser, J., Landy, M. S., & Maloney, L. T. (2006). Humans rapidly estimate expected gain in movement planning. Psychological Science, 17, 981–988.

Perceptuo-motor, cognitive, and description-based decision-making seem equally good

Jarvstad et al (2013) show that when perceptuo-motor and ‘cognitive’ decisions are assessed in the same way there are no marked differences in performances.

The context for this is the difference between studies of perceptual-motor and perceptual decision making (which have emphasised the optimality of human performance) and studies of more cognitive choices (for which the ‘heuristics and biases’ tradition has purported to demonstrate substantial departures from optimality).

Jarvstad and colleagues note that experiments in these two domains differ in several important ways. One is the difference between basing decisions on probabilities derived from descriptions verses derived from experience (which has its own literature; Hertwig & Erev, 2009). Another is that perceptual-motor tasks often involve extensive training, with feedback, whereas cognitive decision making task are often one-shot and/or without feedback.

The definition of optimality employed also varies across the domains. Perceptual-motor tasks usually compare performance to that of an optimal agent, often modelled incorporating some constraints on task performance (e.g. motor noise). Cognitive tasks have often sought to compare performance to the standard of rational utility maximisers, designing choices in the experiments precisely to demonstrate violation of axioms on which rational choice rests (e.g. transitivity).

In short, claiming a difference in decision making across these two domains may be premature if other influences on both task performance and task assessment are not comparable.

To carry out a test of performance in the two domains, Jarvstad et al carried out the following experiment. They compared a manual aiming task (A) and a numerical arithmetic task (B). During a learning phase they assessed variability on the two tasks (ie frequency and range of error in physical (A) or numerical (B) distance). Both kinds of stimuli varied in the ease with which the required response could be successfully produced (ie. they varied in difficulty). They also elicited explicit judgements of stimuli that participants judged would match set levels of success (e.g. that they thought they would have a 50% or a 75%, say, chance of getting right).

During a decision phase they asked participants to choose between pairs of stimuli with different rewards (upon success) and different difficulties. Importantly, the difficulties were chosen – using the data provided by the learning phase – so as to match certain explicit probabilities (such as might be provided in a traditional decision from description experiment on risky choice. They also tested such decisions from explicit probabilities, in a task labelled ‘C’).

The results show that all three tasks had a comparable proportion of decisions which were optimal, in the sense of maximising chance of reward (Fig 3A). For all three tasks more optimal decisions were made on those decision which were more consequential (ie which had a bigger opportunity cost and which, consequentially, were presumably easier to discriminate between, Fig 3B – shown).

Fig3B Jarvstad, A., Hahn, U., Rushton, S. K., & Warren, P. A. (2013). Perceptuo-motor, cognitive, and description-based decision-making seem equally good. Proceedings of the National Academy of Sciences, 110(40), 16271-16276.

Using individual participant data, it is possible to recover – via model fitting – the subjective weights for value and probability functions. These show an underweighting of low objective probabilities in the perceptual-motor task (Fig 4D) and a overweighting of low objective probabilities in the classical probability-from-description cask (Fig 4F). This is in line with previous literature reporting a divergence between the domains in the way low probability events are treated (Hertwig et al, 2004). However, Jarvstad use the explicit judgements obtained in the learning phase to show that the apparent discrepancy in weighting results from differences in the subjective probability function (ie how likely success is judged in the perceptual-motor domain) rather than in the weighting given to this probability. If probability estimations are held constant, then similar weightings to low probability events are found across the domains.

They also show that an individual performance on a task is better predicted by their performance on a task in a different domain than by the average performance in that domain – ie that individual differences are more important than task differences in nature and extent of divergence from optimality.


Jarvstad, A., Hahn, U., Rushton, S. K., & Warren, P. A. (2013). Perceptuo-motor, cognitive, and description-based decision-making seem equally good. Proceedings of the National Academy of Sciences, 110(40), 16271-16276.

Hertwig R, Barron G, Weber EU, Erev I (2004) Decisions from experience and the effect of rare events in risky choice. Psychol Sci 15(8):534–539.

Hertwig R, Erev I (2009) The description-experience gap in risky choice. Trends Cogn Sci 13(12):517–523.

The publication and reproducibility challenges of shared data

Poldrank and Poline’s new paper in TICS (2015) asserts pretty clearly that the field of neuroimaging is behind on open science. Data and analysis code are rarely shared, despite the clear need: studies are often underpowered, there are multiple possible analytic paths.

They offer some guidelines for best practice around data sharing and re-analysis:

  • Recognise that researcher error is not fraud
  • Share analysis code, as well as data
  • Distinguish ‘Empirical irreproducibility’ (failure to replicate a finding on the original researchers’ own terms) from ‘interpretative irreproducibility’ (failure to endorse the original researchers’ conclusions based on a difference of, e.g., analytic method)

They also over three useful best practice guidelines for any researchers who are thinking of blogging a reanalysis based on other researchers’ data (as Russ has himself)

  • Contact the original authors before publishing to give them right of reply
  • Share your analysis code, along with your conclusions
  • Allow comments

And there are some useful comments about authorship rights for research based on open data. Providing the original data alone should not entitle you to authorship on subsequent papers (unless you have also contributed significant expertise to a re-analysis). Rather, it would be better if the researchers contributing data to an open repository publish a data paper which can be cited by anyone performing additional analyses.


Poldrack, R. A., & Poline, J. B. (2015). The publication and reproducibility challenges of shared data. Trends in Cognitive Sciences, 19(2), 59–61.

Light offsets as reinforcing as light onsets

Further support that surprise and not novelty supports sensory reinforcement comes from the evidence that light offsets are more-or-less as good reinforcers as light onsets (Glow, 1970; Russell and Glow, 1974). But in the case of light offset, where is the “novel” stimulus that acts as a reinforcer (by supposedly triggering dopamine)? In this case it is even more clear that it is the unexpectedness of the event (surprise), not the novelty of the stimulus (which is absent), that is at play.

From Barto, A., Mirolli, M., & Baldassarre, G. (2013). Novelty or surprise?. Frontiers in psychology, 4.


Glow P. (1970). Some acquisition and performance characteristics of response contingent sensory reinforcement in the rat. Aust. J. Psychol. 22, 145–154 10.1080/00049537008254568

Russell A., Glow P. (1974). Some effects of short-term immediate prior exposure to light change on responding for light change. Learn. Behav. 2, 262–266 10.3758/BF03199191

Animal analogs of human biases and suboptimal choice

Zentall (2015) summarises a rich literature on experiments showing that analogues of canonical human biases exist in animals. Specifically, he takes the phenomena of

  • justification of effort: rewards which require more effort are overweighted
  • sunk cost fallacy: past effort is weighted in evaluation of future rewards
  • less-is-more effect: high value rewards are valued less if presented along with a low value reward
  • risk neglect: overweighting of low probability but high value rewards
  • base rate neglect: e.g. over-reliance on events which are likely to be false positives

The demonstration of all these phenomena in animals (often birds – pigeons and dogs in Zentall’s own research) presents a challenge to explanations of these biases in human choice. It suggests they are unlikely to be the result of cultural conditioning, social pressure or experience, or elaborate theories (such as theories of probability or cosmic coincidence in the case of suboptimal choice regarding probabilities, see Blanchard, Wilke and Hayden, 2014).

Zentall suggests that these demonstrations compel us to consider that suboptimal choice in the laboratory can only exist because of some adaptive value in the wild, with common mechanisms for multiple biases, or the independent evolution of each bias/heuristic in separate modules. At the end of the paper he presents some loose speculations on the possible adaptive benefit of each of the discussed biases.

Three interesting recent results from Zentall’s lab concern risky choice in pigeons

1. Laude et al (2014) showed that for individual pigeons there was a correlation between degree of suboptimal choice on a gambling task (overweighting of rare but large rewards) and impulsivity as measured by a delay discounting task. As well as seeming to show ‘individual differences’ in pigeon personality, it suggests the possibility of some common factors in these two kinds of choices (choices which experimental human work has found to be dissociable in various ways)

2. Zentall and Stagner (2011) show that the conditioned reinforcer (stimuli which predict reward) are critical in the gambling task (for pigeons). Without these intermediate stimuli, when actions lead directly to reward (still under the same probabilities of outcome), pigeons choose optimally. Zentall suggests that the thought experiment on the human case confirms the generality of this result. Would slot machines be popular without the spinning wheels? Or (my suggestion) the lottery without the ball machine? My speculation is that the promise of insight into the causal mechanism governing outcome is important. We know that human and non-human animals are guided by intrinsic motivation as well as the promise of material rewards (ie as well as being extrinsically motivated). Rats, for example, will press a lever to turn a light on or off, in the absence of the food reward normally used to train lever pressing (Kish, 1955). One plausible explanation for results like this is that our learning systems are configured to seek control or understanding of the world – to be driven by mere curiosity – in order to generate exploratory actions which will, in the long term, have adaptive benefit. Given this, it makes sense if situations where there is the possibility of causal insight – as with the intermediate stimuli in the gambling task – can inspire actions which are less focussed on exploiting know probabilities (i.e. are ‘exploratory’, in some loose sense) even if the promise of causal insight is illusory and the exploratory action are, as defined by the experiment, futile and suboptimal.

3. Pattison, Laude and Zentall (2013) showed that pigeons who were given the opportunity for social interaction (with other pigeons) were less likely to choose the improbable large reward action over lower expected value but more certain reward. Zentall’s suggestion is that the experience of social interaction diminishes the perceived magnitude of the improbable reward, making it seem like a less attractive choice (which makes sense if neglect of the probability an focus on the magnitude is part of the dynamic driving suboptimal choice in this gambling task). Whatever the reason, the result is a reminder that the choices of animals – human and non-human – cannot be studied in isolation from the experience and environment of the organism. This may sound like an obviousity, but discussion of problematic choices (think gambling, or drug use) often conceptualise behaviours as compelled, part of an immutable biological (addiction as disease) or chemical (drugs as inevitably producing catastrophic addiction) destiny. This result, and others (remember Rat Park) give lie to that characterisation.


Blackburn, M., & El-Deredy, W. (2013). The future is risky: Discounting of delayed and uncertain outcomes. Behavioural processes, 94, 9-18.

Blanchard, T. C., Wilke, A., & Hayden, B. Y. (2014). Hot-hand bias in rhesus monkeys. Journal of Experimental Psychology: Animal Learning and Cognition, 40(3), 280-286.

Kish, G.: Learning when the onset of illumination is used as the reinforcing stimulus. J. Comp. Physiol. Psycho. 48(4), 261–264 (1955)

Laude, J.R., Beckmann, J.S., Daniels, C.W., Zentall, T.R., 2014. Impulsivity affects gambling-like choice by pigeons. J. Exp. Psychol. Anim. Behav. Process. 40, 2–11.

Pattison, K.F., Laude, J.R., Zentall, T.R., 2013. Social enrichment affects suboptimal, risky, gambling-like choice by pigeons. Anim. Cogn. 16, 429–434.

Zentall, T.R., Stagner, J.P., 2011. Maladaptive choice behavior by pigeons: an animal analog of gambling (sub-optimal human decision making behavior). Proc. R. Soc. B: Biol. Sci. 278, 1203–1208.

Zentall, T. R. (2015). When animals misbehave: analogs of human biases and suboptimal choice. Behavioural processes, 112, 3-13.

Discounting of delayed and uncertain outcomes

Blackburn & El-Deredy (2013) provide a nice review of the literature on temporal (delay) and probabilistic discounting. They note these features of similarity and difference:

  • Both follow hyperbolic (rather than exponential) discount functions
  • Not correlated: impulsivity might be interpreted as steeper temporal discounting, and shallower probabilistic discounting, but steep temporal discounters don’t appear to be shallow probabilistic discounters
  • Magnitude effect opposite: for probabilistic rewards, larger rewards are more steeply discounted, for delayed rewards, larger rewards are more shallowly discounted
  • Sign effect the same: Gains vs loses effect discounting similarly for temporal and probabilistic discounting (gains are more steeply discounted).

To this list their experiments add a dissociable effect of ‘uncertain outcomes’ (all or nothing) vs ‘uncertain amounts’ (graded reward).

Blackburn, M., & El-Deredy, W. (2013). The future is risky: Discounting of delayed and uncertain outcomes. Behavioural processes, 94, 9-18.

Permanent Zero

‘Email overload’ is one of those phrases everyone thinks they know the meaning of: “I get too many emails!”. Last autumn I met Steve Whittaker, who has a reasonable claim to have actually coined the phrase, way back in 1996. He explained to me that the point wasn’t to say that we get to much email, but that email is used for too many different things. We’re using it to send messages, receive messages, get notifications, schedule tasks, chat, delegate tasks, archive information and so on forever.

Shifting the focus from email as number of individual messages (too many!), to email as functions (still too many!) lets you see why the ‘Inbox Zero‘ idea doesn’t quite work. Inbox Zero appeals to my sense of being in control over my email, and it is better for me than not having a righteous scheduling system for my email, but it doesn’t split the multiple functions for which I use email.

Now, for you today, I’d like to share my newest strategy for managing my email, which is inspired by Whittaker’s ‘Email overload’ distinction.

The first thing to do is to separate off the single largest function of email – receiving messages – from the others. You need to stop emails arriving in your inbox, leaving you free to send and search without distraction. Create a filter and have all incoming mail moved to that folder. Now stare in satisfaction at “You have no new email!” in your inbox. Schedule a time to go to your received mail folder and kill as many emails as you can, using your favourite inbox zero strategies (protop: if you send emails at 4.30 you minimise the chances of someone replying that day). Now your workflow which only involves sending messages and dealing with old messages isn’t tangled up with the distraction of receiving new messages.

Next, separate off all email that isn’t personal correspondence. Set a second filter which removes all email without your email address in the ‘to’ or ‘cc’ fields. These are circulars. You can scan the titles and delete en mass.

If you are using gmail, you can import these filters (after editing to make relevant adjustments).
remove from inbox, unless sent to ‘exception’ address
remove all circulars
Right click to ‘save as’, they won’t show up in a browser. Note that my new folders begin with ‘A_’ so they are top of my alphabetised folder list.