Skip to content

Uncategorized

Habits as action sequences: hierarchical action control and changes in outcome value

Dezfouli, Lingiwi and Balleine (2014) advocate hierarchical reinforcement learning (hierarchical RL) as a framework for understanding important features of animal action learning.

Hierarchical RL and model-free RL are both capable of coping with complex environment where outcomes may be delayed until a sequence of actions is completed. In these situations simple model-based (goal-directed) RL does not scale. The key difference between hierarchical and model free RL is that in model free RL actions are evaluated at each step, whereas in hierarchical RL they are evaluated at the end of an action sequence.

The authors note two features of the development of habits. The concatenation of actions, such that sequences can be units of selection, is predicted by hierarchical RL. The insensitivity of actions to the devaluation of their outcomes is predicted by model-free RL. Here they report experiments, and draw on prior modelling work, to show that hierarchical RL can lead to outcome devaluation insensitivity. This encompasses these two features of habit learning under a common mechanisms, and renders a purely model-free RL account of action learning redundant. Instead model-free RL will be subsumed within a hierarchical RL controller, which is involved in early learning of action components but will later devolve oversight (hence insensitivity to devaluation).

Hierarchical RL leads to two kinds of action errors, planning errors and action slips (for which they distinguish two types).

Planning errors result from ballistic control, meaning that intervening changes in outcome do not affect the action sequence.
Action slips are also due to ‘open-loop control’, ie due to a lack of outcome evaluation for component actions. The first kind is where ballistic control means an action is completed despite a reward being delivered midsequence (and so rendering completion of the action irrelevant, see refs 30 and 31 in the original). The second subcategory of action slip is ‘capture error’ or ‘strong habit intrusion’, which is where a well rehearsed completion of a sequence runs off from initial action(s) which were intended as part of a different sequence.

I don’t see a fundamental difference between the first type of action slip and the planning error, but that may be my failing.

They note that model free RL does not predict specific timing of errors (hierarchical RL predicts errors due to devaluation in the middle of sequences, and habitual intrusions at joins in sequences, see Botvinick & Bylsma, 2005), and doesn’t predict action slips (as Dezfouli et al define them)

EXPT 1

They use a two stage decision task to show insensitivity to intermediate outcomes in a sequence, in humans.

Quoting Botvinick & Weinstein (2014)’s description of the result, because their own is less clear:
“they observed that when subjects began a trial with the same action that they had used to begin the previous trial, in cases where that previous trial had ended with a reward, subjects were prone to follow up with the same second-step action as well, regardless of the outcome of the first action. And when this occurred, the second action was executed with a brief reaction time, compared to trials where a different second-step action was selected.”

The first action, because it was part of a successful sequence, was reinforced (more likely to be choosen, quicker), despite the occasions when the intermediate outcome – the one that resulted from that first action – was not successful.

EXPT 2

Rats tested in extinction recover goal-directed control over their actions (as indicated by outcome devaluation having the predicted effect). This is predicted by a normative analysis where habits should only exist when their time/effort saving benefits outweigh the costs.

The authors note that this is “consistent with a report showing that the pattern of neuronal activity, within dorso-lateral striatum that marks the beginning and end of the action sequences during training, is diminished when the reward is removed during extinction [37]”

Discussion

They review evience for a common locus (the striatum of the basal ganglia) and common mechanism (dopamine signals) for action valuation and sequence learning. Including:
“evidence suggests that the administration of a dopamine antagonist disrupts the
chunking of movements into well-integrated sequences in capuchin monkeys [44], which can be reversed by co-administration of a dopamine agonist [45]. In addition, motor chunking appears not to occur in Parkinsons patients [46] due to a loss of dopaminergic activity in the sensorimotor putamen, which can be restored in patients on L -DOPA [47].”

My memory of this literature is that evidence on chunking in Parkinsons is far from convincing or consistent, so I might take these two results with a pinch of salt.

Their conclusion: “This hierarchical view suggests that the development of action sequences and the insensitivity of actions to changes in outcome value are essentially two sides of the same coin, explaining why these two aspects of automatic behaviour involve a shared neural structure.”

REFERENCES

Botvinick, M. M., & Bylsma, L. M. (2005). Distraction and action slips in an everyday task: Evidence for a dynamic representation of task context. Psychonomic bulletin & review, 12(6), 1011-1017.

Botvinick, M., & Weinstein, A. (2014). Model-based hierarchical reinforcement learning and human action control. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1655), 20130480.

Dezfouli, A., Lingawi, N. W., & Balleine, B. W. (2014). Habits as action sequences: hierarchical action control and changes in outcome value. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1655), 20130482.

Teaching critical thinking: What if? What if not?

I am teaching a course, where I ask students to critically review papers reporting psychology experiments. It is making me question how you try and teach a skill as fundamental as critical thinking. I am trying to do it by example, walking the students through my reading of each paper, showing them what I do and consider when looking at it, and then providing them with a model answer (which is identical in structure to the final coursework I will require of them).

Reading a few practice questions students have handed in, I’m struck that there is a habit I want my students to acquire, which they haven’t quite got yet, and which I don’t even know the name of. Which I why I write this here, to ask you – Kind Interwebs – if you know what it is I am about to talk about and how I can convey it best to people taking my course.

This habit is that of asking the skeptical follow up questions to every proposal they make. So, for example, most undergraduate psychology students will have up their sleeve a well rehearsed list of possible flaws in experiments. Things like: was the sample representative of the population? Are there confounds in the experiment which prevent you cleanly inferring the causation?

It is nice, but inadequate, to write a review of a paper listing flaws like this. Doing so does not constitute a useful or interesting critical review (or, on my course, a gradeworthy one).

What I would like to encourage is my students to go the extra mile and, having spotted a potential flaw, assess it for plausibility, and consider what it would mean if the flaw was a significant one (i.e. how does it limit our interpretation of the current experiment, and what does it mean for future possible experiments?).

Here’s a concrete example: one paper I am teaching is one of my own, that looked at how students’ use of a course wiki predicted final exam score. A student suggested that because students knew their wiki use was being monitored, demand effects may have played a role (demand effects are a classic psychology experiment confound: participants distort their behaviour according to what they think you want to find). Now this is fair enough, but there are a number of follow up questions.

What is required for this to be the case?
That students were able and willing to alter their final exam score based on their wiki use, but not because of it, perhaps. This seems implausible

What is implied if it is the case? If the demand effect did hold, would it even mean that the wiki use wasn’t effective? For example, we might decide that since students can’t easily score higher on exams at a whim, even an effect via demand was an effect worth having

How could I test if it is the case? Demand effects may hold, but how could we tell if they do hold?

What if it isn’t the case? What are the differences between the two situations Imagine two worlds with and without demand effects. What are the crucial differences between them, and what implications do these differences have for our experiment interpretation or further research? If there are no major differences, maybe we don’t need to worry about demand effects.

I pick demand effects because I wanted to use a specific example, but my aim is to encourage students to deploy these questions about every possible flaw or improvement that they suggest. My question today, though, is is there a general principle which students could follow to guide them in asking these kind of skeptical follow up questions? It seems like there isn’t anything too domain specific about this, so even if you aren’t an expert in psychology experiments you could semi-independently develop this skill of probing the logical structure of claims about an interpretation. It also seems that, cognitively, such thinking puts a heavy demand on your working memory, since it consists of layers and iterations of hypotheticals and counter-factuals. This makes it extra hard, is there any way to make it easier?

If there is no general principle, it may be that me (and my students) are stuck with going through worked examples. I’m seeking short cuts up the mountain.

Quote #300: Graeber on the popular appeal of the right

One of the perennial complaints of the progressive left is that so many working-class Americans vote against their own economic interests—actively supporting Republican candidates who promise to slash programs that provide their families with heating oil, who savage their schools and privatize their Medicare. To some degree the reason is simply that the scraps the Democratic Party is now willing to throw its “base” at this point are so paltry it’s hard not to see their offers as an insult: especially when it comes down to the Bill Clinton– or Barack Obama–style argument “we’re not really going to fight for you, but then, why should we? It’s not really in our self-interest when we know you have no choice but to vote for us anyway.” Still, while this may be a compelling reason to avoid voting altogether—and, indeed, most working Americans have long since given up on the electoral process—it doesn’t explain voting for the other side.

The only way to explain this is not that they are somehow confused about their self-interest, but that they are indignant at the very idea that self-interest is all that politics could ever be about. The rhetoric of austerity, of “shared sacrifice” to save one’s children from the terrible consequences of government debt, might be a cynical lie, just a way of distributing even more wealth to the 1 percent, but such rhetoric at least gives ordinary people a certain credit for nobility. At a time when, for most Americans, there really isn’t anything around them worth calling a “community,” at least this is something they can do for everybody else.

The moment we realize that most Americans are not cynics, the appeal of right-wing populism becomes much easier to understand. It comes, often enough, surrounded by the most vile sorts of racism, sexism, homophobia. But what lies behind it is a genuine indignation at being cut off from the means for doing good.

Take two of the most familiar rallying cries of the populist right: hatred of the “cultural elite” and constant calls to “support our troops.” On the surface, it seems these would have nothing to do with each other. In fact, they are profoundly linked. It might seem strange that so many working-class Americans would resent that fraction of the 1 percent who work in the culture industry more than they do oil tycoons and HMO executives, but it actually represents a fairly realistic assessment of their situation: an air conditioner repairman from Nebraska is aware that while it is exceedingly unlikely that his child would ever become CEO of a large corporation, it could possibly happen; but it’s utterly unimaginable that she will ever become an international human rights lawyer or drama critic for The New York Times. Most obviously, if you wish to pursue a career that isn’t simply for the money—a career in the arts, in politics, social welfare, journalism, that is, a life dedicated to pursuing some value other than money, whether that be the pursuit of truth, beauty, charity—for the first year or two, your employers will simply refuse to pay you. As I myself discovered on graduating college, an impenetrable bastion of unpaid internships places any such careers permanently outside the reach of anyone who can’t fund several years’ free residence in a city like New York or San Francisco—which, most obviously, immediately eliminates any child of the working class. What this means in practice is that not only do the children of this (increasingly in-marrying, exclusive) class of sophisticates see most working-class Americans as so many knuckle-dragging cavemen, which is infuriating enough, but that they have developed a clever system to monopolize, for their own children, all lines of work where one can both earn a decent living and also pursue something selfless or noble. If an air conditioner repairman’s daughter does aspire to a career where she can serve some calling higher than herself, she really only has two realistic options: she can work for her local church, or she can join the army.

This was, I am convinced, the secret of the peculiar popular appeal of George W. Bush, a man born to one of the richest families in America: he talked, and acted, like a man that felt more comfortable around soldiers than professors. The militant anti-intellectualism of the populist right is more than merely a rejection of the authority of the professional-managerial class (who, for most working-class Americans, are more likely to have immediate power over their lives than CEOs), it’s also a protest against a class that they see as trying to monopolize for itself the means to live a life dedicated to anything other than material self-interest. Watching liberals express bewilderment that they thus seem to be acting against their own self-interest—by not accepting a few material scraps they are offered by Democratic candidates—presumably only makes matters worse.

David Graeber (2013), The Democracy Project: A History. A Crisis. A Movement., p123-125

Control your dreams (ebook)

Anyone can learn to have lucid dreams, and this ebook tells you how. Lucid dreams are those dreams where you become aware you are dreaming, and can even begin to control the reality of the dream. Adventure, problem-solving and consequence-free indulgence await! And for those interested in the mind, lucid dreams are a great place to explore the nature of their own consciousness. The ebook is written as a sort of travel guide, telling you what you need to take on your journey and what to expect when you start to lucid dream. It finishes off with a quick review of the scientific literature on lucid dreaming and links and references for further reading if you want to continue your exploration of lucid dreaming.

I wrote this with friend, and lucid dreamer, Cat Bardsley. My wife Harriet Cameron provided some beautiful illustrations which you can find throughout the book (and on the cover you can see here). The book is Creative Commons licensed so you can copy it and share it as you will, and even modify and improve (as long as you keep the CC licensing). It’s available on smashwords on a pay-what-you-want-basis (and that includes nothing, so it is yours for free if you’d like).

“Control your dreams” is my second self-published ebook. You can also get “Explore your blindspot” from smashwords (which is completely free, and also CC licensed). The wonderful folk at 40k books published my essay The Narrative Escape last year (and after doing all the formatting and admin associated with these two new ebooks I am more and more in awe of what they did).

Sweet Dreams!

(Cross-posted at mindhacks.com)

Links for March 2011

Links for August 2010

Links for June-July 2010

Quote #248: Hume’s Bundle


…I may venture to affirm of the rest of mankind, that they are nothing but a bundle or collection of different perceptions, which succeed each other with an inconceivable rapidity, and are in a perpetual flux and movement. Our eyes cannot turn in their sockets without varying our perceptions. Our thought is still more variable than our sight; and all our other senses and faculties contribute to this change: nor is there any single power of the soul, which remains unalterably the same, perhaps for one moment. The mind is a kind of theatre, where several perceptions successively make their appearance; pass, repass, glide away, and mingle in an infinite variety of postures and situations. There is properly no simplicity in it at one time, nor identity in different, whatever natural propension we may have to imagine that simplicity and identity. The comparison of the theatre must not mislead us. They are the successive perceptions only, that constitute the mind; nor have we the most distant notion of the place where these scenes are represented, or of the materials of which it is composed.

David Hume, in A Treatise of Human Nature, Book I, Part 4, Section 6, ‘Of Personal Identity’

Waiting for the miracle


Baby, I’ve been waiting,
I’ve been waiting
Night and day
I didn’t see the time,
I waited half my life away
There were lots of invitations
And I know you sent me some,
But I was waiting
For the miracle,
For the miracle to come

Waiting for the Miracle, Leonard Cohen


Sometimes I don’t know where this dirty road is taking me
Sometimes I can’t even see the reason why
I guess I keep on gamblin’, lots of booze and lots of ramblin’
It’s easier than just a-waitin’ ’round to die

Waitin round to die, Townes Van Zandt (and the Be Good Tanyas)

the wire-mesh mother of methodology


It might be the case that, while exploratory factor analysis isn’t a generally reliable tool for causal inference, for some reason it happens to work in psychological testing. To believe this, I would want to see many cases where it had at least contributed to important discoveries about mental structure which had some other grounds of support. These are scarce. The five-factor theory of personality, as I mentioned above, is probably the best candidate, and it fails confirmatory factory analysis tests. As Clark Glymour points out, lesion studies in neuropsychology have uncovered a huge array of correlations among cognitive abilities, many of them very specific, none of which factor analyses predicted, or even hinted at. Similarly, congenital defects of cognition, like Williams’s Syndrome, drive home the point that thought is a biological process with a genetic basis (if that needs driving). But Williams’s Syndrome is simply not the kind of thing anyone would have expected from factor analysis, and for that matter a place where the IQ score, while not worthless, is not much help in understanding what’s going on.

The psychologist Robert Abelson has a very nice book on Statistics as Principled Argument where he writes that “Criticism is the mother of methodology”. I was going to say that such episodes cast that in doubt, but it occurred to me that Abelson never says what kind of mother. To combine Abelson’s metaphor with Harlow’s famous experiments on love in monkeys, observational social science has been offered a choice between two methodological mothers, one of the warm and cuddly and familiar and utterly un-nourishing (the old world of linear regression, analysis of variance, factor analysis, etc.), the other cold, metallic, hurtful and actually able to help materially (statistical methods which are at least not definitely unable to do what people want). Not surprisingly, social scientists, being primates, overwhelmingly go for the warm fuzzies. This, to me, indicates a deep failure on the part of the statistical profession to which I am otherwise proud to belong. It is never a good sign when your discipline’s knowledge is the wire-mesh mother all the baby monkeys avoid if at all possible. Less metaphorically, the perpetuation of these fallacies decade after decade shows there is something deeply amiss with the statistical education of social scientists.

Cosma Shalizi, ‘g, a statistical myth’