18 December 2017

A review of the research of Dr. Nicolas Guéguen

Over the last couple of weeks, James Heathers and I have been blogging about some rather strange articles by Dr. Nicolas Guéguen, of the Université Bretagne-Sud in France.  In this joint post, we want to summarise the apparent issues that we have identified in this researcher’s output.  Some of the points we make here have already been touched on (or more) in an excellent article by Cathleen OGrady at Ars Technica.

There is a lot to read

The first thing one notices in looking at his Google Scholar record is that Dr. Guéguen is a remarkably prolific researcher.  He regularly publishes 10 or more sole-authored empirical articles per year (this total reached 20 in 2015), many of which include extensive fieldwork and the collection of data from large numbers (sometimes even thousands) of participants. Yet, none of the many research assistants and other junior collaborators who must have been involved in these projects ever seem to be included as co-authors, or even have their names mentioned; indeed, we have yet to see an Acknowledgments section in any sole-authored article by Dr. Guéguen.  This seems unusual, especially given that in some cases the data collection process must have required the investment of heroic amounts of the confederates’ time.

Much of Dr. Guéguen's research focuses on human relationships, in what one might euphemistically term a rather "old-fashioned" way.  You can get a flavour of his research interests by browsing through his Google Scholar profile.  As well as the articles we have blogged about, he has published research on such vital topics as whether women with larger breasts get more invitations to dance in nightclubs (they do), whether women are more likely to give their phone number to a man if asked while walking near a flower shop (they are), and whether a male bus driver is more likely to let a woman (but not a man) ride the bus for free if she touches him (we’ll let you guess the answer to this one).  One might call it “Benny Hill research”, although Dr. Guéguen has also published plenty of articles on other lightweight pop-social-psychology topics such as consumer behaviour in restaurants (does that sound familiar?) that do not immediately conjure up images of sexual stereotypes.

Neither Dr. Guéguen’s theories, nor his experimental designs, generally present any great intellectual challenges.  However, despite their simplicity, and the almost trivial nature of the manipulations, his studies often produce effect sizes of the kind more normally associated with the use of domestic cleaning products against germs. Many of the studies also seem to be run on a production line system, with almost every combination of independent and dependent variables being tested (something that was noted by Hans van Maanen in the Dutch newspaper De Volkskrant as far back as 2012).  For example, waitresses get more tips if they have blonde hair or use make-up or wear a red t-shirt, but women wearing red also find it easier to hitch a ride, as do women with blonde hair or larger breasts.  Those same women with blonde hair or larger breasts also get asked to dance more in nightclubs. As well as earning her more tips if she is a waitress, using make-up also makes a woman more likely to be approached by a man in a bar, although her choice to wear make-up might reflect the fact that she is near ovulation, at which point she is also more likely to accept that invitation to dance; and so it goes, round and round.

It seems that some of this research is actually taken quite seriously by some psychologists.  For example, it is cited in recent work by Andrew Elliot and colleagues at the University of Rochester that claims to show that women wear red clothes as a sexual signal (thus also providing a piece of Dr. Guéguen’s IV/DV combination bingo card that would otherwise have been missing). The skeptical psychologist Dr. Richard Wiseman also seems to be something of a fan of Guéguen's work; for example, in his 2009 book 59 Seconds: Think a Little, Change a Lot, Wiseman noted that "Nicolas Guéguen has spent his career investigating some of the more unusual aspects of everyday life, and perhaps none is more unusual than his groundbreaking work on breasts", and he also cited Guéguen several times in his 2013 book The As If Principle.

And of course, as is common for research with themes of sexual attraction and other aspects of everyday human behaviour, these results readily find their way into the popular media, such as the Daily Mail (women are more likely to give their phone number to a man who is carrying a guitar), the Guardian (people drink more in bars with louder music), The Atlantic (men think that women wearing red are more likely to be interested in sex) and the New York Times (customers spend more money in a restaurant if it smells of lavender).

Beyond a joke

But our concerns go well beyond the apparent borderline teenage sexism that seems to characterise much of this research.  A far bigger scientific problem is the combination of extraordinary effect sizes, remarkably high (in some cases, 100%) response rates among participants recruited in the street (cf. this study, where every single one of the 500 young female participants who were intercepted in the street agreed to reveal their age to the researchers, and every single one of them turned out to be aged between 18 and 25), other obvious logistical obstacles, and the large number of statistical errors or mathematically impossible results reported in many of the analyses.

We also have some concerns about the ethicality of some of Dr. Guéguen’s field experiments.  For example, in these two studies, his male confederates asked other men how likely it was that a female confederate would have sex with them on a first date, which might be a suitable topic for bar-room banter among friends but appears to us to be somewhat intrusive.  In another study, women participants were secretly filmed from behind with the resulting footage being shown to male observers who rated the “sexiness” of the women’s gait (in order to test the theory that women might walk “more sexily” in front of men when they are ovulating; again, readers may not be totally surprised to learn that this is what was found). In this study, the debriefing procedure for the young female participants involved handing them a card with the principal investigator’s personal phone number; this procedure was “refined” in another study, where participants who had agreed to give their phone number to an attractive male confederate were called back, although it is not entirely clear by whom. (John Sakaluk has pointed out that there may also be issues around how these women’s telephone numbers were recorded and stored.)

It is unclear from the studies presented that any of these protocols received individual ethical approval, as study-specific details from an IRB are not offered. Steps to mitigate potential harms/dangers are not mentioned, even though in several cases data collection could have been problematic, with confederates dressing deliberately provocatively in bars and so on. Ethical approval is mentioned only occasionally, usually accompanied by the reference number “CRPCC-LESTIC EA 1285”.  This might look like an IRB approval code of some kind, but in fact it is just the French national science administration’s identification code for Dr. Guéguen’s own laboratory.

It is also noteworthy that none of the articles we have read mention any form of funding. Sometimes, however, the expenses must have been substantial.  In this study (hat tip to Harry Manley for spotting it), 99 confederates stood outside bars and administered breathalyser tests to 1,965 customers as they left.  Even though the breathalyser device that was used is a basic model that sells for €29.95, it seems that at least 21 of them were required; plus, as the “Accessories” tab of that page shows, the standard retail price of the sterile mouthpieces (one of which was used per participant) before they were discontinued was €4.45 per 10, meaning that the total cash outlay for this study would have been in the region of €1500.  One would have thought that a laboratory that could afford to pay for that out of petty cash for a single study could also pick up the tab in a nightclub from time to time.

This has been quite the saga

It is almost exactly two years to the day since we started to put together an extensive analysis (over 15,000 words) focused on 10 sole-authored articles by Dr. Guéguen, which we then sent to the French Psychological Society (SFP). The SFP’s research department agreed that we had identified a number of issues that required an answer and asked Dr. Guéguen for his comments. Neither they nor we have received any coherent response in the interim, even though it would take just a few minutes to produce any of the following: (a) the names and contact details of any of the confederates, (b) the field notes that were made during data collection, (c) the e-mails that were presumably sent to coordinate the field work, (d) administrative details such as insurance for the confederates and reimbursement of expenses, (e) minutes of ethics committee meetings, etc.

At one point Dr. Guéguen claimed that he was too busy looking after a sick relative to provide a response, circumstances which did not prevent him from publishing a steady stream of further articles in the meantime.  In the autumn of 2016, he sent the SFP a physical file (about 500 sheets of A4 paper) containing 25 reports of field experiments that had been conducted by his undergraduates, none of which had any relevance to the questions that we had asked.  In the summer of 2017, Dr. Guéguen finally provided the SFP with a series of half-hearted responses to our questions, but these systematically failed to address any of the specific issues that we had raised.  For example, in answer to our questions about funding, Dr. Guéguen seemed to suggest that his student confederates either pay all of their out-of-pocket expenses themselves, or otherwise regularly improvise solutions to avoid incurring those expenses, such as by having a friend who works at each of the nightclubs that they visit and who can get them in for free.

We want to offer our thanks here to the officials at the SFP who spent 18 months attempting to get Dr. Guéguen to accept his responsibilities as a scientist and respond to our requests for information. They have indicated to us that there is nothing more that they can do in their role as intermediary, so we have decided to bring these issues to the attention of the broader scientific community.

Hence, this post should be regarded as a reiteration of our request for Dr. Guéguen to provide concrete answers to the questions that we have raised. It should be very easy to provide at least some evidence to back up his remarkable claims, and to explain how he was able to conduct such a huge volume of research with no apparent funding, using confederates who worked for hours or days on end with no reward, and obtain remarkable effect sizes from generally minor social priming or related interventions, while committing so many statistical errors and reporting so many improbable results.

Further reading

We have made a copy of the current state of our analysis of 10 articles by Dr. Guéguen available here, along with his replies (which are written in French).  For completeness, that folder also includes the original version of our analysis that we sent to the SFP in late 2015, since that is the version to which Dr. Guéguen eventually replied.  The differences between the versions are minor, but they include the removal of one or two points where we no longer believe that our original analysis made a particularly strong case. We also now have a better understanding of how to report some of the issues that we raise, partly because we have more experience with the application of tools such as GRIM and SPRITE.

Despite its length (around 50 pages), we hope that interested readers will find our analysis to be a reasonably digestible introduction to the problems with this research.  Most of the original journal articles are behind paywalls, but none are so obscure that they cannot be obtained from standard University subscriptions.

Nick Brown
James Heathers

08 December 2017

If you're trying to contact me: My e-mail address got hacked

My e-mail address, nick.brown@free.fr, got hacked earlier this week.  The hosting company "suspended" the account, which doesn't just mean I can't access it or send mails from it; it also means that if you send me a mail you will get a message that the user doesn't exist.

Their procedure for dealing with this is fairly... amazing.  If you read French, it's described here.  I had to send them an e-mail explaining what might have been the reason why I got hacked (virus or trojan on my PC, a password that wasn't long enough, reusing the e-mail/password combination as the login details for a site that itself got hacked, etc).  Despite their disclaimer ("Il ne s'agit pas ici de distribuer les bons points et les réprimandes"), it seems pretty clear that the point of the exercise is to cause people sufficient annoyance that they take more care in future, a bit like a mildly sadistic schoolteacher forcing a student to write 500 words on "why I will not forget to bring my gym clothes in future").  I posted about this in a French online forum and discovered that several other people have been victims of this too. My hosting company is screwing me over 1000 times worse than the hackers.

The account was suspended on Wednesday evening (6 December 2017 around 17:00 UTC, which is now more than 48 hours ago) and I sent the required e-mail straight away, but I haven't heard anything since. The technical support line is always busy, and in any case I don't know if they provide support for e-mail. The address to which I sent my "explanation" was abuse@theirdomain, so it is presumably in the hands of the e-mail server managers.

The problem, of course, is that for many people, losing access to their e-mail has the potential to be economically disastrous.  Yes, we can all do things more securely, but the hackers only sent out a few pieces of spam; the real damage is being done by the company trying to teach me a lesson.  And I have a certain amount of computer knowledge.  How is J. Random Customer, who just uses, meant to respond to that list of points is beyond me.

I don't know how long this will take to sort out. For all I know it could be forever, since the suspension is presumably triggered by an algorithm and I don't know if anyone is there to read mails sent to abuse@theirdomain.  This will be rather boring since I have about 30,000 e-mails in there - pretty much everything I've done for the last five or more years.

As a result of this, I'm starting to move everything over to a new Gmail address, "nicholasjlbrown".  This will take a while; I estimate that I have over 150 accounts with various sites out there that use my e-mail address either as the username or the contact address or both.  So if I ever lose the password to those I will be stuck; plus, if those sites send me a mail and it bounces, they might have a policy of deactivating the account.  So I'm going to have a very boring weekend updating logins (and discovering which sites didn't [yet] bother to implement a mechanism to change your e-mail address; to my surprise, PubPeer is in this category).

If you have been expecting to hear back from me, I might no longer have your address.  This applies in particular to people who have written to me in the last few weeks about things that have come up on this blog, so this post is to apologise in advance and invite you to recontact me at my new Gmail address.


04 December 2017

More problematic sexual attraction research, this time with high heels

Another post about some strange issues in the work of Dr. Nicolas Guéguen. Today's article is:

Guéguen, N. (2015).  High heels increase women's attractiveness. Archives of Sexual Behavior, 44, 2227–2235. http://dx.doi.org/10.1007/s10508-014-0422-z

There are four studies reported in this article; I want to concentrate on Study 4, although as you will see if you read the whole thing, there are plenty of questions one could ask about the other studies as well.

Brief summary of the study
Participants were male customers in bars. The author's hypothesis was that men would be quicker to approach a woman drinking on her own in her bar if she was wearing shoes with high (versus medium or flat) heels.  A female confederate was instructed to sit on her own "at a free table near the bar where single men usually stand" (p. 2231).  She was identically dressed in all conditions apart from the size of her heels, and she was told to "cross her legs on one side so that people around could clearly view her shoes" (p. 2231). Meanwhile, two male observers seated nearby timed how long it took before a man approached the female confederate.  When this happened, she told the man that her friend was expected to arrive shortly, and one of the observers then "arrived" to meet her, thus ending the interaction with the participant.  If no contact was made within 30 minutes, the confederates were instructed to leave the bar.

The results showed that the mean time before a male customer of the bar approached the woman was lower when her heels were higher.  This difference was statistically significant for high heels (versus medium or flat heels), but not for medium heels (versus flat heels).  Although it was not reported whether contact was made in every case, the degrees of freedom of the reported ANOVA imply that it was, even when the woman was wearing flat shoes.

There are a few readily apparent problems with this study.

1. The research design is inefficient and implausible
This study seems to be a very inefficient way of gathering data. You need three young volunteers (it's not exactly clear why two male confederates were necessary rather than just one) to give up their Wednesday and Saturday evenings for six straight weeks. They have to visit three bars each time, and no mention is made of funding to pay for the drinks that they would presumably need to buy in order to maintain their credibility as ordinary customers.  As soon as contact is made between a participant and the female confederate, data collection ends.  The three confederates leave the bar and walk to the next, taking care to spend half an hour on the walk so they don't arrive too early for the next session.  (Or maybe they drive and spend 26 minutes chatting in the car.  Sounds like fun.)  And after all this, you get a maximum of three data points in an entire evening.

Even the choice of "time taken before someone approaches the female confederate" as the dependent variable seems strange.  Let's imagine for a moment that you are the kind of man who goes to bars in the hope of meeting attractive single women. Today is your lucky day; one such individual has just come into the bar and sat down on her own, close to where you hang out with your fellow bachelor drinkers. She is wearing "a skirt and an off the shoulder tight fitting top" (p. 2231). You have to decide whether or not to approach her (presumably before anyone else does, if I may be allowed to show off my limited knowledge of what one might call "folk evolutionary psychology" for a moment).  The apparent claim of the study is that the degree of sexual availability conveyed only by the height of the woman's her heels will affect, not whether you ultimately decide to approach her or not, but how long you will hesitate before doing so.  I don't find this very convincing.  What else are all of the single males in the bar (the number of whom, incidentally, is not reported anywhere in the article) thinking about during that time?  Whether they can get a "better deal" if her identical twin appears at the next table wearing slightly higher heels?  See also point 4, below.

2. Repeated use of the same bars
The study took place in each of three different bars on twelve different nights (Wednesdays and Saturdays).  The same female confederate thus made twelve visits to each bar, in each case sitting on her own at a table "near the bar where single men usually stand" (p. 2231).  You might imagine that the staff or the regular customers of the bar might notice what was going on, as a different man each evening attempted to make contact with the same female confederate who was always identically dressed (apart from her heels) and sitting in an area of the bar where one might not expect a woman who was waiting for her boyfriend to feel comfortable, only to be told that her friend would be arriving shortly (which, indeed, transpired every time).  But the article describes no precautions that might have been taken to deal with this issue, which has obvious implications for the validity of the study.  After a few visits, the regular customers might have started taking bets among themselves as to who was going to try his luck this evening (perhaps trying not to giggle as he introduced himself with "Hello, I’ve never seen you here before"), only for the woman's boyfriend to show up immediately afterwards.  Even if the staff were aware of the experiment, it would seem to be hard to take into account the possible range of behaviours of young single men in a bar, especially just before midnight on a Saturday evening.

3. The effect size is huge
Remember that the only difference between the conditions was the height of the woman's heels, which, even with her legs crossed as described in the article, were probably not going to be something that many people --- even single men on the lookout for some action --- would necessarily even notice.  Yet, Cohen's (1988, pp. 274–277) formula gives an effect size (f) of 0.67 for the numbers in Table 4 of Guéguen's article, which corresponds (for k=3 groups) to 1.64 in the more familiar terms of Cohen's d.  Such effect sizes are very rare in psychological studies, and indeed in real life (James will be covering this in his next post).  It seems highly implausible that a manipulation of  this kind could have such an effect.

4. The pattern of behaviour by the men is very strange
Despite my advanced age, my personal lifetime experience of hanging around in bars waiting to hit on single women is exactly zero.  However, it seems to me that for individuals who list that particular activity as one of their hobbies, time is probably of the essence.  If you're going to start talking to a girl who has just sat down and crossed her legs so you can see how high her heels are, you probably want to do it fairly quickly, if only to stake your claim before any of your buddies does.

So what would we expect the distribution of the waiting-time-until-contact to look like?  I don't think we can apply something like queuing theory here since the behaviour of the men probably can't be assumed to be random, but I'm guessing it's likely to look like some kind of Poisson or negative binomial distribution, with a lot of guys trying their luck in the first few minutes, resulting in a big right skew.

So I decided to simulate some data.  For each condition, I generated 12-item samples from a uniform distribution, with a minimum of 0 minutes and a maximum that I determined with some preliminary testing to be the largest possible time that could give the mean and SD reported in the article, plus or minus 0.05 in each case.  I ran this simulation until I had 400 samples for each condition, which required about 250 million iterations per condition.  Then I plotted the simulated amounts of time to make contact, to the nearest minute, from those samples:



Given that the high heels were meant to be especially irresistible, you might expect a certain number of contacts to have been made within the first minute in that condition.  But you can see from the plot that in the high heels condition (blue bars) that no values below 2 minutes were returned by my simulation.  In fact when I forced one of the 12 values in the sample to be 30 seconds, I didn't find a single valid sample in 100 million iterations in the high heels condition.  When I set the minimum to 1 minute, I found three valid samples, but they all had looked weird: the value of 1 meant that the other values were all very close to 8 minutes (i.e., when the woman was wearing high heels, if one man approached her after a minute, the other 11 would all have had to approach her after 8 minutes, plus or minus a few seconds).

You can also see in the above plot that the aggregate of the simulated values in each condition is nicely normally distributed.  The most highly skewed 12-item samples were not in fact very badly skewed at all; for example, here is the most right-skewed sample out of 400 in the high heels condition:



So even here, we can see that these single men are taking a certain amount of time before talking to the woman, even though their tongues are apparently all hanging out at the height of her heels.  The limiting factor here is that the standard deviations (4.87, 3.67, and 2.18 minutes, for the flat, medium, and high heels conditions, respectively) are too small, relative to the range of values allowed (0 to 30), to allow any of the 12 responses to be very far from the others (or, if one value is a little bit further away, this requires all of the others to bunch up).  As we saw in James's post about dead plants and global warming, the subjects in this study all appear to be intensely moderate in their behaviour; the manipulation (increasing the size of the woman's heels) simply reduces the diversity of that moderation somewhat.

5. The reported statistics are incorrect
Readers who are familiar with some recent corrections of work from the Cornell Food and Brand Lab may have been anticipating this problem: the reported F statistic (7.18 with 2 and 33 degrees of freedom) is incorrect.  With the given means and SDs, the correct F statistic should be between 8.06 and 8.16, depending on rounding.  This does not change the statistical significance of the reported result, but it makes one wonder what numbers were run in order to produce the incorrect F statistic, and where those numbers came from. (Just as an aside, the standard deviations appear to be substantially different between the groups, but no indication is given in the article about whether the standard ANOVA checks for homogeneity of variance were made; however, given the context, perhaps asking for this is like criticising Donald Trump for not having his tie straight.)

Conclusion
The report of this study sounds like it is describing a thought experiment for an undergraduate methods class (in a world where nobody is too concerned about crass sexist stereotypes), rather than the results of a field experiment carried out under real-world conditions. The premise is based on a pastiche of evolutionary psychology (skeptics of this subfield can fill in their own joke here), the scenario is a minefield of strange decisions, the effect size is absurdly large, the implied behaviour patterns of the participants are weird, and the statistics haven't been reported correctly. Yet, this article was the subject of uncritical pieces in Huffington Post (under the headline "High Heels Increase A Woman's Attractiveness, And For Once It's Not A Bogus Survey"), the Boston Globe, and Psychology Today (twice). It seems that there is quite a market for sexist junk science out there.