Earlier this week, I attended the BBSRC Eastbio Annual Symposium, a meeting for PhD students funded by the BBSRC’s doctoral training programme. The theme of this year’s meeting was ‘Making an Impact’. Alongside one or two talks on the REF impact of ‘spinning out’ scientific businesses that I found utterly, soul-crushingly devoid of anything honourable, there were a number of great talks on the value of public engagement.

Of these, the talk I enjoyed most was given by Dr Jan Barfoot of EuroStemCell, who spoke about the huge number of ways in which researchers can engage the public in their work. Amidst the extraverted commotion about bright clubs and elevator pitches that had permeated the rest of the symposium, I took comfort from Jan’s acknowledgement of the other ways in which people like me might want to communicate with those who might find their work interesting. As has been explored in great depth in Susan Cain’s book Quiet, there are quite a few people, 33-50% of the population, who find the idea of networking, public speaking and generally ‘putting yourself out there’ lies somewhere along the continuum of unpleasant to terrifying. This minority of introverts is well represented in academia, though sadly for me, not well represented enough to have done away with oral presentations, conference socials and the idea that there’s something wrong with you if you don’t enjoy talking about our work to people who might not give a shit.

Jan’s talk got me thinking about the public engagement I do. In doing so I realised that the majority of the activities I’ve been involved with since my arrival at St Andrews have been written (have I mentioned that I don’t enjoy giving talks?)  writing these public engagement pieces has almost always been made much easier by my experience of writing posts for this blog (maybe some of the more popular blog posts I have written even count as public engagement). I generally write these posts with a scientifically clued-up, but non-specialist audience in mind – PhD students, researchers in other fields, interested members of the public – most of whom I expect will have stumbled across this site via Google. As I’ve practiced writing for this audience a fair amount, I find it relatively easy to switch into this mode when asked to write bits for the St Andrews Campaign magazine (see below) or the Development blog. As lame as it sounds to those who crave the rush of applause and laughter, blogging is my bright club.

St Andrews Campaign Magazine: University Sport
St Andrews Campaign Magazine: University Sport

Of course it takes time and commitment  to keep it up (no-one thinks much of a one-post “Hello World” blog) but I didn’t say it was easier than other forms of public engagement. It’s just a better format for me. Considering the investment of time it requires alongside the real benefits it can have, it’s a shame when other researchers dismiss blogging as less meaningful than the engagement work they do. Something which happens all to often. Consequently, I always feel guilty when writing posts for it during work hours. Why should this be the case? I wouldn’t feel bad about practicing a public engagement talk or meeting a community of patients for whom my research is relevant, so why the self-flagellation over writing? No doubt, this perception will be further reinforced when REF Impact Statements are circulated across departments across the UK, with blogging being written up as just something that all academics in all research groups do, probably under the misapprehension that a PURE research profile like this counts as a blog. This does a real disservice to those whose blogs often act as a first source of information for those googling something they’ve just heard about on the news, or those whose blogs help raise and maintain the profiles of the universities at which they work.

If you want to blog, do it. You’ll write better and one way or another you’ll probably get asked to write in a more formal capacity for the organisation you work for. Just don’t expect to be promoted, or even appreciated, because of it.

I spent most of this summer in St Andrews writing research papers. This prolonged period of writing gave me time to consider the publication of my own work along lines I haven’t fully considered before. I was able to think not only of the quality of science typically reported in the journal to which I was considering submission, but also of that journal’s publication model. For the first time in my career I felt it not only desirable, but also sensible, to submit to open access journals. It’s not that I haven’t wanted to publish in open access journals before, it’s just that there have been too many things stopping me from breaking with the traditional journals.

Open access image via salfordpgrs on flickr: http://www.flickr.com/photos/salfordpgrs/
Open access image via salfordpgrs on flickr: http://www.flickr.com/photos/salfordpgrs/

So what changed? Of course, the traditional publishing houses have had a lot of bad press. Their support of the ultimately unsuccessful Research Works Act, set against soaring profits and the unsavoury business practices of academic journal bundling all demonstrate how committed traditional publishing houses are to making money, not increasing access to research. More personally, PubMed’s RSS feeds for custom search terms (informing me as soon as something related to ‘recognition memory’ is published), Twitter’s water-cooler paper retweets and Google Scholar‘s pdf indexing, mean that I usually learn about and can get access to articles I am interested in, without needing to know where they are published. Over the past months and years, subscription-model journals have started to feel old-fashioned, maybe even willfully so. It’s now the case that if university scientists are interested in my research, they will find it regardless of where it is published. If the public are interested in my research, whether or not they will be able to read it depends entirely on how it is published.

That said, Google Scholar would be useless as a source of papers if researchers and universities didn’t make pdfs available for it to index. It’s here that the work university libraries do to promote open access is crucial. At St Andrews, we use the PURE system which makes green open access – uploading author versions or publisher versions usually after an embargo period –  straightforward. Beyond this though, the Open Access Team frequently encourage us to provide this sort of green open access. For example, earlier this week one of the open access librarians tweeted me to tell me that I was entitled to upload the final version of a paper I had been holding off on uploading. In doing this, they cultivate an environment in which providing open access is seen as a responsibility we have to those who might want to read our research.

While green open access can work, it requires efficient management. The St Andrews Open Access Team seem to have a million publisher-specific checks they have to make before they will allow a pdf to go into the Research@StAndrews:FullText repository. Surely gold open access – publishing in journals whose business model doesn’t involve protecting access to their outputs – would make things much easier. The one problem with gold access, even from a the point of view of a researcher who wants more than anything to publish in this way, is that it is expensive, really expensive. A paper in PLOS ONE costs $1,350, Frontiers, €1,600 and Springer Plus, £725 (though this all may change with PeerJ’s author subscription model). Of course it makes sense. A journal that doesn’t charge subscription fees needs to recoup its costs by charging to publish. And here’s where we run into major barriers to the uptake of gold open access. First, gold open access publishers are asking universities to spend money to publish their own researchers’ work when they’re already spending an eye-watering amount on accessing work that the same researchers have previously published. Second but maybe more importantly in terms of journal submission choices, gold open access publishers are now forcing researchers, not their university libraries, to face up to the costs of publication.

That I, not the head of my library, must think about how to fund the journals that publish my research goes against the traditional subscription model of academic publishing. Moreover this financial division, and the problem it poses to open access journals, almost certainly exists at every single university in the UK. In an ideal world, I should be able to dip into the library’s subscription budget every time I publish in a gold open access journal. If all researchers knew that their submission to Frontiers in Psychology wasn’t jeopardising their travel to next year’s conference in San Diego, gold open access would be set. It’s only when universities recognise this, that gold open access publishing payments should come from the same pot as journal subscription payments, that open access publishing will take off.

And so to why I was able to consider submitting to a gold open access journal. The St Andrews Library Open Access Team have a fund specifically for gold open access publishing. A cheeky twitter request as to whether they would support my submission to an open access journal was all it took for me to get the thumbs up. Together with the green open access resource at Research@StAndrews:FullText and maintenance of the existing closed access journal subscriptions (for now), the gold open access fund helps to provide the full range of publication options for St Andrews researchers. It’s a comprehensive approach to open access that makes me proud to work here.

The process of rotating data in Excel, such that rows become columns and columns become rows, is pretty straightforward. Copy, and then right-click on the destination and select the ‘transpose cells’ paste option.

transpose
Right-clicking to transpose data.

Things get a little more complicated if you want to transpose a series of cell references or formulae e.g. “=A14” or “=NORMSINV(A14)-NORMSINV(1-B14)”. If you don’t have all your cell references in absolute format, Excel will get the transposition all wrong. One way of getting round this is to find and replace (CTRL-H) all your = signs in the array you want to transpose, with a symbol that Excel finds meaningless, like #. You can then copy and paste-transpose your # cell references, and once you find and replace the #s with =s (in both your original and transposed arrays), you’ll have achieved the transposition you’re after.

In A, the transposed cell reference gets messed up (Excel transposes the direction of the reference during the transposition). In B, CTRL-H has been used to find and replace all =s with #s. The cell references look to have transposed correctly. The correct transposition is confirmed in C once CTRL-H has been used to re-replace all the #s with =s.
In A, the transposed cell reference gets messed up (Excel transposes the direction of the reference during the transposition).

In B, CTRL-H has been used to find and replace all =s with #s. The cell references look to have transposed correctly.

The correct transposition is confirmed in C once CTRL-H has been used to re-replace all the #s with =s.

 

 

The blog is making a relatively straightforward transition from wordpress.com to a self-hosted wordpress.org installation.

As the akiraoconnor.org domain is coming with me there has been very little disruption to Google indexing, which send most traffic to the site. However, RSS feeds and WordPress.com’s Follow function on this akiraoconnor.wordpress.com website will no longer point to anything that will be updated in future.

RSS – I am switching to the following RSS feed for the .org installation: http://feeds.feedburner.com/OCML. This should be working right now.

Email subscription – I won’t be coming up with an immediate replacement for WordPress.com Follow. I’m sorry if this was how you stayed up to date with the blog.

A group of second-year students asked me to contribute a ‘Real World Stats’ piece to their new psychology publication, MAZE. I reworked a section from of my most popular statistics lectures on probability theory and roulette. Below is the article in full.

Roulette is a straightforward casino game. While the wheel is spinning, a ball is released. This ball eventually ends up stopping in a numbered (1-37) and coloured (red or black) pocket. You bet on the final resting place of the ball by selecting a number, range of numbers or a colour. When betting on colours, if you pick correctly, you double your money. A £20 stake on black would get you £40 back if the ball landed in a black pocket, and nothing back if it landed in a red one.

A roulette wheel.
A roulette wheel. (Photo credit: Wikipedia)

A few years ago, I received quite a few spam e-mails with the following tip on how to win at roulette.

> So I found a way you can win everytime:
> bet $1 on black if it goes black you win $1
> now again bet $1 on black, if it goes red bet $3 on black, if it goes red again bet $8
> on black, if red again bet $20 on black, red again bet $52 on black (always multiple
> you previous lost bet around 2.5) if now is black you win $52 so you have $104 and you bet:
> $1 + $3 + $8 + $20 + $52 = $84 So you just won $20 :)
> now when you won you start with $1 on blacks again etc etc. its always
> bound to go black eventually (it’s 50/50) so that way you eventually always win.

If you ignore the atrocious spelling and grammar, the basic idea seems to be a good one. In fact, it’s an established betting strategy known as the Martingale system. Under this system, you double losing bets until you win, that way you will always win an amount equivalent to your first stake. If we build a probability tree for a gambler who only bets on black and provide her with a fairly standard outcome, two losses followed by one win, you’ll see how this is meant to work.

An unsurprising outcome under the Martingale system. Bets on black are unsuccesful twice, before paying off the third time.
An unsurprising outcome under the Martingale system. Bets on black are unsuccessful twice, before paying off the third time.

Over three bets, she has spent £7, but won £8. Not too shabby. She just needs to do this over and over until she has won an amount she’s happy with. Fool-proof, right?

Not quite. Casinos have stayed in business over centuries for a reason: they know how to work probabilities. One of their standard strategies is to have minimum and maximum stake limits, with a typical range of £10-£1000. These limits expose a huge flaw in our spam-based strategy.

Imagine you’re trying the Martingale strategy and you go on a losing streak. £10, £20, £40, £80, £160, £320 and £640 all go on losing bets and all of a sudden you’re down £1270. Here’s where you come up against the casino’s maximum bet policy. You can’t place a £1280 bet to recoup your losses. But how likely is losing 7 bets in a row?

7
Minimum bet: £10. Maximum bet: £1,000. It takes seven straight losses to break the Martingale system.

Not very likely at all, if you’re only trying to win £10. According to the multiplication rule for independent events, the exact odds are (1/2)7 which is equal to .0078. Put another way, the probability of this happening is 1 in 128.

But problems arise when you try to make more than £10. To understand the next set of calculations, we need to reverse the probability of losing and think about how likely it is that we will win £10 each time we try. Using the addition rule for mutually exclusive events, we can calculate that the probability of winning £10 is equal to the probability of not losing:

pwin+ plose = 1

pwin= 1 – plose

pwin= 1 – 1/128 = 127/128 = .9922

We can now work out the probabilities of making various amounts of profit, once again using the multiplication rule:

£20 profit         = (127/128)2    = .9844
£100 profit       = (127/128)10 = .9246
£200 profit       = (127/128)20 = .8548

But here’s the kicker. If you want to double the money you bring to the casino to place these bets, you’re looking at close to a 2 in 3 chance that you will lose everything.

£1270 profit     = (127/128)127             = .3693

“I’m not greedy!” I hear you cry. “I’d just want to go home with a little more than if I had invested the money and not had any fun at all.” Let’s say you wanted to take home a little more than, 6%, the best savings interest rate you can currently find on moneysupermarket.com (as of when this article was written). How much would you need to win?

£1270 x .06 = £76.20

You would need to win 8 times in a row to go home with a little more than a 6% interest rate. And what are the odds of this happening?

£80 profit        = (127/128)8    = .9392

Put another way, 15 out of 16 times, you will exceed a savings account interest rate. You will enter the casino with £1270 and leave with £1350. But, 1 in 16 times you will leave the casino with nothing. Not even enough to get the 99 bus back over the Tay. Sadly, this sort of thing is all too common, especially when people are new to gambling and thing they have found a way of beating the system: e.g. http://casinogambling.about.com/od/moneymanagement/a/martingale.htm.

Even if you find a casino with no maximum bet, you need huge financial resources to make it work. It all starts to seem even more hopeless when you factor in something I neglected to mention at the start. Your odds of winning are actually worse than 50%. If the ball lands on 0 the casino takes all the money.

The take-home-message? It’s probably best to ignore financial advice you read in your spam folder.

EDIT (17/1/2014): A link to moneysupermarket.com was removed following receipt of an email from a moneysupermarket.com employee requesting that I comply with their request of ” removing or adding a nofollow attribute to the links to our MoneySuperMarket.com website”. Doing a quick search to find out why reveals https://groups.google.com/forum/#!topic/FleetStreet/HDVHXFwQFdI, which suggests that this is all about SEO optimisation such that “some [links] may look un-natural or paid for in the eyes of Google. This unfortunately means that we have to take down a large number of our links, some of which were genuine and of use/interest to users of the sites on which they were posted”. The suggestion on that last link is that moneysupermarket.com might be doing this because they have previously been penalised for paying for links to their site and are now doing what they can to stop this perception. I don’t know what authority they have to enforce removal of links like this (none I suspect), but I don’t really care enough to kick up a fuss… link removed.

A recent submission to (and rejection from) Psychological Science has provided me with enough information on the editorial process, via Manuscript Central, to blog a follow-up to my Elsevier Editorial System blog of 2011. (I’m not the only person who is making public their manuscript statuses either, see also Guanyang Zhang’s original and most recent posts.)

Psychological Science Decision

Below is the chronology for the status updates a submission from my lab received from Psychological Science. As stated in the confirmation-of-submission letter received from the Editor-in-Chief, the process of obtaining a first decision should take up to 8 weeks from initial submission.

Triage

  • “Awaiting Initial Review Evaluation” – 09/01/2013: The manuscript is submitted and awaits triage, where it is read by two members of the editorial team. An email is sent to the corresponding author from the Editor-in-Chief. The triage process takes up to two weeks and determines whether or not the manuscript will go out for full review.

Full Review

  • “Awaiting Reviewer Selection” – 22/01/2013: An email is sent to the corresponding author from the Editor-in-Chief informing them that the manuscript has passed the triage initial review process. The extended review process is stated as lasting 6-8 weeks from receipt of this email.
  • “Awaiting Reviewer Assignment” – 28/01/2013
  • “Awaiting Reviewer Invitation” – 28/01/2013
  • “Awaiting Reviewer Assignment” – 29/01/2013
  • “Awaiting Reviewer Selection” – 29/01/2013: I may have missed some status updates here. Essentially, I think these status updates reflect the Associate Editor inviting reviewers to review the manuscript and the reviewers choosing whether or not to accept the invitation.
  • “Awaiting Reviewer Scores” – 05/02/2013: The reviewers have agreed to review the manuscript and the Manuscript Central review system awaits their reviews.
  • “Awaiting AE Decision” – 15/03/2013: The reviewers have submitted their reviews, which the Associate Editor uses to make a decision about the manuscript
  • “Decline” – 16/03/2013: An email is sent to the corresponding author from the Associate Editor informing them of the decision and providing feedback from the reviewers.

The whole process took just under ten weeks, so not quite within the 8 week estimate that the initial confirmation-of-submission email suggested.

It’s a shame that I can’t blog the status updates post-acceptance, but the final status update is supposedly what 89% of submissions to Psychological Science will end with. Onwards.

I am heavily reliant on Google Reader for how I keep up with scientific literature.

I have customised RSS feeds for PubMed search terms. I have RSS feeds for journal tables of contents. I access my Reader account on my work computer via the website, on my iPad with the paid version of Feeddler and on my Android with the official Google app. I use IFTT to and a Gmail filter to send everything I star for reading back to my email account so it can all get dealt with at work. It’s not perfect, but it’s efficient and it has taken me well over five years to arrive at this system.

And now, thanks to Google’s decision to kill Reader, I’m going to have to figure it all out again. That is, if Google Reader’s demise doesn’t kill RSS.

Right now, this video resonates with me.

Finding needle in haystack
Finding needle in haystack (Photo credit: Bindaas Madhavi)

A former colleague of mine at an institution I no longer work at has admitted to being a science fraudster.*

I participated in their experiments, I read their papers, I respected their work. I felt a very personal outrage when I heard what they had done with their data. But the revelation went some way to answering questions I ask myself when reading about those who engage in scientific misconduct. What are they like? How would I spot a science fraudster?

Here are the qualities of the fraudster that stick with me.

  • relatively well-dressed.
  • OK (not great, not awful) at presenting their data.
  • doing well (but not spectacularly so) at an early stage of their career.
  • socially awkward but with a somewhat overwhelming projection of self-confidence.

And that’s the problem. I satisfy three of the four criteria above. So do most of my colleagues. If you were to start suspecting every socially awkward academic of fabricating or manipulating their data, that wouldn’t leave you with many people to trust. Conversations with those who worked much more closely with the fraudster reveal more telling signs that something wasn’t right with their approach, but again, the vast majority of the people with similar character flaws don’t fudge their data. It’s only once you formally track every single operation that has been carried out on their original data that you can know for sure whether or not someone has perpetrated scientific misconduct. And that’s exactly how this individual’s misconduct was discovered – an eagle-eyed researcher working with the fraudster noticed some discrepancies in the data after one stage of the workflow. Is it all in the data?

Let’s move beyond the few bad apples argument. A more open scientific process (e.g. the inclusion of original data with the journal submission) would have flagged some of the misconduct being perpetrated here, but only after someone had gone to the (considerable) trouble of replicating the analyses in question.  Most worryingly, it would also have missed the misconduct that took place at an earlier stage of the workflow. It’s easy to modify original data files, especially if you have coded the script that writes them in the first place. It’s also easy to change ‘Date modified’ and ‘Date created’ timestamps within the data files.

Failed replication would have helped, but the file drawer problem, combined with the pressure on scientists to publish or perish typically stops this sort of endeavor (though there are notable exceptions such as the “Replications of Important Results in Cognition”special issue of Frontiers in Cognition ). I also worry that the publication process, in its current form, does nothing more constructive than start an unhelpful rumour-mill that never moves beyond gossip and hearsay. The pressure to publish or perish is also cited as motivation for scientists to cook their data. In this fraudster’s case, they weren’t at a stage of their career typically thought of as being under this sort of pressure (though that’s probably a weak argument when applied to anyone without a permanent position). All of which sends us back to trying to spot the fraudster and not the dodgy data. It’s a circular path that’s no more helpful than uncharitable whispers in conference centre corridors.

So how do we identify scientific misconduct? Certainly not with a personality assessment, and only partially with an open science revolution. If someone wants to diddle their data, they will. Like any form of misconduct, if they do it enough, they will probably get caught. Sadly, that’s probably the most reliable way of spotting it. Wait until they become comfortable enough that they get sloppy. It’s just a crying shame it wastes so much of everyone’s time, energy and trust in the meantime.

 

*I won’t mention their name in this post for two reasons: 1) to minimise collateral damage that this is having on the fraudster’s former collaborators,  former institution and their former (I hope) field; and 2) because this must be a horrible time for them, and whatever their reason for the fraud, it’s not going to help them rehabilitate themselves in ANY career if a Google search on their name returns a tonne of condemnation.

If you’re trying to decide on a journal to submit your latest manuscript to, Jane – the Journal/Author Name Estimator, can point you in the right direction. This isn’t exactly breaking news, but it’s worth a reminder.

To use Jane, copy and paste your title and/or abstract into Jane into the text box and click “Find journals”. Using a similarity index with all Medline-indexed publications from the past 10 years, Jane will spit out a list journals worth considering. Alongside a confidence score, which summarises your text’s similarity to other manuscripts published in that journal, you’re also provided with an citation-based indication of that journal’s influence within the field.

Image

The other available searches are the “Find articles” and the “Find authors” search, the last of which I suspect I would use if I were an editor with no idea about whom to send an article to for review. As an author, it’s worth running abstracts through these searches too to make sure you don’t miss any references or authors you definitely ought to cite in your manuscript.

There’s more information on Jane from the Biosemantics Group here: http://biosemantics.org/jane/faq.php.

English: Extract from Raspberry Pi board at Tr...
The Raspberry Pi (photo credit: Wikipedia)

A few months ago, I suggested that Raspberry Pis could be used as a barebones experiment presentation machine. Since then I have got my hands on one and tinkered a little, only to be reminded yet again that my inability to do anything much in both Linux and python is a bit of a problem.

Fortunately, others with more technological nous have been busy exploring the capabilities of the Pi, with some exciting findings. On the Cognitive Science Stack Exchange, user appositive asked “Is the Raspberry Pi capable of operating as a stimulus presentation system for experiments?” and followed up at the end of January with a great answer to their own question, including this paragraph:

The RPi does not support OpenGL. I approached this system with the idea of using a python environment to create and present experiments. There are two good options for this that I know of, opensesame and psychopy. Psychopy requires an OpenGL python backend (pyglet), so it won’t run on the Rpi. Opensesame gives you the option of using the same backend as PsychoPy uses but has other options, one of which does not rely on openGL (based on pygames). This ‘legacy’ backend works just fine. But the absence of openGL means that graphics rely solely on the 700 mHz CPU, which quickly gets overloaded with any sort of rapidly changing visual stimuli (ie. flowing gabors, video, etc.).

Because of the lack of OpenGL support on the Pi, Psychopy is out (for now) leaving OpenSesame as the best cog psych-focused python environment for experiment presentation. The current situation seems to be that the Pi is suboptimal for graphics-intensive experiments, though this may improve as hardware acceleration is incorporated to take advantage of the Pi’s beefy graphics hardware. As things stand though, experiments with words and basic picture stimuli should be fine. It’s just a case of getting hold of one and brushing up on python.

UPDATE via Comments (1/4/2013) – Sebastiaan Mathôt has has published some nice Raspberry Pi graphics benchmarking data, which are well worth a look if you’re interested.
http://www.cogsci.nl/blog/miscellaneous/216-running-psychological-experiments-on-a-raspberry-pi-with-opensesame