The process of rotating data in Excel, such that rows become columns and columns become rows, is pretty straightforward. Copy, and then right-click on the destination and select the ‘transpose cells’ paste option.
Things get a little more complicated if you want to transpose a series of cell references or formulae e.g. “=A14” or “=NORMSINV(A14)-NORMSINV(1-B14)”. If you don’t have all your cell references in absolute format, Excel will get the transposition all wrong. One way of getting round this is to find and replace (CTRL-H) all your = signs in the array you want to transpose, with a symbol that Excel finds meaningless, like #. You can then copy and paste-transpose your # cell references, and once you find and replace the #s with =s (in both your original and transposed arrays), you’ll have achieved the transposition you’re after.
The blog is making a relatively straightforward transition from wordpress.com to a self-hosted wordpress.org installation.
As the akiraoconnor.org domain is coming with me there has been very little disruption to Google indexing, which send most traffic to the site. However, RSS feeds and WordPress.com’s Follow function on this akiraoconnor.wordpress.com website will no longer point to anything that will be updated in future.
RSS – I am switching to the following RSS feed for the .org installation: http://feeds.feedburner.com/OCML. This should be working right now.
Email subscription – I won’t be coming up with an immediate replacement for WordPress.com Follow. I’m sorry if this was how you stayed up to date with the blog.
A group of second-year students asked me to contribute a ‘Real World Stats’ piece to their new psychology publication, MAZE. I reworked a section from of my most popular statistics lectures on probability theory and roulette. Below is the article in full.
Roulette is a straightforward casino game. While the wheel is spinning, a ball is released. This ball eventually ends up stopping in a numbered (1-37) and coloured (red or black) pocket. You bet on the final resting place of the ball by selecting a number, range of numbers or a colour. When betting on colours, if you pick correctly, you double your money. A £20 stake on black would get you £40 back if the ball landed in a black pocket, and nothing back if it landed in a red one.
A few years ago, I received quite a few spam e-mails with the following tip on how to win at roulette.
> So I found a way you can win everytime:
> bet $1 on black if it goes black you win $1
> now again bet $1 on black, if it goes red bet $3 on black, if it goes red again bet $8
> on black, if red again bet $20 on black, red again bet $52 on black (always multiple
> you previous lost bet around 2.5) if now is black you win $52 so you have $104 and you bet:
> $1 + $3 + $8 + $20 + $52 = $84 So you just won $20 :)
> now when you won you start with $1 on blacks again etc etc. its always
> bound to go black eventually (it’s 50/50) so that way you eventually always win.
If you ignore the atrocious spelling and grammar, the basic idea seems to be a good one. In fact, it’s an established betting strategy known as the Martingale system. Under this system, you double losing bets until you win, that way you will always win an amount equivalent to your first stake. If we build a probability tree for a gambler who only bets on black and provide her with a fairly standard outcome, two losses followed by one win, you’ll see how this is meant to work.
Over three bets, she has spent £7, but won £8. Not too shabby. She just needs to do this over and over until she has won an amount she’s happy with. Fool-proof, right?
Not quite. Casinos have stayed in business over centuries for a reason: they know how to work probabilities. One of their standard strategies is to have minimum and maximum stake limits, with a typical range of £10-£1000. These limits expose a huge flaw in our spam-based strategy.
Imagine you’re trying the Martingale strategy and you go on a losing streak. £10, £20, £40, £80, £160, £320 and £640 all go on losing bets and all of a sudden you’re down £1270. Here’s where you come up against the casino’s maximum bet policy. You can’t place a £1280 bet to recoup your losses. But how likely is losing 7 bets in a row?
Not very likely at all, if you’re only trying to win £10. According to the multiplication rule for independent events, the exact odds are (1/2)7 which is equal to .0078. Put another way, the probability of this happening is 1 in 128.
But problems arise when you try to make more than £10. To understand the next set of calculations, we need to reverse the probability of losing and think about how likely it is that we will win £10 each time we try. Using the addition rule for mutually exclusive events, we can calculate that the probability of winning £10 is equal to the probability of not losing:
pwin+ plose = 1
pwin= 1 – plose
pwin= 1 – 1/128 = 127/128 = .9922
We can now work out the probabilities of making various amounts of profit, once again using the multiplication rule:
But here’s the kicker. If you want to double the money you bring to the casino to place these bets, you’re looking at close to a 2 in 3 chance that you will lose everything.
£1270 profit = (127/128)127 = .3693
“I’m not greedy!” I hear you cry. “I’d just want to go home with a little more than if I had invested the money and not had any fun at all.” Let’s say you wanted to take home a little more than, 6%, the best savings interest rate you can currently find on moneysupermarket.com (as of when this article was written). How much would you need to win?
£1270 x .06 = £76.20
You would need to win 8 times in a row to go home with a little more than a 6% interest rate. And what are the odds of this happening?
£80 profit = (127/128)8 = .9392
Put another way, 15 out of 16 times, you will exceed a savings account interest rate. You will enter the casino with £1270 and leave with £1350. But, 1 in 16 times you will leave the casino with nothing. Not even enough to get the 99 bus back over the Tay. Sadly, this sort of thing is all too common, especially when people are new to gambling and thing they have found a way of beating the system: e.g. http://casinogambling.about.com/od/moneymanagement/a/martingale.htm.
Even if you find a casino with no maximum bet, you need huge financial resources to make it work. It all starts to seem even more hopeless when you factor in something I neglected to mention at the start. Your odds of winning are actually worse than 50%. If the ball lands on 0 the casino takes all the money.
The take-home-message? It’s probably best to ignore financial advice you read in your spam folder.
EDIT (17/1/2014): A link to moneysupermarket.com was removed following receipt of an email from a moneysupermarket.com employee requesting that I comply with their request of ” removing or adding a nofollow attribute to the links to our MoneySuperMarket.com website”. Doing a quick search to find out why reveals https://groups.google.com/forum/#!topic/FleetStreet/HDVHXFwQFdI, which suggests that this is all about SEO optimisation such that “some [links] may look un-natural or paid for in the eyes of Google. This unfortunately means that we have to take down a large number of our links, some of which were genuine and of use/interest to users of the sites on which they were posted”. The suggestion on that last link is that moneysupermarket.com might be doing this because they have previously been penalised for paying for links to their site and are now doing what they can to stop this perception. I don’t know what authority they have to enforce removal of links like this (none I suspect), but I don’t really care enough to kick up a fuss… link removed.
Below is the chronology for the status updates a submission from my lab received from Psychological Science. As stated in the confirmation-of-submission letter received from the Editor-in-Chief, the process of obtaining a first decision should take up to 8 weeks from initial submission.
“Awaiting Initial Review Evaluation” – 09/01/2013: The manuscript is submitted and awaits triage, where it is read by two members of the editorial team. An email is sent to the corresponding author from the Editor-in-Chief. The triage process takes up to two weeks and determines whether or not the manuscript will go out for full review.
“Awaiting Reviewer Selection” – 22/01/2013: An email is sent to the corresponding author from the Editor-in-Chief informing them that the manuscript has passed the triage initial review process. The extended review process is stated as lasting 6-8 weeks from receipt of this email.
“Awaiting Reviewer Assignment” – 28/01/2013
“Awaiting Reviewer Invitation” – 28/01/2013
“Awaiting Reviewer Assignment” – 29/01/2013
“Awaiting Reviewer Selection” – 29/01/2013: I may have missed some status updates here. Essentially, I think these status updates reflect the Associate Editor inviting reviewers to review the manuscript and the reviewers choosing whether or not to accept the invitation.
“Awaiting Reviewer Scores” – 05/02/2013: The reviewers have agreed to review the manuscript and the Manuscript Central review system awaits their reviews.
“Awaiting AE Decision” – 15/03/2013: The reviewers have submitted their reviews, which the Associate Editor uses to make a decision about the manuscript
“Decline” – 16/03/2013: An email is sent to the corresponding author from the Associate Editor informing them of the decision and providing feedback from the reviewers.
The whole process took just under ten weeks, so not quite within the 8 week estimate that the initial confirmation-of-submission email suggested.
I am heavily reliant on Google Reader for how I keep up with scientific literature.
I have customised RSS feeds for PubMed search terms. I have RSS feeds for journal tables of contents. I access my Reader account on my work computer via the website, on my iPad with the paid version of Feeddler and on my Android with the official Google app. I use IFTT to and a Gmail filter to send everything I star for reading back to my email account so it can all get dealt with at work. It’s not perfect, but it’s efficient and it has taken me well over five years to arrive at this system.
A former colleague of mine at an institution I no longer work at has admitted to being a science fraudster.*
I participated in their experiments, I read their papers, I respected their work. I felt a very personal outrage when I heard what they had done with their data. But the revelation went some way to answering questions I ask myself when reading about those who engage in scientific misconduct. What are they like? How would I spot a science fraudster?
Here are the qualities of the fraudster that stick with me.
OK (not great, not awful) at presenting their data.
doing well (but not spectacularly so) at an early stage of their career.
socially awkward but with a somewhat overwhelming projection of self-confidence.
And that’s the problem. I satisfy three of the four criteria above. So do most of my colleagues. If you were to start suspecting every socially awkward academic of fabricating or manipulating their data, that wouldn’t leave you with many people to trust. Conversations with those who worked much more closely with the fraudster reveal more telling signs that something wasn’t right with their approach, but again, the vast majority of the people with similar character flaws don’t fudge their data. It’s only once you formally track every single operation that has been carried out on their original data that you can know for sure whether or not someone has perpetrated scientific misconduct. And that’s exactly how this individual’s misconduct was discovered – an eagle-eyed researcher working with the fraudster noticed some discrepancies in the data after one stage of the workflow. Is it all in the data?
Let’s move beyond the few bad apples argument. A more open scientific process (e.g. the inclusion of original data with the journal submission) would have flagged some of the misconduct being perpetrated here, but only after someone had gone to the (considerable) trouble of replicating the analyses in question. Most worryingly, it would also have missed the misconduct that took place at an earlier stage of the workflow. It’s easy to modify original data files, especially if you have coded the script that writes them in the first place. It’s also easy to change ‘Date modified’ and ‘Date created’ timestamps within the data files.
Failed replication would have helped, but the file drawer problem, combined with the pressure on scientists to publish or perish typically stops this sort of endeavor (though there are notable exceptions such as the “Replications of Important Results in Cognition”special issue of Frontiers in Cognition ). I also worry that the publication process, in its current form, does nothing more constructive than start an unhelpful rumour-mill that never moves beyond gossip and hearsay. The pressure to publish or perish is also cited as motivation for scientists to cook their data. In this fraudster’s case, they weren’t at a stage of their career typically thought of as being under this sort of pressure (though that’s probably a weak argument when applied to anyone without a permanent position). All of which sends us back to trying to spot the fraudster and not the dodgy data. It’s a circular path that’s no more helpful than uncharitable whispers in conference centre corridors.
So how do we identify scientific misconduct? Certainly not with a personality assessment, and only partially with an open science revolution. If someone wants to diddle their data, they will. Like any form of misconduct, if they do it enough, they will probably get caught. Sadly, that’s probably the most reliable way of spotting it. Wait until they become comfortable enough that they get sloppy. It’s just a crying shame it wastes so much of everyone’s time, energy and trust in the meantime.
*I won’t mention their name in this post for two reasons: 1) to minimise collateral damage that this is having on the fraudster’s former collaborators, former institution and their former (I hope) field; and 2) because this must be a horrible time for them, and whatever their reason for the fraud, it’s not going to help them rehabilitate themselves in ANY career if a Google search on their name returns a tonne of condemnation.
If you’re trying to decide on a journal to submit your latest manuscript to, Jane – the Journal/Author Name Estimator, can point you in the right direction. This isn’t exactly breaking news, but it’s worth a reminder.
To use Jane, copy and paste your title and/or abstract into Jane into the text box and click “Find journals”. Using a similarity index with all Medline-indexed publications from the past 10 years, Jane will spit out a list journals worth considering. Alongside a confidence score, which summarises your text’s similarity to other manuscripts published in that journal, you’re also provided with an citation-based indication of that journal’s influence within the field.
The other available searches are the “Find articles” and the “Find authors” search, the last of which I suspect I would use if I were an editor with no idea about whom to send an article to for review. As an author, it’s worth running abstracts through these searches too to make sure you don’t miss any references or authors you definitely ought to cite in your manuscript.
The RPi does not support OpenGL. I approached this system with the idea of using a python environment to create and present experiments. There are two good options for this that I know of, opensesame and psychopy. Psychopy requires an OpenGL python backend (pyglet), so it won’t run on the Rpi. Opensesame gives you the option of using the same backend as PsychoPy uses but has other options, one of which does not rely on openGL (based on pygames). This ‘legacy’ backend works just fine. But the absence of openGL means that graphics rely solely on the 700 mHz CPU, which quickly gets overloaded with any sort of rapidly changing visual stimuli (ie. flowing gabors, video, etc.).
Because of the lack of OpenGL support on the Pi, Psychopy is out (for now) leaving OpenSesame as the best cog psych-focused python environment for experiment presentation. The current situation seems to be that the Pi is suboptimal for graphics-intensive experiments, though this may improve as hardware acceleration is incorporated to take advantage of the Pi’s beefy graphics hardware. As things stand though, experiments with words and basic picture stimuli should be fine. It’s just a case of getting hold of one and brushing up on python.
I often bang on about how useful twitter is for crowd-sourcing a research community. Today I was reminded of quite brilliant the people on twitter can be at helping to overcome an ‘I don’t know where to start’-type information problem.
I’m currently helping to design an fMRI study which could benefit considerably from the application of multivoxel pattern analysis (MVPA). Having no practical experience with MVPA means I’m trying to figure out what I need to do to make the MVPA bit of the study a success. After a few hours of searching, I have come across and read a number of broad theoretical methods papers, but nothing that gives me the confidence that anything I come up with will be viable. Of course, there’s no right way of designing a study, but there are a tonne of wrong ways, and I definitely want to avoid those.
So, I turned to twitter:
Twitter help please. Can anyone recommend a beginner’s guide to fMRI MVPA, from trial counts required to software, analysis steps etc? — Akira O’Connor (@akiraoc) January 23, 2013
Relays and Retweets from @hugospiers, @zarinahagnew and @neuroconscience led to the following tweets coming my way (stripped of @s for ease of reading… kind of).
Our lab works with min 40 trials per condition for MVPA. I think there is a poster out there maybe from the Haxby group on this. — M Barnett-Cowan (@multisensebrain) January 23, 2013
Sounds about right – depends a LOT on task/design though. Could perhaps get away with less. — Matt Wall (@m_wall) January 23, 2013
Sure, I could have come up with as many articles to read by typing “MVPA” into Google Scholar (as I have done in the past), but the best thing about my twitter-sourced reading list is that I’m confident it’s pitched at the right level.
I’m humbled by how generous people are with their time, and glad so many friendly academics are on twitter. I hope collegiality and friendliness like this encourages many more to join our ranks.
IFTTT, if this then that, is an online, multi-service task automation tool I first read about on Lifehacker last year. I finally started using it today, and am seriously impressed.
Once you’ve signed up for an account, you can create IFTTT ‘recipes’ to check for actions and events on one online service (e.g. Google Reader, Dropbox, WordPress, Facebook etc.) and use them as an automatic trigger of a predetermined action in another (e.g. Gmail, Google Calendar, Tumblr etc.)
Example: To keep track of journal articles I should read, I monitor journal table of contents RSS feeds and e-mail interesting posts to myself for later download and consumption. I use my iPad, my phone, and occasionally my PC browser to access Google Reader, but struggle with how fiddly it is to e-mail myself on my mobile devices (with my filter-trigger keywords in the message body) whenever I find an article I want to read. I’m sure I’ve missed articles I ought to have read through setting my action criterion a bit too high, as a direct result of how annoying it is to e-mail myself articles using the various Google Reader interfaces on my mobile devices. Today I set up IFTTT to check for starred Google Reader feed items, and automatically do everything else beyond this that I find annoying. Perfect!
IFTTT will check for custom recipe triggers every 15 minutes, so it isn’t something you’d want to use for actions you require to be instantaneous, but it’s perfect for situations like the above. The services with which it is integrated are many and varied, and the possibilities therefore nearly limitless.
UPDATE 16/04/2014: I just came across this page and found that I had referenced the now defunct Google Reader. When Reader died I moved all of my RSS feeds across to feedly, which IFTTT supports with identical functionality. I also apply the same rule to twitter posts I favourite, meaning that I have a Gmail folder in which IFTTT aggregates all of the stuff I want to read from both feedly and twitter.