When I submit my work, all of the variables at play, including the quality of the thing being judged, combine to give me a probability that a positive outcome will occur e.g. 0.4 – 2 out of 5 times, a good thing will happen. BUT, probabilities produce lumpy strings of outcomes. That is, good and bad outcomes will appear to us pattern-spotting humans to be clustered, rather than what we would describe as “random”, which we tend to think of as evenly spaced (see the first link above).
To illustrate, I did something very straightforward in Excel to very crudely simulate trying to publish 8 papers.
Column A: =RAND() << (pseudo)randomly assign a number between 0 and 1; in the next
Column B: =IF(Ax>0.4, 0,1) << if the number column A (row x) exceeds .4, this cell will equal 0, otherwise it will equal 1.
Thus, column B will give me a list of successes (1s) and failures (Os) with an overall success rate of ~.4. It took me four refreshes before I got the following:
Although the rejections look clustered, they are all independently determined. I have almost certainly had strings of rejections like those shown above. The only thing that has made them bearable is that I have switched papers, moving on to a new project after ~3 rejections, at the same time giving up on the thrice-rejected paper I assume to be a total failure. As a result, I am almost certainly sitting on good data that has been tainted by bad luck.
A group of second-year students asked me to contribute a ‘Real World Stats’ piece to their new psychology publication, MAZE. I reworked a section from of my most popular statistics lectures on probability theory and roulette. Below is the article in full.
Roulette is a straightforward casino game. While the wheel is spinning, a ball is released. This ball eventually ends up stopping in a numbered (1-37) and coloured (red or black) pocket. You bet on the final resting place of the ball by selecting a number, range of numbers or a colour. When betting on colours, if you pick correctly, you double your money. A £20 stake on black would get you £40 back if the ball landed in a black pocket, and nothing back if it landed in a red one.
A few years ago, I received quite a few spam e-mails with the following tip on how to win at roulette.
> So I found a way you can win everytime:
> bet $1 on black if it goes black you win $1
> now again bet $1 on black, if it goes red bet $3 on black, if it goes red again bet $8
> on black, if red again bet $20 on black, red again bet $52 on black (always multiple
> you previous lost bet around 2.5) if now is black you win $52 so you have $104 and you bet:
> $1 + $3 + $8 + $20 + $52 = $84 So you just won $20 :)
> now when you won you start with $1 on blacks again etc etc. its always
> bound to go black eventually (it’s 50/50) so that way you eventually always win.
If you ignore the atrocious spelling and grammar, the basic idea seems to be a good one. In fact, it’s an established betting strategy known as the Martingale system. Under this system, you double losing bets until you win, that way you will always win an amount equivalent to your first stake. If we build a probability tree for a gambler who only bets on black and provide her with a fairly standard outcome, two losses followed by one win, you’ll see how this is meant to work.
Over three bets, she has spent £7, but won £8. Not too shabby. She just needs to do this over and over until she has won an amount she’s happy with. Fool-proof, right?
Not quite. Casinos have stayed in business over centuries for a reason: they know how to work probabilities. One of their standard strategies is to have minimum and maximum stake limits, with a typical range of £10-£1000. These limits expose a huge flaw in our spam-based strategy.
Imagine you’re trying the Martingale strategy and you go on a losing streak. £10, £20, £40, £80, £160, £320 and £640 all go on losing bets and all of a sudden you’re down £1270. Here’s where you come up against the casino’s maximum bet policy. You can’t place a £1280 bet to recoup your losses. But how likely is losing 7 bets in a row?
Not very likely at all, if you’re only trying to win £10. According to the multiplication rule for independent events, the exact odds are (1/2)7 which is equal to .0078. Put another way, the probability of this happening is 1 in 128.
But problems arise when you try to make more than £10. To understand the next set of calculations, we need to reverse the probability of losing and think about how likely it is that we will win £10 each time we try. Using the addition rule for mutually exclusive events, we can calculate that the probability of winning £10 is equal to the probability of not losing:
pwin+ plose = 1
pwin= 1 – plose
pwin= 1 – 1/128 = 127/128 = .9922
We can now work out the probabilities of making various amounts of profit, once again using the multiplication rule:
But here’s the kicker. If you want to double the money you bring to the casino to place these bets, you’re looking at close to a 2 in 3 chance that you will lose everything.
£1270 profit = (127/128)127 = .3693
“I’m not greedy!” I hear you cry. “I’d just want to go home with a little more than if I had invested the money and not had any fun at all.” Let’s say you wanted to take home a little more than, 6%, the best savings interest rate you can currently find on moneysupermarket.com (as of when this article was written). How much would you need to win?
£1270 x .06 = £76.20
You would need to win 8 times in a row to go home with a little more than a 6% interest rate. And what are the odds of this happening?
£80 profit = (127/128)8 = .9392
Put another way, 15 out of 16 times, you will exceed a savings account interest rate. You will enter the casino with £1270 and leave with £1350. But, 1 in 16 times you will leave the casino with nothing. Not even enough to get the 99 bus back over the Tay. Sadly, this sort of thing is all too common, especially when people are new to gambling and thing they have found a way of beating the system: e.g. http://casinogambling.about.com/od/moneymanagement/a/martingale.htm.
Even if you find a casino with no maximum bet, you need huge financial resources to make it work. It all starts to seem even more hopeless when you factor in something I neglected to mention at the start. Your odds of winning are actually worse than 50%. If the ball lands on 0 the casino takes all the money.
The take-home-message? It’s probably best to ignore financial advice you read in your spam folder.
EDIT (17/1/2014): A link to moneysupermarket.com was removed following receipt of an email from a moneysupermarket.com employee requesting that I comply with their request of ” removing or adding a nofollow attribute to the links to our MoneySuperMarket.com website”. Doing a quick search to find out why reveals https://groups.google.com/forum/#!topic/FleetStreet/HDVHXFwQFdI, which suggests that this is all about SEO optimisation such that “some [links] may look un-natural or paid for in the eyes of Google. This unfortunately means that we have to take down a large number of our links, some of which were genuine and of use/interest to users of the sites on which they were posted”. The suggestion on that last link is that moneysupermarket.com might be doing this because they have previously been penalised for paying for links to their site and are now doing what they can to stop this perception. I don’t know what authority they have to enforce removal of links like this (none I suspect), but I don’t really care enough to kick up a fuss… link removed.
It has been a busy and exciting few weeks, but I’m now ready to get back to business and do some research!
By way of re-introducing myself to my own blog (and the blogs of others), here’s a nice blog-post by Tal Yarkoni of  on the common practice of data-peeking. It should make you think twice about taking a peek at the data half-way through data collection, though I’m not sure whether researchers really are as systematic in their peeking as Tal suggests in his worst-case scenario. I’m sure that Tal would argue it doesn’t matter how systematic you are, once you peek, you know, and this necessarily changes your approach to the data.