Tuesday, June 19, 2012

The Fairest Way to Pick a Team

What is the best way to pick a team? As kids we would always strictly alternate between teams so team 1 had first team 2 the second pick and then team 1 again etc.

Most things you can measure about people are on a bell curve. A small number of people are bad, most are in the middle and a few are good. There are a few good known metrics of ability. None are perfect, there is no one number that can sum up ability. The simpler the sport the more one metric can tell you, in cycling VO2 max is a very good indicator. Whereas in soccer VO2 max, kicking speed, vertical leap, number of keep me ups you can do etc could all measure some part of football ability.

So say there was one good metric for a task and teams were picked based on this. Is the standard strict alteration, where Team 1 picks then Team 2 alternating, fair? Fair here meaning both teams end up with a similar quality.

I wrote a program in R Package. Not because I know it but because it is perfect for this sort of problem. If you are picking 5 a side and the best player left is always picked by a team how much better is the first picker?

Strict Alteration the code is

players<-10
#create a vector
z <-0
#run 10000 simulations
for(i in 1:10000)
{
#rnorm generates a normally distributed dataset
# this one has 10 elements. A mean of 100 and a std of 12
#sort puts the biggest at the end
x <- c(sort(rnorm(players, mean=100, sd=12)))
# for each simulation take every second one and put it into a different team. 
# Give one team even and one odd 
z <- append(z, sum(x[c(1,3,5,7,9)]-x[c(2,4,6,8,10)]))
}
print(sd(z))
#get the average difference between the two teams
print(mean(z))

> print(sd(z))

[1] 8.794016

> print(mean(z))

[1] -22.59786

IQ has an average of 100 and a standard deviation of 12. IQ isn't used much to pick soccer teams but many things follow a similar pattern. In software development IQ wouldn't be the worst metric to pick a team on and agile teams are supposed to have between 5 and 9 members. So think of this as people picking teams of developers.

In this simulation Team 1 ends up with .225 of a person advantage. The more people on the team the greater advantage the first picker gets.

18 players

> print(sd(z))

[1] 8.287164

> print(mean(z))

[1] -25.52077

16 players

> print(sd(z))

[1] 8.20681

> print(mean(z))

[1] -25.00685

Would another way of picking the teams be fairer?

Balanced Alteration from the Win Win Solution by Brams and Taylor 'strict alteration can give a big boost to the first chooser when there are only two parties. What we need to do is reduce this advantage of the first chooser by amending strict alternation.'

The balanced alteration allows the captains to be first chooser in turn.

This is

Team 1 Team 2 Team 2 Team 1 Team 1 Team 2 Team 2 Team 1....

the code is

players<-10
#create a vector
z <-0
#run 10000 simulations
for(i in 1:10000)
{
#rnorm generates a normally distributed dataset
# this one has 10 elements. A mean of 100 and a std of 12
#sort puts the biggest at the end
x <- c(sort(rnorm(players, mean=100, sd=12)))
# for each simulation take every second one and put it into a different team. 
# Give one team even and one odd 
z <- append(z, sum(x[c(1,4,5,8,9)]-x[c(2,3,6,7,10)]))
}
print(sd(z))
print(mean(z))

> print(sd(z))

[1] 9.757417

> print(mean(z))

[1] -9.04198

This method looks better than the standard strict alteration.

Thinking about the bell curve though is would make sense if the team that got the best player got the worst, and the second best the second worst etc. This should even up the teams well. The code for this is

players<-10

#create a vector
z <-0
#run 10000 simulations
for(i in 1:10000)
{
#rnorm generates a normally distributed dataset
# this one has 10 elements. A mean of 100 and a std of 12
#sort puts the biggest at the end
x <- c(sort(rnorm(players, mean=100, sd=12)))
# for each simulation take every second one and put it into a different team. 
# Give one team even and one odd 
z <- append(z, sum(x[c(2,4,6,7,9)]-x[c(1,3,5,8,10)]))
}
print(sd(z))
print(mean(z))

> print(sd(z))

[1] 9.3498

> print(mean(z))

[1] 3.027536

This has a better average difference. The fact the difference is as high as it is makes me think I may have a bug in my code.

Kids to implement this method would have to alternate picking a player until they were about to pick the middle player in their team Then Team 2 would get a second pick. This sounds almost practical.

When you played a sport (particularly soccer) as a kid what rules did you pick teams by? Can you think of a better algorithm now?

14 comments:

Anonymous said...

May be that the difference in the last version is roughly the difference between the 6th and 5th pick (and not a bug in the code). These aren't balanced by another pick late in the distribution.

David Curran said...

Thanks anonymous. If I swam 5 and 6 the average difference pretty much reverses so it looks like it is that alright

z <- append(z, sum(x[c(2,4,5,7,9)]-x[c(1,3,6,8,10)]))
+ }
> print(sd(z))
[1] 9.31697
> print(mean(z))
[1] -2.954866

Paul Rubin said...

Assuming that both sides know the pertinent data for every player, you could use a variation on the method for dividing a pie among two children: let one captain divide the participants into two teams, then let the other captain select which team he/she will lead.

David Curran said...

Thanks Paul. As Homer would say upon realising he'd been stoopid, Doh!

When are more than 2 teams picked in real life? Medical students going to hospitals is a bit like team picking.

Does the US army use fitness metrics to decide what part of the army to send you to?

Paul Rubin said...

It happens I've been doing a bit of work on this. In some locales here, young children are enrolled in youth recreation leagues at the league level, rather than at a team level. The league is responsible for creating teams, and they want as much parity as possible.

Also, some schools (including my former employer) assign MBA students to project groups or study teams. Again, the desire is to balance teams along various demographic and competitiveness dimensions.

Not sure about the Army. They take into account recruit preferences, recruit aptitude scores and where the demand is (open billets) ... I think weighted heavily toward the last one.

larrydag said...

As an R fanboy and R User Group organizer I thought I would oblige to simplify your code a bit.

### improved simulation code
players <- 10
x <- replicate(10000, sort(rnorm(players, mean=100, sd=12)))
z <- colSums(x[c(1,3,5,7,9),]-x[c(2,4,6,8,10),])
print(sd(z))
print(mean(z))

David Curran said...

Thanks LarryDag

There was a discussion on google+
https://plus.google.com/u/0/114950266501968037434/posts/9phLU8xZRhG
That came up with similar code to yours. Thanks for fixing up my code. And for not ragging on me for being such a poor r package coder

Paul I had not considered having to balance demographic factors. It sounds a bit like being the Handicapper general from Harrison Bergeron http://www.tnellen.com/cybereng/harrison.html

Paul Rubin said...

David,

A volunteer for a youth recreation league found me online and posed the problem to me - including attributes I would not have come up with on my own. From teaching, I already knew that our program was sensitive to distributing international students across teams.

David Curran said...

This article might interest you Paul
Understanding uncertainty: The Premier League
http://plus.maths.org/content/understanding-uncertainty-premier-league

Once you distribute players trying to make the teams even you might want to see how well it worked at the end of the season.

John Pollard said...

Very interesting discussion! I finally decided to write an online program for fair team picking. Initially it was for my 5-aside football group but as other groups were interested I have now made it available to any interested group. See Team Picker. It lets you rank players and then crucially it auto adjusts ranks after each game, so in the long run every player will win about half of all games. We have found it has made for close games most weeks.
John

John Pollard said...

Sorry link above broken. You can find it at team-picker dot com, try again Team Picker
John

David Curran said...

Thanks for that John. Nice work.This is Java based?

Kama Life Styles said...
This comment has been removed by a blog administrator.
John Pollard said...

Hi David, you are right that Team Picker is written in Java (how did you know?). It is starting to gain interest; recently got Jacksonville Beach Volleyball group on board and they pick multiple teams in a round-robin match up which I added support for. If you want to set up a group to have a go with, let me know via http://www.team-picker.com/ and I'll set it up. I am particularly proud of the multi-team picking of fair sides as it is potentially something that could grind the server to a halt with escalating combinations of possible players in teams.