Wednesday, January 13, 2010

Analytics X Prize Outside Bets

Black swans are not predictable but are there fairly rare events that could coour this year in Philadelphia that would skew the homicide rate in an area?

I know of nothing that would cause a sudden drop in the number of homicides in one particular area. Except a mass evacuation. There are a few things that could show up as a sudden rise though.

Terrorist attack. Philadelphia is not a well known terrorist target so any attack there would have to be pretty unpredictable. Terrorist attacks are also very rare so I do not think worth considering in a model.

Going Postal: These seem common enough in America. They have their own list here. I doubt you can predict where they will occur though. Might be worth considering.

Religious nutjobs:The solar temples and the kool aid drinkers in general tend to head off into the sticks before topping each other. So I think there is not much chance of a mass homicide by a cult in
Philadelphia.

Riots:Cities kick off on a fairly regular basis, The LA riots in 1992 resulted in 53 deaths for example. Philadelphia has had riots in the past. Riots in America seem to be mainly caused by racial tension. It might be possible to if not predict them localise where they are most likely to occur. Then submit a higher homicide count for that area in one of the analytics x prize submissions as an outside bet.

Prisoneers
: They love a good ruck. In general prison populations have a higher homicide rate. So looking at changes in prisoner ecosystem in Philadelphia might be worth a look

Natural Disasters:After natural disaster people generally think the world turns into something from a Romero film.
The evidence for this is not that strong for example tales of post Katrina anarchy seem to be overblown. Also natural disasters could reduce the homicide rate as people leave an area after one. Philadelphia is unlikely to undergo a natural disaster though.

There are rare events that still occur often enough to make some sort of prediction on. I do not think any of these is worth including in a model with the possible exception of a riot. But predicting that would need more information about riots and Philadelphia than I have at the moment.

Tuesday, January 12, 2010

Survey the people of Philadelphia

In order to tell if a zip code in Philadelphia is getting more dangerous maybe we should ask people who live there.

So I created a survey here to ask them here. If you are or know someone living in Philadelphia if you could fill that out it would be great.

The idea is to find areas people think are changing in safety and see this turns out to help predict homicides. If it does this could be used to focus police resource in future to help reduce homicides.





I will release any data that is input as I am not the best person to do analysis on them. People going to the effort to submit a survey deserve to get the most out they possible. I will anonymise any data that does contain personal info before releasing it. There shouldn't be any info like this but I will check through the data in case.

I think ideally such a survey would let user place pins in a map in areas they think are improving/disimproving. Any thoughts on the survey? Or the idea of asking people for their local predictions in general?

Monday, January 11, 2010

Analytics X Prize

There is a competition here to try and predict what proportion of murders in Philadelphia will occur in each of the cities 47 zip codes. Many people who are interested in these sorts of puzzles have started submitting predictions.

So how would you go about predicting the murder proportion in each zip code?
Well if nothing changes in Philadelphia you would expect each zip code to keep exactly the same proportion of murders, well with some random variation you could not predict. So my first guess is a repeat of exactly what happened last year.

But in the real world things do change. Say the population changes if every person had the same chance of being murdered then the proportion of murders in a zip code would change proportionate to the change in population. If this was the case the prediction problem would become to find out what changes in population will take place over the year.



The dataset I am using is here and some errors in it need to be removed. Each dot is a zip code. It looks like number of murders does roughly follow population but it is not nearly an exact match. So changes in population are important but they are far from the only thing we need to predict.

How expensive the house in an area are or the average income or the number of people per house might help indicate the murder rate. Here I am looking at number of (murders/population)*10000.





So it looks to me a bit like areas with crowded houses could be more likely to have murders.






House cost looks like it is not connected to murder rate. This could be because zip code is too rough grained for this to be a good judge. Maybe the average cost of a house in a block would be a better measure of risk. Philadelphia has even been broken down into 60ft squares here



Does household income look like it is related to murder rate?

So if the graph is a random scattering of dots then it looks like the independant variable on the x-axis has no relation to homicide rate the dependant variable on the y-axis. If the dots form a line (well not just a line but that is another story) then the homicide rate may be related to that independant variable. It really is not this simple but that's the basics.



As Siah pointed out here young black males seem to be murdered out of proportion. The graph above does seem to suggest that predicting changes in ethnicity of a zip code may improve predictions. Age is another important variable and I do not have data on that so that might be the next thing to get.

There are interesting posts already on this puzzle
"Evaluating Spatial Predictions" and "Second Pass at Analytics X Prize" and "Homicides as non homogeneous poisson processes" are very informative.