I got some wheat data from here the US department of Agriculture. The map of the US comes from here
Then I cleaned up the data by taking only the columns for state, county and total wheat production. This dataset includes a county 888 and 999 but that seems to be a combination of all the states counties so I stripped those out. Also there are more than 50 states in these county datasets which seems to be standard. There is always messing with numbers being seen as strings with these sorts of manipulations so some casting is needed.
The svg is 1.9 mbs and google drive does not want to store or convert it at the moment but if anyone wants it I can send it to them. This quality of file means zooming in on an individual state, like Kansas, is fine.
The code to create this picture is here.
JDLong on twitter pointed out where to get data for countries. I got the grains from here and a look at the 'head psd_grains_pulses.csv' shows the file layout
I think I want Country_code and value for the commodity wheat in every country in the most recent year value. The country code is 2 characters (iso 3166-1 alpha 2) and the map I have from wikipedia is that format you can get it here
The code to produce colors for each country based on this data is here. Again this is based on the "Visualize This" book from Yau. This css code to set the color of each country gets pasted into the style section of the BlankMap-World6.svg file. I should read all the documentation describing the values before doing any analysis like this. But I am only doing this to make pretty pictures in Python so I am making assumptions to work quickly.
extra: I made a stacked area graph of what crops have been grown when here with the code here.
Dave, welcome to the world of NASS data.
ReplyDeleteFor more info on the 999 and 888 codes, see here: http://www.nass.usda.gov/Data_and_Statistics/County_Data_Files/Frequently_Asked_Questions/county_list.txt
The 999 county codes are aggregates. If the district = 99 & county = 999 then the value is a state total. If district != 99 then they are district totals. District 88 and county 888 are special cases where, for some reason, the district or county is not individually reported so they are reported in a "special group".
FWIW, there's also an API for pulling NASS data through a restful API: http://quickstats.nass.usda.gov/api
the api does require an API key but they give them out liberally.
-JD
and I see from your code that you already understood the 888 and 999 :)
ReplyDeleteWhy wheat, as opposed to say barley? Seems to me that wheat beers are a fairly small portion of total beer production. ;-)
ReplyDeleteLooks like it misses the smaller counties, I live in Clallam county WA and produce over a hundred tons of wheat, I have neighbors growing wheat as well. I'm sure some of the other white counties are similar. Obviously that was a USDA miss, awesome map.
ReplyDelete