Here’s a tutorial on the steps I went through to generate the data for Assignment 3, statistical modeling of stop & frisk data with precinct as the unit of analysis. The first problem is adding a precinct ID to every stop and frisk event, which means matching stop locations to precinct areas. To get baseline racial makeup we also need to sum census data over precincts.
Thanks to John Keefe of WNYC for valuable data and advice.
Just the data
- Here is NYPD’s 2011 stop & frisk data, with appended precinct and lat/long. The column names are explained here.
- Here are all all NYC 2010 census blocks, with appended precinct, from John Keefe, and here is the census data dictionary (see pp.6-10)
How to make this data
1. First, download NYPD raw stop and frisk data here. I used 2011, since that was the year with the most stops before the program was drastically curtailed. There’s also documentation on that page, which explains the data format
2. The XCOORD and YCOORD columns record the position of each stop, in the “New York-Long Island State Plane Coordinate System” also known as EPSG 2908. To do anything useful with this, we’re going to need to convert into lat/long using QGis. Use Layer -> Add Layer -> Add Delimited Text Layer to load the CSV file. It should automatically guess the correct column names. When asked, pick EPS 2908 for the coordinate system. If all goes well, you are now looking at this:
3. We need to save this in a more standard coordinate system for comparison to precinct boundaries. Right click on the “2011” layer and Save As as an ESRI Shapefile with CRS (“coordinate reference system”) of WGS84, which should also be the “Project CRS.” Ensure “add saved file to map” is checked. Now you’ve got a new layer in standard GPS-style lat/long coordinates.
4. Now we need to know the geographic outlines for each police precinct. The raw shape files are here, or your can look at them in a Fusion Table here. In QGIS, Use Add Layer -> Add Vector Layer and select the precinct shapefile (nypp_4326.zip).
5. To assign points to their precinct, run Add polygon attributes to point in the data processing toolbox (Processing -> Toolbox). The “Precinct” field should appear automatically. Again you’ll have a new layer. Right click and save that as a CSV. You should end up with a file that is exactly the same as the NYPD’s 2011 CSV, but with lat/long and precinct columns added.
6. Keefe has also done the work of enriching New York City census blocks with precinct keys, which he’s put in this this Fusion Table. The data above is a CSV download of this file.