This web report includes descriptive statistics of the Seattle 911 CAD data. The report starts with an overall summary of the structure of the dataset and then steps through each variable in the dataset.

Dataset Description

Let’s start by identifying the dimensions in the dataset.

## [1] 480811     17

There are 752,421 events and and 17 variables in the CAD data export. For this analysis, we narrow the focus to 911 and other telephone call types only. The reduced dataset contains 480,811 events.

The variable names from the CAD export are listed below.

##  [1] "CAD_Event_ID"                        "Dispatch_ID"                        
##  [3] "Event_First_Dispatch_Time_ATTR"      "Call_Priority_Code"                 
##  [5] "Call_Type_Desc"                      "Case_Type_Final_Desc"               
##  [7] "Case_Type_Initial_Desc"              "Clear_By_Desc"                      
##  [9] "Dispatch_Address"                    "Officer_Serial_Num"                 
## [11] "Precinct"                            "Sector"                             
## [13] "Squad_Desc"                          "Dispatch_Blurred_Latitude"          
## [15] "Dispatch_Blurred_Longitude"          "CAD_Event_Response_Time_Seconds_SUM"
## [17] "Total_Service_Time_Seconds_SUM"

Now, let’s find the number of categories in the categorical variables. In subsequent sections, I will step through each variable and summarize the distributions in greater detail.

# of Categories for Categorical Variables
Dispatch ID Priority Code Call Type Case Type Final Desc Case Type Initial Desc Clear By Desc Precinct Sector
474,571 8 2 320 231 23 6 18

Dispatch ID - This is some sort of identifier. It’s interesting that the identifiers are not unique to each event. What does the dispatch ID identify? Is this identifier going to be relevant for our analysis?

Call priority codes and Call type description have a manageable number of categories - 9 and 2, respectively. Call type has 8 possible categories, but we are only focusing on two categories - 911 and other telephone (not 911). After taking a deeper dive into the univariate statistics in the sections below and understanding what these categories mean, we can decide whether any of these categories should be aggregated.

Case Type Final and Case Type Initial Descriptions - These two variables have the greatest number of categories with 343 and 235 categories, respectively. We will want to parse out the categories and see how to regroup into a smaller, more manageable set of categories for analysis. After looking over the categories we can figure out some strategies for aggregating categories.

Clear by description - There are 23 categories in this variable. After further review below, we can look to see if any aggregation is necessary.

Precinct is a categorical spatial indicator. It looks like the city is divided into 6 regional precincts.

Sector - There are 17 sectors. This variable appears to be another spatial category related to precinct. This will be described in the sector section below.

Before diving into the distributions of the categorical variables in greater detail, let’s take advantage of the fact that the data are time-stamped and get a sense of the frequency of events throughout the year.

Event Dates & Times

The data are time stamped to the minute. In the graph below, I have displayed the frequency of events per day. Hover your mouse over the line graph to see the number of events that occurred on a given day.

The date with the highest number of events recorded was 1,667, which was on July 5th. In general, the summer months appear to have higher frequencies that the rest of the year.

November 14th, 2019 is the date with the most marked decrease in events. There were only 61 events recorded on November 14th. This is far below other days with fewer events than normal, as shown in Table 1 below. It raises the possibility of a glitch in the reporting system for that day.

Table 1: Dates with Highest and Fewest Calls
Date # Calls Rank
2019-07-05 1,667 1
2019-06-12 1,637 2
2019-06-14 1,623 3
2019-07-13 1,617 4
2019-05-30 1,603 5
2019-06-10 1,590 6
2019-05-02 1,572 7
2019-07-04 1,572 7
2019-05-31 1,561 9
2019-06-01 1,556 10
2019-12-22 1,080 356
2019-03-03 1,068 357
2019-12-24 1,054 358
2019-11-28 1,038 359
2019-12-25 1,037 360
2019-02-10 1,030 361
2019-11-26 1,024 362
2019-02-03 999 363
2019-11-13 366 364
2019-11-14 61 365

On average, there were 1,317 calls for service per day in 2019. With the exception of the 76 event day on November 14th, there is not much of a skew to the distribution.

Table 2: Calls Over Time Summary
Daily Avg Std. Dev Median
1,317.29 147.2586 1,324

Call Priority Codes

Code 2 is the most common priority code recorded with a total of 189,788 events. According to Table 3, Code 2 is about 40% of the events in 2019. Just over 96% of the calls for service are categorized as being categorized as priority codes 1 through 3.

Codes 6 and 7 were very rare. They do not show up as clearly in the graph, but in Table 3 they total to 44 and 75 calls, respectively.

One other point to note is that there is not a code 8; the codes skip from 7 to 9.

Table 3: Call Priority Codes
Code # Calls %
1 154,849 32.21
2 189,788 39.47
3 118,628 24.67
4 8,971 1.87
5 6,565 1.37
6 44 0.01
7 75 0.02
9 1,891 0.39

Call Type Description

Table 4: Call Type Description
Type # Events %
911 325,008 67.6
TELEPHONE OTHER, NOT 911 155,803 32.4

We have retained only calls for service that came in via 911 or other telephone calls (not via 911). 911 calls are about 68% of the calls and other telephone source makes up the remaining 32%.

For reference, prior to reducing the dataset, 911 calls were about 43% of the events and other telephone was 21%.

Case Type Final Description

Flip through the pages in the table to view the number of events with each type of case final description. Recall that this variable has 320 different descriptions.

Some of these descriptions have a general description followed by a more specific description that follows a dash. We could parse on the general description and then aggregate to get a smaller set of categories. I demonstrate this in the table below.

This aggregation strategy reduced the number of categories by a little over half to 140. Disturbance cases are the most common followed by suspicious circumstances and traffic. If you flip through the pages, there are some categories that also appear to be similar to these top 3. For instance, traffic stop is listed on page 6, which seems like it could also fit under traffic. Also on page 6 is the category “Dist”, which is an abbreviation for disturbance. All of descriptions and frequencies for the final case type descriptions are listed in the exported Excel file (shared over email and on the github page).

Other Comments * Need to make sure to catch abbreviations using reg. expressions (e.g., burg –> burglary) * Similarly, use reg. expressions for categories that look alike but differ in terms of spacing (e.g., Arson, Bombs, Explo; Abandoned car & Abandoned vehicle) * “#NAME?” looks like it might be the classification for events that were not classified. There are 977 events with this classification, which is about 0.2 events.

Case Type Initial Descriptions

The top two initial case type descriptions are similar to the final case description types.

One note on structure of these descriptions is that not as many of these descriptions have the same structure as noted in the final descriptions, that is a general description followed by a more specific description/detail, with the two descriptions separated by a dash “-”. Below, I have parsed out the description as I did with the final case descriptions, however, it may be a less useful approach for this description.

Other Comments/Questions * Need to make sure to catch abbreviations using reg. expressions (e.g., HAZ –> HAZARD) * “#NAME?” shows up again in this set of descriptions, though not as frequently as it did in the final descriptions (n=12,132). * Would it be useful to compare final and initial descriptions? We could use some fuzzy matching and regular expressions if this is something important. If final descriptions are missing (meaning that they are coded as #NAME?) and initial descriptions are not missing, should the initial description be applied?

Aggregating reduced the number of descriptions down to 123. The top four descriptions remain the same, but the rest of the top 10 have shifted ranks (e.g., assault, trespass).

NOTE: Unknown is pretty substantial here (n=13,907, 2.89%). The #NAME? description is less frequent (n=528), but appears to also signify unknown case descriptions.

Clear by Description

Table 4: Clear By Descriptions
Description # Events %
ASSISTANCE RENDERED 167,457.0 34.83
REPORT WRITTEN (NO ARREST) 148,495.0 30.88
UNABLE TO LOCATE INCIDENT OR COMPLAINANT 55,366.0 11.52
PHYSICAL ARREST MADE 41,140.0 8.56
NO POLICE ACTION POSSIBLE OR NECESSARY 19,290.0 4.01
CITATION ISSUED (CRIMINAL OR NON-CRIMINAL) 11,705.0 2.43
RESPONDING UNIT(S) CANCELLED BY RADIO 10,169.0 2.11
ORAL WARNING GIVEN 5,772.0 1.20
FOLLOW-UP REPORT MADE 4,178.0 0.87
DUPLICATED OR CANCELLED BY RADIO 4,137.0 0.86
OTHER REPORT MADE 2,968.0 0.62
FALSE COMPLAINT/UNFOUNDED 2,872.0 0.60
STREET CHECK WRITTEN 2,477.0 0.52
- 1,878.0 0.39
INCIDENT LOCATED, PUBLIC ORDER RESTORED 1,839.0 0.38
RADIO BROADCAST AND CLEAR 429.0 0.09
PROBLEM SOLVING PROJECT 330.0 0.07
TRANSPORTATION OR ESCORT PROVIDED 143.0 0.03
NON-CRIMINAL REFERRAL 80.0 0.02
EXTRA UNIT 35.0 0.01
SERVICE OF DVPA ORDER 21.0 0.00
(NOT CURRENTLY USED) ALARM NO RESPONSE 16.0 0.00
NO SUCH ADDRESS OR LOCATION 14.0 0.00

In 38% of the service calls (n=286,250), assistance was rendered. The next most common response type was no arrest, but report, which was applied to about 31% of the calls (n=148,495).

The next most common clear by type was unable to locate incident or complainant. It was applied to about 11.5% of calls (n=55,366). 14 calls were marked as no such address or location (not sure if it is reasonable to consider this as similar to unable to locate incident).

A physical arrest was made in 8.5% of calls (n=41,140). No police action was possible or necessary in 4% of calls (n=19,290).

OTHER NOTES: * It looks like a dash “-” represents missing clear by description (n=1,878). * There are some descriptions that I do not know what they mean or how they differ from other descriptions. For instance, how are responding units canceled by radio and duplicated or canceled by radio different? * Unable to locate incident or complainant is about 11.5% of the events.

Precinct & Sector

Table 5: Calls per Precinct
Precinct # Calls %
NORTH 141,513 29.43
WEST 134,448 27.96
SOUTH 76,186 15.85
EAST 75,810 15.77
SOUTHWEST 51,572 10.73
UNKNOWN 1,282 0.27

The north and west precincts had the most calls with about 29% and 27% of all calls, respectively. South and and east precincts had similar shares of calls at about 16%. The southwest precinct had the fewest number of calls recorded - 51,572 (11%).

For 1,282 calls, the precinct is listed as unknown. We may be able to identify a precinct for these events if they have valid latitude and longitude coordinates. Let’s look to see if they do have lat and long:

Table 6: Unknown Precinct Coordinate Status
Coordinate Status # Calls
Not valid coords 720
Valid coords 562

About 43% of the calls with an unknown precinct have coordinates within the geographic extent of Seattle. We can use 562 of these events with unknown precincts and assign them a precinct. When I create a spatial object from the coordinates, as shown a few sections below, I will be able to plot these. For some it may be obvious what the precinct is based on the precinct labels given to neighboring events. If the precinct classification is not obvious, the best thing to do would be to obtain a shapefile of the polygons for each of the five precincts, overlay it on the events and give the point the name of the polygon precinct that it falls within or nearest to. Seattle’s Open Data website has such a shapefile that I will call on and use in the spatial geoprocessing section below.

There are some interesting bivariate analyses that could be explored. For example, call priority codes and precincts. View the interactive stacked bar chart below.

A few things stand out in the stacked bar graph of call priority codes and precincts. * The breakdown of precincts within codes 1 and 2 are very similar. The north and west precincts have very similar shares in these two codes. * Most of the unknown precinct calls were classified as code 9. * The south precinct had no code 7 cases. * Over half of the calls in code 9, were in the western precinct.

Let’s turn to focus on the sectors. There are 17 distinct sector names. 1,282 calls were not given a sector. These calls are identical to those missing a precinct classification.

Table 7: Calls by Precinct-Sector
Precinct Sector # Calls Percent
SOUTH ROBERT 30,327 39.81
SOUTH SAM 24,427 32.06
SOUTH OCEAN 21,432 28.13
EAST EDWARD 33,631 44.36
EAST GEORGE 21,241 28.02
EAST CHARLIE 20,938 27.62
SOUTHWEST WILLIAM 25,852 50.13
SOUTHWEST FRANK 25,720 49.87
WEST KING 45,650 33.95
WEST DAVID 32,159 23.92
WEST MARY 29,461 21.91
WEST QUEEN 27,178 20.21
NORTH BOY 33,684 23.80
NORTH UNION 32,389 22.89
NORTH NORA 27,150 19.19
NORTH LINCOLN 27,065 19.13
NORTH JOHN 21,225 15.00
UNKNOWN NA 1,282 100.00

Sectors are unique to precincts. We can think of a sectors as a subdivision of the precinct.

King sector in the western precinct leads in the number of calls with 45,650 calls. This is about 34% of all calls in the west precinct. The other three sectors in the western precinct - David, Mary, and Queen - have about 10% to 12% fewer events than King.

Boy in the north precinct and Edward in the east precinct are the sectors with the next highest frequency of calls with over 33,000 calls. The share of calls in Boy is not substantially greater than other sectors in the north. However, Edward clearly has the majority of calls in the east precinct, amounting to about 44% of all calls in the precint.

The two sectors of the southwest precinct - William and Frank - have a 50-50 split of the calls.

NOTE: The Seattle Open Data website does not appear to have a boundary shapefile or API for sector. This may be something to inquire about if we want to do point-in-polygon analyses at the sector level.

Squad Description

This is one of the variables with an unmanageable amount of categories. There are only 1,487 events missing a squad description. If you flip through the pages of the table you can see that the squad groups are named in various ways. Some are based on the field/area they work in (e.g., forensics, Arson/Bomb) and others are based on locations (i.e., precinct + sector). NOTE: If this is a variable that is considered important we would need to approach the aggregation like we would for the Case type descriptions using the first descriptor before the dash, regular expressions, and lazy matching to get broad categories and abbreviations, misspellings, and differences in ordering of words.

Officer Identifier

## [1] 1262

There are 1,262 officers in this dataset.

Response Time

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##        0      246      574     2121     1886 14773646

Response time for each event is reported in seconds. The summary statistics suggest that there are some very long response times that are outliers. The longest response time is 14,773,646 seconds, which would be many, many days long. Let’s parse the seconds into higher levels of time.

With the times parsed into periods and sorted from longest to shortest time, we can see that the longest time was 170 days and the case was a test call. This is probably a candidate for excluding. For completeness, below the data displayed sorted from shortest to longest, so that it is easier to see what the short response times are.

Total Service Time

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   -2604     598    1455    2936    3664  107771     221

The distribution for total service time on events is strange. There are 221 events missing a total service time. Additionally, there is at least one event that had a negative total service time recorded. First, let’s see how many negative values we have.

Table 7: Total Service Time
Date Service Time (Seconds) response time parsed Case Type Final
2019-11-03 -2,604 3H 41M 19S TRAFFIC - PARKING VIOL (EXCEPT ABANDONED CAR)
2019-11-03 -1,829 1H 22M 0S CRISIS COMPLAINT - GENERAL
2019-11-03 -1,829 1H 22M 0S CRISIS COMPLAINT - GENERAL
2019-11-03 -1,091 9M 1S DISTURBANCE - OTHER
2019-11-03 -998 4M 50S ASSAULTS, OTHER
2019-11-03 -699 2M 9S DISTURBANCE - OTHER
2019-11-03 -699 2M 9S DISTURBANCE - OTHER

There are only 7 events in the dataset with negative values. When we include information like the event date, parsed response time, and case description type, we notice that two of these are duplicates. The other thing that stands out is that these events were all recorded on the same date, November 3rd. It is possible that the negative values were a recording error that occurred that day. We could also check for the average service time on other events of a similar type to see if the absolute value of total service time is reasonable.

Now, let’s look at the NA values.

The events with missing values vary on case types. There appears to be some duplicates, e.g., the assault-DV case on January 13th. Again, it seems like event date and response time would be useful for identifying duplicates this dataset.

For the sake of consistency, I parsed the total service time into time periods as I did with the response time. See some of the output below.

With the parsed by period version of service time, we see that the upper end of the service time distribution is 2 days.

Spatial Object

Before transforming the dataframe into a spatial object, the calls with missing or invalid coordinates need to be removed. After filtering those events out, the transformed spatial object contains 459,132 calls with locations. There were 21,679 calls for service that do not have valid coordinates. Mapping all of these calls as points results in over-plotting as shown below.

There are other approaches for visualizations that would be more informative. One approach is to create a point density map to show where the highest and lowest number of events per area occurred in the city. Another approach is to aggregate the points to meaningful geographic units like zipcodes or neighborhoods. The following sections demonstrate these approaches.

Point Density Mapping

This interactive map clusters the points that are proximate. Zoom into different parts of the city to where clusters tend to occur.

Smoothed Point Density Map

The interactive map below shows the areas with high density of calls. Only areas with statistically significant densities are mapped. Highest density areas are in yellow and lowest are red.

Precinct Aggregation & Mapping

Now, let’s turn to by visualizing the frequency of events in different geographic regions of Seattle. In one of the prior sections, I showed the frequency of events per precinct. However, approximately 5,000 of the calls did not have a precinct listed. Now that the dataframe has been transformed to a spatial object, I can identify a precinct for those locations based on which precinct each coordinate pair lies within.

Service Calls per precinct, spatial overlay version
Precinct # Calls
NORTH 135,485
WEST 124,569
EAST 76,677
SOUTH 72,369
SOUTHWEST 48,901
NA 1,131

A couple of things standout from using the spatial overlay approach to assign precincts. First, the number of points that are not assigned to a precinct is 1,131. The reason these points are not assigned is because they lie outside of the precinct boundaries (see the map below). To make use all of these points, the best thing to do would be to keep the precincts that were provided in the original dataset. Then merge in the spatial overlay precincts for the subset of events that did not have valid coordinates. Finally, if there are still points missing precinct assignments, assign them to the precinct that they are nearest to. Let’s do that and then visualize the results.

The map shows not only the events per precinct, but also those events that are outside of the precinct boundaries. NOTE: I assigned the “outlying” points to the nearest precinct for the precinct layer. I included them in the visual just to show that some of the locations do lie outside of the the city limits.

Zipcode Aggregation & Mapping

Another aggregation we can perform and visualize is at the zipcode level. Zipcode boundaries were pulled from Seattle’s Open Data website.

The table below lists the count of events per zipcode. There is a sizable range in calls per zipcode from 7 to 46,396. The map below shows the counts per zipcode.

Calls for Service per zipcode
Zipcode # Calls
98104 46,396.0
98101 35,380.0
98122 30,939.0
98118 28,848.0
98103 26,205.0
98144 24,904.0
98105 23,154.0
98125 22,243.0
98109 22,224.0
98108 18,709.0
98107 18,199.0
98133 17,998.0
98121 17,270.0
98106 15,464.0
98134 15,150.0
98115 14,375.0
98126 12,678.0
98102 12,366.0
98116 10,678.0
98117 10,288.0
98112 9,839.0
98119 9,588.0
98136 4,773.0
98199 4,488.0
98178 2,561.0
98177 1,848.0
98146 1,008.0
98195 839.0
98155 549.0
98168 89.0
98188 11.0
98166 7.0

Zipcodes in the core of the city tend to have the highest counts. The zipcodes along the southeastern edge of the city also have relatively high counts, especially compared to the zipcodes along the southwestern side of the city.

Neighborhood Aggregation & Mapping

The Seattle Open Data website also makes neighborhood boundaries available. In the table below, the events were aggregated to the neighborhoods. This should allow us to drill down to smaller units than the zipcodes. The neighborhoods and their counts are also featured in the map below. We see that the neighborhoods in the city’s core like the CBD, Broadway, and Pioneer Square had the highest calls. Just to the south of the city’s core, the Industrial District also had a relatively high number of calls. In the northern half of the city, the University District is the neighborhood with the highest number of calls.

Events per neighborhood
Neighborhood # Calls
Central Business District 25,898.0
Broadway 24,422.0
Pioneer Square 23,567.0
Belltown 21,659.0
University District 18,528.0
Industrial District 15,100.0
Industrial District 15,100.0
First Hill 12,397.0
Greenwood 10,966.0
International District 10,836.0
North Beacon Hill 10,026.0
Adams 9,820.0
South Lake Union 9,762.0
Lower Queen Anne 9,678.0
Columbia City 9,217.0
Haller Lake 7,960.0
Fremont 7,726.0
Yesler Terrace 7,615.0
Atlantic 7,584.0
Georgetown 6,940.0
Minor 6,748.0
Dunlap 6,532.0
West Woodland 6,523.0
Wallingford 6,116.0
Pinehurst 5,917.0
Stevens 5,807.0
Bitter Lake 5,768.0
North College Park 5,586.0
South Delridge 5,407.0
Pike-Market 5,315.0
Brighton 5,267.0
Mid-Beacon Hill 5,165.0
Olympic Hills 5,127.0
Mount Baker 4,941.0
Cedar Park 4,772.0
Green Lake 4,651.0
High Point 4,549.0
Genesee 4,419.0
South Park 4,270.0
Maple Leaf 4,036.0
Roosevelt 4,011.0
Highland Park 3,958.0
North Admiral 3,957.0
Roxhill 3,740.0
East Queen Anne 3,606.0
North Delridge 3,488.0
Ravenna 3,471.0
Fairmount Park 3,177.0
Holly Park 3,047.0
North Queen Anne 2,960.0
Phinney Ridge 2,827.0
Alki 2,625.0
South Beacon Hill 2,613.0
Victory Heights 2,553.0
Interbay 2,550.0
Rainier Beach 2,431.0
Mann 2,412.0
Riverview 2,275.0
West Queen Anne 2,188.0
Broadview 2,068.0
Loyal Heights 2,061.0
Seward Park 1,996.0
Whittier Heights 1,992.0
Lawton Park 1,975.0
Eastlake 1,966.0
Crown Hill 1,951.0
Leschi 1,827.0
Westlake 1,761.0
Montlake 1,667.0
Wedgwood 1,610.0
Fauntleroy 1,509.0
Gatewood 1,471.0
Seaview 1,462.0
Sunset Hill 1,457.0
Rainier View 1,428.0
Matthews Beach 1,313.0
Madrona 1,275.0
Bryant 1,251.0
Meadowbrook 1,217.0
Southeast Magnolia 1,205.0
Arbor Heights 1,205.0
Sand Point 1,129.0
Madison Park 959.0
Laurelhurst 909.0
North Beach/Blue Ridge 781.0
Harrison/Denny-Blaine 739.0
Briarcliff 656.0
View Ridge 552.0
Harbor Island 468.0
Windermere 450.0
Portage Bay 421.0

Potential Next steps for mapping/spatial analysis:

  • Faceted maps of call locations subset by: 1) case type, 2)call priority, 3) clear by, 4) call type.
  • Density maps by any of the above categories.
  • Space-time slice maps/diagrams for cases of interest.
  • Bring in census block groups and integrate ACS demographics.