According to a review recently put out by the the Police Executive Research Forum (PERF), the Albuquerque Police Department (APD) was faced with a riddle. In recent years, violent crime and assaults on officers have been declining, training has been adapted to focus more on deescalating conflicts and avoiding physical confrontations, the department procured and trained officers on less-lethal weaponry, a non-disciplinary Early Warning System was put in place to identify officers with warning signs of excessive use of force based on recent incidents, and a Crisis Intervention Team created to help deal with mentally ill persons, which are present in 65% of officer-involved shootings nationwide. Still, Albuquerque’s rate of police-suspect encounters involving use of force remained unusually high relative to comparable cities.
Eager to find out what was going wrong, APD collected all the data they could. Already, any police use of force, from grabbing a suspect to a fatal gun battle, was reported and reviewed by numerous committees through the police hierarchy, Internal Affairs Division, and civilian government. As anyone involved with law enforcement will tell you, reporting and record-keeping is extensive. On top of that, APD monitored their officers with audio recorders on the gunbelts and video recorders on the shirts of every patrol officer, as well as dashboard cameras on squad cars involved in higher-risk operations. They even instituted a database that signalled when an officer was involved in activities indicating risk too often. APD pored over this information to revise training and procedure to little avail, as use of force rates remained steady.
The Albuquerque Police Department is not the only one confused over use of force. Nationwide, experts agree that despite extensive writing on the subject and software to track incidents, use of force, and especially wrongful use of force, is poorly understood due to gaps in data and inconsistencies in reporting between departments. There are many “snapshots” of use of force from select departments, studies, and reports, some contradictory, but no big picture. Even Albuquerque, where the police department is in line with the most progressive practices on monitoring violence, suffers from data gaps and faulty reporting procedures, causing PERF to recommend better checking and reporting of use of force, documenting the response and outcomes of the Crisis Intervention Team, more follow-up reports on responses and outcomes, and more integration of the various monitoring and reporting mechanisms in place.
One way to achieve these goals is to bring all the data on police use of force together, rather than just storing a count of incidents in a database. Ideally, a use of force database would include data from police departments across the country, like the FBI’s national finger print database IAFIS or its Combined DNA Indexing System CODIS, which would allow towns, cities, counties, and states to learn from each other, share best practices, and improve reporting and monitoring nationwide. Even for one city like Albuquerque, however, this would be a Big Data challenge.
Consider all the information that can be pooled to provide a more complete view of use of force incidents. On top of a description the the actual incident, it would be beneficial to have dispatcher data to see what the officer knew going in and how he or she was dispatched. Of course, the police report would be included, as well as any other reports and findings around the use of force, which for a shooting, fatal or not, would include the findings of a Multi-Jurisdictional Investigative team, a Homicide Unit investigation, an Internal Affairs Investigation, interviews with psychologists, and reviews by the Grand Jury and the city’s Independent Review Officer. In an effort to see the citizen’s and suspect’s perspective, complaints would be included in the database, as well as any relevant newspaper articles. To get a better understanding of the officers, their records, commendations, citations, and training would be included. Aside from this mass of text, a whole host of audio-visual recordings would be used to objectively chronicle the incident, including the audio from the officer’s gun belt, the video from his or her uniform, and any relevant footage from a dash-cam or video surveillance camera in the area.
Storing and analyzing this information would be no small feat. The data is complex and unstructured, stored in different formats from text to indexed reports to audio and video. If other police departments were to be included, even more formats would arise. The data is also massive. To give an idea of the scale, APD alone produces about 9,000 police reports a year for 45,000 calls for service and makes about 1,200 felony arrests. Nationwide, fewer than 2% of interactions result in use of force, but assuming 2% for the 45,000 calls for service results in 900 use of force incidents every year. Though daunting, this would be a challenge worth taking. In fields from business to medicine and intelligence, better and more holistic data, when analyzed, has revealed conclusions contrary to anecdotal evidence and practitioner impressions. Even in law enforcement, extensive data analysis on old problems has yielded shocking results, such as correlations between moon phase and crime, later attributed to how bright it was at night. Bringing together traditional reports with evaluations and reviews can add depth and significance, including more non-police observations and commentary adds perspective, and audio and video provide a means of verifying and the information.
Storing and analyzing this Big Data cheaply and effectively is not impossible thanks to advances in data science driven by business intelligence. Apache Hadoop, for example, can store and process massive amounts of unstructured data in its original form for pennies on the dollar as it is open source, requiring no licensing fees, and runs on clusters of commodity hardware instead of supercomputers. It also provides a platform for complex and evolving analysis with a variety of open source software.
This analysis can then automatically sift through the terabytes to petabytes of data to find trends and warning signs with much more depth than the current system that only looks at counts of certain incidents. Department-wide trends can be generated and when an instance or officer is being reviewed, relevant data can be pulled from the system. Already, a host of analytics exist to help with text, audio, and video analysis, some designed specifically for Hadoop. IBM‘s Watson, for example, used textual analysis to defeat human contestants on Jeapordy!, generating questions for the answers on written on the game board in real time by pouring through a virtual library. Other software can analyze audio recordings for emotional state and signs of aggression, and Video Content Analysis can flag suspected use of force and erratic or suspicious behavior. Products like piXserve can automatically index and search video. Machine learning tools like Mahout, developed to work with Hadoop, can pick up trends that analysts weren’t even examining. And capabilities like the full stack of Hadoop-related components in the Cloudera Distribution including Apache Hadoop (CDH) make the entire stack one of high-performing, high-availability capabilities that deliver on both analytical and management tools.
All together, a Big Data analysis approach to police records would mean more than decreasing gaps and checking accounts. Even the PERF report, which combined interviews with extensive statistical analysis, wasn’t able to pinpoint where the problem was for Albuquerque, instead suggesting 40 small changes that may together lead to possitive trends. Most are slight improvements to procedure, some as minor as proposed name changes. Overall, APD seemed to have done everything right, leaving even the experts with what the intelligence community after 9/11 termed a “failure of imagination.” When done well, however, data analysis can find trends that analysts could never imagine and search for. For example, there may be a telling correlation between an officer’s geographic location, the aggression in his or her voice, the and the outcome of the interaction. As a result, the use of Big Data storage and analysis tools like Apache Hadoop can revolutionize not only police record keeping, but, through the findings, the practice of policing itself.
- Cloud Computing for Law Enforcement (CTOvision.com)
- Automating the Extraction of Useable Knowledge From Videos and Pictures (CTOvision.com)
- Common Hadoopable Problems (CTOvision.com)
As we all know, the world is inundated with data about practically everything we do, from where we are to who we know to what we eat, and it’s an extremely exciting time to be working in a field trying to make sense of all of it. However, as I and others have pointed out, there’s a lot of effort in our discipline put toward what I feel are sort of “bourgeois” applications of data science, such as using complex machine learning algorithms and rich datasets not to enhance communication or improve the government, but instead to let people know that there’s a 5% deal on an iPad within a 1 mile radius of where they are. In my opinion, these applications bring vanishingly small incremental improvements to lives that are arguably already pretty awesome.
On the other hand there are lots of NGOs and non-profits out there doing wonderful things for the world, from rehabilitating criminals, to battling hunger, to providing clean drinking water. However, they’re increasingly finding themselves with more and more data about their practices, their clients, and their missions that they don’t have the resources or budgets to analyze. At the same time, the data /dev communities love hacking together weekend projects where we play with new datasets or build helpful scripts, but they usually just culminate in a blog post or some Twitter buzz. Wouldn’t it be rad if we could get these two sides together?