Data-Driven Government: A Case Study in U.S. Criminal Justice

How can big data be leveraged to improve government? We report back evidence from the 2017 Winter Innovation Summit on the U.S. criminal justice system.

(The 2017 Winter Innovation Summit main stage in Salt Lake City, UT, available on flickr.)

Criminal justice systems in the U.S. often result in high costs for government and poor outcomes for residents. Individuals who enter the criminal justice system, usually through a police encounter or emergency medical service (EMS) call, often suffer from mental illness, substance abuse, socioeconomic strain, and chronic health problems. While public systems track these conditions, they do so separately with little coordination. As a result, jurisdictions are unable to best allocate the right resources to those who need them and such individuals often re-enter the system and end up incarcerated as repeat offenders. There is a better way to break this cycle. County officials are now increasingly turning towards big data to measure and improve their systems linking mental health interventions with policing practices to prevent recurring incarceration.

In January, I attended the 2017 Winter Innovation Summit to dig into how county governments in the U.S. are implementing big data initiatives and to unpack existing challenges. The Summit brought together non-profit organizations, universities, and government to discuss and share innovations in social services. Attendees included University of Chicago's Data Science for Social Good Fellowship (DSSG), the Data Driven Justice Initiative (DDJ), as well as public officials from over 130 U.S. county jurisdictions.

As a computational research specialist for my team at MIT GOV/LAB, I share a parallel goal with officials tasked with data-driven social impact: analyze large, public datasets to leverage meaningful insights for questions around better governance. As such, my main question at the Summit was how can government best use data to tackle social problems like criminal justice? I found that a successful data-driven framework relies on three pillars: 1) opening silos; 2) operationalizing findings; and, 3) maintaining strong public-private networks. These practices are explained below using specific case studies.

Creating open data silos

One success story highlighted during the Summit was Johnson County, Kansas, where DSSG and DDJ partnered in 2016 to break down data silos between agencies and departments. By implementing a shared data exchange between three separate databases (Sheriff’s Department, County Mental Health Services, and EMS) 127,000 residents could be traced across the three systems. Combining records across public systems helps to understand the interactions between problem dimensions. For instance, the Johnson County data showed that residents most likely to go to jail, were those who had come into contact with County Mental Health Services or had a previous jail booking.

(Johnson County Courthouse in Kansas, Wikipedia Creative Commons)

However, police don’t just encounter residents from their own municipalities. To this point, Robert Sullivan, Criminal Justice Coordinator of Johnson County, noted the value in data sharing across county lines to inform police about individuals from outside the Johnson County municipality.

Opening up data vaults across agencies and geographies can create helpful statistics and and highlight past patterns of an individual’s history in the criminal justice system. But, how can these databases inform law enforcement officers on the ground about future encounters?

Operationalizing data findings

In the tech world, to “operationalize" means to go beyond analysis. Or, in other words, to integrate an analytical model into a real-time workflow that provides actionable insights for stakeholders. Often, this involves moving from descriptive or explanatory statistics to full-blown predictive modeling - advents in the field of machine learning have made such prediction tasks faster and more powerful than ever. Where social scientists using a randomized controlled trial (RCT) approach must manually select from an infinite pool of possible treatments, machine learning can automate this whole selection process. Similarly, findings from an RCT must be interpreted and translated into policy implications. In contrast, machine learning can seamlessly scale up from a back-tracking, exploratory tool to a forward-thinking, predictive tool.

(The machine learning pipeline - from historical data to predictive analytics. Slide from the 2016 DSSG Data Fest:

Before these tools became popular, simple additive "points" systems would be used by corrections officers to determine an individual's propensity for violence. Now, in Johnson County a machine learning classification predicts if current incarcerated individuals will re-offend. Instead of the one-size-fits-all system, models are trained on highly specific variables across vulnerable subpopulations (e.g. juveniles, the mentally ill, low income) to predict an individual’s tendency towards violence. Already, the Johnson County platform has predicted the jail bookings of over 104 individuals, identifying a savings of over $250,000 for the county - money that can be reinvested into early intervention.

Developing public-private networks

Finally, the structural key to any data-driven social impact is to develop valuable private-public networks. A network of stakeholders from different sectors and geographies creates value in two ways: it expedites idea exchanges between different areas of knowledge (e.g. the tech and policy worlds) and it mobilizes different forms of capital towards a unified goal.

In the context of criminal justice, Bexar County, Texas provides an exemplary model for both knowledge exchange and mobilization of capital between public and private groups. Raising nearly $6 million from pharmaceutical companies, $1 million in increased tax revenue, a boost in Medicaid money, and $20 million from local philanthropy, Bexar County built highly effective crisis centers combining treatment, social services, and housing. Building the crisis centers, an effort involving private investment, political will, and clinical research, has diverted more than 100,000 people from jail and emergency rooms to treatment.

(Congressman John Delaney (D-MD 6th district) outlines the importance of mobilizing private capital for public good. Flickr: [

Summing up the summit

While big data no doubt creates tremendous value for improved governance, the collaboration between civil society, private finance, researchers, and government is necessary for deploying the systems, teams, and people to use it.

The big data challenge for government is unique from the large-scale data science at traditional tech companies (e.g., Google and Facebook). Data sits in its silo of origin, not in a central warehouse, so for results to service vulnerable or marginalized subpopulations in real-time, practitioners must create value by spanning geographies and disciplines. Correspondingly, a big data approach centered around accessibility, forward-thinking operationalization, and cross-sector collaboration, will be equipped to tackle social problems, like criminal justice.

At GOV/LAB, we are working to apply machine learning to political challenges like online government transparency. To learn more, attend of one of our Data Science to Solve Social Problems Seminars, shoot me an email (sbarari(at), or check out past events online.