Abstract
“[I]t is better that ten guilty persons escape than that one innocent suffer.” 4 William Blackstone, Commentaries *358.
“[I]t is better that ten innocent men suffer than that one guilty man escape.” Otto von Bismarck, Germany’s first chancellor, quoted in John W. Wade, Uniform Comparative Fault Act, 14 Forum 379, 385 (1979).
Punishing the innocent is considered an “error” that the legal system must minimize. In reality, it is a choice. When evidence points to one of a few suspects as the victim’s assailant, society must decide whether to punish all, some, or none. Despite the stated distaste, in many cases lawmakers, regulators, and enforcers elect to sanction a large group of innocent actors. Mass arrests, sobriety checkpoints, profiling, even increasing the number of searches and seizures, and using excessive police force—are all forms of collective punishments. They allow the targeting and sanctioning of an entire population in the name of finding the culprit. The decision is often masked by confusing terms but its effect is the same. The freedoms of many are sacrificed to punish the one. These methods have been widely criticized, but what can economics say on the subject? Are these methods even effective? Under what conditions?
The standard economic model of crime has not offered a clear answer to these questions because it typically focuses on the apprehension and punishment of a single actor—the offender. It therefore does not consider the possibility of punishing a group of innocent actors as a policy tool.
This Article seeks to fill this gap. First, in deviation from previous models, it develops a realistic framework in which the enforcer can choose not only the probability of detection and the nature and severity of the punishment but also the size of the “punishment group.” Our model then analyzes the welfare implications of collective punishment and the menu of choices available to enforcers and regulators. To be clear, our intention is not to argue for group punishment as an enforcement strategy. Rather, our goal is to explain the logic of collective liability regimes and unravel the way detection of offenders is currently carried out. Our analysis also shows that in many cases enforcers applying collective sanctions are motivated by self-interest and bigotry. Concerningly, even when the enforcer’s goal is to enhance deterrence, the cost on members of the punishment group is discounted or ignored, resulting in a substantial loss of freedoms and welfare. The silver lining is that many of these concerns can be addressed by appropriate reforms.
Introduction
In 1999, a female student was brutally raped in Grand Rapids, Michigan. DNA evidence pointed to identical twin brothers: Jerome and Tyrone Cooper. It was clear that one of them committed the heinous crime, but it was impossible to determine conclusively which one was the assailant. Both twins had prior records of sexual assault and both denied any involvement. The strategy proved successful. Tyrone and Jerome are still walking free. And they are not the only ones.
Whodunnit twin cases may be rare, but the problem—separating the one wolf from the sheep—is not. Shaken baby cases provide another example. In these cases, it is often clear that either the father or mother abused the injured child. Despite the fact that the child remains in danger, criminal proceedings are unlikely to take place. As one prosecutor explained, “when you have a child who cannot testify and you cannot tell who caused the injury, we do not charge. We cannot.”
In other cases, prosecutors take the opposite stand. To target a single assailant, they indict a large group of actors. A recent example is the January 20, 2017 protests that erupted during President Trump’s inauguration. A group of more than 200 protestors—including innocent activists, members of the media, as well as medical and legal observers—was arrested for felony rioting and conspiracy. In these so-called “J20” cases, prosecutors “argued for a far-reaching notion of liability.” They “sought to hold dozens of people criminally responsible for crimes—like smashing the windows of several storefronts—that only a handful of them had carried out.”
In the examples above, the dilemma is explicit. Prosecutors must decide whether to charge a group of actors for crimes committed by just one or a subset of its members. In other cases, the choice is more implicit. Screening, profiling, increasing the number of searches and seizures, and using excessive police force are all group punishment tools that allow the targeting and sanctioning of an entire population in the name of punishing the culprit. In these cases the sanction is often informal. It does not take the traditional form of imprisonment or fines. Still, the sanction imposes a real cost on the innocents. It denies basic freedoms (as in the case of unjustified detainment, searches, and seizures), and imposes substantial physical, emotional, and pecuniary harms on those whose sole “crime” was being at the wrong place at the wrong time.
In the examples above, members of the group do not act in unison, nor do individuals aid, abet, or support the offender. In fact, they may not even know each other or that a crime has been committed. These individuals are subject to a collective punishment simply because they were in the wrong place at the wrong time or because the enforcer believes, based on some other indicators, that the true offender is one of them. In all of these cases the same questions arise: How far should society go to punish criminals? When a crime is committed by an unidentified member of a group, should criminal liability be imposed on all (as in the case of the rioters), none (as in the twin cases), or some (as with profiling)? As the examples illustrate, the question is by no means theoretical. Legislators, courts, prosecutors, and regulators face this dilemma often.
To date, the economic literature has failed to provide clear answers to these pressing questions. The standard economic model of crime, beginning with Becker, focuses on apprehending and punishing a criminal by way of imposing individual liability. It theorizes that an actor will commit a crime so long as the expected benefits from committing that crime outweigh the expected costs therefrom. This expected cost—or the “price” of committing a crime—can be denoted by ps, where p is the probability of apprehension and conviction, and s is the sanction (e.g., a fine or imprisonment). Importantly, innocent actors are never punished in Becker’s model. “[O]nly convicted offenders are punished . . . .” Thus, in this model, “there are only Type I errors, i.e., the guilty criminal might escape,” but there are no Type-II errors, i.e., there are no false convictions. In other words, only the criminal may be caught and punished.
Others, like Polinsky and Shavell, have extended the model to include the possibility of false convictions. But these models, with rare exceptions, have not even considered the possibility of sanctioning a group that clearly includes innocent actors in order to punish the culprit.
Surprisingly, only a few articles have analyzed collective liability as a policy tool. These articles extoll the ability of collective liability to encourage group members to monitor each other and thus ex ante deter actors from engaging in harmful behaviors and ex post extract information that would identify the culprit. This scant literature is largely informal, relies on eclectic examples (mostly historical), and primarily focuses on civil sanctions. Moreover, this literature focuses on the benefits of collective liability but, with a few exceptions, ignores the cost and countereffects that come with holding innocent actors criminally liable. Importantly, most of this literature is merely descriptive. It therefore leaves open the most pressing questions regarding collective liability’s utility in criminal law with respect to the most relevant choice variables: how many people should be subject to criminal liability, what should the collective sanction be, and how severe should it be? Consequently, the literature provides policy makers with little guidance as to the effects and roles of collective punishments.
This Article takes the first steps in filling this gap. To be clear, our intention is not to argue for group punishment as an enforcement strategy. The idea of group punishment would be considered contrary to any sensible approach to law enforcement, and indeed is seen as morally objectionable to many. Rather, our goal is to demonstrate that understanding the effects of group punishment can shed light on the way that detection of offenders is actually carried out, as illuminated by the above examples.
To motivate our approach, we note that in a world of imperfect detection, criminals seek to avoid apprehension by “melting” into the crowd. The task of law enforcers is to narrow the set of suspects down, ideally to a single individual. However, one can rarely be certain of that person’s guilt, which is the reason for the various procedural safeguards of defendants’ rights in criminal prosecutions. The willingness to punish a group of potential offenders can increase the chances of punishing the true perpetrator, if that is the primary goal of law enforcement. Indeed, as the above examples show, in some cases it may be the only way to ensure punishment of the actual offender. Our extension of the economic model of crime explicitly allows that possibility and asks how a group sanction affects the prescriptions of the model based on the goal of achieving optimal deterrence.
We carry out our analysis within the context of the Becker-Polinsky-Shavell (BPS) framework. This allows us to highlight the differences between our conclusions and those obtained from the standard approach.
Our key findings can be summarized as follows. First, in deviation from the prior literature, we show that collective criminal sanctions—although undertheorized and understudied—are commonplace. We show that these sanctions are often implicit, which may explain why they have been neglected by the literature. Revealing the true nature of sanctions allows us to make another important contribution. We provide the first economic account for Othering—a term that encompasses different forms of group-based exclusion and has been called “[t]he problem of the twenty-first century.” Specifically, we show how by discounting and ignoring the impact of sanctions on minority groups, the law further marginalizes these groups and exacerbates inequalities.
Second, while the prior literature treats the imposition of sanctions on innocent people as an “error” that must be minimized, this Article shows that in reality it is a choice. Moreover, this Article reveals that regulators, enforcers, and courts regularly approve the imposition of collective sanctions. They do so even in cases where the vast majority (99%) of those subject to the collective sanction are believed to be innocent. We further show that many of these collective sanctions are legal, whereas others occur in the shadow or in contravention of the law.
Third, this Article explains, for the first time, the factors that decisionmakers are considering in making their choices and the welfare implications of these choices. Our analysis shows that when punishment is perceived as costless (as in the case of fines) and wrongful punishment is ignored (even when it should not be), optimal enforcement from a deterrence standpoint involves random punishment in the sense that no effort is expended to identify the true offender. Rather, in these cases a sanction is imposed on a randomly chosen individual or on all members of a randomly chosen subset of the population of which the offender is known to belong. The punishment group may even include the entire population. Both individual and group punishment can therefore be effective as long as the sanction (e.g., a fine) is appropriately scaled. However, when the sanction is bounded (e.g., because members of the punishment group do not have the means to pay the fine), enforcers employ some amount of group punishment to maintain deterrence. In this sense, group punishment mitigates the judgment-proof problem, but can result in a different and more concerning problem: Othering.
When punishment is costly to impose (e.g., because it consists of imprisonment), it is optimal for enforcers to invest in efforts to identify the true offender. The enforcer will keep eliminating, at a cost, the least likely suspects and, at the same time, increase the sanction (to ensure optimal deterrence). This process could theoretically continue until a single individual is punished. However, we show that even in these cases, when the sanction reaches the maximum accepted level first (that is, before the culprit is identified), the punishment group will necessarily include innocent individuals. Thus, group punishment is again an enticing option for enforcers, and it becomes more likely as the maximal sanction gets smaller. However, individual punishment combined with a less-than-maximal sanction may also be optimal. This result is in contrast to the BPS model, where the optimal sanction is maximal even when it is prison, and in this sense our result is more consistent with actual practice.
The rest of the paper is organized as follows. Part I reviews the prior literature and explains its shortcomings. Part II focuses on the theory of optimal deterrence. It identifies and analyzes the menu of choices available to enforcers and regulators who seek to impose collective liability regimes. In deviation from the prior literature, but consistent with practice, we treat (1) the size of the punishment group, (2) the type of sanction, and (3) the severity of the sanction—as choices. This Part also provides simple examples to illustrate the implications and welfare effects of these choices. Part II further shows that collective sanctions can be informal and investigates their efficacy—a consideration that has not escaped the courts.
Moving from theory to practice, Part III discusses current collective liability regimes in light of the analysis and how they can be abused. We identify here how a narrow focus on resource efficiency would lead law enforcers to punish groups, rather than conducting full investigations. We use this insight to recast common punitive police practices and paint them as attempts to preserve local resources at the expense of victim rights. Rather than animus alone, our explanation grounds these practices in the goal of maximizing deterrence—and thus opens them to a host of novel interventions in wrongful police practices. These include police conducting raids on a group of individuals on a belief that some possess drugs, targeting all members of a racially defined group (e.g., those with Mexican appearance) in the hope of catching a few culprits, setting checkpoints where all drivers are stopped while the police look for offenders, and detaining every black person in town because of an attack allegedly committed by one of them.
Our goal is to unravel and explain the logic of these collective liability regimes—all of which have been found to be legal—and to shed light on the way detection of offenders is currently carried out. Moreover, our analysis shows that in many cases enforcers applying collective sanctions are motivated by self-interest, bigotry, and xenophobia. Concerningly, even when the enforcer’s goal is to enhance deterrence, the cost on members of the punishment group is often ignored or discounted, resulting in a substantial loss of freedoms and welfare. The silver lining is that many of these concerns can be addressed. By uncovering the operation and effects of these sanctions, we hope to reignite and reinvigorate the debate regarding collective sanctions.
I. The Prior Literature
A. The Standard (BPS) Model of Crime
Under the Becker model, every criminal offense results in a benefit for the criminal, b, but imposes a harm on the victim, h, resulting in a net social loss of h−b. A major insight from the standard model is that an individual will commit crimes so long as the expected net private benefit from the crime exceeds the expected gains from other activities.
This insight implies an important role for the criminal system. To deter criminals, the law must erode the perpetrators’ expected benefits by increasing the “price” of committing such acts, ps. But this endeavor comes at a cost. Increasing the severity of the sanction, s, or the probability of detection and conviction, p, requires more investment in policing, adjudication, and punishment.
For example, choosing p=1 means that all offenders will be caught and convicted, but it also requires more expenditures on investigators, prosecutors, juries, judges, and prisons. Thus, optimal deterrence demands a balancing act. The enforcer must choose the probability, magnitude, and type of sanction (e.g., fines or prison sentences) that minimize the sum of (i) the net damage from crimes (h−b), (ii) the cost of apprehension and conviction, and (iii) the cost of punishing offenders (if any). Under the standard model, this can be done by setting p and s at the right levels.
Most importantly, in the standard model liability is individual. This model does not consider the possibility of holding a group of actors liable for the acts of one. Moreover, the model assumes that only one actor committed the crime and only the one true criminal is ever convicted and sanctioned. False convictions—that is, the convictions of innocent actors—are not even a possibility in the standard model. In other words, either the actual criminal is convicted and sanctioned⎯or no one is.
B. Mistaken Convictions and Mistaken Acquittals
Others have extended the standard model to address the possibility of false convictions. These articles, however, treat these situations as “mistakes” or “errors” that should be minimized. Only a few explicitly discuss group liability as an enforcement tool. None, however, have treated the punishment group—the subset of the population that includes the culprit and should thus be punished—as a choice variable.
Harris, for example, shows that in addition to p and s, the enforcer has a third choice variable. The enforcer can determine the degree of legal protections available to suspects. These protections include the suspects’ safeguards against indiscriminate stops, searches, and arrests. The enforcer can increase these protections and thus lower the probability of punishment, p, if false convictions become a concern; or it can relax these safeguards (e.g., make it easier to arrest suspects) and increase p if the level of crime increases.
Others have analyzed the effect of the burden of proof and the tradeoff between Type-I (false acquittals) and Type-II (false convictions) errors on optimal policy. Polinsky and Shavell, for example, analyzed mistaken convictions and acquittals as extensions to Becker’s model. They concluded that both “errors” lead to underdeterrence and may be offset by increasing p or s. By contrast, our model shows that false convictions may be, under certain circumstances, part of an effective, and even optimal, deterrence policy (when deterrence is the enforcer’s primary goal). In the case of collective responsibility, mistaken convictions are not “errors.” As we show below, they are the result of choices that enforcers and courts are currently making in their battle against crime.
Persson and Siven developed a model in which criminal procedure (e.g., the threshold evidentiary level necessary for conviction) is endogenous. In this model, actors are either workers or criminals and interact in groups of three. One of the workers is the victim, and each of the two remaining individuals—the suspects—claim innocence. The enforcer weighs the evidence against each of the suspects and attempts to identify the culprit. Although the authors note that society may prefer to convict both actors rather than allow the criminal to escape responsibility, they do not develop this point further. In other words, they do not discuss the option of imposing collective liability as an enforcement tool. Rather, they treat collective liability as a possible consequence. Moreover, in their model, the group size is fixed. By contrast, in our model, the size of the “punishment group”—the number of actors that will be subject to the formal (e.g., fine) or informal (e.g., mass arrests) punishment—is a choice variable.
C. Group Liability
A third strand of the prior literature directly discusses the possibility of imposing group liability. Surprisingly, however, the economic and legal literature on collective responsibility is sparse and mainly informal, often relying on historical and eclectic examples.
The modern literature on collective liability regimes begins with Posner’s observation that primitive societies relied solely on tort law to control crime, and this liability was often collective. One reason was the lack of privacy. In environments without “separate rooms, doors, [and] opportunities for solitude or anonymity” where “everyone kn[ew] everything about everyone else,” collective liability was a tool to harness information about the culprit (i.e., to increase p). The threat of paying for the wrongful acts of another enlisted those subject to group liability as informers, and it encouraged them to turn in the assailant. It also incentivized the group to control those who were likely to commit crimes but lacked the means to pay for their actions. Collective liability also served another function. It allowed primitive societies “to set fines [s, in the BPS model] that exceed the individual’s ability to pay.”
Unlike Posner, Parisi and Dari-Mattiacci provide a formal model to explain the transition from communal to individual liability in tribal societies. In their model, the individual incentives to provide security and reduce crime are tied to the tribe’s initial wealth. These incentives diminish as the group gets larger and wealthier due to free riding and less cooperation. Both the Posner and the Parisi and Dari-Mattiacci models hypothesize that collective liability is efficient when imposed on a small group, such as the assailant’s kinsmen. However, they do not discuss the cost of false convictions. Nor do they derive optimality conditions.
Building on Posner’s insights, others extol the ability of collective liability to (i) ex ante increase deterrence by encouraging group members to monitor each other and take preventative measures, and (ii) ex post extract information that would identify the culprit. Varian, for example, used a principal-agent model to explain how group liability can help parties overcome information asymmetries. Extending loans to villagers in developing countries provides a good example. Banks in these regions face a major obstacle. They cannot observe the applicant’s characteristics (e.g., their creditworthiness) or their actions (e.g., performance). As a result, banks cannot extend credit. But applicants (who are familiar with each other and each other’s business) are or can be privy to such information. One solution, famously adopted by the Grameen Bank, is to allow villagers to organize in small groups and apply for loans together. Each applicant is responsible for his own loan and also serves as a co-guarantor for the other applicants’ loans. If one applicant fails to make payments, the other group members must foot the bill. In microfinancing, communal accountability allows the bank to enlist applicants as its agents in order to harness private information.
Varian’s model, however, is different from ours in a number of important ways. First, it focuses on a cooperative environment where the principal and agents’ interests are aligned. His model also assigns the task of forming the group to the agents themselves. The villagers, not the bank, assemble their groups. By contrast, in our model, the enforcer must choose the group size, the type and severity of the sanction, and the probability of detection. Finally, Varian’s model applies to civil cases, which are fundamentally different in nature than the criminal system—the focus of our paper.
Unlike Varian, other scholars have focused on cases where an unknown actor harms another. As noted, these articles are for the most part informal, and with a few exceptions, they focus on the benefits of collective liability while ignoring or merely mentioning the costs of such regimes. Importantly, many of these articles focus on civil cases or lump them together with criminal cases, as their focus is mainly on the information-forcing feature of collective liability, which we will ignore.
There are, however, striking differences between civil and criminal liability regimes. One major difference is the presence of dilution. In a civil case, liability cannot exceed total harm, and so the expected liability of each actor is a function of the number of actors in the group. In a group of two, each actor can expect to pay half of the damage to the victim; in a group of three, each can expect to pay one third; and so on. By contrast, in a criminal case, each actor is subject to a sanction that cannot be shared by others. A second difference is in the nature and severity of the sanction. In tort law, the sanction is mainly compensation, and it is often limited by both law (e.g., in the form of statutory caps) and the resources available to the injurer. Criminal law does not suffer from the same limitations. Criminal sanctions range in nature and severity. And prison sentences will likely deter insolvent injurers when a compensatory verdict would not. A third difference is the administrative cost of the systems. A tort is a private action. The costs of investigating, litigating, and collecting are imposed (although not completely) on the plaintiffs. These costs are substantially higher in the criminal system and are imposed on all.
Articles that focus on criminal law are also limited in scope. These articles tend to focus on cases where all members of the group participated in a crime, as is the case under the Pinkerton rule and felony murder. The Pinkerton rule holds liable every participant in a criminal conspiracy (e.g., drug trafficking) for any crime committed by others in the course of conspiracy. Similarly, with the aid of the felony murder doctrine, a getaway driver can be held liable for murders committed by her colleagues in a robbery gone wrong. These collective punishment theories are different than the ones we focus on. First, in all of these cases the group is well defined: it comprises those who agreed to partake in the initial crime (e.g., drug trafficking or robbery). Moreover, in these cases all members of the groups are culpable. By contrast, we investigate cases where the enforcer must determine whether and to what extent liability should be imposed on a group of actors that includes completely innocent individuals.
The papers closest to the current one are by Miceli and Segerson, and Dillbary. Miceli and Segerson developed a formal model of group versus individual punishment within the context of the standard economic model of crime. Their focus was on demonstrating the circumstances in which one or the other approach is preferred, depending on the objective of the enforcer (specifically, deterrence versus retribution) and the technology of detection. The analysis, however, treated the group size as a parameter rather than a choice variable. As a result, it did not derive the optimal enforcement strategy in the most general sense, which is the objective of the current paper. Focusing primarily on the cost-side, Dillbary showed that collective liability can erode actors’ incentive to take care, incentivize those who can identify the true injurer to lie or collude with others to lie, and even engage in harmful activities. Dillbary focused, however, on civil cases where liability is more easily diluted. Moreover, like Miceli and Segerson, he treated the group size as a parameter and not as a choice variable.
II. The Model
A. Motivation
As noted, a key choice variable of the enforcement authority in the BPS model is p, the probability of apprehension. This probability determines both the level of deterrence and the expected cost of punishment. Despite its immense importance, however, the meaning of this variable remains unclear. Becker defines it as the “probability that an offense is cleared by conviction.” Polinsky and Shavell refer to it as the “probability of detection.” In other words, according to the BPS model p is the probability that the actual perpetrator of a criminal offense is caught and punished. What is less clear is how we should interpret the complementary probability, 1−p. Is it the probability that no one is caught and punished, or that someone other than the actual offender is caught and punished? In terms of the impact on deterrence, this distinction actually does not matter because the crime rate only depends on how p influences the decisions of rational offenders. This explains why no attention has been paid to the meaning of 1−p.
However, the interpretation of p does matter with respect to the cost of imposing punishment. The usual formulation is to write the expected cost of punishment per crime as pβs, where s is the severity of the sanction (measured in terms of the cost to the offender) and β is the unit cost to society of imposing it, assuming that β>1. It follows that 1−p is the probability that no one is punished (i.e., that no cost is incurred). Under this approach, enforcement efforts are directed toward detecting the true offender. Only when he or she is identified and convicted does a punishment ensue. In other words, either the true offender is apprehended and punished (with probability p), or no one is apprehended (with probability 1–p). The notations in the standard model are summarized below:
b = the benefits to the offender from committing the crime
h = the harm to the victim
e = the expenditure on enforcement (identifying the offender)
s = the severity of sanction
c = the unit cost to society of administrating the sanction
β = (1+c) = the unit cost to society for imposing the sanction
p(e) = the probability of convicting the one true offender
1−p = the probability that no one is punished
The enforcer’s task: choose e and s to maximize social welfare
Now consider a different interpretation of 1−p. Suppose that a crime has been committed and the offender is known with certainty to be a member of a group of size N>1, which could be very large. Detection generally involves winnowing this group down by gradually eliminating the least likely suspects, ideally until only one remains. In this context, we define p to be the probability that this person—the last one remaining—is the true offender. But unlike BPS, we define 1−p as the probability that he is not. Under this interpretation, exactly one person is apprehended and punished, but it is not possible to know with certainty whether that is the right person.
To see the differences between the two definitions, consider a case in which an unknown individual smashed a storefront window with a large object. Under both BPS and our model, p=0.9 means that there is a 90% chance that the true offender will be identified and punished. The models diverge, however, on the interpretation of (1−p). Under BPS, there is a 10% chance that no one will be punished. By contrast, under our model someone is going to be punished, and there is a 10% chance that the punished person will be innocent.
From a pure deterrence perspective, the implications of this formulation are identical to those arising from the BPS model. Under both models—ours and BPS’s—the would-be offender faces the same expected punishment, ps. For example, if the sanction for rioting is a ten-year sentence, under both models the expected punishment is nine years (1090%). From a broader policy perspective, however, the two approaches may generate very different prescriptions, as the subsequent analysis will show.
B. The Basic Setup
As a generalization of our alternative perspective, let us define the punishment group, n, to be a subset of the full population N that will be subject to collective liability, where N is the smallest population known to contain the offender with certainty (n . We define p(n) to be the probability that the true offender is a member of this subset. For example, in the protest hypothetical, N=200 may include everyone who attended the protest (including police officers and bystanders). However, the size of the punishment group can be reduced to a much smaller group, for example, n=10, consisting only of those protestors who were in close proximity to the store when it was damaged and thus were physically able to throw the object. By definition, therefore, p(N)=1 (i.e., it is certain that the offender is someone who is one of the 200 individuals that attended the protest). We also assume that p is increasing in n (p’>0), meaning that the larger is the punishment group, n, the more likely it is that the offender will be included in it. If the punishment group is reduced to a single member, then p(1)<1 is the probability that the sole remaining member is the true offender.
One way to reduce the size of the original group is by random exclusion. Those who remain are then subject to a sanction. With n actors in the punishment group, the probability that the true culprit is punished is thus p(n)=n/N. If the punishment group consists of one individual, then p(1)=1/N. For example, suppose that the true offender is one of ten suspects. If liability is imposed randomly on eight of the suspects, the probability that the true offender is among the punished is 8/10 or 80%. In such a case, at least seven innocent individuals will be punished. By contrast, if only one member of the group is punished, the probability that that individual is the culprit drops to 10% (1/10). Random punishment, or punishment by lottery, although not unheard of, is not the way criminal justice should be administered.
Suppose, therefore, that the enforcement authority proceeds more deliberately by sequentially eliminating the least likely suspects, as described above. This presumably reflects the way that detection actually happens, but it is a costly process. To capture this cost, we define p(e,n) to be the probability that the true offender is a member of a subset of size n when an expenditure of e is made by the enforcer to choose that subset deliberately. For example, in the protest hypothetical, e includes the cost of investigating and determining who was close enough to the store when it was damaged. With a larger investment, the enforcer could be able to reduce that group even further. For example, witness testimonies or video feeds could identify the injurer’s gender and age range, thereby reducing the punishment group from N=200 to n=5. We assume that the more the enforcer invests in detection, the greater is the chance that the true offender will be in the designated subgroup. We also assume that p is increasing at a decreasing rate in both e and n. This simply means that the most effective detection efforts are used first, and the least likely suspects are excluded first. Further, absent any investment, p(0,n)=n/N for all n≥1. Thus, random punishment is a special case when no effort is devoted to detection. Finally, in the extreme cases, when the punishment group includes the entire population, n=N, it is certain the offender is a member of the group, p(e,N)=1 for all e≥0, and when no one is held accountable, n=0, then p(e,0)=0.
In this formulation, n, the size of the punishment group, is a choice variable, along with e, the cost of deliberately choosing the punishment group, and s, the sanction imposed on each member of the group. In the standard BPS model, by contrast, n is constrained to equal one, and absent legal error, that one person, who is detected with probability p, is assumed to be the true offender. In our formulation (as in real life), such certainty is not possible unless the entire population of N is punished. Erroneous punishment is therefore inherent to this alternative interpretation, though greater expenditure of effort increases the likelihood that the offender is in the targeted subgroup. In other words, the BPS model does not emerge as a special case of the above formulation by simply imposing the constraint that n=1. The notations of our model can be summarized as follows:
N = the smallest population that includes the offender
n = the size of the punishment group (0<n<N)
e = the enforcer’s cost of choosing the punishment group
s = the severity of the sanction
p(e,n) = the probability the offender is in the punishment group
1−p = the probability the offender is not in the punishment group
The enforcer’s task: choose e, s, and n to maximize social welfare
When only one individual is apprehended (n=1):
p(e,1) = the probability that the chosen party is the true offender
1−p = the probability that the chosen party is not the offender
Given the preceding setup, the following Sections derive the optimal enforcement policy in the context of this model and compare it to those from the BPS model. The objective is taken to be the same as in the BPS model—namely, maximization of social welfare (when deterrence is the primary goal), which consists of the net gain to offenders less the cost of enforcement. Our inclusion of offenders’ gains in welfare is consistent with the law enforcement literature since Becker. It also reflects the possible applicability of our approach to other forms of externalities, including certain tortious injuries. We emphasize the intuition for the results in the text.
C. Costless Punishment
We first consider sanctions that are costless to impose. This is often the case with monetary sanctions which, unlike imprisonment, are considered costless. We recognize that the sanction cannot exceed a maximal level, . In the context of a fine, is the largest amount the least wealthy suspect in the punishment group can pay. Examples of monetary sanctions include traffic fines or restitution orders for pecuniary losses resulting from the crime. An example of a nonmonetary yet costless-to-impose sanction is the revocation of a license. For now, we do not consider the cost of wrongfully holding innocent actors liable (i.e., false positives)—an issue we address later.
Following the traditional BPS formulation, an offender will commit a prohibited act if the benefit from doing so exceeds the expected punishment. This is the rational offender assumption, which underlies all economic models of crime. As noted, social welfare consists of the sum of the net benefits from the offense minus enforcement costs. The enforcer’s problem is to choose e, s, and n to maximize welfare—that is, achieve optimal deterrence—subject to the constraints that s≤ and n≤N. The solution to this problem is described as follows:
Proposition 1: Assume it is optimal to punish someone (i.e., n*>0). Then (i) the optimal policy involves no expenditure on identifying the culprit (e=0); (ii) if the maximal sanction can be imposed on each member of the group (≥h), the first-best outcome can be attained. In such a case, the size of the punishment group, n, can be varied so long as the severity of the sanction is modified to account for the group size, such that ns=Nh (subject to s and n≤N); (iii) if the maximal sanction that can be imposed is lower than the harm caused (<h), the first-best outcome is not attainable. In such a case, the optimal policy is to subject each member of the group, N, to the maximal sanction (s= and n=N).
To illustrate part (i), reconsider the protest hypothetical. Suppose that rioting resulted in property damage of h=$500, and the sanction (restitution) would be of the same amount (i.e., s=$500). With a large enough expenditure on personnel and detection technologies (such as drones), the police might be able to identify the actual offender (n=1). In such a case, that offender would pay a fine of $500 with certainty. Alternatively, instead of the large investment in detecting the actual culprit, the enforcement authority can simply collect s=$500 from every protester who was known to be present (n=N). Under this course of action, the expected liability, and thus deterrence, remains the same because each protestor can still expect to pay s. The difference is that now many innocent actors would also be fined. The point is that by increasing the punishment group from n=1 to n=N, the enforcer is able to keep deterrence at the same level while reducing enforcement cost to e=0 and keeping the sanction, s, constant at h. Of course, if the police know the identity of a smaller subset of the 200 individuals who attended the protest (e.g., because submitting a list of attendees was a licensing requirement), but just cannot identify the rioter, the punished group can be set at that number (1<n<N). This is a case where e=0 and n<N. (In effect, though, that smaller subset is the population.)
To illustrate part (ii), note that if each of the N=200 protesters is wealthy enough, the enforcer can impose a sanction of $500 (200500/200) on each and every one of the N=200 protestors. At the other extreme, the enforcer can collect from one (n=1) randomly chosen protestor a sum of s=$100,000 (s=Nh/n=200500/1). This case, however, requires $100,000 (i.e., the randomly chosen individual must be able to pay the high fine)—a much stricter requirement. The enforcer can also impose a sanction on any intermediate number of protestors by appropriately scaling the fine. For example, the enforcer can collect from each of n=50 randomly chosen protestors a sum of $2,000 (200500/50), which requires that ≥$2,000.
In all of these cases, deterrence is optimal and identical. In the first case, when everyone is sanctioned, each protestor faces a sanction of $500 with certainty. In the second case, when only one is randomly sanctioned, each protestor (including the offender) can expect to pay $500—i.e., there is a 1 in 200 chance of paying $100,000. And the same is true for the third case, where each protestor has a 1 in 50 chance of paying $2,000. In all cases, however, must be large enough to allow the required fine.
To illustrate part (iii), suppose the maximal fine that the least wealthy member of the group can pay is =$400. If all protesters are fined at the maximal amount (i.e., n=N), each can expect to pay only p(0,N)=1$400=$400, which will result in underdeterrence (400<500).
Figure 1: Optimal Combinations of n and s (iso-deterrence, e=0)

Figure 1 depicts the possible outcomes of the optimal policy. The negatively sloped convex curve depicts the combinations of the sanction, s, and the punishment group size, n, that maintain a constant expected sanction. Thus, with e=0, any point along this iso-welfare locus achieves optimal (first-best) deterrence. The two extreme points represent pure group responsibility (n=N) and random individual punishment (n=1). With no constraint on s, either end point is feasible, and, from a deterrence standpoint, there is no basis for choosing one or the other, or any point in between. Under pure group punishment, n=N and sN=h; that is, everyone in the population is fined an amount equal to the harm from the offender’s act. This would optimally deter the would-be offender because, with n=N, p=1, so the offender can expect to be fully punished with certainty (as will everyone else in the relevant population).
Under pure individual punishment, by contrast, n=1 and s1=Nh; that is, a single, randomly chosen member of the population is assessed a fine equal to the harm multiplied by the size of the population. Although this solution is reminiscent of the standard Becker prescription of a high fine coupled with a low probability of apprehension (albeit taken to the extreme), the crucial difference is that here, the person being punished is randomly chosen, and is likely innocent as p=1/N. However, would-be offenders are still optimally deterred because the expected sanction for every member of the population is equal to the harm, h. As Part IV below explains, courts rely on this probability to determine the legality of random sanctions. In a series of decisions, the Supreme Court held that a probability equal to or above p=0.0024 meets the legal standard. In other words, the police can design a collective sanction such that the number of those innocent sanctioned is 99.9976%—a very low standard.
Now suppose the cap on s is such that each member of the punishment group can pay the property damage, h, but has limited resources such that h<<Nh. In that case, the set of efficient points shrinks to the darkened segment of the iso-welfare locus in Figure 1. Since n=1 is no longer in the feasible set, some amount of group punishment is necessary to achieve the first-best level of deterrence, with the smallest such group being n=. In this sense, group punishment works to correct for a judgment-proof problem on the part of offenders. This is the case where some protestors cannot pay a $100,000 fine, so instead a $2,000 fine must be imposed on fifty randomly chosen protestors.
Finally, if some members of the punishment group cannot pay even the property damage (<h,), then punishing the entire population (n=N) is optimal from a deterrence standpoint, but the first-best is no longer attainable. In such a case, deterrence cannot be improved by investing more in detection because, with n=N, the offender is already being punished with certainty. Due to the low solvency level, the optimum will be on a lower iso-deterrence line than that shown in Figure 1.
The preceding analysis shows that when maximizing deterrence is the sole criterion for enacting an optimal enforcement policy and when punishment is costless, not only is group punishment compatible with maximum social welfare, it may also be necessary to some extent if the fine is constrained.
More strikingly, an enforcement policy that only seeks to maximize deterrence never includes costly effort to identify the true offender—in essence, punishment should always be random (and at times in fact is—as Part IV below demonstrates). This is true because optimal deterrence can be achieved at no cost to the enforcer by randomly punishing any subset of the population in which the offender is known to reside, with the fine being inversely scaled to the size of the “punishment group.” When the sanction is bounded, deterrence is maintained by appropriately increasing the size of the punishment group, possibly up to the population as a whole. This insight carries a warning: enforcers interested in deterrence (perhaps because of the incentive scheme they face) at low cost, may be more inclined to impose formal and informal sanctions on group of individuals.
To modern eyes, the preceding policy will seem unacceptably arbitrary, but it is really the logical extreme of the Becker result regarding probability scaling of punishments. Indeed, once it is accepted that convicted offenders can be punished above and beyond the level of the harm they caused as a way of saving on enforcement costs—which is one of Becker’s central insights—the prescription just derived is merely a quantitative extension of that policy.
As a final point, we consider the possibility that for some crimes, it may not be optimal to punish anyone; that is, n*=0. Indeed, in some cases, a prosecutor may decide to avoid pursuing a case. Examples include the twin rapist and the shaken baby cases that opened this Article. This will be true if welfare is decreasing in n for all n≥0. In the current case, however, because punishment is costless, it will always be optimal (from a deterrence standpoint) to punish someone for committing harmful acts.
D. Costly Punishment
We now consider the case where punishment is costly to impose, focusing initially on the administrative cost of punishment, as when it takes the form of imprisonment. (The next Section considers costs associated with wrongful punishment, whether by fine or imprisonment.)
We continue to measure s in terms of its cost to the offender, but we now suppose that there is a social cost of imposing s which is equal to βs=(1+c)s per person, where c is the unit cost of imprisonment to the state. For example, if the cost of imprisonment to the individual (e.g., loss of earnings and opportunities) is s=$60,000 and the cost to the state is 20% of that (i.e., c=0.2, β=1.2) then total social cost is $72,000 (1.260,000). All other variables are defined as above. Social welfare in this case includes the cost of imposing s, which is equal to nβs per crime, given that all n members of the punishment group are sanctioned (by definition). The problem again is to choose e, s, and n to maximize welfare subject to the constraints on s and n. In this case, we can prove the following result:
Proposition 2: Assume punishment is costly to administer and that it is optimal to punish someone. In such a case, the optimal policy in terms of deterrence involves either a maximal sanction (s=), or punishment of a single individual (n=1).
The intuition is as follows. Because punishment is costly to administer, it is no longer costless to raise s, as was true with a fine. However, any increase in s can be offset by a proportionate decrease in n so as to leave punishment costs, nβs, unchanged. Further, because of the diminishing marginal impact of n on p, the expected sanction, ps, rises as a result of this adjustment, holding e fixed (i.e., as s increases and n decreases to keep nβs constant, p also increases). In this way, deterrence can be achieved at no cost to the enforcer. It follows that these adjustments can be employed to the maximum extent possible in order to economize on enforcement costs. The limit of this strategy is reached when the constraint on s or n binds.
To illustrate, suppose the sanction for inflicting property damage during a protest is one day’s imprisonment. Assume also that a day in prison costs s=$500 to the inmate and cs=$100 to the state (i.e., c=0.2 and β=1.2). The total social cost from a day of imprisonment per inmate is thus βs=$600. Punishing the entire group of 200 suspects therefore comes at a total cost of nβs=$120,000 (200600), but it ensures that the culprit will be punished and therefore will face optimal deterrence.
To see how the state can achieve deterrence by sanctioning fewer people, suppose a deliberate elimination of suspects at a cost of e>0 results in a smaller pool of n=50 suspects. In such a case the total cost from imposing the sanction is nβs=50$600=$30,000. But suppose now that the probability that the culprit is in the remaining pool is only 50%. The expected liability each individual in the larger pool faces is thus p(e,n)s=0.5$500=$250. The enforcer can achieve the same level of deterrence by sanctioning a smaller group. For example, it can sanction only n=25 individuals. But to do so the enforcer will have to raise the sanction, s, to two days in prison or $1,000 (5002). In such a case, the total social cost of imprisonment, nβs, will remain fixed at $30,000 (251.2$1,000). Thus, punishment costs are unchanged. Further, the assumptions we made about the p(e,n) function are such that the enforcer can lower its expenditures, e, so as to maintain the same expected liability, ps=$250. In other words, the simultaneous decrease in e and n can be chosen so that p(e,n)=0.25, which means that ps=0.25$1,000=$250, thereby holding deterrence fixed. With expected punishment costs and deterrence held constant, the decrease in e must result in a higher level of welfare, which implies that the initial configuration could not have been optimal. This process can continue until either the pool is reduced to one suspect, n=1, or until s=—that is, until the sanction reaches the maximum accepted level (a life in prison being the highest bound).
It is a well-known result from the BPS model that the optimal sanction is maximal, even when it is costly to impose. Here we find that the optimal sanction may be maximal, but only when the punishment group is larger than one. The reason for the difference is the differing interpretations of p in the two models. As explained above, in the BPS model, p is defined to be the probability of apprehending and punishing the true offender, and so punishment is only imposed if and when the actual offender is found. Thus, the expected social cost of punishment for any crime would be pβs. It follows that 1−p is the probability that no one is caught and punished. By contrast, we are assuming that n≥1 individuals are punished with certainty, and that p is the probability that the true offender is a member of that group, while 1−p is the probability he is not. Consequently, scaling down p in our model (i.e., reducing e) does not reduce the expected cost of punishment as long as n and s are held fixed; it only lowers the probability of punishing the right person. In other words, lowering p does not reduce punishment costs in our model (as it does in the BPS model) because, as long as n is held constant, the same number of people is punished. This is why we had to hold ns as well as ps fixed to keep deterrence constant while lowering e.
An interesting implication of the preceding result is that punishment of a single individual may in fact be optimal, even though we are not yet explicitly accounting for the cost of wrongful punishment. When this is the case, however, the optimal sanction is not necessarily maximal, as is true in the BPS model. In other words, only one person is punished and the sanction is less than maximal. In this sense, our model actually gives a prescription that is more in line with actual policy.
As above, we conclude by considering the possibility that it may not be optimal to punish all crimes. This will be true in the current context if the harm from the act is small enough and/or if the marginal cost of punishment is large enough. Thus, in contrast to the case of costless punishment, it will not be optimal to prosecute minor crimes.
E. Wrongful Punishment Is Costly
Finally, we consider the case where there is a social cost of wrongfully punishing someone. This seems especially important in our model, given that wrongful punishment is the chief moral objection to the use of group punishment. For example, it is one of the objections that was raised early on against the prescriptions of the Becker model. For purposes of this discussion, we will focus only on monetary punishments, though the same logic would apply, a fortiori, to imprisonment.
It is worth emphasizing again that, in order to incorporate wrongful punishment into the BPS model, it would be necessary to redefine p to be the probability that someone is apprehended and punished, and then to define a new variable, q, to be the conditional probability that this person is the true offender. Thus, from the offender’s perspective, the probability of punishment would now equal pq, while the probability of a wrongful punishment would equal p(1−q). In this formulation, 1−p would be the probability that no one is apprehended and punished.
By contrast, in our model n individuals are punished with certainty and p is the probability that the true offender is in this group. Accordingly, wrongful punishment is inherent in our model. Either the true offender is in the group (with probability p) in which case there are n−1 false convictions, or she is not, in which case (with probability 1−p) there are n false convictions. The number of false convictions is thus p(n−1)+(1−p)n=n−p. Our assumptions about the p function ensure that this number is increasing in n, as we would naturally expect. For example, suppose there is a 70% chance that a punishment group with n=10 individuals includes the true offender. In such a case there is a 70% chance that there are nine false convictions and a 30% chance that there are ten false convictions. Accordingly, with p=0.7 and n=10, on average there are 9.3 (10−0.7) false convictions.
Incorporating the cost of wrongful punishment into the social welfare function yields the following result:
Proposition 3: When punishment takes the form of a fine, wrongful punishment is costly, and it is optimal (in terms of deterrence) to punish someone, the optimal policy in terms of deterrence involves either a maximal sanction s= or punishment of a single individual (n=1).
Although only wrongful punishments are costly here (as when punishment is by a fine), the welfare-enhancing strategy of lowering n and e while raising s so as to hold ps and ns constant continues to apply. Thus, as was true above, these adjustments should be used to the maximum extent allowed by the bounds on n and s before investing in e.
To illustrate, assume, as in the previous example, that initially n=50, p=0.5, s=$500, and let σ=1 be the unit cost of wrongful punishment. The expected sanction facing offenders is therefore ps=$250, and the social cost of imposing that sanction is (n−p)σs=(50−0.5)$500=$24,750. The enforcer can lower n to 25 and correspondingly lower p to 0.25 while raising the fine to $1,000. The expected fine therefore remains at $250, and the social cost of punishment remains at $24,750 ((25−0.25)$1,000). However, given the above assumptions about the p function, e can be lowered, thereby raising welfare. Finally, as was true for imprisonment, crimes that impose a low level of harm should not be punished.
Taken together, the results from the previous two Sections show that when punishment is costly, individual punishment (i.e., n<N) emerges as the optimal deterrence policy if the maximum feasible (or acceptable) sanction is not too restrictive; in other words, if the constraint that n≥1 binds first. In that case, the optimal sanction will not generally be maximal. Further, this conclusion is true whether the sanction takes the form of a fine or prison.
F. Impact of Imposing Individual Punishment as Constraint
The analysis to this point has shown that when group punishment is allowed as part of an enforcement strategy, the enforcer may be interested in making use of it. Specifically, we have seen that when punishment is by a fine and wrongful punishment is not viewed as being socially costly, group punishment (n>1) is always consistent with optimal deterrence whenever . And when punishment is costly to impose—whether administratively or because it is potentially wrongful—n>1 is optimal if the maximal sanction is sufficiently constrained, whether by offenders’ ability to pay a fine, the maximum time they can be incarcerated, or possibly by proportionality (fairness) considerations. As noted, the BPS model implicitly rules out group punishment by imposing n=1 as a side constraint. We now turn to analyze the implications of imposing this n=1 constraint in our model.
Consider first costless punishment. If we require that n=1, then the expected sanction from the offender’s perspective is p(e,1)s, which is increasing in e, as in the BPS model. It is also the case, however, that p(0,1)=1/N>0, whereas the BPS model assumes that p(0)=0. This again goes back to the differing definitions of p in the two models. In BPS, if no resources are invested in detection, no one is caught and punished. By contrast, in our model one person can always be randomly punished, and p is the probability that this is the true offender. We nevertheless saw that if , the first-best outcome from a deterrence standpoint can still be achieved by assessing a fine of Nh on that randomly chosen person. In this case, the optimal fine is not necessarily maximal, though it is scaled up in proportion to 1/N. However, it would never be desirable to increase s above Nh, even if that were feasible. For small groups, this upper bound on s would therefore not be exceedingly large.
If , the first-best is not attainable in our model with n=1 (as we saw in Figure 1), in which case the optimal fine is maximal (as in the BPS model). Moreover, it is optimal to invest in some positive level of detection effort in this case to increase deterrence, though at the optimum there will be some underdeterrence.
This solution thus mirrors that of the BPS model. From the perspective of offenders, the results in the two models look the same—that is, the probability of being caught is low and the fine is high. The implications for deterrence are therefore also the same. As we have emphasized, however, the models differ with respect to the meaning of 1−p. In the BPS model, it is the probability that no one is apprehended and punished, whereas in our model it is the probability that the wrong person is punished. In the case where punishment is assumed to be costless, however, this difference does not affect welfare at all.
Consider next the case where punishment is by imprisonment. Here, imposition of the constraint that n=1 forces an outcome in which the sanction will not generally be maximal. This result is different from the BPS model, where the optimal prison term is maximal when both s and e can be chosen optimally. The reason for the difference is that punishment is only imposed in the BPS model with probability p, the probability that the true offender is apprehended, whereas if the true offender is not detected, no one is punished. By contrast, in our model a single individual is apprehended and punished with certainty, and p is the probability that he is the guilty party. In this scenario, the optimal prison term is not generally maximal when n is constrained to be one. The same logic describes the situation where wrongful punishment is costly. In that case, the constraint that n=1 again forces an outcome in which the optimal sanction is not generally maximal.
The point of these illustrations is that the BPS model does not emerge from our model as a special case simply by constraining n to equal one. This suggests that the conclusions from that model, particularly the “impractical” prescription of a maximal sanction when punishment is costly, are to some extent artifacts of the particular interpretation of p. Further, we contend that the interpretation of p in our model is more consistent with actual practice in the sense that, when n=1, it reflects the probability that the person being punished is the true offender. In reality, there is always some residual uncertainty about the guilt of a defendant, no matter how much effort is devoted to detection.
III. The Law of Collective Punishments
At first glance, it may be hard to imagine regimes that randomly punish innocent individuals, or target an entire group of people in order to punish the culprit(s). Yet, as this Part demonstrates, such regimes are commonplace. Before we turn to a discussion of these regimes, it is worth emphasizing once again that our goal here is not normative. We do not argue that criminal liability should be imposed on a group that includes many innocent actors. Rather, our claim is that current regimes—the product of legislative, regulatory, and adjudicative efforts—do in fact take that approach. Our goal is to reveal these regimes and their operation.
We begin with what may seem to be the most theoretical portion of our model: costless sanctions. As explained above, when deterrence is the sole criterion, costless group punishment can be (1) effective (deterrence is optimal), (2) welfare maximizing, and (3) even necessary. The leading example in Section III.C was a fine. There we saw that an enforcer can achieve optimal deterrence by assessing a fine equal to h on each member of the group, a higher fine on a randomly chosen subset of that group, or a fine of Nh on one member of the group. In such cases the sanction is costless because the loss to the punished person, s, is a gain to others. However, once it is understood that a fine is but one form of costless sanction and that a sanction neither need be formal nor the result of an adjudicatory ruling, it is easy to identify many real-life examples.
Checkpoints are one example of sanctions that enforcers treat as if they were costless (although they are not). In these cases, all members of a certain group are momentarily detained. Temporary investigatory police stops are another example. Although the detainees incur a real cost (e.g., delay and humiliation), the Supreme Court, focusing on the length of the stop, has repeatedly taken the position that these costs are negligible in order to justify the legality of detainments. For this very reason, the Supreme Court has even authorized enforcement agents to target individuals by way of racial and ethnic profiling. In the name of deterrence, enforcement agents are now permitted to target a group of individuals, n>1, based on observed characteristics (e.g., race), which is perceived to impose a negligible cost in order to find the culprit.
While in some cases group punishment can be justified on efficiency grounds, in other cases it may serve for Othering. In these cases, the targeting of a group that includes innocent individuals is not aimed at enhancing deterrence. Rather, group punishment becomes a tool to impose sanctions on individuals that bigoted enforcers do not view as equal or legitimate members of society. Othering comes in many forms, including racism, xenophobia, homophobia, and antisemitism. In all of these cases the enforcer ignores the well-being of and the cost imposed on members of the targeted group. Still, as this Section shows, such sanctions may be held valid. The subjective intent of the enforcers, the Supreme Court held, is simply irrelevant.
Jails—where suspects awaiting trial are incarcerated—may, in some cases, also be treated as a form of costless sanction by enforcers. First, localities often underestimate the cost of jails. The reason is that “other government agencies bear a large share of jail costs that are not reflected in [the] jail budgets.” Another reason is a divergence of interests between the enforcement agent (e.g., the police officer) and its principal (e.g., the state). The police officer does not bear the cost of incarceration—the state, those arrested, and society do. But the officer may be interested in reducing her (and her colleagues’) own costs (time and inconvenience) that detainment entails. We refer to these arrests as “costless arrests” as the enforcer agent treats them as such. Armed with these understandings, we turn to an analysis of costless sanction cases.
A. Finding the Culprit: Separating the Wheat from the Chaff
We begin with costless arrests. In the United States, the police usually must show probable cause before they make an arrest, conduct a search, or seize property. However, whether the police can impose a collective sanction (e.g., search and seizure) on a group of individuals that is believed to include the culprit remains far from settled.
In Ybarra v. Illinois, an early case, the police received a warrant to search a tavern and an employee suspected of selling drugs. The Supreme Court had to determine whether the warrant allowed the police also to search each of the dozen other patrons that were present in the tavern. It held that the police cannot do so without a particularized and individualized suspicion that every searched individual violated the law. Simply being present at the tavern when the police had reason to believe someone brought contraband for sale is not enough.
Two decades later, in Maryland v. Pringle, the Supreme Court took a more lenient approach toward group punishment. In Pringle, a police officer stopped a speeding car in the early hours of the morning, and observed a large amount of rolled up cash and drugs. There was no indication that the three car occupants were colluding or pursuing an illegal end. Yet, in an attempt to identify the offender, the officer warned the three that if no one admitted ownership of the contraband, he would arrest all of them. When none of the car occupants offered information, the officer proceeded as promised. Later that day, Pringle admitted that he owned the drugs, that he planned to sell them at a party they were going to, and that none of the other occupants knew about them. The police released the two other occupants and pressed charges against Pringle.
The question before the court was whether an entire group can be sanctioned (i.e., arrested) when it is clear that a member of the group committed an offense. The Maryland Court of Appeals found the arrest illegal. It explained that because the crime could have been committed by any of the occupants, the police lacked individualized probable cause. The Supreme Court reversed. It held that mere presence in a car with contraband gave the officer a reasonable belief that each one of the occupants had committed the crime independently or together with others. Unlike the patrons in Ybarra, the Court reasoned that passengers in a small car “will often be engaged in a common enterprise with the driver, and have the same interest in concealing the fruits or the evidence of their wrongdoing.” The teaching of Pringle is clear: passengers in a car are subject to group punishment. If it is reasonable to believe that one passenger committed a crime, all can be subject to a severe sanction.
Pringle fits squarely within the predictions of Proposition 1 in our model. This is a case where it is optimal to punish someone (n*>0). It is clear that at least one of the car occupants (perhaps even more than one) brought drugs into the car. The police can invest in trying to winnow down the group at a cost (e>0). For example, in Pringle, the police officer could have detained the car occupants while a forensic expert checked the drugs and cash for fingerprints that would identify the true culprit. In such a case, only the true culprit would have been sanctioned s=h (i.e., arrested). Alternatively, instead of waiting for a long time in the very early hours of the morning, the costless option for the officer was to simply to impose s=h—that is, arrest each of the car occupants (n=N=3). A third option, which we discuss later, is to randomly subject one of the car occupants to a punishment of s=3h. For example, the officer could (illegally) use excessive force during the arrest. In such a case, each of the occupants would also be optimally (although illegally) deterred, as each would have 1/3 of a chance of incurring 3h for an expected sanction of s=h.
By contrast, when the officer can reduce the punishment group at no cost, she will be required to do so. United States v. Di Re is such a case. The facts in Di Re are very similar to Pringle. Based on a tip from an informant, the police arrested three car occupants, even though the informant singled out only one of them, Buttitta. The Supreme Court held that only the named occupant, Buttitta, could be arrested. Here, because the officer could identify the culprit at no cost (e=0), the punishment group had to be reduced to n=1 (in effect, N=1).
Pringle is not an outlier; it has become the rule. Based on Pringle, courts have held that the police can arrest (i) an entire small group of middle school students when it is clear that some of them injured another in the bathroom; and (ii) a group of police officers when “some unidentifiable members . . . of that group have committed a crime.”
B. Investigatory Stops: High Crime Areas, Profiling & Bigotry
In Terry v. Ohio, the Supreme Court recognized an exception to the probable cause requirement. It held that in the interest of “effective crime prevention and detection,” a police officer who reasonably believes that criminal activity is afoot can detain an individual for investigatory purposes without probable cause. The officer may continue with a search if she reasonably suspects the person stopped is armed and dangerous. The police officer is not required to provide Miranda warnings, can use a limited amount of force, and may seize any contraband in plain view or “plain feel,” as well as any contraband found during a protective patdown. In traffic stops, the Terry requirement is minimal. It is enough that the police legally stopped the vehicle for a potential traffic violation. The police can then order the vehicle occupants to step out and search for weapons, even if there is no reason to believe that any of the occupants engaged in criminal activity. The result is that “wholly innocent passengers in a taxi, bus, or private car” may all be subject to detention “simply because they have the misfortune to be seated in a car whose driver has committed” an illegal act.
When multiple parties are involved, Terry stops become a form of group punishment. They allow the police to impose a cost on a large group of innocent individuals in the hope of finding the true offender. The Supreme Court recognized as much when it noted that “Terry accepts the risk that officers may stop innocent people.” Importantly, the Supreme Court treated the sanction as negligible, or to use the model’s terminology, costless. Stops and frisks, the Court held, “amount to a mere ‘minor inconvenience and petty indignity.’” It also comes at little cost to the state (e≈0) “where the police . . . have no interest in prosecuting” those detained. As this Section shows, sometimes, the constitutional right of individuals to be secured in their persons and effects is sacrificed on the altar of finding the culprit. In other cases, punishing the innocents is done in the name of deterrence or due to Othering and bigotry. Regardless of the motivation, with a few exceptions, the decision to impose collective punishment is deliberate and consistent with our model’s predictions.
1. High Crime Areas
Terry stops have been used as a form of collective punishment in cases where the true offender is unknown but suspected to be included in a group. One example is defining the group based on location—being present in “high crime areas.” Little more than the arbitrary condition is needed to conduct a Terry stop. The Supreme Court held that the police only need a “plus factor” to detain and search individuals in a high crime area. These plus factors can be unusual conduct, a furtive gesture, or physical and temporal proximity to the location of the criminal activity.
The unbearable ease with which a police officer can (Terry) stop a group of innocent individuals is demonstrated in United States v. Broomfield. About thirty minutes after a store was robbed, a police officer stopped Broomfield, a Black male. The stated reason: Broomfield was staring straight ahead when he drove past the police car. Judge Posner found the explanation unpersuasive:
Had Broomfield instead glanced around him, the officer would doubtless have testified that Broomfield seemed nervous or, the preferred term because of its vagueness, “furtive.” Whether you stand still or move, drive above, below, or at the speed limit, you will be described by the police as acting suspiciously should they wish to stop or arrest you.
As one scholar noted, “the clear message is that in ‘high crime’ . . . areas . . . the possibility of criminal activity is so substantial as to make everyone in the area subject to police inquiry.” That message was well received by the lower courts where the nature of the area is often the only explanation offered to support the stop. In other words, when the police cannot identify the culprit, they can define the punishment group, n, based on what is truly nothing more than their presence in a certain locale. The size of the locale varies. In one case, the police argued that “the entire city” is a high crime area that may justify detainment and a dog sniff. A locale may of course be a mere proxy for race or ethnicity—the subject of the next examples and an issue we investigate further below.
2. Racial and Ethnic Profiling
Profiling—a widely common activity—is another form of group punishment. It allows the police to impose a sanction (e.g., in the form of a Terry stop) based on the fact that the individual belongs to a group of people who share an observable characteristic such as race, ethnicity, or national origin. To use the model’s terminology, profiling allows police officers, at a very low cost (e≈0) to narrow the population, N, to a smaller group of people, n, who meet certain criteria, while treating the sanction as costless.
It may be surprising, but the U.S. Constitution allows the police to engage in racial and ethnic profiling. In border areas, for example, an enforcement agent is allowed to target and stop for questioning car occupants based on their apparent Mexican ancestry, even if they did not violate any law. The leading case is United States v. Brignoni-Ponce. In this case, roving patrol agents near San Clemente, California, pursued and stopped a car for questioning. They later testified that “their only reason for doing so was that its three occupants appeared to be of Mexican descent.” Upon learning that the two passengers were illegal aliens, they arrested all three.
At trial, the driver, Brignoni-Ponce—a U.S. citizen of Puerto Rican ancestry—was found guilty of knowingly transporting illegal immigrants. The U.S. Court of Appeals for the Ninth Circuit reversed. It held that there was nothing that could have given the rover agents a reason to suspect the car occupants. “All that they knew was that Brignoni-Ponce and his companions appeared to be of Mexican descent and were in a sedan . . . .” And that, the court held, was not enough. “The conduct does not become suspicious simply because the skins of the occupants are nonwhite.”
Relying on Terry, the Supreme Court held that “because of the importance of the governmental interest [in preventing illegal entry], the minimal intrusion of a brief stop [(s≈0)], and the absence of practical alternatives for policing the border,” all the police need to detain someone is a reasonable suspicion. It then reversed Brignoni-Ponce’s conviction. Mexican appearance, the Court held, is a relevant factor, but it cannot be the only factor. However, combined with “[a]ny number of factors,” Mexican appearance can justify a stop. Such factors include the “proximity to the border,” “the usual patterns of traffic,” “previous experience with alien traffic,” “information about recent illegal border crossings,” “[t]he driver’s behavior,” and characteristics of the car (whether it appears to be heavily loaded, large enough to conceal aliens, or has a large number of passengers).
Brignoni-Ponce won, but for many it was a pyrrhic victory. “Today, race dominates immigration enforcement, in no small part due to . . . Brignoni-Ponce.” A year after the decision, in United States v. Martinez-Fuerte, the Supreme Court held that an even lower standard is required when an enforcement agent refers a car from a checkpoint to a secondary inspection area for questioning about immigration status. In such cases, the police do not even need a reasonable suspicion. Moreover, such referrals are legal “even if [they] are made largely on the basis of apparent Mexican ancestry.” The result, as the dissent noted, is that an enforcement agent can “target motorists of Mexican appearance,” and they can do so “wholly on whim.”
The difference between the majority and the dissent in Martinez-Fuerte can be explained by our model. For the majority, the ethnic singling out on the basis of a “visual inspection” came at little to no cost (i.e., e≈0). It also involved a negligible, or to use the term of the model, costless sanction. In the eyes of the majority, being singled out at a checkpoint for further inquiry involves “a brief detention” that does not “‘stigmatiz[e]’ those diverted” (i.e., s≈0). This “modest” intrusion, the Court explained, “is sufficiently minimal that no particularized reason need exist,” and it is justified by the strong interest in deterring illegal entry. By contrast, the dissent thought that the sanction that “arbitrarily selected motorists” had to endure—“the delay and humiliation of detention and interrogation”—was costly. It would also “inescapably discriminate against citizens of Mexican ancestry and Mexican aliens lawfully in this country for no other reason than that they unavoidably possess the same ‘suspicious’ physical and grooming characteristics of illegal Mexican aliens.”
Another factor that played a role in Martinez-Fuerte was effective deterrence. The majority noted that of the 7 million cars that passed through the checkpoint each year, about 17,000 vehicles harbored illegal aliens. In the words of our model, the probability of finding a culprit in the punishment group was p=0.0024, implying that more than 99.9976% of those sanctioned were innocent. For the Court, this “rate of apprehensions” provided “a rather complete picture of the effectiveness of the . . . checkpoint.”
In Whren v. United States, the Supreme Court went a step further. It held that the subjective intent of an officer initiating a stop is irrelevant. This is so even if the stop was clearly pretextual or motivated by bigotry or racism. Whren involved plainclothes vice officers patrolling a high drug area in an unmarked police car. “Their suspicions were aroused when they passed a dark Pathfinder truck with temporary license plates and [two] youthful occupants waiting at a stop sign” for about twenty seconds. Both occupants were Black men. That was enough for the officers. A pursuit ensued. Once they stopped the car, the police found several types of illegal drugs and arrested both occupants.
In their motion to suppress, the defendants argued that the stop was pretextual: that it was nothing more than an excuse to search for drugs without probable cause or even reasonable suspicion. A police officer, they explained, “will almost invariably be able to catch any given motorist in a technical violation,” and they can do so “based on decidedly impermissible factors, such as the race of the car’s occupants.”
There was additional support to the defendants’ claim that the stop was pretextual. Vice officers usually do not concern themselves with traffic violations. The Court admitted as much when it noted that the “District of Columbia police regulations . . . permit plainclothes officers in unmarked vehicles to enforce traffic laws ‘only in the case of a violation that is so grave as to pose an immediate threat to the safety of others.’” No such threat could have been shown in Whren.
The Supreme Court was not persuaded. In a unanimous vote, it held that the actual motive of the officers—the real reason for a stop—is irrelevant. An ulterior motive for a stop, even if it is racially-based, is simply not enough to render a stop invalid. Thus, once enforcement agents find probable cause that a traffic violation—any violation—has occurred, they may stop the car even if they do so because of the race or ethnicity of its occupants. Relying on Whren, some courts held that the officers’ motive is irrelevant, even where they admitted its pretextual nature. Others recognized that “in practice” this means that traffic stops will be used as pretext.
The scope of racial profiling was tested in Brown v. City of Oneonta. In Brown, a seventy-seven-year-old woman reported that a young Black male attacked her, and had cut himself with a knife during the struggle. A canine unit tracked the assailant to the local university campus where it lost his tracks. The police then obtained a list of every Black male student from the university and sought to locate and question each one of them. When the endeavor failed to produce suspects, the police broadened their search. They “stopp[ed] and question[ed] non-white persons on the streets and inspect[ed] their hands for cuts.” More than 200 Black males were questioned in this manner.
The students and all those interrogated filed a suit claiming that the police had violated their constitutional rights. The court dismissed the plaintiffs’ equal protection claim, reasoning that despite the disparate impact on the Black minority, there was no evidence of discriminatory intent. The police, the court held, were simply trying to winnow down the group by focusing on Black males.
3. Discriminatory Motive
In other cases, the discriminatory intent is evident. Enforcement agents may sanction a minority group because they believe that it is more likely that the culprit is a member of that group or because the group members are more likely to commit a crime. In other cases, the group sanctions may be motivated by bigotry and animus. These sanctions should all be analyzed as if they are costless. Not because they are. They are not. But rather because the bigoted agent, by ignoring the welfare of the target group, treats the sanctions as costless.
Courts have long recognized the ease with which Terry stops may be abused to discriminate against minority groups. In fact, Terry itself is a good example. In Terry, an officer in plain clothes suspected that two individuals he had never seen before were up to something, but “he was unable to say precisely what drew his eye to them.” According to his own admission, Terry and Chilton’s gravamen was simply standing at the intersection of two streets in midday. The only explanation the officer offered for taking an interest in the two was that they “didn’t look right to [him].” What the Court failed to mention was that Terry and Chilton were people of color despite the many indications that their race was the true reason for the stop.
A recent disturbing example of the abuse of Terry stops is Floyd v. City of New York. Between 2002 and 2012, over 4.4 million Terry stops were conducted in New York. Statistical and anecdotal evidence showed that the NYPD’s stop and frisk policies intentionally and expressly targeted “racially defined groups,” many of whom “have done nothing to attract the unwanted attention.” The police’s own records indicated “that at least 200,000 stops were made without reasonable suspicion.” Perhaps not surprising given that police officers were encouraged to “target[ ] ‘the right people,’” code for young Black and Hispanic men. Such racial profiling, the court held, is illegal. The police should not—although the court found that they did—“subject all members of a racially defined group to heightened police enforcement because some members of that group are criminals.”
4. Checkpoints & Random Enforcement
Checkpoints are a form of random enforcement. They can be set at any time along selected geographic areas. Once set, all vehicles passing through the checkpoint must stop or slow down. Drivers showing signs of intoxication are then directed to a secondary location where the officers check registration and may conduct a sobriety test.
The legality of such checkpoints was at issue in Michigan Department of State Police v. Sitz. During a period of seventy-five minutes, 126 drivers passed through the checkpoint—an average delay of a few seconds for each vehicle. The Court first determined that the delay at the checkpoint was a seizure. It then explained that the constitutionality of the checkpoint rests on a three-prong test. This test requires “balancing the state’s interest in preventing accidents caused by drunk drivers, the effectiveness of the sobriety checkpoints in achieving that goal, and the level of intrusion on an individual’s privacy caused by the checkpoints.” In essence, the Court determined that the enforcer must consider our three choice variables. The state interest in deterrence must be evaluated in light of the probability of apprehension, p, which is determined by the level of expenditure made by the enforcer, e, as well as the nature and size of the punishment group, and the type and severity of the sanction, s.
In applying the test, the Court found deterrence to be an important goal. It then considered the type of sanction chosen. “[T]he choice among . . . reasonable alternatives,” the Court explained, “remains with the governmental officials.” It also found that the severity of the sanction—the “intrusion” to drivers’ privacy, “fear and surprise”—was “slight.” Finally, the majority found the sanction to be effective. The level of investment, e, in winnowing down the group of drivers, included setting a permanent checkpoint where all cars had to stop for a brief examination by uniformed police officers. As a result, the ratio of offenders arrested (those driving under the influence) to innocents—that is, p(e, N)—was 1.6% (of the 126 that were stopped at the checkpoint, only two were arrested). That ratio, the Court explained, was much higher compared to a ratio of 0.24% of aliens found to vehicles stopped in Martinez-Fuerte.
In contrast to the permanent checkpoint in Sitz, the court found that random stops aimed at identifying unlicensed drivers and unsafe vehicles were illegal. Empirical data showed that few people operated vehicles without a license. By contrast, “the number of licensed drivers who will be stopped in order to find one unlicensed operator will be large.” In other words, the low investment in a random stop resulted in a low probability of apprehension and a severe sanction.
5. Identifying the Culprit
Terry stops and searches involve cases where the police are concerned that a crime is afoot. In other cases, the police work begins after the crime has been committed. Courts and scholars draw a distinction between the identification phase (i.e., finding the suspect) and the ensuing investigation (i.e., the evidence-gathering phase). In the preliminary identification stage, enforcers have much flexibility in taking actions against large groups of innocent individuals in order to find the one culprit. And they can do so without probable cause.
An example is Davis v. Mississippi, which involved a rape. The only description the victim could give was that the assailant was Black. The other lead was fingerprints found on the victim’s window. Without a warrant or probable cause, the police interrogated and fingerprinted a large group of approximately seventy Black youths. One of them, the defendant, was a match.
In echoing Terry, the Supreme Court held that detaining an entire group of people for the sole purpose of obtaining fingerprints that may identify the one offender may be consistent with the Fourth Amendment, provided a proper procedure is in place. Such detention, the majority explained, imposes “a much less serious intrusion” (i.e., sanction) on members of the group compared to Terry stops and searches. However, because no procedures were made, the Court reversed the conviction. The two dissenting justices had no issue with the way the fingerprints were collected. One of them, Justice Stewart, thought that fingerprints are simply a form of evidence, which, like the color of a person’s eye, can be obtained without any concern.
A few years later, in a decision authored by Justice Stewart, the Supreme Court held that a large number of individuals can be coerced to produce voice samples in order to identify the culprit. United States v. Dionisio involved grand jury witnesses. The grand jury convened to investigate potential gambling law violations, during which it came across a voice recording. To identify the people on the recording, the grand jury compelled twenty people to provide voice exemplars that could be compared to the recorded message. To that end, each was required to read a transcript of the recording and have their reading recorded.
The respondent, Dionsio, refused to do so and was held in contempt by the lower court. The U.S. Court of Appeals for the Seventh Circuit reversed. The Fourth Amendment, it held, “bans ‘wholesale intrusions’ upon personal security” because one of them may be the culprit. Summoning a large number of individuals was simply unreasonable. The Supreme Court disagreed. It explained that unlike Terry stops and searches, a subpoena is not abrupt and does not involve a social stigma—that is, it comes at a low cost. The Court then explained that the large size of the punishment group was necessary “to identify one voice.”
C. Denmark’s 2018 Ghetto Laws
Denmark’s Ghetto Laws are another example where the punishment group is defined by locale, ethnicity, and race. The loaded term “Ghetto” was originally used to denote a part of the city where Jews were forced to live, and evokes dark memories of the Holocaust. But in 2010, the Danish legislature formally introduced the term as part of a reform titled “The Ghetto [B]ack to [S]ociety—Confronting [P]arallel [S]ocieties in Denmark.” The stated goal of the new policy was to integrate immigrant communities and reduce crime levels. The chosen method was “social mixing” combined with collective punishment. To that end, the Common Housing Act was amended to require the government to prepare a list of “Ghettos” that would be subject to high assimilation and enforcement efforts. These include the reduction of public housing to populations that are not considered “Western” enough by way of evicting and displacing them, forced sales, real-estate repurposing, and demolitions. Parents who receive social benefits must also enroll their kids in state daycares “with a maximum migrant intake of 30%.”
Following a 2018 amendment, a Ghetto is now defined as a public housing neighborhood with a minimum of 1,000 residents that satisfies a number of criteria. Chief among them is that “the share of immigrants and descendants from non-Western countries exceeds 50%.” The definition of “non-Western” not only leads to discrimination on the basis of nationality, but also on the basis of race, ethnic origin, and religious orientation. It targets “non-white, non-European ethnic populations” in what are “largely Muslim neighborhoods.”
Importantly, the undesirable “Ghetto” status provides the government with draconian enforcement measures. One of these measures is to “allow[] for collective punishment—by eviction—of entire families if one of their members commits a criminal act.” Another measure allows the local police to define geographical zones in which punishments for particular crimes can be doubled and fines may be converted to imprisonment.
It is thus not a surprise that some found “Denmark’s Ghetto laws [to be] a basic exercise of scapegoating.” However, as despicable and offensive as they may be to most people’s notions of morality, they may nevertheless be consistent with optimal deterrence policy. The reduction in punishment group, n, was accompanied by an increase in the sanction, s.
D. The Frankpledge and Riot Laws
We now return to the riot example. As Section III.C explained, to achieve optimal deterrence, the enforcer can impose a sanction of s=h=500 on each member of the punishment group (n=N=200 individuals in the example), but deterrence can also be achieved by imposing a higher sanction on a subset of that group—for example, by collecting $2,000 (200500/50) from each of fifty randomly chosen individuals. Both alternatives require, however, that members of the punishment group are wealthy enough to pay the fine. In the first case, each member must be able to pay $500. In the second, each of the chosen fifty must be able to pay $2,000. When members of the punishment group are not wealthy enough (i.e., when <h), deterrence will be suboptimal. For example, if each member of the punishment group can only pay $400, sanctioning all members will still result in underdeterrence (400<500). These insights may explain the demise of both the frankpledge system and the English and American riot laws modeled after it.
The frankpledge was a low-cost collective enforcement system. It was invented to fight crime at a time where prisons, police, and other means to identify offenders were rarely, if at all, available (i.e., e=0). The idea, according to some scholars, was that imposing a collective fine on a group of individuals would deter offenders by incentivizing group members to ex-ante monitor each other and ex-post identify the culprits.
The system worked by sorting individuals into small groups called tithings. The number of individuals in the tithing varied. In some cases, it was ten, in others hundreds, and sometimes it was even an entire locality. When an unidentified member of the group committed a crime, a collective punishment in the form of a fine was imposed on each member of the group. The tithing, arbitrarily selected, was thus the punishment group, n. The system, however, suffered from a number of limitations. As explained below, in some cases the fine was too high. In others it was too low. Both cases resulted in suboptimal deterrence.
According to some descriptions, each member of the punishment group deposited a certain amount which was forfeited if the tithing failed to turn in the offender. This form of frankpledge could deter only the commission of low-value crimes—those where the deposit was higher than the benefit to the offender. Ability to pay was a major issue (many were poor), and so the deposit (or obligation to pay) could not be set at a meaningful level.
Consistent with our model, one solution would have been to target only a few wealthy individuals. In fact, the frankpledge system did so by nominating one or two members of the group as “chief pledges.” These were men who had means and stature in the community. When a crime was committed by an unidentified member of the tithing, the chief pledges had to pay when others could not. Their expected sanction was thus much higher and so was the incentive to use their influence to identify the offender. There is also evidence that, at least in some cases, the imposition of the sanction was done at random. In these frankpledge systems, although the sanction was nominally imposed on a large group, the sanction could have been collected from any randomly chosen (wealthy) member of the group.
In other cases, the fine was too low. This occurred when, rather than imposing a predetermined fine on each individual, the fine was imposed on the entire group. Consequently, as the number of group members increased, the actual amount each had to pay was decreased. Thus, when the punishment group comprised a large number of individuals, the result was a suboptimal level of deterrence. This was the case when the tithing was defined as inhabitants of the “hundred”—a sub-unit of a county or a shire—that is, when n was large:
To bring home to each locality a realizing sense of its responsibility, therefore, [in 1285] Edward I enacted a new law making the people of each hundred and franchise responsible for robberies and damages arising through their failure to produce the offenders. The half-mark usually paid by the tithing for the escape of an offending member in the time of Henry II [i.e., the 1100s],—so heavy a burden that in some instances the sheriff seems to have been compelled to defer its collection for a year or even longer,—had now come to represent a far slighter value, the payment of which was inadequate to spur the community to capture a fugitive neighbor with whom it was often in sympathy.
The frankpledge disappeared, but it gave birth to other collective punishment regimes. One example was the English Riot Act of 1714. The act, some argue, was born out of a necessity. “[M]any rebellious riots and tumults” spread in the kingdom, while “the punishments” were “not adequate to [address the] heinous offen[s]es.” The act criminalized participation in a riot. It also allowed individuals who incurred property damage from rioters to recover from inhabitants of the “hundred” with funds collected by raising a general tax.
Many U.S. states followed the English model. By 1936, at least twenty-four states enacted statutes that allowed the victims of property damage and, in some cases, personal injury, to recover from cities and counties. Although many of these statutes have since been abolished, some, like Maryland’s, remain as relics. The reason—as the model theorizes and the Court of Appeals of Maryland explained—is deterrence. It was enacted “on the assumption . . . that it would deter rioters by spreading the tax burden upon the entire community.”
IV. Conclusion
Modern concepts of criminal justice, and more broadly the assignment of liability for harmful behavior, center on the principle of individual responsibility, or punishment of the truly guilty. The idea of collective responsibility, by contrast, is usually associated with a primitive past. At that time, social order was maintained by threats of retaliation by competing kinship groups, and vengeance rather than justice was the governing ethic. Yet examples of collective punishment remain in modern law enforcement, suggesting that there is a functional aspect to the strategy.
With this idea in mind, the analysis in this Article draws on the standard economic theory of crime and punishment to explore the usefulness of group punishment as a law enforcement tool. Our purpose is twofold. The first is to gain an understanding of the conditions under which this practice potentially enhances the pursuit of welfare maximization (when deterrence is the enforcer’s primary goal) in a world where the detection of lawbreakers is imperfect. Existing models essentially assume away collective punishment. By contrast, our approach showed that the imposition of liability on multiple innocent individuals is inherent to the way that law enforcement is actually conducted in the presence of uncertainty.
The results of the analysis showed that group punishment is often part of an effective enforcement policy. This is especially so when punishment is not costly to impose or where such costs are ignored. This is true because when the identity of the true offender is unknown, punishing a group of individuals that contains the true offender with a high probability promotes deterrence while saving on detection costs. The same result can be achieved by punishing a randomly chosen suspect with a fine that is scaled up by the size of the population that is known to contain the true offender. Although these extreme policies are highly objectionable to modern sensibilities, they follow logically from a strict economic approach aimed at achieving optimal deterrence.
The prescriptions are less extreme when punishment is costly—whether due to detention costs or aversion to wrongful punishment—but some amount of group punishment nevertheless remains part of an optimal deterrence policy. This is true because reducing the size of the “punishment group” to one necessarily mitigates deterrence, holding all other factors constant, because the true offender faces punishment with a decreasing probability. This can be offset by raising the sanction, but the ability of offenders to pay fines, or the maximum time they can serve in prison, is necessarily limited. That leaves investing more resources in detection, or raising the size of the punishment group, as the only tools for achieving greater deterrence. And since both are costly, the conscious choice to punish a single individual will not generally be optimal from a strict welfare perspective, though it may be the morally preferred one.
Which brings us to the second goal of this Article. Using our model, we are able to uncover different forms of collective sanctions and analyze their welfare effects. Our review of the pervasive use of group punishment in practice suggests that collective sanctions are imposed more commonly than policy makers might care to acknowledge. Moreover, the analysis reveals that enforcers often discount the cost of the sanction imposed on the punishment group. In other cases, they treat a costly sanction as costless. Both cases result in loss of welfare and freedoms. By unearthing these cases we are able to identify cases in which enforcers are more likely to abuse their power and thereby lay the ground for change.
You must be logged in to post a comment.