Containing the spread of crime in urban societies remains a major challenge. Empirical evidence suggests that, if left unchecked, crimes may be recurrent and proliferate. On the other hand, eradicating a culture of crime may be difficult, especially under extreme social circumstances that impair the creation of a shared sense of social responsibility. Although our understanding of the mechanisms that drive the emergence and diffusion of crime is still incomplete, recent research highlights applied mathematics and methods of statistical physics as valuable theoretical resources that may help us better understand criminal activity. We review different approaches aimed at modeling and improving our understanding of crime, focusing on the nucleation of crime hotspots using partial differential equations, self-exciting point process and agent-based modeling, adversarial evolutionary games, and the network science behind the formation of gangs and large-scale organized crime. We emphasize that statistical physics of crime can relevantly inform the design of successful crime prevention strategies, as well as improve the accuracy of expectations about how different policing interventions should impact malicious human activity that deviates from social norms. We also outline possible directions for future research, related to the effects of social and coevolving networks and to the hierarchical growth of criminal structures due to self-organization.