Reinforcement learning has shown much success in games such as chess, backgammon and Go [21, 22, 24]. However, in most of these games, agents have full knowledge of the environment at all times. In this paper, we describe a deep learning model in which agents successfully adapt to different classes of opponents and learn the optimal counter-strategy using reinforcement learning in a game under partial observability. We apply our model to $$backslashmathsf FlipIt$$FlipIt[25], a two-player security game in which both players, the attacker and the defender, compete for ownership of a shared resource and only receive information on the current state of the game upon making a move. Our model is a deep neural network combined with Q-learning and is trained to maximize the defender’s time of ownership of the resource. Despite the noisy information, our model successfully learns a cost-effective counter-strategy outperforming its opponent’s strategies and shows the advantages of the use of deep reinforcement learning in game theoretic scenarios. We also extend $$backslashmathsf FlipIt$$FlipItto a larger action-spaced game with the introduction of a new lower-cost move and generalize the model to n-player $$backslashmathsf FlipIt$$FlipIt.