© 2021 by the authors. Licensee MDPI, Basel, Switzerland.A communication system based on unmanned aerial vehicles (UAVs) is a viable alternative for meeting the coverage and capacity needs of future wireless networks. However, because of the limitations of UAV-enabled communications in terms of coverage, energy consumption, and flying laws, the number of studies focused on the sustainability element of UAV-assisted networking in the literature was limited thus far. We present a solution to this problem in this study; specifically, we design a Q-learning-based UAV placement strategy for long-term wireless connectivity while taking into account major constraints such as altitude regulations, nonflight zones, and transmit power. The goal is to determine the best location for the UAV base station (BS) while reducing energy consumption and increasing the number of users covered. Furthermore, a weighting method is devised, allowing energy usage and the number of users served to be prioritized based on network/battery circumstances. The suggested Q-learning-based solution is contrasted to the standard k-means clustering method, in which the UAV BS is positioned at the centroid location with the shortest cumulative distance between it and the users. The results demonstrate that the proposed solution outperforms the baseline k-means clustering-based method in terms of the number of users covered while achieving the desired minimization of the energy consumption.