With the increasing use of unmanned aerial vehicles (UAVs) in infrastructure monitoring and environmental inspection, stable image quality has become critical for deep learning and photogrammetry applications. However, UAV images are often degraded by environmental disturbances, while existing quality filtering still relies on manual inspection, making it unsuitable for high-frequency or real-time deployment. This study proposes a realtime, reference-free image quality assessment (IQA) framework based on a Swin-Unet architecture to improve screening efficiency and ensure data quality stability, while simultaneously generating image quality maps (IQMs) for downstream applications. To overcome limitations of traditional SSIM-based methods, including the requirement for reference images and sensitivity to pixel misalignment, an improved metric, termed CLIP-SSIM (CSSIM), is introduced to construct an image scoring model. A probability-weighted Swin-Transformer is first employed to generate high-accuracy IQMs (RMSE = 0.0193); however, its pixel-wise inference is computationally expensive (532 seconds per image). Therefore, the generated IQMs are used as supervisory labels to train a Swin-Unet model, enabling real-time inference (0.3 s per image) with acceptable accuracy (RMSE = 0.04). The proposed approach provides an efficient, accurate, and scalable solution for UAV image screening, effectively replacing manual inspection in high-frequency UAV applications.