Controlled Benchmarking of CNN Architectures for Fake Face Classification under Standardized Synthetic Conditions

Rakhi Chauhan; Monika Sethi; Sachin Ahuja

doi:10.15157/ijitis.2026.9.1.597-624

Authors

Rakhi Chauhan Department of Computer Science and Engineering, Chitkara University Institute of Engineering & Technology, Rajpura, Punjab, India https://orcid.org/0009-0007-8984-9834
Monika Sethi Department of Computer Science and Engineering, Chitkara University Institute of Engineering & Technology, Rajpura, Punjab, India https://orcid.org/0000-0002-3655-0894
Sachin Ahuja Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India https://orcid.org/0000-0002-2565-6685

DOI:

https://doi.org/10.15157/ijitis.2026.9.1.597-624

Keywords:

Deep Learning, Artificial Intelligence, Convolutional Neural Networks, Generative Adversarial Networks, Fake Face Detection, ; CNN Benchmarking

Abstract

The most recent advances connected with the creation of generative adversarial networks (GANs) resulted in the fact that the visual appearance of the fake faces became even more realistic, which raises the question of whether authenticity can be checked and whether the detection mechanisms regarding deep learning can be strong. However, at the moment, and despite a wealth of literature, objective comparison across architectures in fake face detection is not possible due to the differences in dataset selection, pre-processing pipelines, and training methods across papers. This inconsistency inhibits the reproducibility and dilutes innate architectural learning behaviour. The paper introduced a controlled and reproducible benchmarking system that was designed to assess the representative convolutional neural network (CNN) systems under standardized synthetic environments. A balanced real and synthetic face dataset was constructed with Style GAN to produce synthetic faces and Flickr-Faces-HQ (FFHQ) that had real images only. To control the architecture effects, popular CNN models including VGG16, VGG19, ResNet50, DenseNet201, MobileNetV2, InceptionV3 and EfficientNet-B0 were trained with the same transfer learning starting point, hyper-parameter values and limited training budgets. The evaluation of the performance was conducted using classification accuracy, precision, recall, F1-score, convergence dynamics, area under the ROC curve (AUC), and computational efficiency. The more sophisticated connectivity schemes and compound scaling schemes of architecture were more convergence efficient and with predictable steady behaviour with small regimes of optimisation. EfficientNet-B0 is the most accurate in classification (88.67%), precision (0.91), balanced F-score (0.88), and training time which demonstrates that it has a good trade-off between predictive power and computational efficiency. This contribution, rather than concentration on deployment-level forensic generalization, is concerned with methodological seclusion of architectural learning behaviour. The proposed benchmarking solution provides an official and consistent foundation to a systematic architectural reflection of a synthetic face detection experiment.