Adversarial Prompt Attacks and Cultural Misrepresentation Risks in Large-Scale Image Generation Systems
Keywords:
adversarial prompt attacks, cultural misrepresentation, text-to-image generation, bias amplification, socio-technical systems, model robustness, fairness governanceAbstract
Large-scale image generation systems have achieved remarkable capabilities in synthesizing high-fidelity visual content from natural language prompts. However, their increasing deployment in commercial and public-facing applications introduces critical vulnerabilities that extend beyond traditional security concerns. This paper investigates the intersection of adversarial prompt attacks and cultural misrepresentation risks within text-to-image models. We argue that adversarial perturbations to user prompts can systematically trigger the reproduction of harmful stereotypes, erase underrepresented cultural elements, and amplify biases embedded in training data. Drawing on recent advances in adversarial machine learning, cultural computing, and socio-technical systems analysis, we propose a unified framework for understanding how prompt-level manipulations interact with model architectures, training distributions, and deployment infrastructures to produce culturally harmful outputs. We examine structural trade-offs between model robustness, fairness, and utility, and discuss governance mechanisms including prompt filtering, adversarial training, and participatory auditing. Through cross-domain comparisons with large language models and recommendation systems, we highlight the unique challenges posed by visual modality and the opacity of latent space representations. The paper concludes with forward-looking recommendations for building culturally resilient image generation systems that prioritize equity without sacrificing generative quality.
References
1. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR).
2. Carlini, N., & Wagner, D. (2017). Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (pp. 3–14).
3. Birhane, A., & Prabhu, V. U. (2021). Large image datasets: A pyrrhic win for computer vision? In IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1537–1547).
4. Srinivasan, T., & Chander, A. (2022). The cultural bias of generative AI. Journal of Artificial Intelligence Research, 75, 1–25.
5. Wall, E., & Stede, M. (2023). Prompt attacks on text-to-image models: A taxonomy and analysis. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 4231–4243).
6. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR).
7. Guo, Z., Li, Y., Song, D., & Liu, Y. (2024). Adversarial prompt attacks on diffusion models: Bypassing safety filters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 11234–11243).
8. Bianchi, F., Attanasio, G., Pisoni, D., Leite, D., & Poch, M. (2023). It's not just size that matters: Small language models are also few-shot learners. In Findings of the Association for Computational Linguistics: EACL (pp. 1479–1489).
9. Bansal, H., & Garg, S. (2023). Bias amplification in generative models through iterative prompting. In Advances in Neural Information Processing Systems (NeurIPS) (Vol. 36).
10. Lee, M., & Kim, J. (2023). Cultural representation in text-to-image generation: A case study of East Asian traditions. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT) (pp. 404–414).
11. Jain, N., & Schwartz, R. (2024). Adversarial prompt modification for stereotyping in generative models. In International Conference on Machine Learning (ICML) (Vol. 202).
12. Shi, C., Li, S., Guo, S., Xie, S., Wu, W., Dou, J., ... & Chua, T. S. (2025). Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation. arXiv preprint arXiv:2511.17282.
13. Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT) (pp. 59–68).
14. Smirnova, A., & Riedl, M. (2024). Visual stereotypes in generative AI: An analysis of African representations. Journal of Cultural Analytics, 9(1), 1–22.
15. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10684–10695).
16. Olteanu, A., Varol, O., & Kiciman, E. (2022). The amplification of stereotypes through content recommendation algorithms. In Proceedings of the ACM Web Conference (WWW) (pp. 2345–2355).
17. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR).
18. Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., ... & Barnes, P. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT) (pp. 33–44).
19. European Commission. (2021). Proposal for a Regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.
20. Luccioni, S., & Viviano, J. (2023). The unintended consequences of concept erasure in diffusion models. In Advances in Neural Information Processing Systems (NeurIPS) (Vol. 36).
21. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (NeurIPS) (Vol. 33).
22. Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., ... & Vayena, E. (2018). AI4People—An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds and Machines, 28(4), 689–707.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal of Artificial Intelligence Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



