Loading Events

Persuadable Machines: Ethically Hacking GenAI with Rhetoric

Thu, May 15th, 2025
4:15 pm
- 5:30 pm

  • This event has passed.

A public talk entitled, “Persuadable Machines:  Ethically Hacking GenAI with Rhetoric” with Gerol Petruzella, Ph.D., Academic Technology Consultant, Williams College.

“τῶν δὲ πίστεων αἱ μὲν ἄτεχνοί εἰσιν αἱ δ᾽ ἔντεχνοι”
“Of the modes of persuasion some are non-technical, others technical” (Aristotle, Rhetoric I.2 1355b36).

Historically, we’ve persuaded (really, commanded) computers to perform tasks through technical means like writing well-formed code arguments. A sea change with generative AI is that non-technical natural-language persuasion— i.e. rhetoric — is now an effective means to induce an application to do our bidding. The rhetorical techniques Aristotle, Cicero, and Quintilian identified as efficacious for persuading a human audience are effective at ‘jailbreaking’ large language models. In this presentation, Petruzella will share his recent experiences as an invited participant in a red-teaming (ethical hacking) exercise held in October 2024 at a machine-learning security conference, where he used rhetorical exploits to bypass safety guardrails in generative AI models, through techniques such as appeals to authority, enthymeme, and more. As this technology has moved swiftly into the realm of agentic AI, with abilities to take unsupervised action in the world, its persuadability poses serious challenges to our assumptions about trust, reliability, and security.

Gerol Petruzella is an Academic Technology Consultant at Williams. He is a co-author on “Humanity’s Last Exam” (2025), a multi-modal LLM benchmark developed through the Center for AI Safety. In 2018 he was a contributor to the IEEE whitepaper Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems. This past Winter Study at Williams he taught PHIL 98: Automata to AI.

This event is free and open to the public.

Event sponsored by the Philosophy Department.

Event/Announcement Navigation