Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
An open, comprehensive benchmark exposing how large audio-language models can be jailbroken through the audio channel.
First-year PhD student in NLP at Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi.
I am supervised by Prof. Xiuying Chen and Prof. Xiaojun Chang. I received my Bachelor of Engineering (Honours) in Software Engineering with First Class Honours from the University of Technology Sydney, where I also received the Dean's List 2025 prize (Top 2% of students).
Before that, I was a member of UTS-NLP since Oct 2023, where I was fortunate to be advised by Prof. Ling Chen and mentored by Prof. Meng Fang.
Currently working on multimodal reasoning, geolocation, embodied agents, and trustworthy MLLMs.
Five selected works — first-author or co-first-author. The full list, including the freshest ICML 2026 and ACL 2026 acceptances, lives on Google Scholar.
An open, comprehensive benchmark exposing how large audio-language models can be jailbroken through the audio channel.
Reinforcement learning for reasoning in embodied manipulation, aligning large vision-language models with affordance and trajectory rewards.
A taxonomy of domain knowledge injection: dynamic injection, static embedding, modular adapters, and prompt optimization.
AnomalyGen builds virtual anomaly scenes without human annotation to make household robots more robust.
A comprehensive benchmark for the cross-style visual capability of LMMs. Toolkit & Code
* equal contribution
My current research goal is to integrate multimodal information to improve the performance of large language models, with applications in geolocation and embodied AI.
I am also interested in jailbreak and attack issues of multimodal language models, particularly in vision and audio modalities.
One question keeps pulling me back: given a photograph of a street it has never seen, can a model reason its way home?
PhD, Mohamed bin Zayed University of Artificial IntelligenceExpected · NLP Department · UAE Government Scholarship
B.E. (Honours), University of Technology SydneySoftware Engineering · First Class Honours · GPA 3.90/4.00 · Dean's List 2025 (Top 2%)
Algorithm Engineer (Research Intern) · Supervised by Xiang Wang
MBZUAIVisiting student → PhD · Supervisor: Prof. Xiuying Chen · Trustworthy MLLMs
University of Technology SydneyResearch Intern · Prof. Ling Chen & Prof. Meng Fang · Multimodal Agents
Nanyang Technological UniversityResearch Intern · Supervisor: Prof. Alex Kot · Multimodal
First year of the PhD. The desert keeps its own hours, and so do I.
Lately I have been circling one question: what does it mean for a model to understand the world it has been shown. Not the act of prediction (we have plenty of that), but the quieter thing underneath. Whether it knows where it stands. Whether it can tell when it is wrong. Whether, given a photograph of a street it has never seen, it can reason its way home.
Most of what I work on lives near this question. Multimodal reasoning. Geolocation. Embodied agents that must act in places they have never been. The trust we extend, or refuse, to what a model claims it sees. I keep walking into the same room through different doors.
I read more than I write. I rewrite more than I publish. Some of the papers listed under my name belong to a younger version of me, and I am still learning how to be honest about that.
I have never loved being alive this much. New cities. New languages overheard on the bus. New collaborators who became friends before they became coauthors. I owe the courage of this season to the UAE Government Scholarship, which let me walk through doors I had only read about. I do not take that lightly.
The plan, if it can be called one, is to stay here long enough to plant something. The desert is not empty; it is patient. I would like to grow a small oasis on it, the slow kind, one paper, one student, one honest conversation at a time.
If you are working on something you cannot let go of, I would like to hear about it. My inbox is mostly quiet after midnight.
Last updated: May 2026
The fastest way to reach me is email. My inbox is mostly quiet after midnight — that is when I read carefully.