GenAI Optimization

Home Research Publications Projects Service Updates Blog CV

This direction focuses on making generative AI systems efficient enough for real deployment without treating compression as a purely mechanical size-reduction problem. I work on methods that connect LLM/VLM compression, neural architecture search, runtime orchestration, and formal specifications with practical constraints such as energy, latency, fairness, and thermal safety.

The core question is how to optimize GenAI systems while preserving behavior that matters: task reliability, safety constraints, edge deployability, and predictable performance.

Relevant Papers and Projects

TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge
K. Khalil, K. A. Hoque
ICCAD 2025.

TOGGLE repository
Safe LLM compression framework.

FairCompress repository
Fairness- and energy-aware LLM compression.

VeriNAIS repository
Signal-temporal-logic-guided neural architecture search.