GenAI Optimization

Home Research Publications Projects Service Updates Blog CV

This direction focuses on making generative AI systems efficient enough for real deployment without treating compression as a purely mechanical size-reduction problem. I work on methods that connect LLM/VLM compression, neural architecture search, runtime orchestration, and formal specifications with practical constraints such as energy, latency, fairness, and thermal safety.

The core question is how to optimize GenAI systems while preserving behavior that matters: task reliability, safety constraints, edge deployability, and predictable performance.

Relevant Papers and Projects

TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge
K. Khalil, K. A. Hoque
Spec2VLM: Specification-Guided Hardware-Aware Compression of VLMs via Constrained Design Space Exploration
K. Khalil, K. A. Hoque
FairCompress: Enabling Trustworthy Edge AI via Fairness- and Energy-aware LLM Compression with Formal Guarantees
K. Khalil, K. A. Hoque
VeriNAIS: Signal Temporal Logic-Guided Neural Architecture Search for Verified Cyber-Physical AI
K. Khalil, K. A. Hoque
TOGGLE repository
FairCompress repository