Deploy and secure an on-premises AI infrastructure for hosting large language models (LLMs)Install, configure, and maintain AI model-serving frameworks on internal GPU-enabled serversDevelop and maintain robust, scalable APIs to ensure internal access to AI capabilities and seamless integration with...
...
Deploy and secure an on-premises AI infrastructure for hosting large language models (LLMs)
Install, configure, and maintain AI model-serving frameworks on internal GPU-enabled servers
Develop and maintain robust, scalable APIs to ensure internal access to AI capabilities and seamless integration with enterprise applications and data systems
Collaborate on the implementation of a Retrieval-Augmented Generation (RAG) pipeline and AI agents to automate business workflows
Requirements
Bachelor’s degree or higher in computer science, electrical/computer engineering or related field
Minimum 4 years of experience in systems engineering, DevOps, or MLOps Role
Proficiency in Linux Server Administration
Strong working knowledge of GPU-accelerated compute environments
Proficiency in Python for scripting, automation, and building AI/ML data pipelines
Experience deploying LLMs or generative AI models in production environments
Working knowledge of RAG architectures, including vector databases, embedding models, and retrieval strategies.