AI Systems & Fine-Tuning Engineer (Python / Open-Source LLMs)

Please login or register as jobseeker to apply for this job.

TYPE OF WORK

Full Time

WAGE / SALARY

500-2500

HOURS PER WEEK

TBD

DATE UPDATED

Mar 23, 2026

JOB OVERVIEW

About the role
We are looking for an AI Engineer to build, train, and optimize high-performance local models. This isn’t a "prompt engineering" or API-wrapping role—you’ll be working under the hood with fine-tuning, LoRAs, and deploying models directly on GPU hardware. Your goal is to take raw open-source weights and turn them into specialized, production-ready engines that are fast, efficient, and private.

What you’ll do
Specialized Training: Execute fine-tuning on local LLMs (LoRA, QLoRA, and full fine-tuning) to adapt models for specific domain tasks.

Model Optimization: Work with the latest open-source architectures (Qwen, Llama, Mistral) to maximize their utility in real-world applications.

Data & Pipelines: Build the "factory" for our models—handling everything from data scraping and cleaning to automated training and evaluation loops.

Inference Engineering: Optimize for speed. You’ll implement quantization, batching, and high-efficiency runtimes (vLLM/TGI) to keep latency low.

On-Prem Deployment: Manage model stability on local GPU servers and multi-GPU setups, ensuring high availability for internal services.

Rapid R&D: Stay at the bleeding edge—test new papers, quantization methods, and training techniques as soon as they hit GitHub.

What we’re looking for
Core Python: Deep experience building the scripts and tooling that power AI pipelines.

Fine-Tuning Expertise: Practical experience using PEFT, LoRA, and QLoRA to steer model behavior.

Local Execution: You’ve spent significant time running models on your own hardware or dedicated instances (not just hitting an OpenAI endpoint).

Architecture Knowledge: A strong handle on tokenization, context management, and preparing high-quality datasets.

The AI Stack: Fluency in Hugging Face, PyTorch, and the Transformers library.

Hardware Proficiency: Experience navigating CUDA, managing VRAM constraints, and orchestrating multi-GPU environments.

Problem Solver: You can debug a failing training run and understand why a model is "hallucinating" or underperforming.

Nice to have
Generative Media: Experience with ComfyUI, Stable Diffusion, or video generation workflows.

Edge Inference: Knowledge of GGUF, AWQ, or running models via engines like Ollama.

Scale: Experience with distributed training or managing massive, multi-terabyte datasets.

Agents: A background in building autonomous AI agents or internal productivity tools.

Top 3 Skills to select on OnlineJobs for this role:
Python (Absolute must-have for the codebase).

Machine Learning (To capture the general AI talent pool).

PyTorch (This filters for people who understand the underlying math/frameworks).

SKILL REQUIREMENT
VIEW OTHER JOB POSTS FROM:
SHARE THIS POST
facebook linkedin