Project Showcase
Fine-Tuned Tool-Calling Agent
A Small Model Taught to Use 40+ Tools
A LangGraph ReAct agent over a custom MCP backend exposing 40+ tools, with the decision model replaced by a fine-tuned Gemma 3 4B. Successful multi-turn tool trajectories from production were filtered and reformatted into an SFT dataset, then used to train Gemma 3 4B with QLoRA — cutting cost and latency while making tool-call formatting far more reliable.
Tech Stack
Key Features
Fine-Tuned Decision Model
Gemma 3 4B trained with SFT + QLoRA on the agent’s own successful tool trajectories, replacing an expensive prompt-steered API model.
40+ Tool MCP Backend
A custom MCP server exposes the existing backend as 40+ tools the ReAct agent calls during multi-step tasks.
Trajectory-Mined Dataset
Production multi-turn runs filtered to verified successes, deduplicated, and formatted with the model’s own chat template and multi-turn response masking.
Reliable Tool Calling
Training on the exact tool-call format makes the model emit valid name + JSON-args calls consistently, so the harness parses them reliably.
Architecture
Diagrams
API Usage
from peft import LoraConfig
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype="bfloat16",
)
lora_config = LoraConfig(
r=16, lora_alpha=32, lora_dropout=0.05,
target_modules=[ # attention AND MLP
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
],
task_type="CAUSAL_LM",
)
# optimizer: paged_adamw_8bit · base stays frozenImpact
Replaced a large prompt-steered model with a small fine-tuned one: lower cost and latency, and far more reliable tool-call formatting across multi-step tasks.
40+
MCP Tools
4B
Gemma Parameters
QLoRA
Single-GPU Training