llm_engin.py
Custom Reward Functions
🔁 Training Loop Overview
The Ladder loop consists of the following key stages:-
Problem Decomposition
The task is recursively broken down into simpler subproblems using the Recursive Tree Engine. -
Solution Generation
The smaller LLM attempts to solve subproblems. -
Verification & Evaluation
A larger LLM or a domain-specific Verification Engine checks the solution quality. -
Reward Assignment
Rewards are computed based on the correctness and usefulness of answers. These guide learning via the GRPO (Guided Reinforcement via Pseudo-Optimization) protocol. -
Model Update
The smaller LLM is fine-tuned using the collected rewards and verified data points.
🔍 Role of Engines
Ladder coordinates multiple modular Engines (or agents), each responsible for a specific part of the process:- Recursive Tree Engine: Breaks down problems into subproblems
- Verification Engine: Validates output correctness
- Difficulty Engine: Modulates complexity
- LLM Engine: Interfaces with both large and small models