The main problem selected to apply the ladder algorithm on is to select the nearest point on a curve to a specific point I tested this problem over many intelligent models and they even failed to sovle it even for linear equation until u force the model to do reasoning or multisteps and u gonna find also some hallucinations in the output.

1- Dataset Generation

First I used gpt-3 turbo Model for the dataset generation process using the following problem description and example

from ladder.schema import Example

problem_description = """
                    Given a curve / line equation and a specific point, find the nearest point on the curve
                    to the given point. Provide your final answer directly as a tuple (x, y) without
                    any additional explanation or commentary.
                    NOTES:
                        1. always mention the point as (x, y) in your question
                        2. dont mention word points like origin, point on x axis always specifie x and y of the point
                        3. curve could be any degree 1,2,3... huge diverse output
                        4. Dont mention any other variables in the given equation like a,t , ... only y in terms of x like y = x**2 + 2
                        5. Make sure maxium equation power u generate is 5
                    IMPORTANT: Make sure the represented curve or line is just one equation dont mention multiple equations
                    """
examples = [
   Example(
    question="Find the point on the curve defined by the equation y = 2x**2 + 5*x - 3 that is closest to the point (6, -1). Provide the coordinates of this nearest point as a tuple (x, y) without any additional explanation or commentary.",
    answer="(2,1)"
)
]

The First Model selected for the Ladder Finetuning Test was metallama 3.1 8b

config = LadderConfig(
    # 1- Main LLM for dataset generation, ladder implementation
    instructor_llm = OpenAIModel(model="gpt-3.5-turbo", api_key=os.environ.get("OPENAI_API_KEY") ),
    # instructor_llm=TogetherAIModel(model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",api_key="79da0f0c5a2c5dca6c1d6416bec24640b710d86dd4e615aff8e2759af4a10239"),
    # 2- To the run the same LLM for finetuning
    finetune_llm_runner = GroqModel(model="llama-3.1-8b-instant", api_key="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"),

    # 3- Target LLM to finetune (hugginfacehub compatible)
    target_finetune_llm_id="meta-llama/Llama-3.1-8B-Instruct",

    # 4- Number of steps to finetune
    num_train_epochs=1,

)

Sample Generated Dataset

{
  "question": "Find the point on the curve defined by the equation y = 3x**3 - 2x**2 + 4*x - 1 that is closest to the point (2, -3). Provide the coordinates of this nearest point as a tuple (x, y) without any additional explanation or commentary.",
  "correct_answer": [1.65, 7.277],
  "small_llm_answer": "(-2/3, 17/3)",
  "sub_problems": [
    {
      "question": "Find the point on the curve defined by the equation y = x**2 - 4x + 3 that is closest to the point (1, 2). Provide the coordinates of this nearest point as a tuple (x, y) without any additional explanation or commentary.",
      "correct_answer": [2, 1],
      "small_llm_answer": [-2.0, 3.0],
      "transformations_applid": [],
      "difficulty_level": null,
      "is_solvable": false
    },
    {
      "question": "Find the point on the curve defined by the equation y = x**2 - 4x + 3 that is closest to the point (3, 1). Provide the coordinates of this nearest point as a tuple (x, y) without any additional explanation or commentary.",
      "correct_answer": [2, 1],
      "small_llm_answer": [-2.0, 3.0],
      "transformations_applid": [],
      "difficulty_level": null,
      "is_solvable": false
    },
    {
      "question": "Find the point on the curve defined by the equation y = 2x - 1 that is closest to the point (3, 1). Provide the coordinates of this nearest point as a tuple (x, y) without any additional explanation or commentary.",
      "correct_answer": [2, 3],
      "small_llm_answer": [1.0, 1.0],
      "transformations_applid": [],
      "difficulty_level": null,
      "is_solvable": false
    }
  ],
  "difficulty_level": 0.8,
  "is_solvable": false
}

2- Custom Verification Engine

The Created Verification Engine sometimes failed to extract the equation or to evaluate it (so it seems I might need more advanced method)

to verify the solution I used sympy and regex to extract the equation and evaluate it

from ladder.engines import VerificationEngine, LLMEngine
from typing_extensions import Union
from loguru import logger
try:
    from scipy.optimize import minimize_scalar
    import numpy as np
    import sympy as sp
except ImportError:
    raise ImportError("NearestPointVerificationEngine requires scipy, numpy, and sympy to be installed. run `pip install scipy numpy sympy`")
import re

class NearestPointVerificationEngine(VerificationEngine):
    """ Custom Verification engine to select the nearest point on a curve / line to a reference point """

    def verify(self, problem_question: str, given_answer: str | tuple):
        # get correct answer
        closest_point = self.get_correct_answer(problem_question)
        if not closest_point:
            return 0
        # Calculate difference between answer and closest point
        answer_tuple = self._ensure_tuple(given_answer)
        if not answer_tuple:
            return 0

        difference = self._calculate_difference(answer_tuple, closest_point)
        logger.success(f"difference: {difference}")

        return max(0, 1 - difference)

    def get_correct_answer(self, problem_question: str ):
        """
        Get the correct answer for the problem question
        """
        equation_str, reference_point = self._parse_problem_question(problem_question)
        if not equation_str or not reference_point:
            return None

        return self._closest_point_on_curve_tool(equation_str, reference_point)


    def _parse_problem_question(self, question_text: str) -> tuple[str, tuple]:
        """
        Extract equation and reference point from problem question text.
        """
        # More robust equation pattern that handles complete mathematical expressions
        # Look for y = followed by mathematical expression until I hit specific stop words
        # I tried here all possible stop words generally found in the generation prompt
        equation_pattern = r'y\s*=\s*([x\d\s\*\+\-\/\.\^()]+(?:\s*[\+\-]\s*[x\d\s\*\+\-\/\.\^()]+)*)\s*(?=\s+(?:that|which|where|find|give|at|near|closest|from)\b|$)'
        equation_match = re.search(equation_pattern, question_text, re.IGNORECASE)

        if not equation_match:
            fallback_pattern = r'y\s*=\s*([^()]*?)(?=\s*\([^)]*\d[^)]*\)|\s+that\b|\s+which\b|\s+where\b)'
            equation_match = re.search(fallback_pattern, question_text, re.IGNORECASE)

            if not equation_match:
                return None, None

        equation_str = equation_match.group(1).strip()

        equation_str = self._normalize_equation(equation_str)

        if equation_str.endswith(('+', '-', '*', '/', '^')):
            extended_pattern = r'y\s*=\s*([x\d\s\*\+\-\/\.\^()]+(?:\s*[\+\-]\s*[x\d\s\*\+\-\/\.\^()]+)*(?:\s*[\+\-]\s*\d+)?)'
            extended_match = re.search(extended_pattern, question_text, re.IGNORECASE)
            if extended_match:
                equation_str = self._normalize_equation(extended_match.group(1).strip())

        point_pattern = r'\(\s*([+-]?\d*\.?\d+)\s*,\s*([+-]?\d*\.?\d+)\s*\)'
        point_match = re.search(point_pattern, question_text)

        if not point_match:
            if "origin" in question_text.lower():
                return equation_str, (0, 0)
            return equation_str, None

        x_coord = float(point_match.group(1))
        y_coord = float(point_match.group(2))
        reference_point = (x_coord, y_coord)

        return equation_str, reference_point


    def _normalize_equation(self, equation_str: str) -> str:
        """
        Normalize the equation string for SymPy parsing.
        """
        equation_str = re.sub(r'\s+', '', equation_str)
        equation_str = equation_str.replace('^', '**')

        equation_str = re.sub(r'(\d)([x])', r'\1*\2', equation_str)
        equation_str = re.sub(r'([x])(\d)', r'\1*\2', equation_str)

        equation_str = re.sub(r'(\d)([x])', r'\1*\2', equation_str)

        equation_str = re.sub(r'(\d)\(', r'\1*(', equation_str)
        equation_str = re.sub(r'\)(\d)', r')*\1', equation_str)
        equation_str = re.sub(r'\)([x])', r')*\1', equation_str)
        equation_str = re.sub(r'([x])\(', r'\1*(', equation_str)

        return equation_str

    def _ensure_tuple(self, answer: str | list| tuple) -> tuple:
        """
        Ensure the answer is a tuple format.
        Handle various input formats: tuple, list, string representation, etc.
        """

        if not answer:
            return None
        try:
            if isinstance(answer, tuple):
                return answer
            elif isinstance(answer, list) and len(answer) == 2:
                return tuple(answer)
            elif isinstance(answer, str):
                # Try to parse string representation of tuple/coordinates
                # Handle formats like "(1.5, 2.3)", "1.5, 2.3", "[1.5, 2.3]"
                clean_str = answer.strip().strip('()[]')
                coords = [float(x.strip()) for x in clean_str.split(',')]
                if len(coords) == 2:
                    return tuple(coords)
                else:
                    logger.warning(f"Answer string does not contain exactly 2 coordinates: {answer}")
                    return None
            else:
                logger.warning(f"Answer format not supported: {type(answer)} - {answer}")
                return None
        except Exception as e:
            return None

    def _calculate_difference(self, answer_tuple: tuple, closest_point: tuple):
        """
        Calculate the Euclidean distance between answer tuple and closest point tuple.
        """
        x1, y1 = answer_tuple
        x2, y2 = closest_point

        # Euclidean distance
        difference = np.sqrt((x2 - x1)**2 + (y2 - y1)**2)

        return difference

    def _closest_point_on_curve_tool(self, equation_str: str, point: tuple) -> tuple:
        """
        Given a curve equation y = f(x) as string and a point (x0, y0),
        returns the closest point (x, y) on the curve to the point.

        Args:
            equation_str (str): Curve equation in terms of x, e.g. "-0.5*x**3 + 4*x**2 + 2*x + 5"
            point (tuple): Coordinates (x0, y0) of the reference point

        Note: this could be considered as a reverse process to what I saw in ladder paper(they do integration and I are assumed to do here differentiation)
        Returns:
            tuple: Closest point (x, y) on the curve
        """
        try:
            x = sp.symbols('x')
            expr_y = sp.sympify(equation_str)
            x0, y0 = point

            f_y = sp.lambdify(x, expr_y, 'numpy')

            def dist_sq(x_val):
                # Define distance squared function
                y_val = f_y(x_val)
                return (x_val - x0)**2 + (y_val - y0)**2

            result = minimize_scalar(dist_sq)

            x_closest = result.x
            y_closest = f_y(x_closest)

            return (float(x_closest), float(y_closest))

        except Exception as e:
            logger.warning(f"Error in closest_point_on_curve_tool: {e}")
            return None

3- Reward Functions

import re

def composite_reward_func(prompts, completions, answer, **kwargs) -> list[float]:
    """
    Returns a reward for each completion:
    - 0-0.9 if the answer is numerically correct (within threshold)
    - 0.1 if the answer is wrapped in <answer>...</answer>
    - 1.0 if both correct
    """
    def extract_tuple_from_answer(text: str):
      if not isinstance(text, str):
          return None
      match = re.search(r'\(\s*(-?\d+\.?\d*)\s*,\s*(-?\d+\.?\d*)\s*\)', text)
      if match:
          return float(match.group(1)), float(match.group(2))
      return None

    threshold= 0.3
    rewards = []
    answer_pattern = r"<answer>.*?</answer>"
    for completion, correct in zip(completions, answer):
      if isinstance(completion, list) and isinstance(completion[0], dict) and 'content' in completion[0]:
          response = completion[0]['content']
      else:
          response = completion

      # Format check (10%)
      has_answer_tag = bool(re.search(answer_pattern, response, re.DOTALL))
      format_reward = 0.1 if has_answer_tag else 0.0

      # Accuracy check (90%) now scaled by diff
      pred_tuple = extract_tuple_from_answer(response)
      gold_tuple = extract_tuple_from_answer(correct)
      print(pred_tuple, gold_tuple)

      if pred_tuple is not None and gold_tuple is not None:
          diff_x = abs(pred_tuple[0] - gold_tuple[0])
          diff_y = abs(pred_tuple[1] - gold_tuple[1])
          print(f"diff_x: {diff_x}")
          print(f"diff_y: {diff_y}")

          # Scale reward: if diff is 0 -> max reward, if diff >= threshold -> min reward
          def scaled_component(diff, threshold):
              return max(0.0, 1.0 - (diff / threshold))  # linearly drop off

          x_reward = scaled_component(diff_x, threshold)
          y_reward = scaled_component(diff_y, threshold)
          average_accuracy = (x_reward + y_reward) / 2

          accuracy_reward = 0.9 * average_accuracy
      else:
          accuracy_reward = 0.0

      total_reward = format_reward + accuracy_reward
      rewards.append(total_reward)
    return rewards

4- Training

I used LORA Adapter to optimize the training process before training we customized Ladder dataset to this reasoning format . First we used subproblems as internal steps and make them as reasoning and feed them to the model on a chat format. Like this

This is the final Demo prompt using during finetuning
{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful math research assistant that solves complex problems by breaking them down into easier sub-problems and reasoning step by step. When giving your final answer, always enclose it in <answer>...</answer> tags."
    },
    {
      "role": "user",
      "content": "Find the point on the curve defined by the equation y = 3x**3 - 2x**2 + 4*x - 1 that is closest to the point (2, -3). Provide the coordinates of this nearest point as a tuple (x, y) without any additional explanation or commentary.\n\nPlease give your answer in the form: <answer>(x, y)</answer>"
    }
  ],
  "correct_answer": [-0.32736493947155765, -2.6290443096884073],
  "reasoning": " Please follow through each step carefully.\n\nLet's begin:\n\nWe are given:\n- A **curve**: $ y(x) = 3x^3 - 2x^2 + 4x - 1 $\n- An **external point**: $ P_0 = (2, -3) $\n\n### Goal\nFind the point(s) $(x,y)$ lying *on* the curve such that its Euclidean distance from $P_0$ is minimized.\n\n---\n\n## Step 1: Distance Function Between Point on Curve & External Point\n\nThe general form of a point on the curve is:\n$$\n(x,\\; f(x)) \\quad\\text{where } f(x)=3x^3 - 2x^2 + 4x - 1\n$$\n\nSo we define the square of the Euclidean distance between $(x,f(x))$ and $P_0=(2,-3)$ as follows:\n$$\nD(x)^2 = (x - 2)^2 + [f(x)+3]^2 \\\\\n= (x - 2)^2 + [(3x^3 - 2x^2 + 4x - 1) + 3]^2 \\\\\n= (x",
  "small_llm_answer": "(-2/3, 17/3)"
}

from trl import GRPOConfig

training_args = GRPOConfig(
    output_dir="Llama-3.1-8B-ladder",
    learning_rate=1e-5,
    remove_unused_columns=False,
    gradient_accumulation_steps=16,
    num_train_epochs=1,
    fp16=True,

    max_completion_length=64,
    num_generations=4,
    max_prompt_length=512,

    report_to=["tensorboard"],
    logging_steps=10,
    push_to_hub=True,
    save_strategy="steps",
    save_steps=10,
)

5- Evaluation

The Model achieved 33.85% accuracy over 130 problems and after finetuning :D

6- Notes

We need to generate much more datasets to give a final judge if Ladder really affective or not. during our experiment we used 700, we might need to use 5000 at lease similar to the paper
We used only llama 8b we might try others like Qwen, Deepseek and mistral
acc.to the paper. Selecting the hyperparameters and the reward functions affects alot on the final results and from the experiements I can confirm

Getting Started

Engines

Finetuning

Deployment

Evaluation & Metrics

Use Cases

Others

1. Nearest Point

1- Dataset Generation

Sample Generated Dataset

2- Custom Verification Engine

3- Reward Functions

4- Training

5- Evaluation

6- Notes

Getting Started

Engines

Finetuning

Deployment

Evaluation & Metrics

Use Cases

Others

​1- Dataset Generation

​Sample Generated Dataset

​2- Custom Verification Engine

​3- Reward Functions

​4- Training

​5- Evaluation

​6- Notes

1- Dataset Generation

Sample Generated Dataset

2- Custom Verification Engine

3- Reward Functions

4- Training

5- Evaluation

6- Notes