Skip to content
Home » News » How to use RLHF to train a model to generate code that compiles (Tutorial)

How to use RLHF to train a model to generate code that compiles (Tutorial)

Step 1: The Interpreter

Find or write an interpreter for the code that you want your model to generate. This is not just limited to code. It can be any kind of an interpreter.

There are many different kinds of interpreters, but some common examples include:

  1. Programming language interpreters: These interpreters execute instructions written in a programming language, such as Python or C++.
  2. Command line interpreters: Also known as shell interpreters, these programs allow users to enter commands, execute programs, and manage their computer’s operating system from a command line interface (CLI).
  3. Database interpreters: These interpreters process and execute instructions written in a database query language, such as SQL.
  4. Markup language interpreters: These interpreters process and display instructions written in a markup language, such as HTML or XML.
  5. Regular expression interpreters: These interpreters process and evaluate strings of text according to a set of rules defined using regular expressions.
  6. Virtual machine interpreters: These interpreters execute instructions written in a virtual machine language, such as the Java Virtual Machine (JVM) language.
  7. Brainfuck interpreters: These interpreters execute instructions written in the esoteric programming language Brainfuck.

Step 2: Reward Function

Here is an example :

def reward_fn(samples):
    reward_list = []
    for sample in samples:
        code = sample.split("Function:")[1].strip()
        output = eval(sample.split("Output:")[1].strip().split("Function:")[0].strip())
        interpreted_output = interpreter(code)
        if interpreted_output == "ERROR":
            # If the code is unparsable, we give it a negative reward.
            reward_list.append(-1)
        else:
            # if the code is parseable
            if output == interpreted_output:
                # if the output is correct, we give it a positive reward.
                reward_list.append(1)
            else:
                # if the output is incorrect, we give it a negative reward.
                reward_list.append(-0.5)

    return reward_list

As you can see an interpreter is called, the output from the model i.e. the code generated is fed into the interpreter and the interpreted output is checked for errors. If no errors then the model gets a reward, otherwise a negative one.

Refer to my previous post to learn the basics RLHF training.

Leave a Reply

Your email address will not be published. Required fields are marked *