In this tutorial we’ll show how to add vision capabilities to the on-chain LLM call (currently available with OpenAI’s gpt-4o and gpt-4-turbo). Simply put this functionality enables the LLM to recognize what is depicted on an image, for example by providing the model with an image and a question “What’s on here?” it is able to provide a descirption about the image.

Prerequisites

This tutorial is written as a continuation to Calling an LLM tutorial. We recommend going through this first as we’re simply changing a few functions here.

  • Make sure you’re compiler configuration has viaIR set to true. See our config example here.
  • A Galadriel devnet account. For more information on setting up a wallet, visit Setting Up A Wallet.
  • Some devnet tokens. Get your free devnet tokens from the Faucet.
  • Working code from Calling an LLM.

Steps to modify the ChatGPT contract

  1. Change contract object name to OpenAiChatGptVision.

  2. Remove the IOracle interface from code.

interface IOracle {
    function createLlmCall(
        uint promptId
    ) external returns (uint);
}
  1. Replace it with an import statement.
import "./interfaces/IOracle.sol";
  1. If this produces an error, it’s because the compiler cannot find IOracle.sol that’s getting imported.

    Next to your main contract file create a folder /interfaces and into this add an IOracle.sol file.

    Into the newly created IOracle.sol file copy & paste the code from here.

    Depending on where you work the filetree should look something like this:

    /contracts
        /interfaces
            IOracle.sol
        OpenAiChatGptVision.sol
    
  2. Delete Message struct from the code.

struct Message {
    string role;
    string content;
}
  1. Edit Message parameter in the ChatRun struct.
struct ChatRun {
    address owner;
    IOracle.Message[] messages;
    uint messagesCount;
}
  1. Edit the constructor. Assign the OpenAiRequest configuration in code above constructor method to config parameter.
IOracle.OpenAiRequest private config;

constructor(address initialOracleAddress) {
    owner = msg.sender;
    oracleAddress = initialOracleAddress;
    chatRunsCount = 0;

    config = IOracle.OpenAiRequest({
        model : "gpt-4-turbo",
        frequencyPenalty : 21, // > 20 for null
        logitBias : "", // empty str for null
        maxTokens : 1000, // 0 for null
        presencePenalty : 21, // > 20 for null
        responseFormat : "{\"type\":\"text\"}",
        seed : 0, // null
        stop : "", // null
        temperature : 10, // Example temperature (scaled up, 10 means 1.0), > 20 means null
        topP : 101, // Percentage 0-100, > 100 means null
        tools : "",
        toolChoice : "", // "none" or "auto"
        user : "" // null
    });
}
  1. Swap out the whole startChat function to the following. This starts the LLM call with a list of image URLs and text input. In addition to https and base64-encoded image data URLs, the Oracle supports ipfs:// URLs too.
function startChat(string memory message, string[] memory imageUrls) public returns (uint i) {
    ChatRun storage run = chatRuns[chatRunsCount];
    run.owner = msg.sender;
    IOracle.Message memory newMessage = IOracle.Message({
        role: "user",
        content: new IOracle.Content[](imageUrls.length + 1)
    });
    newMessage.content[0] = IOracle.Content({
        contentType: "text",
        value: message
    });
    for (uint u = 0; u < imageUrls.length; u++) {
        newMessage.content[u + 1] = IOracle.Content({
            contentType: "image_url",
            value: imageUrls[u]
        });
    }
    run.messages.push(newMessage);
    run.messagesCount = 1;
    uint currentId = chatRunsCount;
    chatRunsCount = chatRunsCount + 1;
    IOracle(oracleAddress).createOpenAiLlmCall(currentId, config);
    emit ChatCreated(msg.sender, currentId);
    return currentId;
}
  1. Swap out the whole onOracleLlmResponse to the onOracleOpenAiLlmResponse function below.
function onOracleOpenAiLlmResponse(
    uint runId,
    IOracle.OpenAiResponse memory response,
    string memory errorMessage
) public onlyOracle {
    ChatRun storage run = chatRuns[runId];
    require(
        keccak256(abi.encodePacked(run.messages[run.messagesCount - 1].role)) == keccak256(abi.encodePacked("user")),
        "No message to respond to"
    );

    if (!compareStrings(errorMessage, "")) {
        IOracle.Message memory newMessage = IOracle.Message({
            role: "assistant",
            content: new IOracle.Content[](1)
        });
        newMessage.content[0].contentType = "text";
        newMessage.content[0].value = errorMessage;
        run.messages.push(newMessage);
        run.messagesCount++;
    } else {
        IOracle.Message memory newMessage = IOracle.Message({
            role: "assistant",
            content: new IOracle.Content[](1)
        });
        newMessage.content[0].contentType = "text";
        newMessage.content[0].value = response.content;
        run.messages.push(newMessage);
        run.messagesCount++;
    }
}
  1. Swap out the addMessage function with the code below.
function addMessage(string memory message, uint runId) public {
    ChatRun storage run = chatRuns[runId];
    require(
        keccak256(abi.encodePacked(run.messages[run.messagesCount - 1].role)) == keccak256(abi.encodePacked("assistant")),
        "No response to previous message"
    );
    require(
        run.owner == msg.sender, "Only chat owner can add messages"
    );

        IOracle.Message memory newMessage = IOracle.Message({
        role: "user",
        content: new IOracle.Content[](1)
    });
    newMessage.content[0].contentType = "text";
    newMessage.content[0].value = message;
    run.messages.push(newMessage);
    run.messagesCount++;

    IOracle(oracleAddress).createOpenAiLlmCall(runId, config);
}
  1. Let’s also refactor the getMessageHistory function to the following.
function getMessageHistory(uint chatId) public view returns (IOracle.Message[] memory) {
    return chatRuns[chatId].messages;
}
  1. Finally add one helper function to the end.
function compareStrings(string memory a, string memory b) private pure returns (bool) {
    return (keccak256(abi.encodePacked((a))) == keccak256(abi.encodePacked((b))));
}

Putting it all together

That’s it — if you now deploy the contract to the Galadriel Devnet, you can start sending image url’s with text as input to the on-chain LLM.

You can find the full OpenAiChatGptVision contract file here. The code in that contract is ordered slightly differently.

What’s Next?

Congratulations on deploying your on-chain ChatGPT! Explore further:

  • Implement a more advanced chatbot by adding retrieval-augmented generation to your contract.
  • Dive deeper into the Galadriel documentation, particularly the How It Works section, to understand the underlying technology.
  • Experiment with different LLMs, e.g. Groq-hosted open-source LLMS or take control over the nuances of text generation: see Solidity reference and the example contract.
  • Explore other Use Cases to get inspired for your next project.

Happy building!