Multimodal: vision
In this tutorial we’ll show how to add vision capabilities to the on-chain LLM call (currently available with OpenAI’s gpt-4o
and gpt-4-turbo
). Simply put this functionality enables the LLM to recognize what is depicted on an image, for example by providing the model with an image and a question “What’s on here?” it is able to provide a descirption about the image.
Prerequisites
This tutorial is written as a continuation to Calling an LLM tutorial. We recommend going through this first as we’re simply changing a few functions here.
- Make sure you’re compiler configuration has
viaIR
set totrue
. See our config example here. - A Galadriel devnet account. For more information on setting up a wallet, visit Setting Up A Wallet.
- Some devnet tokens. Get your free devnet tokens from the Faucet.
- Working code from Calling an LLM.
Steps to modify the ChatGPT contract
-
Change contract object name to
OpenAiChatGptVision
. -
Remove the
IOracle
interface from code.
interface IOracle {
function createLlmCall(
uint promptId
) external returns (uint);
}
- Replace it with an import statement.
import "./interfaces/IOracle.sol";
-
If this produces an error, it’s because the compiler cannot find
IOracle.sol
that’s getting imported.Next to your main contract file create a folder
/interfaces
and into this add anIOracle.sol
file.Into the newly created
IOracle.sol
file copy & paste the code from here.Depending on where you work the filetree should look something like this:
/contracts /interfaces IOracle.sol OpenAiChatGptVision.sol
-
Delete
Message
struct from the code.
struct Message {
string role;
string content;
}
- Edit
Message
parameter in theChatRun
struct.
struct ChatRun {
address owner;
IOracle.Message[] messages;
uint messagesCount;
}
- Edit the
constructor
. Assign the OpenAiRequest configuration in code above constructor method toconfig
parameter.
IOracle.OpenAiRequest private config;
constructor(address initialOracleAddress) {
owner = msg.sender;
oracleAddress = initialOracleAddress;
chatRunsCount = 0;
config = IOracle.OpenAiRequest({
model : "gpt-4-turbo",
frequencyPenalty : 21, // > 20 for null
logitBias : "", // empty str for null
maxTokens : 1000, // 0 for null
presencePenalty : 21, // > 20 for null
responseFormat : "{\"type\":\"text\"}",
seed : 0, // null
stop : "", // null
temperature : 10, // Example temperature (scaled up, 10 means 1.0), > 20 means null
topP : 101, // Percentage 0-100, > 100 means null
tools : "",
toolChoice : "", // "none" or "auto"
user : "" // null
});
}
- Swap out the whole
startChat
function to the following. This starts the LLM call with a list of image URLs and text input. In addition tohttps
and base64-encoded imagedata
URLs, the Oracle supportsipfs://
URLs too.
function startChat(string memory message, string[] memory imageUrls) public returns (uint i) {
ChatRun storage run = chatRuns[chatRunsCount];
run.owner = msg.sender;
IOracle.Message memory newMessage = IOracle.Message({
role: "user",
content: new IOracle.Content[](imageUrls.length + 1)
});
newMessage.content[0] = IOracle.Content({
contentType: "text",
value: message
});
for (uint u = 0; u < imageUrls.length; u++) {
newMessage.content[u + 1] = IOracle.Content({
contentType: "image_url",
value: imageUrls[u]
});
}
run.messages.push(newMessage);
run.messagesCount = 1;
uint currentId = chatRunsCount;
chatRunsCount = chatRunsCount + 1;
IOracle(oracleAddress).createOpenAiLlmCall(currentId, config);
emit ChatCreated(msg.sender, currentId);
return currentId;
}
- Swap out the whole
onOracleLlmResponse
to theonOracleOpenAiLlmResponse
function below.
function onOracleOpenAiLlmResponse(
uint runId,
IOracle.OpenAiResponse memory response,
string memory errorMessage
) public onlyOracle {
ChatRun storage run = chatRuns[runId];
require(
keccak256(abi.encodePacked(run.messages[run.messagesCount - 1].role)) == keccak256(abi.encodePacked("user")),
"No message to respond to"
);
if (!compareStrings(errorMessage, "")) {
IOracle.Message memory newMessage = IOracle.Message({
role: "assistant",
content: new IOracle.Content[](1)
});
newMessage.content[0].contentType = "text";
newMessage.content[0].value = errorMessage;
run.messages.push(newMessage);
run.messagesCount++;
} else {
IOracle.Message memory newMessage = IOracle.Message({
role: "assistant",
content: new IOracle.Content[](1)
});
newMessage.content[0].contentType = "text";
newMessage.content[0].value = response.content;
run.messages.push(newMessage);
run.messagesCount++;
}
}
- Swap out the
addMessage
function with the code below.
function addMessage(string memory message, uint runId) public {
ChatRun storage run = chatRuns[runId];
require(
keccak256(abi.encodePacked(run.messages[run.messagesCount - 1].role)) == keccak256(abi.encodePacked("assistant")),
"No response to previous message"
);
require(
run.owner == msg.sender, "Only chat owner can add messages"
);
IOracle.Message memory newMessage = IOracle.Message({
role: "user",
content: new IOracle.Content[](1)
});
newMessage.content[0].contentType = "text";
newMessage.content[0].value = message;
run.messages.push(newMessage);
run.messagesCount++;
IOracle(oracleAddress).createOpenAiLlmCall(runId, config);
}
- Let’s also refactor the
getMessageHistory
function to the following.
function getMessageHistory(uint chatId) public view returns (IOracle.Message[] memory) {
return chatRuns[chatId].messages;
}
- Finally add one helper function to the end.
function compareStrings(string memory a, string memory b) private pure returns (bool) {
return (keccak256(abi.encodePacked((a))) == keccak256(abi.encodePacked((b))));
}
Putting it all together
That’s it — if you now deploy the contract to the Galadriel Devnet, you can start sending image url’s with text as input to the on-chain LLM.
You can find the full OpenAiChatGptVision contract file here. The code in that contract is ordered slightly differently.
What’s Next?
Congratulations on deploying your on-chain ChatGPT! Explore further:
- Implement a more advanced chatbot by adding retrieval-augmented generation to your contract.
- Dive deeper into the Galadriel documentation, particularly the How It Works section, to understand the underlying technology.
- Experiment with different LLMs, e.g. Groq-hosted open-source LLMS or take control over the nuances of text generation: see Solidity reference and the example contract.
- Explore other Use Cases to get inspired for your next project.
Happy building!