Blog Post | DG Dev

Building with Vercel's AI SDK - Part 2

This is a part of a series and the previous post can be found here where we start to explore text generation and streaming outputs as a basic and gentle introduction to using AI SDK to interact with LLMs.

So far, we have seen how to generate text based output from LLMs. That is probably the most widely used usecase for LLMs and is prevalent on platforms offering solutions like ChatGPT, Gemini, etc. However, we typically have no control over the structure of the data we get back. Even if we tell it, for example, that we want the steps of a recipe in a list form, it isn't really guaranteed although it will likely do a good job at it. We can use somewhat of a different approach to get outputs from the LLM that can have user defined structure.

Why use structured outputs?

When building an agent for tasks like mathematical analysis or report generation, it's often useful to have the agent's final output structured in a consistent format that your application can process. We can even force the LLM with a tool call that will always return a structured output, therefore guaranteeing the shape of the data we will be receiving on our side for further processing.

Before we start using structured outputs

Make sure that you use a model that supports structured outputs. Not all models have the same capabilites. See if your model supports structured outputs or the same under another name of object generation.

I will be using Gemini's API with gemini-1.5-flash model. You can view their page on AI Studio how to get hands on an API key ( yes a free tier is available, and the model mentioned is free to use ).

Benefit of using AI SDK is that we can swap out LLM providers at any point of time without modifying much of the other code we have already written.

1import { config } from "dotenv";
2
3config();
4
5if (
6  process.env.GOOGLE_GENERATIVE_AI_API_KEY == undefined ||
7  (process.env.GOOGLE_GENERATIVE_AI_API_KEY as string).trim().length === 0
8) {
9  console.error("GOOGLE_GENERATIVE_AI_API_KEY KEY Not found");
10  process.exit(1);
11}
12
13import { google } from "@ai-sdk/google";
14
15const googleModel = google("gemini-1.5-flash");

1import { config } from "dotenv";
2
3config();
4
5if (
6  process.env.GOOGLE_GENERATIVE_AI_API_KEY == undefined ||
7  (process.env.GOOGLE_GENERATIVE_AI_API_KEY as string).trim().length === 0
8) {
9  console.error("GOOGLE_GENERATIVE_AI_API_KEY KEY Not found");
10  process.exit(1);
11}
12
13import { google } from "@ai-sdk/google";
14
15const googleModel = google("gemini-1.5-flash");

Defining the structure of our outputs

Let's prepare a prompt that will give us the steps for a recipe. Its a simple one,

1const prompt = "Write a recipe for cheese sandwich.";

1const prompt = "Write a recipe for cheese sandwich.";

We can now use generateObject or streamObject methods from the SDK to get the structured output in form of an object. We can define which model we will be using, the prompt, and most importantly the schema of the response object we want the model to adhere to.

1import { generateObject } from "ai";
2import { z } from "zod";
3
4const generateRecipe = async () => {
5  const { object: recipeData } = await generateObject({
6    model: googleModel,
7    schema: z.object({
8      recipe: z.object({
9        name: z.string(),
10        ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
11        steps: z.array(z.string()),
12      }),
13    }),
14    prompt,
15  });
16};

1import { generateObject } from "ai";
2import { z } from "zod";
3
4const generateRecipe = async () => {
5  const { object: recipeData } = await generateObject({
6    model: googleModel,
7    schema: z.object({
8      recipe: z.object({
9        name: z.string(),
10        ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
11        steps: z.array(z.string()),
12      }),
13    }),
14    prompt,
15  });
16};

Here we are using zod, a Typescript validation library but for defining the schema that our model should adhere to. Here we say that the top level response should be an object with a recipe property, which should include a name, an ingredients array containing the name and amount of ingredient, and the steps to make the dish should be an array of strings explaining the steps.

This makes it predictable and deterministic that the output from the LLM will adhere to this schema and that enables us to access the data returned with confidence.

Accessing the returned data

This is now pretty straightforward, just like accessing any other object data. We can pick and modify the data as we like as it is now an object that adheres to the format. One modification I am making here, is to join the recipe steps from the array into a giant string with newlines.

1const recipe = await generateRecipe();
2
3console.log(recipeData.recipe.name);
4
5console.log(recipeData.recipe.ingredients.map((ingredient) => `${ingredient.amount} ${ingredient.name}`).join("\n"));
6
7console.log(recipeData.recipe.steps.join("\n"));

1const recipe = await generateRecipe();
2
3console.log(recipeData.recipe.name);
4
5console.log(recipeData.recipe.ingredients.map((ingredient) => `${ingredient.amount} ${ingredient.name}`).join("\n"));
6
7console.log(recipeData.recipe.steps.join("\n"));

That yields us with the output:

Output image

Questions generation - an actual use case example

Let us say that we want to ask the LLM to generate some questions and their multiple choice answers that we can then access in our frontend to make some sort of revision app or flashcards app. Sounds like a nice small usecase right?

In the following example I go over a bit more with the code. There is a system prompt along with the actual prompt which will be preceded and followed as instructions by the LLM. These can be helpful when you want the LLM to behave in a certain way no matter what the user asks or no matter what the query is from the other side. Here I set the system prompt to ensure the following:

To always generate 4 options as answer choices to the questions.
To make all the answer choices of relatively the same length and tone.
To not provide much more information in the correct option and less information for incorrect options.
To make all options sound and seem equally probable with sufficient text in each.

Based on my testing for a few initial runs, I found that the LLM was providing a somewhat detailed answer to the correct options and leaving the other options relatively short making it easy to guess the answer. Also by default it was generating 3 answer choices so I forced it to always generate 4.

I guess prompt engineering is definitely a thing now!

Here is the full code describing the output format of having a questions array with the answer content and a boolean flag for marking the answer as correct or wrong, along with the system prompt that follows it.

1const generateQuestions = async () => {
2  const questionPrompt = "Generate 1 question for Artificial Intelligence exam";
3  const { object: questionData } = await generateObject({
4    model: googleModel,
5    schemaName: "Questions",
6    schemaDescription: "Practice Questions for subject provided",
7    schema: z.object({
8      questions: z.array(
9        z.object({
10          question: z.string(),
11          answers: z.array(z.object({ answer: z.string(), correct: z.boolean() })),
12        })
13      ),
14    }),
15    system:
16      "You are a excellent question setter. Generate MCQ questions for the subject you will be provided with. Make sure to always generate 4 options as answer choices to the questions. Make all choices of relatively the same tone and length. Do not provide much more information in the correct option and less information for incorrect options, make all options sound and seem equally probable with sufficient text in each. ",
17    prompt: questionPrompt,
18  });
19
20  console.log(JSON.stringify(questionData, null, 4));
21};

1const generateQuestions = async () => {
2  const questionPrompt = "Generate 1 question for Artificial Intelligence exam";
3  const { object: questionData } = await generateObject({
4    model: googleModel,
5    schemaName: "Questions",
6    schemaDescription: "Practice Questions for subject provided",
7    schema: z.object({
8      questions: z.array(
9        z.object({
10          question: z.string(),
11          answers: z.array(z.object({ answer: z.string(), correct: z.boolean() })),
12        })
13      ),
14    }),
15    system:
16      "You are a excellent question setter. Generate MCQ questions for the subject you will be provided with. Make sure to always generate 4 options as answer choices to the questions. Make all choices of relatively the same tone and length. Do not provide much more information in the correct option and less information for incorrect options, make all options sound and seem equally probable with sufficient text in each. ",
17    prompt: questionPrompt,
18  });
19
20  console.log(JSON.stringify(questionData, null, 4));
21};

F inally we get some answer that strictly adheres to the structure and can be reliably passed on to the front end to be shown in a UI and the user can then answer them and gain feedback as correct or wrong. This can mark the beginning of a helpful study assistant or a flashcards app sort of thingy!

1{
2    "questions": [
3        {
4            "question": "What is Artificial Intelligence (AI)?",
5            "answers": [
6                {
7                    "answer": "A process that allows machines to mimic human intelligence by learning from data, recognizing patterns, and making decisions.",
8                    "correct": true
9                },
10                {
11                    "answer": "A branch of computer science that deals with the theory and development of computer systems.",
12                    "correct": false
13                },
14                {
15                    "answer": "A field of study that focuses on the design and development of algorithms.",
16                    "correct": false
17                },
18                {
19                    "answer": "A type of software that is used to automate tasks.",
20                    "correct": false
21                }
22            ]
23        },
24    ]
25}

1{
2    "questions": [
3        {
4            "question": "What is Artificial Intelligence (AI)?",
5            "answers": [
6                {
7                    "answer": "A process that allows machines to mimic human intelligence by learning from data, recognizing patterns, and making decisions.",
8                    "correct": true
9                },
10                {
11                    "answer": "A branch of computer science that deals with the theory and development of computer systems.",
12                    "correct": false
13                },
14                {
15                    "answer": "A field of study that focuses on the design and development of algorithms.",
16                    "correct": false
17                },
18                {
19                    "answer": "A type of software that is used to automate tasks.",
20                    "correct": false
21                }
22            ]
23        },
24    ]
25}

References