已把工具调用从 XML 改成 OpenAI 兼容 JSON,并统一解析/执行流程。改动概览如下:

新增 JSON tool_calls 解析/序列化并替换核心执行与提示词为 JSON-only:JsonToolCallParser.cs、AIIntelligenceCore.cs
工具基类移除 XML 解析,统一 JSON 参数读取与类型转换辅助:AITool.cs
工具实现统一 JSON args/UsageSchema(含重写/修复):Tool_ModifyGoodwill.cs、Tool_SendReinforcement.cs、Tool_GetMapPawns.cs、Tool_GetMapResources.cs、Tool_GetAvailablePrefabs.cs、Tool_CallPrefabAirdrop.cs、Tool_CallBombardment.cs、Tool_GetAvailableBombardments.cs、Tool_GetPawnStatus.cs、Tool_GetRecentNotifications.cs、Tool_SearchThingDef.cs、Tool_SearchPawnKind.cs、Tool_ChangeExpression.cs、Tool_SetOverwatchMode.cs、Tool_RememberFact.cs、Tool_RecallMemories.cs、Tool_SpawnResources.cs、Tool_AnalyzeScreen.cs
轰炸相关解析统一到 JSON 字典并增强数值解析:BombardmentUtility.cs
UI 对话展示改为剥离 JSON tool_calls:Overlay_WulaLink.cs、Dialog_AIConversation.cs
This commit is contained in:
2025-12-31 01:45:38 +08:00
parent 0cea79ddff
commit b906a468b6
32 changed files with 6396 additions and 542 deletions

View File

@@ -1,83 +0,0 @@
# Wula AI x Gemini Integration: Technical Handover Document
**Version**: 1.0
**Date**: 2025-12-28
**Author**: AntiGravity (Agent)
**Target Audience**: Codex / Future Maintainers
---
## 1. Overview
This document details the specific challenges, bugs, and architectural decisions made to stabilize the integration between **WulaFallenEmpire** (RimWorld Mod) and **Gemini 3 / OpenAI-Compatible Agents**. It specifically addresses "stubborn" issues related to API format compliance, JSON construction, and multimodal context persistence.
---
## 2. Critical Issues & Fixes
### 2.1 The "Streaming" Trap (SSE Handling)
**Symptoms**: AI responses were truncated (e.g., only "Comman" displayed instead of "Commander").
**Root Cause**: Even when `stream: false` is explicitly requested in the payload, some API providers (or reverse proxies wrapping Gemini) force a **Server-Sent Events (SSE)** response format (`data: {...}`). The original client only parsed the first line.
**Fix Implementation**:
- **File**: `SimpleAIClient.cs` -> `ExtractContent`
- **Logic**: Inspects response for `data:` prefix. If found, it iterates through **ALL** lines, strips `data:`, parses individual JSON chunks, and aggregates the `choices[0].delta.content` into a single string.
- **Defense**: This ensures compatibility with both standard JSON responses and forced Stream responses.
### 2.2 The "Trailing Comma" Crash (HTTP 400)
**Symptoms**: AI actions failed silently or returned `400 Bad Request`.
**Root Cause**: In `SimpleAIClient.cs`, the JSON payload construction loop had a logic flaw.
- When filtering out `toolcall` roles inside the loop, the index `i` check `(i < messages.Count - 1)` failed to account for skipped items, leaving a trailing comma after the last valid item: `[{"role":"user",...},]` -> **Invalid JSON**.
- Additionally, if the message list was empty (or all items filtered), the comma after the System Message remained: `[{"role":"system",...},]` -> **Invalid JSON**.
**Fix Implementation**:
- **Logic**:
1. Pre-filter `validMessages` into a separate list **before** JSON construction.
2. Only append the comma after the System Message `if (validMessages.Count > 0)`.
3. Iterate `validMessages` to guarantee correct comma placement between items.
### 2.3 Gemini 3's "JSON Obsession" & The Dual-Defense Strategy
**Symptoms**: Gemini 3 Flash Preview ignores System Prompts demanding XML (`<visual_click>`) and persistently outputs JSON (`[{"action":"click"...}]`).
**Root Cause**: RLHF tuning of newer models biases them heavily towards standard JSON tool-calling schemas, overriding prompt constraints.
**Strategy**: **"Principled Compromise"** (Double Defense).
1. **Layer 1 (Prompt)**: Explicitly list JSON and Markdown as `INVALID EXAMPLES` in `AIIntelligenceCore.cs`. This discourages compliance-oriented models from using them.
2. **Layer 2 (Code Fallback)**: If XML regex fails, the system attempts to parse **Markdown JSON Blocks** (` ```json ... ``` `).
- **File**: `AIIntelligenceCore.cs` -> `ExecuteXmlToolsForPhase`
- **Logic**: Extracts `point` arrays `[x, y]` and synthesizes a valid `<visual_click>` XML tag internally.
### 2.4 The Coordinate System Mess
**Symptoms**: Clicks occurred off-screen or at (0,0).
**Root Cause**:
- Gemini 3 often returns coordinates in a **0-1000** scale (e.g., `[115, 982]`).
- Previous logic used `Screen.width` normalization, which is **not thread-safe** and caused crashes or incorrect scaling if the assumption was pixel coordinates.
**Fix Implementation**:
- **Logic**: In the JSON Fallback parser, if `x > 1` or `y > 1`, divide by **1000.0f**. This standardizes coordinates to the mod's required 0-1 proportional format.
### 2.5 Visual Context Persistence (The "Blind Reply" Bug)
**Symptoms**: AI acted correctly (Phase 2) but "forgot" what it saw when replying to the user (Phase 3), or hallucinated headers.
**Root Cause**:
- Phase 3 (Reply) sends a message history ending with System Tool Results.
- `SimpleAIClient` only attached the image if the **very last message** was from `user`.
- Thus, in Phase 3, the image was dropped, rendering the AI blind.
**Fix Implementation**:
- **File**: `SimpleAIClient.cs`
- **Logic**: Instead of checking the last index, the code now searches **backwards** for the `lastUserIndex`. The image is attached to that specific user message, regardless of how many system messages follow it.
---
## 3. Future Maintenance Guide
### If Gemini 4 Breaks Format Again:
1. **Check `SimpleAIClient.cs`**: Ensure the JSON parser handles whatever new wrapper they add (e.g., nested `candidates`).
2. **Check `AIIntelligenceCore.cs`**: If it invents a new tool format (e.g., YAML), add a regex parser in `ExecuteXmlToolsForPhase` similar to the JSON Fallback. **Do not fight the model; adapt to it.**
### If API Errors Return:
1. Enable `DevMode` in RimWorld.
2. Check `Player.log` for `[WulaAI] Request Payload`.
3. Copy the payload to a JSON Validator. **Look for trailing commas.**
### Adding New Visual Tools:
1. Define tool in `Tools/`.
2. Update `GetToolSystemInstruction` whitelist.
3. **Crucially**: If the tool helps with **Action** (Silent), ensure `GetPhaseInstruction` enforces silence. If it helps with **Reply** (Descriptive), ensure it runs in Phase 3.
---
**End of Handover.**

View File

@@ -0,0 +1,58 @@
JSON Output
In many scenarios, users need the model to output in strict JSON format to achieve structured output, facilitating subsequent parsing.
DeepSeek provides JSON Output to ensure the model outputs valid JSON strings.
Notice
To enable JSON Output, users should:
Set the response_format parameter to {'type': 'json_object'}.
Include the word "json" in the system or user prompt, and provide an example of the desired JSON format to guide the model in outputting valid JSON.
Set the max_tokens parameter reasonably to prevent the JSON string from being truncated midway.
When using the JSON Output feature, the API may occasionally return empty content. We are actively working on optimizing this issue. You can try modifying the prompt to mitigate such problems.
Sample Code
Here is the complete Python code demonstrating the use of JSON Output:
import json
from openai import OpenAI
client = OpenAI(
api_key="<your api key>",
base_url="https://api.deepseek.com",
)
system_prompt = """
The user will provide some exam text. Please parse the "question" and "answer" and output them in JSON format.
EXAMPLE INPUT:
Which is the highest mountain in the world? Mount Everest.
EXAMPLE JSON OUTPUT:
{
"question": "Which is the highest mountain in the world?",
"answer": "Mount Everest"
}
"""
user_prompt = "Which is the longest river in the world? The Nile River."
messages = [{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}]
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
response_format={
'type': 'json_object'
}
)
print(json.loads(response.choices[0].message.content))
The model will output:
{
"question": "Which is the longest river in the world?",
"answer": "The Nile River"
}

View File

@@ -0,0 +1,273 @@
Tool Calls
Tool Calls allows the model to call external tools to enhance its capabilities.
Non-thinking Mode
Sample Code
Here is an example of using Tool Calls to get the current weather information of the user's location, demonstrated with complete Python code.
For the specific API format of Tool Calls, please refer to the Chat Completion documentation.
from openai import OpenAI
def send_messages(messages):
response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
tools=tools
)
return response.choices[0].message
client = OpenAI(
api_key="<your api key>",
base_url="https://api.deepseek.com",
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather of a location, the user should supply a location first.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"]
},
}
},
]
messages = [{"role": "user", "content": "How's the weather in Hangzhou, Zhejiang?"}]
message = send_messages(messages)
print(f"User>\t {messages[0]['content']}")
tool = message.tool_calls[0]
messages.append(message)
messages.append({"role": "tool", "tool_call_id": tool.id, "content": "24℃"})
message = send_messages(messages)
print(f"Model>\t {message.content}")
The execution flow of this example is as follows:
User: Asks about the current weather in Hangzhou
Model: Returns the function get_weather({location: 'Hangzhou'})
User: Calls the function get_weather({location: 'Hangzhou'}) and provides the result to the model
Model: Returns in natural language, "The current temperature in Hangzhou is 24°C."
Note: In the above code, the functionality of the get_weather function needs to be provided by the user. The model itself does not execute specific functions.
Thinking Mode
From DeepSeek-V3.2, the API supports tool use in the thinking mode. For more details, please refer to Thinking Mode
strict Mode (Beta)
In strict mode, the model strictly adheres to the format requirements of the Function's JSON schema when outputting a tool call, ensuring that the model's output complies with the user's definition. It is supported by both thinking and non-thinking mode.
To use strict mode, you need to:
Use base_url="https://api.deepseek.com/beta" to enable Beta features
In the tools parameterall function need to set the strict property to true
The server will validate the JSON Schema of the Function provided by the user. If the schema does not conform to the specifications or contains JSON schema types that are not supported by the server, an error message will be returned
The following is an example of a tool definition in the strict mode:
{
"type": "function",
"function": {
"name": "get_weather",
"strict": true,
"description": "Get weather of a location, the user should supply a location first.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
"additionalProperties": false
}
}
}
Support Json Schema Types In strict Mode
object
string
number
integer
boolean
array
enum
anyOf
object
The object defines a nested structure containing key-value pairs, where properties specifies the schema for each key (or property) within the object. All properties of every object must be set as required, and the additionalProperties attribute of the object must be set to false.
Example
{
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"required": ["name", "age"],
"additionalProperties": false
}
string
Supported parameters:
pattern: Uses regular expressions to constrain the format of the string
format: Validates the string against predefined common formats. Currently supported formats:
email: Email address
hostname: Hostname
ipv4: IPv4 address
ipv6: IPv6 address
uuid: UUID
Unsupported parameters:
minLength
maxLength
Example:
{
"type": "object",
"properties": {
"user_email": {
"type": "string",
"description": "The user's email address",
"format": "email"
},
"zip_code": {
"type": "string",
"description": "Six digit postal code",
"pattern": "^\\d{6}$"
}
}
}
number/integer
Supported parameters:
const: Specifies a constant numeric value
default: Defines the default value of the number
minimum: Specifies the minimum value
maximum: Specifies the maximum value
exclusiveMinimum: Defines a value that the number must be greater than
exclusiveMaximum: Defines a value that the number must be less than
multipleOf: Ensures that the number is a multiple of the specified value
Example:
{
"type": "object",
"properties": {
"score": {
"type": "integer",
"description": "A number from 1-5, which represents your rating, the higher, the better",
"minimum": 1,
"maximum": 5
}
},
"required": ["score"],
"additionalProperties": false
}
array
Unsupported parameters:
minItems
maxItems
Example
{
"type": "object",
"properties": {
"keywords": {
"type": "array",
"description": "Five keywords of the article, sorted by importance",
"items": {
"type": "string",
"description": "A concise and accurate keyword or phrase."
}
}
},
"required": ["keywords"],
"additionalProperties": false
}
enum
The enum ensures that the output is one of the predefined options. For example, in the case of order status, it can only be one of a limited set of specified states.
Example
{
"type": "object",
"properties": {
"order_status": {
"type": "string",
"description": "Ordering status",
"enum": ["pending", "processing", "shipped", "cancelled"]
}
}
}
anyOf
Matches any one of the provided schemas, allowing fields to accommodate multiple valid formats. For example, a user's account could be either an email address or a phone number:
{
"type": "object",
"properties": {
"account": {
"anyOf": [
{ "type": "string", "format": "email", "description": "可以是电子邮件地址" },
{ "type": "string", "pattern": "^\\d{11}$", "description": "或11位手机号码" }
]
}
}
}
$ref and $def
You can use $def to define reusable modules and then use $ref to reference them, reducing schema repetition and enabling modularization. Additionally, $ref can be used independently to define recursive structures.
{
"type": "object",
"properties": {
"report_date": {
"type": "string",
"description": "The date when the report was published"
},
"authors": {
"type": "array",
"description": "The authors of the report",
"items": {
"$ref": "#/$def/author"
}
}
},
"required": ["report_date", "authors"],
"additionalProperties": false,
"$def": {
"authors": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "author's name"
},
"institution": {
"type": "string",
"description": "author's institution"
},
"email": {
"type": "string",
"format": "email",
"description": "author's email"
}
},
"additionalProperties": false,
"required": ["name", "institution", "email"]
}
}
}

View File

@@ -0,0 +1,533 @@
<br />
You can configure Gemini models to generate responses that adhere to a provided JSON Schema. This capability guarantees predictable and parsable results, ensures format and type-safety, enables the programmatic detection of refusals, and simplifies prompting.
Using structured outputs is ideal for a wide range of applications:
- **Data extraction:**Pull specific information from unstructured text, like extracting names, dates, and amounts from an invoice.
- **Structured classification:**Classify text into predefined categories and assign structured labels, such as categorizing customer feedback by sentiment and topic.
- **Agentic workflows:**Generate structured data that can be used to call other tools or APIs, like creating a character sheet for a game or filling out a form.
In addition to supporting JSON Schema in the REST API, the Google GenAI SDKs for Python and JavaScript also make it easy to define object schemas using[Pydantic](https://docs.pydantic.dev/latest/)and[Zod](https://zod.dev/), respectively. The example below demonstrates how to extract information from unstructured text that conforms to a schema defined in code.
Recipe ExtractorContent ModerationRecursive Structures
This example demonstrates how to extract structured data from text using basic JSON Schema types like`object`,`array`,`string`, and`integer`.
### Python
from google import genai
from pydantic import BaseModel, Field
from typing import List, Optional
class Ingredient(BaseModel):
name: str = Field(description="Name of the ingredient.")
quantity: str = Field(description="Quantity of the ingredient, including units.")
class Recipe(BaseModel):
recipe_name: str = Field(description="The name of the recipe.")
prep_time_minutes: Optional[int] = Field(description="Optional time in minutes to prepare the recipe.")
ingredients: List[Ingredient]
instructions: List[str]
client = genai.Client()
prompt = """
Please extract the recipe from the following text.
The user wants to make delicious chocolate chip cookies.
They need 2 and 1/4 cups of all-purpose flour, 1 teaspoon of baking soda,
1 teaspoon of salt, 1 cup of unsalted butter (softened), 3/4 cup of granulated sugar,
3/4 cup of packed brown sugar, 1 teaspoon of vanilla extract, and 2 large eggs.
For the best part, they'll need 2 cups of semisweet chocolate chips.
First, preheat the oven to 375°F (190°C). Then, in a small bowl, whisk together the flour,
baking soda, and salt. In a large bowl, cream together the butter, granulated sugar, and brown sugar
until light and fluffy. Beat in the vanilla and eggs, one at a time. Gradually beat in the dry
ingredients until just combined. Finally, stir in the chocolate chips. Drop by rounded tablespoons
onto ungreased baking sheets and bake for 9 to 11 minutes.
"""
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=prompt,
config={
"response_mime_type": "application/json",
"response_json_schema": Recipe.model_json_schema(),
},
)
recipe = Recipe.model_validate_json(response.text)
print(recipe)
### JavaScript
import { GoogleGenAI } from "@google/genai";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const ingredientSchema = z.object({
name: z.string().describe("Name of the ingredient."),
quantity: z.string().describe("Quantity of the ingredient, including units."),
});
const recipeSchema = z.object({
recipe_name: z.string().describe("The name of the recipe."),
prep_time_minutes: z.number().optional().describe("Optional time in minutes to prepare the recipe."),
ingredients: z.array(ingredientSchema),
instructions: z.array(z.string()),
});
const ai = new GoogleGenAI({});
const prompt = `
Please extract the recipe from the following text.
The user wants to make delicious chocolate chip cookies.
They need 2 and 1/4 cups of all-purpose flour, 1 teaspoon of baking soda,
1 teaspoon of salt, 1 cup of unsalted butter (softened), 3/4 cup of granulated sugar,
3/4 cup of packed brown sugar, 1 teaspoon of vanilla extract, and 2 large eggs.
For the best part, they'll need 2 cups of semisweet chocolate chips.
First, preheat the oven to 375°F (190°C). Then, in a small bowl, whisk together the flour,
baking soda, and salt. In a large bowl, cream together the butter, granulated sugar, and brown sugar
until light and fluffy. Beat in the vanilla and eggs, one at a time. Gradually beat in the dry
ingredients until just combined. Finally, stir in the chocolate chips. Drop by rounded tablespoons
onto ungreased baking sheets and bake for 9 to 11 minutes.
`;
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: prompt,
config: {
responseMimeType: "application/json",
responseJsonSchema: zodToJsonSchema(recipeSchema),
},
});
const recipe = recipeSchema.parse(JSON.parse(response.text));
console.log(recipe);
### Go
package main
import (
"context"
"fmt"
"log"
"google.golang.org/genai"
)
func main() {
ctx := context.Background()
client, err := genai.NewClient(ctx, nil)
if err != nil {
log.Fatal(err)
}
prompt := `
Please extract the recipe from the following text.
The user wants to make delicious chocolate chip cookies.
They need 2 and 1/4 cups of all-purpose flour, 1 teaspoon of baking soda,
1 teaspoon of salt, 1 cup of unsalted butter (softened), 3/4 cup of granulated sugar,
3/4 cup of packed brown sugar, 1 teaspoon of vanilla extract, and 2 large eggs.
For the best part, they'll need 2 cups of semisweet chocolate chips.
First, preheat the oven to 375°F (190°C). Then, in a small bowl, whisk together the flour,
baking soda, and salt. In a large bowl, cream together the butter, granulated sugar, and brown sugar
until light and fluffy. Beat in the vanilla and eggs, one at a time. Gradually beat in the dry
ingredients until just combined. Finally, stir in the chocolate chips. Drop by rounded tablespoons
onto ungreased baking sheets and bake for 9 to 11 minutes.
`
config := &genai.GenerateContentConfig{
ResponseMIMEType: "application/json",
ResponseJsonSchema: map[string]any{
"type": "object",
"properties": map[string]any{
"recipe_name": map[string]any{
"type": "string",
"description": "The name of the recipe.",
},
"prep_time_minutes": map[string]any{
"type": "integer",
"description": "Optional time in minutes to prepare the recipe.",
},
"ingredients": map[string]any{
"type": "array",
"items": map[string]any{
"type": "object",
"properties": map[string]any{
"name": map[string]any{
"type": "string",
"description": "Name of the ingredient.",
},
"quantity": map[string]any{
"type": "string",
"description": "Quantity of the ingredient, including units.",
},
},
"required": []string{"name", "quantity"},
},
},
"instructions": map[string]any{
"type": "array",
"items": map[string]any{"type": "string"},
},
},
"required": []string{"recipe_name", "ingredients", "instructions"},
},
}
result, err := client.Models.GenerateContent(
ctx,
"gemini-2.5-flash",
genai.Text(prompt),
config,
)
if err != nil {
log.Fatal(err)
}
fmt.Println(result.Text())
}
### REST
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{
"parts":[
{ "text": "Please extract the recipe from the following text.\nThe user wants to make delicious chocolate chip cookies.\nThey need 2 and 1/4 cups of all-purpose flour, 1 teaspoon of baking soda,\n1 teaspoon of salt, 1 cup of unsalted butter (softened), 3/4 cup of granulated sugar,\n3/4 cup of packed brown sugar, 1 teaspoon of vanilla extract, and 2 large eggs.\nFor the best part, they will need 2 cups of semisweet chocolate chips.\nFirst, preheat the oven to 375°F (190°C). Then, in a small bowl, whisk together the flour,\nbaking soda, and salt. In a large bowl, cream together the butter, granulated sugar, and brown sugar\nuntil light and fluffy. Beat in the vanilla and eggs, one at a time. Gradually beat in the dry\ningredients until just combined. Finally, stir in the chocolate chips. Drop by rounded tablespoons\nonto ungreased baking sheets and bake for 9 to 11 minutes." }
]
}],
"generationConfig": {
"responseMimeType": "application/json",
"responseJsonSchema": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
"description": "The name of the recipe."
},
"prep_time_minutes": {
"type": "integer",
"description": "Optional time in minutes to prepare the recipe."
},
"ingredients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string", "description": "Name of the ingredient."},
"quantity": { "type": "string", "description": "Quantity of the ingredient, including units."}
},
"required": ["name", "quantity"]
}
},
"instructions": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["recipe_name", "ingredients", "instructions"]
}
}
}'
**Example Response:**
{
"recipe_name": "Delicious Chocolate Chip Cookies",
"ingredients": [
{
"name": "all-purpose flour",
"quantity": "2 and 1/4 cups"
},
{
"name": "baking soda",
"quantity": "1 teaspoon"
},
{
"name": "salt",
"quantity": "1 teaspoon"
},
{
"name": "unsalted butter (softened)",
"quantity": "1 cup"
},
{
"name": "granulated sugar",
"quantity": "3/4 cup"
},
{
"name": "packed brown sugar",
"quantity": "3/4 cup"
},
{
"name": "vanilla extract",
"quantity": "1 teaspoon"
},
{
"name": "large eggs",
"quantity": "2"
},
{
"name": "semisweet chocolate chips",
"quantity": "2 cups"
}
],
"instructions": [
"Preheat the oven to 375°F (190°C).",
"In a small bowl, whisk together the flour, baking soda, and salt.",
"In a large bowl, cream together the butter, granulated sugar, and brown sugar until light and fluffy.",
"Beat in the vanilla and eggs, one at a time.",
"Gradually beat in the dry ingredients until just combined.",
"Stir in the chocolate chips.",
"Drop by rounded tablespoons onto ungreased baking sheets and bake for 9 to 11 minutes."
]
}
## Streaming
You can stream structured outputs, which allows you to start processing the response as it's being generated, without having to wait for the entire output to be complete. This can improve the perceived performance of your application.
The streamed chunks will be valid partial JSON strings, which can be concatenated to form the final, complete JSON object.
### Python
from google import genai
from pydantic import BaseModel, Field
from typing import Literal
class Feedback(BaseModel):
sentiment: Literal["positive", "neutral", "negative"]
summary: str
client = genai.Client()
prompt = "The new UI is incredibly intuitive and visually appealing. Great job. Add a very long summary to test streaming!"
response_stream = client.models.generate_content_stream(
model="gemini-2.5-flash",
contents=prompt,
config={
"response_mime_type": "application/json",
"response_json_schema": Feedback.model_json_schema(),
},
)
for chunk in response_stream:
print(chunk.candidates[0].content.parts[0].text)
### JavaScript
import { GoogleGenAI } from "@google/genai";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const ai = new GoogleGenAI({});
const prompt = "The new UI is incredibly intuitive and visually appealing. Great job! Add a very long summary to test streaming!";
const feedbackSchema = z.object({
sentiment: z.enum(["positive", "neutral", "negative"]),
summary: z.string(),
});
const stream = await ai.models.generateContentStream({
model: "gemini-2.5-flash",
contents: prompt,
config: {
responseMimeType: "application/json",
responseJsonSchema: zodToJsonSchema(feedbackSchema),
},
});
for await (const chunk of stream) {
console.log(chunk.candidates[0].content.parts[0].text)
}
## Structured outputs with tools
| **Preview:** This is a feature available only for the Gemini 3 series models,`gemini-3-pro-preview`and`gemini-3-flash-preview`.
Gemini 3 lets you combine Structured Outputs with built-in tools, including[Grounding with Google Search](https://ai.google.dev/gemini-api/docs/google-search),[URL Context](https://ai.google.dev/gemini-api/docs/url-context), and[Code Execution](https://ai.google.dev/gemini-api/docs/code-execution).
### Python
from google import genai
from pydantic import BaseModel, Field
from typing import List
class MatchResult(BaseModel):
winner: str = Field(description="The name of the winner.")
final_match_score: str = Field(description="The final match score.")
scorers: List[str] = Field(description="The name of the scorer.")
client = genai.Client()
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="Search for all details for the latest Euro.",
config={
"tools": [
{"google_search": {}},
{"url_context": {}}
],
"response_mime_type": "application/json",
"response_json_schema": MatchResult.model_json_schema(),
},
)
result = MatchResult.model_validate_json(response.text)
print(result)
### JavaScript
import { GoogleGenAI } from "@google/genai";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const ai = new GoogleGenAI({});
const matchSchema = z.object({
winner: z.string().describe("The name of the winner."),
final_match_score: z.string().describe("The final score."),
scorers: z.array(z.string()).describe("The name of the scorer.")
});
async function run() {
const response = await ai.models.generateContent({
model: "gemini-3-pro-preview",
contents: "Search for all details for the latest Euro.",
config: {
tools: [
{ googleSearch: {} },
{ urlContext: {} }
],
responseMimeType: "application/json",
responseJsonSchema: zodToJsonSchema(matchSchema),
},
});
const match = matchSchema.parse(JSON.parse(response.text));
console.log(match);
}
run();
### REST
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{
"parts": [{"text": "Search for all details for the latest Euro."}]
}],
"tools": [
{"googleSearch": {}},
{"urlContext": {}}
],
"generationConfig": {
"responseMimeType": "application/json",
"responseJsonSchema": {
"type": "object",
"properties": {
"winner": {"type": "string", "description": "The name of the winner."},
"final_match_score": {"type": "string", "description": "The final score."},
"scorers": {
"type": "array",
"items": {"type": "string"},
"description": "The name of the scorer."
}
},
"required": ["winner", "final_match_score", "scorers"]
}
}
}'
## JSON schema support
To generate a JSON object, set the`response_mime_type`in the generation configuration to`application/json`and provide a`response_json_schema`. The schema must be a valid[JSON Schema](https://json-schema.org/)that describes the desired output format.
The model will then generate a response that is a syntactically valid JSON string matching the provided schema. When using structured outputs, the model will produce outputs in the same order as the keys in the schema.
Gemini's structured output mode supports a subset of the[JSON Schema](https://json-schema.org)specification.
The following values of`type`are supported:
- **`string`**: For text.
- **`number`**: For floating-point numbers.
- **`integer`**: For whole numbers.
- **`boolean`**: For true/false values.
- **`object`**: For structured data with key-value pairs.
- **`array`**: For lists of items.
- **`null`** : To allow a property to be null, include`"null"`in the type array (e.g.,`{"type": ["string", "null"]}`).
These descriptive properties help guide the model:
- **`title`**: A short description of a property.
- **`description`**: A longer and more detailed description of a property.
### Type-specific properties
**For`object`values:**
- **`properties`**: An object where each key is a property name and each value is a schema for that property.
- **`required`**: An array of strings, listing which properties are mandatory.
- **`additionalProperties`** : Controls whether properties not listed in`properties`are allowed. Can be a boolean or a schema.
**For`string`values:**
- **`enum`**: Lists a specific set of possible strings for classification tasks.
- **`format`** : Specifies a syntax for the string, such as`date-time`,`date`,`time`.
**For`number`and`integer`values:**
- **`enum`**: Lists a specific set of possible numeric values.
- **`minimum`**: The minimum inclusive value.
- **`maximum`**: The maximum inclusive value.
**For`array`values:**
- **`items`**: Defines the schema for all items in the array.
- **`prefixItems`**: Defines a list of schemas for the first N items, allowing for tuple-like structures.
- **`minItems`**: The minimum number of items in the array.
- **`maxItems`**: The maximum number of items in the array.
## Model support
The following models support structured output:
| Model | Structured Outputs |
|------------------------|--------------------|
| Gemini 3 Pro Preview | ✔️ |
| Gemini 3 Flash Preview | ✔️ |
| Gemini 2.5 Pro | ✔️ |
| Gemini 2.5 Flash | ✔️ |
| Gemini 2.5 Flash-Lite | ✔️ |
| Gemini 2.0 Flash | ✔️\* |
| Gemini 2.0 Flash-Lite | ✔️\* |
*\* Note that Gemini 2.0 requires an explicit`propertyOrdering`list within the JSON input to define the preferred structure. You can find an example in this[cookbook](https://github.com/google-gemini/cookbook/blob/main/examples/Pdf_structured_outputs_on_invoices_and_forms.ipynb).*
## Structured outputs vs. function calling
Both structured outputs and function calling use JSON schemas, but they serve different purposes:
| Feature | Primary Use Case |
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Structured Outputs** | **Formatting the final response to the user.** Use this when you want the model's*answer*to be in a specific format (e.g., extracting data from a document to save to a database). |
| **Function Calling** | **Taking action during the conversation.** Use this when the model needs to*ask you*to perform a task (e.g., "get current weather") before it can provide a final answer. |
## Best practices
- **Clear descriptions:** Use the`description`field in your schema to provide clear instructions to the model about what each property represents. This is crucial for guiding the model's output.
- **Strong typing:** Use specific types (`integer`,`string`,`enum`) whenever possible. If a parameter has a limited set of valid values, use an`enum`.
- **Prompt engineering:**Clearly state in your prompt what you want the model to do. For example, "Extract the following information from the text..." or "Classify this feedback according to the provided schema...".
- **Validation:**While structured output guarantees syntactically correct JSON, it does not guarantee the values are semantically correct. Always validate the final output in your application code before using it.
- **Error handling:**Implement robust error handling in your application to gracefully manage cases where the model's output, while schema-compliant, may not meet your business logic requirements.
## Limitations
- **Schema subset:**Not all features of the JSON Schema specification are supported. The model ignores unsupported properties.
- **Schema complexity:**The API may reject very large or deeply nested schemas. If you encounter errors, try simplifying your schema by shortening property names, reducing nesting, or limiting the number of constraints.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff