Schema Definition Explained¶
Schema is the core of apcore, defining the input and output structure of modules.
1. Why is Schema Mandatory?¶
Traditional Module:
def process(data): # What is data? Unknown
return result # What is result? Unknown
apcore Module:
input_schema = ProcessInput # Input structure is clear
output_schema = ProcessOutput # Output structure is clear
def execute(inputs, context):
# inputs are validated, structure is known
return {...} # Output will be validated
Purpose of Schema:
| Purpose | Description |
|---|---|
| AI Understanding | LLM knows how to call the module through Schema |
| Auto Validation | Framework automatically validates input and output |
| Documentation Generation | Automatically generate API documentation |
| Type Safety | Type hints available during development |
| Cross-Language | YAML Schema can be shared across languages |
2. Schema Definition Methods¶
2.1 Pydantic Model (Recommended)¶
The preferred method for Python development.
from pydantic import BaseModel, Field
from typing import Literal
from datetime import datetime
class OrderInput(BaseModel):
"""Order creation input"""
# Required fields
product_id: str = Field(
..., # ... means required
description="Product ID",
min_length=1,
max_length=50
)
quantity: int = Field(
...,
description="Purchase quantity",
ge=1, # >= 1
le=100 # <= 100
)
# Optional fields (must have default value)
note: str | None = Field(
None, # Default value
description="Order note",
max_length=500
)
# Enum type
payment_method: Literal["alipay", "wechat", "card"] = Field(
"alipay",
description="Payment method"
)
# Nested object
shipping_address: "Address" = Field(
...,
description="Shipping address"
)
class Address(BaseModel):
"""Address information"""
province: str = Field(..., description="Province")
city: str = Field(..., description="City")
district: str = Field(..., description="District")
detail: str = Field(..., description="Detailed address")
postal_code: str = Field(..., description="Postal code", pattern=r"^\d{6}$")
2.2 YAML Schema (Cross-Language)¶
Used for sharing Schema definitions across languages.
# schemas/executor/order/create_order.schema.yaml
$schema: "https://apcore.dev/schema/v1"
version: "1.0.0"
module_id: "executor.order.create_order"
description: |
Order creation module
Supports multiple payment methods and shipping address configuration.
input_schema:
type: object
properties:
product_id:
type: string
description: "Product ID"
minLength: 1
maxLength: 50
quantity:
type: integer
description: "Purchase quantity"
minimum: 1
maximum: 100
note:
type: string
description: "Order note"
maxLength: 500
default: null
payment_method:
type: string
description: "Payment method"
enum: ["alipay", "wechat", "card"]
default: "alipay"
shipping_address:
$ref: "#/definitions/Address"
description: "Shipping address"
required: [product_id, quantity, shipping_address]
output_schema:
type: object
properties:
order_id:
type: string
description: "Order ID"
status:
type: string
description: "Order status"
enum: ["created", "pending", "paid", "failed"]
total_amount:
type: number
description: "Total order amount"
created_at:
type: string
format: date-time
description: "Creation time"
required: [order_id, status, total_amount, created_at]
definitions:
Address:
type: object
properties:
province:
type: string
description: "Province"
city:
type: string
description: "City"
district:
type: string
description: "District"
detail:
type: string
description: "Detailed address"
postal_code:
type: string
description: "Postal code"
pattern: "^\\d{6}$"
required: [province, city, district, detail, postal_code]
3. Field Types¶
3.1 Basic Types¶
from pydantic import BaseModel, Field
from typing import Any
class BasicTypes(BaseModel):
# String
name: str = Field(..., description="Name")
# Integer
age: int = Field(..., description="Age")
# Float
price: float = Field(..., description="Price")
# Boolean
active: bool = Field(..., description="Is active")
# Any type (avoid when possible)
data: Any = Field(..., description="Any data")
Corresponding JSON Schema:
properties:
name:
type: string
age:
type: integer
price:
type: number
active:
type: boolean
data: {} # Any type
3.2 Optional Types¶
from typing import Optional
class OptionalTypes(BaseModel):
# Method 1: | None
email: str | None = Field(None, description="Email")
# Method 2: Optional (equivalent to above)
phone: Optional[str] = Field(None, description="Phone")
# Non-None field with default value
count: int = Field(default=0, description="Count")
3.3 Lists and Dictionaries¶
from typing import List, Dict
class CollectionTypes(BaseModel):
# String list
tags: list[str] = Field(default=[], description="Tag list")
# Object list
items: list["OrderItem"] = Field(..., description="Order items")
# Dictionary
metadata: dict[str, Any] = Field(default={}, description="Metadata")
# Dictionary with specified value type
scores: dict[str, int] = Field(default={}, description="Score mapping")
class OrderItem(BaseModel):
product_id: str = Field(..., description="Product ID")
quantity: int = Field(..., description="Quantity")
price: float = Field(..., description="Unit price")
Corresponding JSON Schema:
properties:
tags:
type: array
items:
type: string
default: []
items:
type: array
items:
$ref: "#/definitions/OrderItem"
metadata:
type: object
additionalProperties: true
default: {}
scores:
type: object
additionalProperties:
type: integer
default: {}
3.4 Enum Types¶
from typing import Literal
from enum import Enum
# Method 1: Literal (recommended)
class Order1(BaseModel):
status: Literal["pending", "paid", "shipped", "done"] = Field(
...,
description="Order status"
)
priority: Literal[1, 2, 3] = Field(
default=2,
description="Priority: 1-high 2-medium 3-low"
)
# Method 2: Enum
class OrderStatus(str, Enum):
PENDING = "pending"
PAID = "paid"
SHIPPED = "shipped"
DONE = "done"
class Order2(BaseModel):
status: OrderStatus = Field(..., description="Order status")
3.5 Date and Time¶
from datetime import datetime, date, time
from pydantic import BaseModel, Field
class DateTimeTypes(BaseModel):
# Datetime
created_at: datetime = Field(..., description="Creation time")
# Date only
birth_date: date = Field(..., description="Birth date")
# Time only
alarm_time: time = Field(..., description="Alarm time")
# Date string format (needs to be parsed manually)
date_str: str = Field(
...,
description="Date string",
pattern=r"^\d{4}-\d{2}-\d{2}$" # YYYY-MM-DD
)
3.6 Cross-Language Type Mapping Quick Reference¶
| JSON Schema Type | Python | Rust | Go | Java | TypeScript |
|---|---|---|---|---|---|
string |
str |
String |
string |
String |
string |
integer |
int |
i64 |
int64 |
long / Long |
number |
number |
float |
f64 |
float64 |
double / Double |
number |
boolean |
bool |
bool |
bool |
boolean / Boolean |
boolean |
null |
None |
() |
nil |
null |
null |
object (with properties) |
BaseModel subclass |
struct |
struct |
class |
z.object({}) |
additionalProperties |
dict[str, V] |
HashMap<String, V> |
map[string]V |
Map<String, V> |
Record<string, V> |
array |
list[T] |
Vec<T> |
[]T |
List<T> |
T[] |
string + format: date-time |
datetime |
DateTime<Utc> |
time.Time |
OffsetDateTime |
Date / string |
string + format: date |
date |
NaiveDate |
civil.Date |
LocalDate |
string |
string + format: time |
time |
NaiveTime |
Custom | LocalTime |
string |
string + format: email |
EmailStr |
String + validation |
string + validation |
String + @Email |
z.string().email() |
string + format: uri |
AnyUrl |
url::Url |
string + validation |
String + @URL |
z.string().url() |
string + format: uuid |
UUID4 |
uuid::Uuid |
uuid.UUID |
UUID |
z.string().uuid() |
string enum |
Literal[...] / StrEnum |
enum + serde |
type T string + const |
enum |
z.enum([...]) |
integer enum |
IntEnum |
#[repr(i64)] enum |
type T int64 + iota |
enum |
z.union(z.literal(...)) |
oneOf |
Discriminated Union | enum (tagged) |
interface{} |
Sealed Class | z.discriminatedUnion() |
anyOf |
Union type | enum (untagged) |
interface{} |
Object |
Union type |
T \| null |
T \| None |
Option<T> |
*T |
@Nullable T |
T \| null |
This table is a quick reference. For complete type mapping specifications (including serialization fidelity and edge cases), see docs/spec/type-mapping.md.
4. Field Constraints¶
4.1 String Constraints¶
class StringConstraints(BaseModel):
# Length constraints
username: str = Field(
...,
description="Username",
min_length=3, # Minimum length
max_length=20 # Maximum length
)
# Regular expression
email: str = Field(
...,
description="Email",
pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$"
)
phone: str = Field(
...,
description="Phone number",
pattern=r"^1[3-9]\d{9}$"
)
# Format (JSON Schema standard format)
website: str = Field(
...,
description="Website",
# Common formats: email, uri, date, time, date-time, uuid
)
4.2 Numeric Constraints¶
class NumberConstraints(BaseModel):
# Range constraints
age: int = Field(
...,
description="Age",
ge=0, # >= 0
le=150 # <= 150
)
price: float = Field(
...,
description="Price",
gt=0, # > 0
lt=1000000 # < 1000000
)
# Multiple constraint
quantity: int = Field(
...,
description="Quantity (must be multiple of 10)",
multiple_of=10
)
4.3 List Constraints¶
from pydantic import BaseModel, Field, field_validator
class ListConstraints(BaseModel):
# Length constraints
tags: list[str] = Field(
...,
description="Tags",
min_length=1, # At least 1
max_length=10 # At most 10
)
# Unique elements (requires custom validation)
unique_ids: list[str] = Field(
...,
description="Unique ID list"
)
@field_validator("unique_ids")
@classmethod
def check_unique(cls, v):
if len(v) != len(set(v)):
raise ValueError('List elements must be unique')
return v
5. LLM Extension Fields¶
These fields help AI/LLM better understand Schema.
5.1 x-llm-description¶
class LLMFriendlyInput(BaseModel):
sql: str = Field(
...,
description="SQL statement",
json_schema_extra={
"x-llm-description": """
SQL query statement to execute.
- Only SELECT statements allowed
- DROP, DELETE, UPDATE and other modification operations are not allowed
- Table names must use schema.table format
"""
}
)
Corresponding YAML:
sql:
type: string
description: "SQL statement"
x-llm-description: |
SQL query statement to execute.
- Only SELECT statements allowed
- DROP, DELETE, UPDATE and other modification operations are not allowed
- Table names must use schema.table format
5.2 x-examples¶
class WithExamples(BaseModel):
email: str = Field(
...,
description="Email address",
json_schema_extra={
"x-examples": [
"[email protected]",
"[email protected]",
"[email protected]"
]
}
)
phone: str = Field(
...,
description="China mainland mobile number",
pattern=r"^1[3-9]\d{9}$",
json_schema_extra={
"x-examples": ["13800138000", "15912345678"]
}
)
5.3 x-sensitive¶
class WithSensitive(BaseModel):
username: str = Field(..., description="Username")
password: str = Field(
...,
description="Password",
json_schema_extra={
"x-sensitive": True # Mark as sensitive field
}
)
api_key: str = Field(
...,
description="API key",
json_schema_extra={
"x-sensitive": True
}
)
Handling of sensitive fields: - Automatically masked in logs - Not recorded in trace data - AI/LLM is advised not to store
6. Nesting and References¶
6.1 Nested Objects¶
class Address(BaseModel):
"""Address information"""
city: str = Field(..., description="City")
street: str = Field(..., description="Street")
postal_code: str = Field(..., description="Postal code")
class Company(BaseModel):
"""Company information"""
name: str = Field(..., description="Company name")
address: Address = Field(..., description="Company address") # Nested
class Employee(BaseModel):
"""Employee information"""
name: str = Field(..., description="Name")
company: Company = Field(..., description="Company") # Multi-level nesting
home_address: Address = Field(..., description="Home address") # Reuse
6.2 References in YAML¶
# Using $ref references
input_schema:
type: object
properties:
shipping_address:
$ref: "#/definitions/Address"
billing_address:
$ref: "#/definitions/Address"
definitions:
Address:
type: object
properties:
city:
type: string
street:
type: string
6.3 Cross-File References¶
# schemas/order.schema.yaml
input_schema:
type: object
properties:
customer:
$ref: "./common/customer.schema.yaml#/definitions/Customer"
items:
type: array
items:
$ref: "./common/product.schema.yaml#/definitions/OrderItem"
7. Custom Validation¶
7.1 Field Validators¶
from pydantic import BaseModel, Field, field_validator
class UserInput(BaseModel):
username: str = Field(..., description="Username")
email: str = Field(..., description="Email")
password: str = Field(..., description="Password")
confirm_password: str = Field(..., description="Confirm password")
@field_validator('username')
@classmethod
def username_alphanumeric(cls, v: str) -> str:
if not v.isalnum():
raise ValueError('Username can only contain letters and numbers')
return v
@field_validator('email')
@classmethod
def email_valid(cls, v: str) -> str:
if '@' not in v:
raise ValueError('Email format is incorrect')
return v.lower() # Convert to lowercase
7.2 Model Validators¶
from pydantic import BaseModel, model_validator
class PasswordInput(BaseModel):
password: str = Field(..., description="Password")
confirm_password: str = Field(..., description="Confirm password")
@model_validator(mode='after')
def passwords_match(self) -> 'PasswordInput':
if self.password != self.confirm_password:
raise ValueError('Password entries do not match')
return self
7.3 Complex Business Validation¶
class OrderInput(BaseModel):
items: list["OrderItem"] = Field(..., description="Order items")
coupon_code: str | None = Field(None, description="Coupon")
total_amount: float = Field(..., description="Total amount")
@model_validator(mode='after')
def validate_order(self) -> 'OrderInput':
# Calculate total product price
calculated_total = sum(item.price * item.quantity for item in self.items)
# Validate total amount
if abs(self.total_amount - calculated_total) > 0.01:
raise ValueError(f'Total amount is incorrect, should be {calculated_total}')
# Validate coupon
if self.coupon_code and not self._is_valid_coupon(self.coupon_code):
raise ValueError('Coupon is invalid or expired')
return self
def _is_valid_coupon(self, code: str) -> bool:
# Coupon validation logic
return True
8. Schema Loading Strategy¶
apcore supports multiple Schema loading methods:
# apcore.yaml
schema:
# Loading strategy
strategy: "yaml_first" # yaml_first | native_first | yaml_only
# yaml_first: Load from YAML first, native implementation can override
# native_first: Prefer Python class definition, YAML as fallback
# yaml_only: YAML only (pure cross-language scenarios)
paths:
yaml_schemas: "./schemas"
validation:
strict: true # Strict mode: extra fields not allowed
coerce_types: true # Type coercion
9. Best Practices¶
9.1 Every Field Should Have a Description¶
# ✅ Good
class GoodSchema(BaseModel):
name: str = Field(..., description="User name, 2-50 characters")
age: int = Field(..., description="User age, 0-150 years")
# ❌ Bad
class BadSchema(BaseModel):
name: str
age: int
9.2 Use Explicit Types¶
# ✅ Good: Clear types
class GoodSchema(BaseModel):
status: Literal["active", "inactive", "pending"]
count: int
price: float
# ❌ Bad: Vague types
class BadSchema(BaseModel):
status: str # Can be any string
count: Any # Unknown type
price: Any
9.3 Set Reasonable Constraints¶
# ✅ Good: Has reasonable constraints
class GoodSchema(BaseModel):
username: str = Field(..., min_length=3, max_length=20)
age: int = Field(..., ge=0, le=150)
email: str = Field(..., pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")
# ❌ Bad: No constraints
class BadSchema(BaseModel):
username: str # Can be empty string or very long
age: int # Can be negative or absurd number
email: str # Can be any string
9.4 Separate Complex Schemas¶
# ✅ Good: Reusable and clear
class Address(BaseModel):
"""Address - reusable"""
city: str = Field(..., description="City")
street: str = Field(..., description="Street")
class OrderInput(BaseModel):
shipping_address: Address
billing_address: Address
# ❌ Bad: Duplicate definitions
class OrderInput(BaseModel):
shipping_city: str
shipping_street: str
billing_city: str
billing_street: str
10. Edge Case Handling¶
10.1 null vs Empty String¶
| Scenario | JSON Value | Meaning |
|---|---|---|
Field value is null |
{"name": null} |
Field exists but has no value |
| Field does not exist | {} |
Field is missing |
| Empty string | {"name": ""} |
Field exists and value is empty string |
Implementations must distinguish these three cases. The required constraint checks if the field exists, nullable controls whether null is allowed.
10.2 Large Number Handling¶
apcore specifies 2^53 - 1 (9007199254740991) as the cross-language integer safe boundary (determined by JavaScript Number.MAX_SAFE_INTEGER).
| Scenario | Schema Definition | Description |
|---|---|---|
| Within safe range (≤ 2^53 - 1) | type: integer |
Use directly, lossless across all languages |
| Beyond safe range | type: string + format: int64 |
64-bit integer, transmitted as string |
| Arbitrary precision integer | type: string + format: bigint |
Like blockchain nonce |
| High precision decimal | type: string + format: decimal |
Like currency, exchange rate |
from pydantic import BaseModel, Field
class PaymentInput(BaseModel):
# Within safe range — use int directly
user_id: int = Field(..., description="User ID", ge=0)
# Beyond safe range — use string + format
order_no: str = Field(
...,
description="Snowflake order number",
pattern=r"^\d+$",
json_schema_extra={"format": "int64"}
)
# High precision amount
amount: str = Field(
...,
description="Payment amount (accurate to cent)",
pattern=r"^-?\d+\.\d{2}$",
json_schema_extra={"format": "decimal"}
)
For complete large number type mapping specifications, see docs/spec/type-mapping.md §14.1.
10.3 Unicode Handling¶
- All
stringtypes must support UTF-8 encoding minLength/maxLengthshould count by Unicode characters (code points), not bytes- When containing Emoji and combining characters, implementations should use NFC normalized form
10.4 Nesting Depth Limit¶
- Schema object nesting depth should not exceed 16 levels
$refrecursion depth must not exceed 32 levels- Implementations should issue warnings when detecting excessive nesting
11. AI-Friendly Schema Design¶
All mainstream AI protocols (MCP, OpenAI, Anthropic, Gemini, LangChain) are based on JSON Schema. apcore's use of JSON Schema Draft 2020-12 is the correct choice. This section provides design guidelines to make Schema easier for AI/LLM to correctly understand and call.
11.1 Flat is Better Than Nested¶
AI/LLM accuracy significantly decreases with deeply nested structures. input_schema should keep nesting within 2-3 levels.
# ❌ Bad: Deep nesting, hard for AI to fill accurately
input_schema:
type: object
properties:
config:
type: object
properties:
database:
type: object
properties:
connection:
type: object
properties:
host:
type: string
port:
type: integer
# ✅ Good: Flat, clear for AI
input_schema:
type: object
properties:
db_host:
type: string
description: "Database server hostname or IP address"
x-examples: ["localhost", "db.example.com"]
db_port:
type: integer
description: "Database port number"
default: 5432
Reference: §10.4 specifies nesting depth limit of 16 levels, but modules for AI calling should not exceed 3 levels.
11.2 Description Quality Rules¶
Each field's description should answer "What does the AI need to know to correctly fill this field?"
| Rule | Description |
|---|---|
| Explain semantics, not type | Schema already declares type: string, description doesn't need to repeat "string type" |
| Explain value range | Like "Supported formats: png, jpg, gif" |
| Explain default behavior | Like "If not filled, send to all subscribers" |
| Explain related constraints | Like "When format is html, template_id must be provided" |
# ❌ Bad: Repeats type information, no actual guidance
body: str = Field(
...,
description="Email body, string type"
)
# ✅ Good: Explains semantics and usage
body: str = Field(
...,
description="Email body content, supports plain text or HTML. HTML format requires html=true"
)
11.3 Token Awareness¶
AI protocols serialize Schema and inject it into the prompt, consuming context tokens. Reducing unnecessary token consumption can improve AI reasoning.
| Recommendation | Description |
|---|---|
| Avoid repeating type info in description | type: string already declares type, description doesn't need to mention it |
| Use x-llm-description when enum has more than 5 values | Avoid AI guessing enum meanings one by one |
| Use first sentence description in compact mode | See Registry API compact export |
| Send simplified Schema during module discovery | Load full Schema after module is selected (progressive disclosure) |
# ❌ Bad: Many enum values without explanation
status:
type: string
enum: ["draft", "pending_review", "in_review", "approved", "rejected",
"published", "archived", "suspended", "deleted"]
# ✅ Good: Use x-llm-description to explain usage scenarios
status:
type: string
enum: ["draft", "pending_review", "in_review", "approved", "rejected",
"published", "archived", "suspended", "deleted"]
description: "Article status"
x-llm-description: |
Article lifecycle status. Use draft when creating, pending_review for submission,
approved after review, published for going live.
Other statuses (archived/suspended/deleted) are for admin operations, generally not used during creation.
11.4 Required Field Priority¶
In the properties block, list required fields (required) first, then optional fields. This helps AI grasp core parameters faster when reading Schema.
# ✅ Recommended: Required fields first
input_schema:
type: object
properties:
# --- Required fields ---
to:
type: string
description: "Recipient email address"
subject:
type: string
description: "Email subject"
body:
type: string
description: "Email body"
# --- Optional fields ---
cc:
type: array
items: { type: string }
description: "CC list"
default: []
html:
type: boolean
description: "Is HTML format"
default: false
required: [to, subject, body]
11.5 Design Checklist¶
Before publishing modules for AI calling, it's recommended to check against the following checklist:
- [ ] input_schema nesting does not exceed 3 levels
- [ ] Every field has description, and does not repeat type information
- [ ] Required fields are listed before optional fields in properties
- [ ] Provide x-llm-description when enum has more than 5 values
- [ ] Complex modules (oneOf/anyOf, 5+ required fields) provide examples
- [ ] Sensitive fields marked with x-sensitive
12. Cross-Protocol Compatibility¶
apcore uses JSON Schema Draft 2020-12, but some AI protocols (like MCP) are still based on Draft 7. Modules aimed at broad AI protocol compatibility should prioritize using the common subset of both versions.
12.1 Safe Features (Draft 7 + 2020-12 Common)¶
The following keywords can be safely used in all mainstream AI protocols:
| Keyword | Description |
|---|---|
type |
Type declaration |
properties |
Object property definition |
required |
Required field list |
enum |
Enum values |
const |
Constant value |
description |
Field description |
default |
Default value |
minimum / maximum |
Numeric range |
minLength / maxLength |
String length |
pattern |
Regular expression constraint |
items |
Array element definition |
$ref |
Reference (local reference) |
oneOf / anyOf / allOf |
Combined types |
additionalProperties |
Allow additional properties |
format |
Format annotation (like date-time, email) |
12.2 Use with Caution (Draft 2020-12 Specific)¶
The following keywords are only available in Draft 2020-12, some AI protocols may not support:
| Keyword | Description | Risk |
|---|---|---|
if / then / else |
Conditional Schema | MCP Draft 7 doesn't support; some AI protocols ignore |
$dynamicRef / $dynamicAnchor |
Dynamic reference | Most AI protocols don't support |
prefixItems |
Tuple definition (replaces old items array form) |
Draft 7 uses items array form |
$anchor |
Named anchor | Draft 7 uses $id |
dependentRequired / dependentSchemas |
Dependency declaration | Draft 7 uses dependencies |
12.3 Recommendations¶
- Modules aimed at broad AI protocol compatibility SHOULD limit to the safe subset in §12.1
- Modules used only within apcore and not exported to external AI protocols can freely use all Draft 2020-12 features
- When using
oneOf/anyOf, SHOULD provideexamplesto help AI understand branch meanings - Cross-file
$refSHOULD be inlined when exported, as most AI protocols don't support external reference resolution
Next Steps¶
- Creating Modules Guide - Complete module creation tutorial
- Module Interface Definition - API reference
- ACL Configuration Guide - Access control configuration