Schema and Formula Generation
Schema & Formula Generation
This guide documents how to move from semantic search to structured Guardian schema generation using the Hedera Guardian AI Toolkit.
Overview
At this stage:
Documents have already been ingested
Semantic search is working
The MCP server is connected to your AI client
You will now:
Design schemas using natural language
Generate Guardian-compatible Excel schema files
Extract formulas from methodologies
Translate formulas into structured schema components
Iteratively refine schema structures
Prerequisites
Before generating schemas, ensure:
MCP server is running
Document ingestion is complete
Your AI client supports MCP
The File System extension (if using Claude Desktop) is enabled
The output directory is allowed for file access
Schema files are written to:
Each generated schema includes:
An Excel (.xlsx) file
A JSON representation (source of truth)
Step 1 — Configure Your AI Client
For structured schema workflows:
Use a dedicated project workspace
Add persistent instructions to guide tool usage
Ensure the AI uses MCP schema tools instead of generating raw Excel via Python
Best practice instructions include:
Always inspect available tools first
Use search tools before schema generation
Use schema builder tools for Excel creation
Do not manually generate Excel via scripting
Validate schema updates before overwriting files
This ensures controlled and repeatable schema construction.
Step 2 — Design a Root Schema
You may begin with a high-level root schema.
Example natural language prompt:
Create a root schema for a project description.
Include project title (string).
Include project hub account ID.
Include certification type dropdown with options VCS and CCP.
Add placeholder subschemas for future expansion.
The AI will:
Draft schema structure in JSON
Suggest field types
Define enum options
Propose visibility logic (if applicable)
You can refine field types and requirements before generating the file.
Step 3 — Generate the Excel Schema File
Once the structure is approved:
The AI calls the schema builder MCP tool
JSON schema metadata is converted into Excel format
Validation is applied
The file is written to disk
The generated file includes:
Field definitions
Data types
Enum sheets
Visibility conditions
Subschema references
Both the Excel and JSON versions are stored.
Step 4 — Extract Structure from Methodology Documents
To extend schemas properly, first analyze the methodology structure.
Ask the AI to:
Inspect section headings
Extract table of contents
Identify subsection hierarchy
Discover required vs optional fields
The AI will:
Explore metadata fields
Use filtered semantic search
Identify section names and structure
Propose corresponding subschemas
This allows schema structure to mirror document structure.
Step 5 — Create Subschemas
Subschemas can be created incrementally.
Best practice:
Create in small batches
Validate after each update
Avoid large multi-schema generation in one step
This reduces hallucination risk and improves schema correctness.
If a placeholder schema is referenced elsewhere, validation prevents unsafe removal and enforces safe updates.
Step 6 — Transform Tables into Structured Fields
Many methodology sections are presented as tables.
Instead of copying table structure directly:
Convert logical rows into structured fields
Use help text fields for grouping
Define risk or category groupings explicitly
Convert qualitative tables into structured input/output fields
This creates machine-readable structure rather than static formatting.
Step 7 — Extract and Interpret Formulas
You can ask:
How is net emission reduction calculated?
What formulas are involved?
Explain each variable.
Propose a schema capturing all inputs and outputs.
The AI will:
Perform targeted semantic search
Filter by document name
Retrieve LaTeX-converted formulas
Identify dependencies
Break formulas into calculation chains
Propose input and computed fields
Formula extraction includes:
Root equations
Intermediate equations
Parameter definitions
Dependency structure
You can then:
Design schemas for calculated parameters
Separate user inputs from computed outputs
Build multi-schema dependency structures
Step 8 — Iterative Refinement
You remain the domain expert.
The AI assists by:
Drafting structure
Translating formulas
Suggesting schema layouts
Maintaining validation rules
You review and refine before finalizing.
Generated schema files can be:
Extended
Edited
Connected via subschema references
Updated safely through MCP tools
Validation & Safety
Schema updates are validated automatically.
If:
A referenced schema is removed
A required field is missing
A type mismatch occurs
The tool returns a validation error and forces correction.
This ensures structural integrity of Guardian-compatible files.
Output Structure
Each generated schema results in:
The JSON file is the canonical editable representation. The Excel file is the exported Guardian-compatible snapshot.
Recommended Workflow
Search methodology
Extract structure
Draft root schema
Generate Excel file
Extend with subschemas
Extract formulas
Create calculation schemas
Refine iteratively
What Success Looks Like
You have successfully completed this stage when:
Root schema exists
Subschemas mirror document structure
Formulas are interpreted correctly
Inputs and calculated outputs are structured
Excel files are valid and Guardian-compatible
At this point, methodology understanding has been transformed into structured, machine-readable schema artifacts.
What Comes Next
The final step in the open-source workflow is:
Schema Ingestion & Mapping (Transformation)
This stage enables:
Matching external JSON inputs to Guardian schema fields
Structured mapping logic
Controlled transformation pipelines
Was this helpful?