Schema and Formula Generation

Schema & Formula Generation

This guide documents how to move from semantic search to structured Guardian schema generation using the Hedera Guardian AI Toolkit.

Overview

At this stage:

Documents have already been ingested
Semantic search is working
The MCP server is connected to your AI client

You will now:

Design schemas using natural language
Generate Guardian-compatible Excel schema files
Extract formulas from methodologies
Translate formulas into structured schema components
Iteratively refine schema structures

Prerequisites

Before generating schemas, ensure:

MCP server is running
Document ingestion is complete
Your AI client supports MCP
The File System extension (if using Claude Desktop) is enabled
The output directory is allowed for file access

Schema files are written to:

data/output/

Each generated schema includes:

An Excel (.xlsx) file
A JSON representation (source of truth)

Step 1 — Configure Your AI Client

For structured schema workflows:

Use a dedicated project workspace
Add persistent instructions to guide tool usage
Ensure the AI uses MCP schema tools instead of generating raw Excel via Python

Best practice instructions include:

Always inspect available tools first
Use search tools before schema generation
Use schema builder tools for Excel creation
Do not manually generate Excel via scripting
Validate schema updates before overwriting files

This ensures controlled and repeatable schema construction.

Step 2 — Design a Root Schema

You may begin with a high-level root schema.

Example natural language prompt:

Create a root schema for a project description.
Include project title (string).
Include project hub account ID.
Include certification type dropdown with options VCS and CCP.
Add placeholder subschemas for future expansion.

The AI will:

Draft schema structure in JSON
Suggest field types
Define enum options
Propose visibility logic (if applicable)

You can refine field types and requirements before generating the file.

Step 3 — Generate the Excel Schema File

Once the structure is approved:

The AI calls the schema builder MCP tool
JSON schema metadata is converted into Excel format
Validation is applied
The file is written to disk

The generated file includes:

Field definitions
Data types
Enum sheets
Visibility conditions
Subschema references

Both the Excel and JSON versions are stored.

Step 4 — Extract Structure from Methodology Documents

To extend schemas properly, first analyze the methodology structure.

Ask the AI to:

Inspect section headings
Extract table of contents
Identify subsection hierarchy
Discover required vs optional fields

The AI will:

Explore metadata fields
Use filtered semantic search
Identify section names and structure
Propose corresponding subschemas

This allows schema structure to mirror document structure.

Step 5 — Create Subschemas

Subschemas can be created incrementally.

Best practice:

Create in small batches
Validate after each update
Avoid large multi-schema generation in one step

This reduces hallucination risk and improves schema correctness.

If a placeholder schema is referenced elsewhere, validation prevents unsafe removal and enforces safe updates.

Step 6 — Transform Tables into Structured Fields

Many methodology sections are presented as tables.

Instead of copying table structure directly:

Convert logical rows into structured fields
Use help text fields for grouping
Define risk or category groupings explicitly
Convert qualitative tables into structured input/output fields

This creates machine-readable structure rather than static formatting.

Step 7 — Extract and Interpret Formulas

You can ask:

How is net emission reduction calculated?
What formulas are involved?
Explain each variable.
Propose a schema capturing all inputs and outputs.

The AI will:

Perform targeted semantic search
Filter by document name
Retrieve LaTeX-converted formulas
Identify dependencies
Break formulas into calculation chains
Propose input and computed fields

Formula extraction includes:

Root equations
Intermediate equations
Parameter definitions
Dependency structure

You can then:

Design schemas for calculated parameters
Separate user inputs from computed outputs
Build multi-schema dependency structures

Step 8 — Iterative Refinement

You remain the domain expert.

The AI assists by:

Drafting structure
Translating formulas
Suggesting schema layouts
Maintaining validation rules

You review and refine before finalizing.

Generated schema files can be:

Extended
Edited
Connected via subschema references
Updated safely through MCP tools

Validation & Safety

Schema updates are validated automatically.

If:

A referenced schema is removed
A required field is missing
A type mismatch occurs

The tool returns a validation error and forces correction.

This ensures structural integrity of Guardian-compatible files.

Output Structure

Each generated schema results in:

data/output/
├── project_description_schema.xlsx
├── project_description_schema.json

The JSON file is the canonical editable representation. The Excel file is the exported Guardian-compatible snapshot.

Recommended Workflow

Search methodology
Extract structure
Draft root schema
Generate Excel file
Extend with subschemas
Extract formulas
Create calculation schemas
Refine iteratively

What Success Looks Like

You have successfully completed this stage when:

Root schema exists
Subschemas mirror document structure
Formulas are interpreted correctly
Inputs and calculated outputs are structured
Excel files are valid and Guardian-compatible

At this point, methodology understanding has been transformed into structured, machine-readable schema artifacts.

What Comes Next

The final step in the open-source workflow is:

Schema Ingestion & Mapping (Transformation)

This stage enables:

Matching external JSON inputs to Guardian schema fields
Structured mapping logic
Controlled transformation pipelines

PreviousFirst Ingestion and Semantic Search NextMapping and Transformation

Was this helpful?

Good evening

hashtagSchema & Formula Generation

hashtagOverview

hashtagPrerequisites

hashtagStep 1 — Configure Your AI Client

hashtagStep 2 — Design a Root Schema

hashtagStep 3 — Generate the Excel Schema File

hashtagStep 4 — Extract Structure from Methodology Documents

hashtagStep 5 — Create Subschemas

hashtagStep 6 — Transform Tables into Structured Fields

hashtagStep 7 — Extract and Interpret Formulas

hashtagStep 8 — Iterative Refinement

hashtagValidation & Safety

hashtagOutput Structure

hashtagRecommended Workflow

hashtagWhat Success Looks Like

hashtagWhat Comes Next

Schema & Formula Generation

Overview

Prerequisites

Step 1 — Configure Your AI Client

Step 2 — Design a Root Schema

Step 3 — Generate the Excel Schema File

Step 4 — Extract Structure from Methodology Documents

Step 5 — Create Subschemas

Step 6 — Transform Tables into Structured Fields

Step 7 — Extract and Interpret Formulas

Step 8 — Iterative Refinement

Validation & Safety

Output Structure

Recommended Workflow

What Success Looks Like

What Comes Next