From Natural Language to Robot: How Kindly IDE Works Under the Hood
"Make me a 4-DOF robotic arm that can pick up objects from a conveyor belt." Type that into Kindly IDE and you get a complete robot design in about 30 seconds. Here's what happens in those 30 seconds.
The Pipeline
The generation pipeline has four stages: interpretation, structure generation, physics validation, and refinement. Each stage can use a different LLM backend—Kindly IDE supports OpenAI (GPT-4o), Anthropic (Claude), and Google (Gemini) out of the box.
Stage 1: Interpretation
Your natural language prompt is sent to the LLM along with a system prompt that defines what a valid robot structure looks like. The model extracts the number and type of joints, kinematic topology, dimensions, end-effector type, and mounting configuration.
Stage 2: Structure Generation
The interpreted requirements are fed into a second LLM call with structured output enforced via a Zod schema. The model must return a JSON object matching the RobotStructure schema exactly—no free-form text, no hallucinated fields.
{
name: string,
links: [{ name, visual: { geometry }, collision: { geometry }, inertial: { mass, inertia } }],
joints: [{ name, type, parent, child, origin, axis, limits }]
}
Structured output means the response is guaranteed to parse. No regex extraction, no "please format your response as JSON" prayers.
Stage 3: Physics Validation
The generated structure passes through the physics linter—this is deterministic code, not another LLM call. It checks positive-definite inertia tensors, triangle inequality, mass consistency, joint limit validity, kinematic tree validity, and self-collision at default and limit poses.
Issues are auto-corrected where possible (e.g., swapping reversed joint limits) and flagged as warnings where human judgment is needed.
Stage 4: Iterative Refinement
After the initial generation, you can refine the robot through conversation. "Make the base link heavier for stability," "Add a camera mount to the end effector." Each refinement goes through the same pipeline, editing the existing valid structure rather than starting from scratch.
Why Multi-Model Support Matters
Different LLMs have different strengths. GPT-4o produces the most creative configurations, Claude generates the most physically realistic inertial properties, and Gemini is fastest for simple robots. You can switch models mid-session—the structured output schema ensures compatibility.
See It in Action
Try the full generation pipeline in your browser or download the desktop IDE.