Vocabulary
AI-Native Glossary
40 definitions of the vocabulary we use when we scope, build, and run engagements. Written for operators, not researchers.
Delivery & operations
AI workflow
A bounded operational process where AI handles defined steps end-to-end with measurable KPIs.
AI-native
A delivery model where AI is the operating layer of the workflow, not a feature added on top.
Discovery sprint
A 2-week paid engagement that maps the workflow, baseline metrics, systems, and risk model.
Thin slice
A narrow, end-to-end production deployment that proves an AI workflow on real data and edge cases.
Architecture
Agentic AI
AI systems that can plan, take multi-step actions, and use tools to complete tasks autonomously.
Autonomous agent
An AI agent that completes a defined task without per-step human input.
Chain of thought
Prompting the model to show intermediate reasoning steps before producing a final answer.
Embeddings
Numerical vectors that represent the meaning of a text, image, or other piece of content.
Function calling
Specific implementation of tool use where the model emits structured JSON calls to registered functions.
Hybrid search
Combination of keyword (BM25) and vector (embedding) retrieval, often re-ranked.
MCP (Model Context Protocol)
Open protocol for exposing tools, resources, and prompts to AI models in a standard way.
Multi-LLM architecture
Routing different tasks to different models based on cost, quality, latency, and capability tradeoffs.
RAG (Retrieval-Augmented Generation)
Generation grounded in retrieved source documents rather than the model's parametric memory alone.
ReAct
A reasoning pattern where the model alternates Thought → Action → Observation steps.
Semantic search
Search by meaning, not by keyword overlap.
Structured output
Constraining model output to a strict schema (JSON, regex, grammar) for reliable downstream parsing.
Tool use
An LLM's ability to call external functions, APIs, or services within a generation step.
Vector store
A database optimized for similarity search over embeddings.
Evaluation & quality
Confidence score
A scalar that estimates how reliable a model's output is for a given input.
Evaluation harness
A test framework that scores model or prompt output against a labelled set of expected outputs.
Labelled test set
A frozen, hand-curated set of real input examples with expected outputs, used to score model behavior.
Prompt versioning
Treating prompts as code: stored, diffed, reviewed, and rolled back like any production artifact.
Governance & risk
AI governance
Policies, processes, and controls that make an AI system auditable and accountable.
Audit log
Tamper-evident record of every model input, output, version, and reviewer action.
Grounding
Anchoring model output to verifiable source material to reduce hallucination.
Guardrails
Pre and post checks that filter unsafe, off-topic, or non-compliant model outputs.
Hallucination
Plausible but factually incorrect output generated by an LLM with no grounding.
Model card
Documentation describing a model's intended use, limitations, evaluation, and risks.
NIST AI RMF
U.S. NIST's voluntary framework for managing risks in AI systems across the lifecycle.
Prompt injection
An attack where user input manipulates the model into ignoring its system prompt or executing unintended actions.
Reviewer queue
A workflow where low-confidence or high-impact AI outputs route to a human for approval.
Models & foundations
Context window
The maximum number of tokens a model can process in a single request.
Extended thinking
A model mode that performs longer internal reasoning before producing the answer.
Fine-tuning
Continuing the training of a base model on task-specific data to specialize behavior.
Foundation model
A large model pre-trained on broad data, then adapted to many downstream tasks.
Frontier model
The leading-edge foundation models with the highest reasoning, coding, and multimodal capabilities.
LLM (Large Language Model)
A transformer-based model trained on language data to predict and generate text.
LoRA
Low-Rank Adaptation: a parameter-efficient fine-tuning method that trains small adapters instead of full weights.
Multimodal
Models that accept and produce more than one type of content (text, image, audio, video).
Transformer
The neural network architecture that powers modern LLMs, based on self-attention.