BAML Quickstart Guide

Below is a concise, up‑to‑date field guide to the BAML (BoundaryML) ecosystem—what it is, what’s new, what to use, and practical techniques/tips that work right now.

TL;DR — what’s new (mid‑2025)

Fast release cadence: Current release line is 0.206.x (Aug 28, 2025) with recent fixes for the VS Code playground & Jinja media types. (GitHub)
IDE coverage: In addition to VS Code, there are JetBrains and Zed editor extensions; JetBrains has frequent updates and is on the official marketplace. (JetBrains Marketplace)
Boundary Studio v2 (alpha): New analytics/observability iteration announced in the releases; Studio remains the commercial, hosted part of the stack. (GitHub, Y Combinator)
Workflows (Tech Preview): Compose multi‑model, multi‑step pipelines directly in BAML (with fetch_value, expressions, etc.). Marked experimental (don’t use in prod yet). (BAML)
“Semantic streaming” matured: stream partial but valid JSON chunks aligned to your schema (not token-by-token). (Boundary Documentation, BAML)
Providers & runtimes: Wide provider support (OpenAI, Anthropic/Gemini/Vertex, Azure OpenAI, Groq, Cerebras, OpenRouter, Together, vLLM, LM Studio, etc.) via built‑in providers and the openai-generic adapter. (Boundary Documentation)

What BAML is (mental model)

BAML is a small language + compiler for type‑safe LLM functions. You define types (class, enum, unions) and “functions” that specify model, prompt (Jinja), and typed output. The compiler generates strongly‑typed clients for Python/TS/Go/Ruby, with first‑class testing, streaming, and observability. (Boundary Documentation)

Core flow

Write .baml types & functions → 2) Test inside IDE playground → 3) baml-cli generate to emit a baml_client in your language → 4) Call the generated functions in app code. (BAML, Boundary Documentation)

The ecosystem at a glance

Language & compiler

Typed schemas (class, enum, unions), Jinja prompts (use {{ ctx.output_format }} to inject schema/output instructions), tests in BAML, and streaming semantics. (Boundary Documentation)

CLI & codegen

baml-cli init | generate | test | dev | fmt generate typed SDKs for Python, TypeScript, Go, Ruby, and a REST/OpenAPI dev server for “any language” clients. TS generator supports ESM. (Boundary Documentation)

Editors / playground

VS Code extension (inline playground, prompt preview, raw cURL). JetBrains & Zed supported. (Visual Studio Marketplace, JetBrains Marketplace, Boundary Documentation)

Framework integrations

React/Next.js plugin generates server actions & React hooks (non‑streaming & streaming) for your BAML functions. (Boundary Documentation)

Observability

Local Collector to capture raw requests/responses/usage/timings. Hosted Boundary Studio for tracing, labeling, metrics, and post‑deploy evaluation. (Boundary Documentation, BAML)

Providers

Native providers for OpenAI, Google Gemini, Azure OpenAI, etc., plus openai-generic for OpenAI‑compatible APIs (Groq, LM Studio, OpenRouter, Together, Hugging Face, vLLM, Tinfoil, Cerebras). Fallback/round‑robin strategies included. (Boundary Documentation)

Techniques & patterns that work well

1) Schema‑Aligned Parsing (SAP) for structured output

BAML’s parser reads “sloppy” outputs and repairs them using your schema, which consistently beats API‑native structured outputs and grammar approaches on BFCL metrics in their write‑ups. Use it by simply defining types + {{ ctx.output_format }} in your prompts—no provider‑specific modes required. (BAML)

Why you care: works with more models, coexists with Chain‑of‑Thought in prompts, and avoids JSON‑mode constraints. (BAML)

2) Tool use ≈ structured outputs

Model “tools” as classes and the model’s choice as a typed union. Disambiguate similar tools with string‑literal sentinel fields (e.g., tool_name "get_weather"). You then switch on the returned type in your app. (Boundary Documentation)

3) Semantic streaming (not token streaming)

Annotate output types to stream semantically valid partials (e.g., @stream.not_null). Your app receives typed partials and a typed final. Much easier for UI than token streams. (Boundary Documentation)

4) Dynamic types at runtime

When categories/fields are not known at compile time, mark types/enums @@dynamic and extend at runtime using the TypeBuilder (Python/TS/Go/Ruby). Great for “user‑defined schema” or “DB‑backed categories.” (Boundary Documentation)

5) Tests, checks & asserts in BAML

Keep evals close to the prompt: write `test { @@check …; @@assert … }` right next to your function and run in the IDE or CI. Checks don’t throw; asserts do (and can remove invalid container items). (Boundary Documentation)

6) Reliability strategies built‑in

Configure retry policies, fallback chains (e.g., try Groq Llama → OpenAI), and round‑robin load balancing at the client level. These are part of the BAML client strategies. (Boundary Documentation)

7) Provider portability without re‑work

Use openai-generic to point to OpenAI‑compatible endpoints (Groq, OpenRouter, Together, LM Studio, Cerebras, vLLM, Tinfoil, etc.)—change base_url, model, and you’re off. (Boundary Documentation)

8) Next.js streaming UX

Add the Next.js plugin to auto‑generate server actions + React hooks with streaming support for your functions. This pairs beautifully with semantic streaming. (Boundary Documentation)

9) Prompt hygiene that matters

Always include {{ ctx.output_format }} (or a customized version) to imprint the schema & guardrails. (Boundary Documentation)
Use @description and aliases on fields to increase extraction accuracy without bloating text. (Boundary Documentation)
Prefer enum for classification over free strings; change to dynamic if taxonomy evolves. (Boundary Documentation)

10) Observability from day one

Turn on the local Collector to capture raw requests/responses & token usage while iterating. (Boundary Documentation)
Wire up Boundary Studio to get pipeline‑level traces/evals in production (Studio is the paid/hosted part). (BAML, Y Combinator)

Notable recent features (mid‑2025 highlights)

BAML VM (WIP): groundwork for loops, assignment/bitwise ops—part of the push toward Workflows/agent syntax. (WIP; expect changes.) (Boundary Documentation)
JetBrains & Zed IDE support + Boundary Studio v2 alpha shipped in recent releases. (Boundary Documentation)
REST API moved out of preview; stream support documented; easier to expose functions to other languages via OpenAPI. (Boundary Documentation)
Next.js integration & ESM option in TS generator; Ruby 3.4 and Windows ARM64 (Python) supported. (Boundary Documentation)
Provider improvements: Claude via Vertex, OpenAI audio input, and many OpenAI‑compatible endpoints via openai-generic. (Boundary Documentation)

Services, tools & where they fit

Open‑source core: language, CLI, IDE extensions, runtime clients. (GitHub)
Boundary Studio (hosted): tracing, labeling, evals/metrics after deploy. (Commercial; simple marketing page + YC profile.) (BAML, Y Combinator)
Playground in editor (VS Code/JetBrains/Zed) for instant tests & raw cURL visibility. (Visual Studio Marketplace, JetBrains Marketplace)
Examples & templates: BAML example repo and interactive examples (RAG, PII scrub, function tools, CoT). (GitHub, Boundary Documentation)

Quick start “recipes”

1) Typed extraction service (Python)

pip install baml-py && baml-cli init && baml-cli generate
Define class types + function ExtractX(...) -> MyType with {{ ctx.output_format }}; write a test block; run in playground; then import from baml_client import b in your FastAPI worker. (Boundary Documentation)

2) Tool‑using agent (any language)

Model each tool as a class, have your selection function return a union; switch on the returned type at runtime; optionally stream. (Boundary Documentation)

3) Add streaming UI (Next.js)

npm i @boundaryml/baml @boundaryml/baml-nextjs-plugin → configure plugin → baml-cli init → baml-cli generate → use the generated hooks; annotate long fields with streaming attributes. (Boundary Documentation)

4) Keep providers portable

Start with openai-generic + base_url for your chosen host (Groq, OpenRouter, Together, LM Studio, Cerebras, Tinfoil, vLLM). Add fallback and round‑robin if you need resiliency. (Boundary Documentation)

Tips & “gotchas”

Always test: Put test { @@check …; @@assert … } beside your function; run in CI (baml-cli test). (Boundary Documentation)
Don’t rely on JSON mode: SAP lets you keep CoT and still get correct JSON. (BAML)
Azure OpenAI needs an explicit api_version or requests fail. (Boundary Documentation)
Class syntax: properties don’t use :; inheritance is intentionally not supported—prefer composition. (Boundary Documentation)
Streaming compatibility varies; set supports_streaming flags per provider if required. (Boundary Documentation)

Learn more (good entry points)

Docs home (big‑picture & quickstarts). (Boundary Documentation)
Interactive examples (RAG, tools, CoT, PII). (Boundary Documentation)
Semantic streaming explainer (with examples). (Boundary Documentation)
Workflows Tech Preview (where the language is headed). (BAML)
VS Code extension (playground in‑IDE). (Visual Studio Marketplace)

Bottom line

If you want reliable structured outputs, tool use without provider lock‑in, typed streaming, and fast inner‑loop iteration (tests + prompt preview + raw cURL inside your IDE), BAML’s current stack is one of the strongest options—and it’s improving quickly. Use the stable parts (language/CLI/SDKs/streaming), try Workflows in a sandbox, and hook up Studio when you need production‑grade telemetry/evals. (Boundary Documentation, BAML)