Docs › Getting Started › Introduction
Introduction
ODIN Catalog is an open-source data catalog built on W3C and OMG standards. It bridges the gap between raw
technical metadata and business understanding — giving data teams a semantic layer, end-to-end lineage, and
AI-powered discovery out of the box.
What makes ODIN different
Most data catalogs stop at documentation. ODIN goes further:
- Semantic vocabulary mappings — every data element can be bound to a concept in FIBO,
schema.org, or your own ontology using SKOS match types.
- Live lineage graph — OpenLineage events and SQL DDL are parsed into an Apache AGE
property graph queryable by Cypher.
- Data product governance — the DPROD standard gives every dataset a business owner,
lifecycle stage, and access policy.
- AI-powered Q&A — a Spring AI RAG pipeline runs over your metadata corpus using Ollama
(local) or OpenAI.
- Zero lock-in — all metadata is exportable as DCAT 3.0 JSON-LD.
Standards at the core
| Standard |
Body |
Role in ODIN |
| DCAT 3.0 |
W3C |
Catalog, Dataset, Distribution, DataService resources |
| DPROD |
OMG |
DataProduct, Port, lifecycle, access policy |
| CSV-W |
W3C |
Physical schema (table, column, datatype) harvested from source systems |
| OpenLineage |
Linux Foundation |
Job/Run/Dataset lineage events ingested via REST |
| FIBO |
EDM Council |
Pre-loaded financial ontology vocabulary (FND, FBC, SEC, MD) |
| SKOS |
W3C |
Mapping properties: exactMatch, closeMatch, relatedMatch |
ℹ
ODIN is currently in private alpha. APIs and database schemas may change between
releases. Not recommended for production workloads yet.
Docs › Getting Started › Quick Start
Quick Start
Get a full ODIN stack running locally in under five minutes using Docker Compose.
1. Clone and configure
git clone https://github.com/odin-catalog/odin.git
cd odin
cp .env.example .env # review and edit credentials
2. Start the stack
make up
# or: docker compose up -d
# Watch services come healthy:
docker compose ps
Services start in dependency order. Allow ~60 seconds for Kafka, PostgreSQL, and OpenSearch to initialise
before the Spring Boot services become healthy.
3. Create your first dataset
curl -s -X POST http://localhost:8001/api/v1/datasets \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-local" \
-H "X-Tenant-Id: 00000000-0000-0000-0000-000000000001" \
-d '{
"title": "Trade Blotter",
"description": "Intraday trade records from the front-office OMS.",
"keywords": ["trading", "blotter", "positions"],
"accrualPeriodicity": "daily"
}' | jq .
4. Open the frontend
| App |
URL |
Purpose |
| Producer (management) |
http://localhost:3000 |
Publish, govern, harvest |
| Consumer (discovery) |
http://localhost:3001 |
Search, explore, ask AI |
✓
The dev API key X-API-Key: dev-* (any value starting with dev-) grants full
catalog:admin scope and bypasses Keycloak. Use it for local smoke testing only.
5. Load sample data
make seed # loads financial services sample data
make reindex # pushes all datasets into OpenSearch
The seed script creates 12 financial datasets, 5 data products, logical models with FIBO vocabulary mappings,
and OpenLineage pipeline events for a BCBS 239 risk aggregation scenario.
Docs › Getting Started › Prerequisites
Prerequisites
Runtime requirements
| Requirement |
Minimum version |
Notes |
| Docker |
25.0 |
Docker Desktop or Docker Engine on Linux |
| Docker Compose |
v2.24 |
Bundled with Docker Desktop; docker compose (v2 plugin) |
| RAM |
12 GB available |
OpenSearch and Kafka are the largest consumers |
| Disk |
8 GB free |
Container images + volumes |
Development requirements
| Requirement |
Version |
Notes |
| Java |
21 (LTS) |
Required to build services; virtual threads (Project Loom) |
| Gradle |
8.x |
Wrapper included; run ./gradlew |
| Node.js |
20 LTS |
Required for frontend builds |
| pnpm |
9.x |
npm install -g pnpm |
Optional — AI features
# Install Ollama for local LLM inference
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull nomic-embed-text # embedding model (768 dimensions)
ollama pull llama3 # chat model
# Then start the AI profile:
docker compose --profile ai up -d
Without Ollama, the ai-service will not start. All other services function normally. You can also configure
an OpenAI key in .env instead.
Docs › Getting Started › Configuration
Configuration
All runtime configuration is driven by environment variables. Copy .env.example to
.env and edit before running make up.
Core variables
| Variable |
Default |
Description |
POSTGRES_PASSWORD |
odin |
Shared password for all Postgres instances (change in production) |
KEYCLOAK_ADMIN |
admin |
Keycloak admin username |
KEYCLOAK_ADMIN_PASSWORD |
admin |
Keycloak admin password |
MINIO_ROOT_USER |
minio |
MinIO root access key |
MINIO_ROOT_PASSWORD |
minio123 |
MinIO root secret key |
JWT_SECRET |
— |
HS256 secret for dev API key validation (32+ chars) |
AI variables
| Variable |
Default |
Description |
OLLAMA_BASE_URL |
http://ollama:11434 |
Ollama inference endpoint |
OPENAI_API_KEY |
(empty) |
If set, OpenAI is used for embeddings and chat instead of Ollama |
AI_CHAT_MODEL |
llama3 |
Ollama model name for chat completions |
AI_EMBED_MODEL |
nomic-embed-text |
Embedding model; must produce 768-dimension vectors |
⚠
The default .env.example values are intentionally weak. Change all passwords and secrets
before exposing any port to a network.
Docs › Architecture › Overview
Architecture Overview
ODIN follows Domain-Driven Design with a database-per-service pattern. Six Spring Boot 3.3 microservices
communicate via Kafka events. Traefik routes external HTTP traffic.
Browser / CLI
│
▼
Traefik (port 80/443)
├── catalog.local/ → consumer-frontend (nginx, port 3001)
├── manage.catalog.local/ → producer-frontend (nginx, port 3000)
└── api.catalog.local/ → services (ports 8001–8006)
│
├── catalog-service :8001 PostgreSQL :5433
├── harvest-service :8002 PostgreSQL :5434 + MinIO :9000
├── lineage-service :8003 PostgreSQL+AGE :5435
├── search-service :8004 OpenSearch :9200
├── ai-service :8005 PostgreSQL+pgvector :5437
└── identity-service :8006 PostgreSQL :5436 + Keycloak :8180
│
└─── Apache Kafka :9092 (KRaft, no ZooKeeper)
Design principles
- API-first — every capability is a versioned REST endpoint before any UI is built on top.
- Database-per-service — no service shares a database with another. Cross-service reads go
through REST or Kafka events.
- Event-driven — state changes publish Kafka events on log-compacted topics. Downstream
services maintain their own read models.
- Standards-based exports — the catalog exports DCAT 3.0 JSON-LD via Apache Jena; the
lineage service accepts OpenLineage JSON.
Docs › Architecture › Services
Services
| Service |
Port |
Database |
Responsibility |
| catalog-service |
8001 |
PostgreSQL 16 |
DCAT/DPROD/CSV-W metadata, logical models, vocabulary mappings, Kafka event publisher |
| harvest-service |
8002 |
PostgreSQL 16 + MinIO |
Spring Batch crawlers for Snowflake, AWS Glue, Teradata, DCAT HTTP; Quartz scheduler |
| lineage-service |
8003 |
PostgreSQL + Apache AGE |
OpenLineage REST ingestion, DDL parsing via Calcite, Cypher graph queries |
| search-service |
8004 |
OpenSearch 2.x |
Full-text + semantic indexing, FIBO facets, autocomplete suggestions |
| ai-service |
8005 |
PostgreSQL + pgvector |
Spring AI RAG pipeline, embeddings, SSE chat streaming, Ollama / OpenAI |
| identity-service |
8006 |
PostgreSQL 16 |
Keycloak OAuth2/OIDC integration, ABAC policies, API keys, tenant management |
Docs › Architecture › Event Topology
Event Topology
All inter-service communication uses Kafka with an envelope schema that carries tenant, event type, and
schema version on every message.
Topics
| Topic |
Producer |
Consumers |
Compacted |
catalog.datasets.changes |
catalog-service |
search-service, ai-service |
Yes |
catalog.data-products.changes |
catalog-service |
search-service, ai-service |
Yes |
harvest.entities.discovered |
harvest-service |
catalog-service |
No |
harvest.ddl.discovered |
harvest-service |
lineage-service |
No |
lineage.graph.updated |
lineage-service |
search-service |
No |
Event envelope
{
"eventId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"eventType": "DatasetCreated",
"schemaVersion": "1.0",
"producerService": "catalog-service",
"tenantId": "00000000-0000-0000-0000-000000000001",
"timestamp": "2026-05-18T10:23:00Z",
"payload": { ... }
}
Docs › Architecture › Security Model
Security Model
Authentication methods
| Method |
Header |
Use case |
| Bearer JWT (OIDC) |
Authorization: Bearer <token> |
User sessions via Keycloak |
| API Key |
X-API-Key: <key> |
Service-to-service, CI pipelines, curl |
| Dev key |
X-API-Key: dev-* |
Local development only — bypasses auth entirely |
Tenant isolation
Every resource row carries a tenant_id UUID. The X-Tenant-Id header (set by Traefik
or the frontend nginx) scopes all queries. Rows from other tenants are never returned.
✕
Never use X-API-Key: dev-* in production. It grants unrestricted admin access to all
tenants.
Docs › Data Model › Metamodel Overview
Metamodel Overview
ODIN's metamodel has three tiers: conceptual (business), logical (semantic), and physical (technical).
Conceptual DataProduct (DPROD)
├── InputPort → DataService
└── OutputPort → DataService → Distribution
Logical Dataset (DCAT)
├── VocabularyProfile → Vocabulary (FIBO / schema.org)
└── LogicalModel
└── LogicalDataElement
├── physicalColumnId ──FK──┐
└── VocabularyMapping │ (SKOS)
│
Physical Distribution (DCAT) │
└── CSVWTable → CSVWSchema → CSVWColumn ◄┘
Lineage OpenLineage Job ─[READS_FROM/WRITES_TO]→ Dataset
(stored in Apache AGE graph, linked to DCAT Dataset)
Layer responsibilities
| Layer |
Key entities |
Purpose |
| Conceptual |
DataProduct, Port, DataService |
Business ownership, governance, lifecycle (Ideation → Consume) |
| Logical |
LogicalModel, LogicalDataElement, VocabularyMapping |
Business meaning, semantic annotations, vocabulary alignment |
| Physical |
Distribution, CSVWTable, CSVWColumn |
Technical structure as harvested from source systems |
Docs › Data Model › DCAT Datasets
DCAT Datasets & Distributions
ODIN models datasets and distributions using the DCAT 3.0
vocabulary. The full catalog can be exported as DCAT JSON-LD.
Dataset fields
| Field |
DCAT property |
Type |
Notes |
title |
dct:title |
string |
Human-readable name |
description |
dct:description |
string |
Free-text description |
keywords |
dcat:keyword |
string[] |
Used for search facets |
themes |
dcat:theme |
string[] |
Domain classification IRIs |
accrualPeriodicity |
dct:accrualPeriodicity |
string |
e.g. daily, hourly |
license |
dct:license |
URI |
License IRI |
conformsTo |
dct:conformsTo |
URI[] |
Standards this dataset conforms to |
DCAT export
# Export full catalog as DCAT 3.0 JSON-LD
curl http://localhost:8001/api/v1/catalogs/{id}/export \
-H "Accept: application/ld+json" \
-H "X-API-Key: dev-local"
Docs › Data Model › DPROD Data Products
DPROD Data Products
Data products are modelled using the OMG DPROD standard. A data
product represents a business-owned, governed unit of data with a defined lifecycle.
Lifecycle stages
| Stage |
Description |
| Ideation |
Concept identified; no data yet |
| Design |
Schema and SLA being defined |
| Build |
Pipeline under development |
| Deploy |
Running in production, not yet published |
| Consume |
Publicly available for consumers |
Create a data product
curl -X POST http://localhost:8001/api/v1/data-products \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-local" \
-d '{
"title": "Trade Risk Data Product",
"description": "Aggregated risk metrics for regulatory reporting.",
"lifecycleStatus": "Consume",
"keywords": ["risk", "trading", "BCBS239"],
"informationSensitivity": "Internal"
}'
Docs › Data Model › CSV-W Physical Schema
CSV-W Physical Schema
The physical layer is modelled using CSV on the Web
(CSV-W). Each distribution that is harvested from a source system produces a CSVWTable with
a CSVWSchema containing one CSVWColumn per field.
Column fields
| Field |
Type |
Description |
name |
string |
Column name as it appears in the source system |
titles |
string[] |
Alternate names / aliases |
datatype |
string |
Source system type: DECIMAL(18,4), VARCHAR(50), etc. |
required |
boolean |
Whether the column is NOT NULL |
description |
string |
Column comment from the source DDL |
propertyUrl |
URI |
Linked Data property IRI if available |
Physical columns are created automatically during harvest. A physicalColumnId UUID is assigned
to each and can be referenced from a LogicalDataElement to create the logical–physical binding.
Docs › Data Model › Logical Models
Logical Models
A LogicalModel belongs to a Dataset and provides the business-oriented view of its
structure. It contains LogicalDataElements — each representing a named business concept with
an optional binding to a physical column and zero or more vocabulary mappings.
LogicalDataElement fields
| Field |
Type |
Description |
name |
string |
Business name: Trade Amount, Settlement Currency |
logicalType |
string |
Semantic type: MonetaryAmount, Identifier, Date,
Party
|
physicalColumnId |
UUID (nullable) |
FK to harvested csvw_columns row; null until harvest or manual bind |
isIdentifier |
boolean |
True if this element forms part of the logical primary key |
isNullable |
boolean |
Whether the business concept permits absence of a value |
Bind a physical column
curl -X POST \
http://localhost:8001/api/v1/logical-data-elements/{elementId}/bind \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-local" \
-d '{ "physicalColumnId": "b3f1a2e4-..." }'
Auto-scaffold from harvest
When a harvest run discovers columns for a dataset that has no published LogicalModel, ODIN automatically
generates a draft LogicalModel with one LogicalDataElement per CSVWColumn.
Each element has its physicalColumnId pre-bound and its logicalType inferred from
the source datatype. You can then enrich the draft with business names and vocabulary mappings.
Docs › Data Model › Vocabulary & FIBO
Vocabulary & FIBO
ODIN ships with six system vocabularies pre-loaded. You can register additional RDF vocabularies at any time.
Pre-loaded vocabularies
| Vocabulary |
Prefix |
Type |
Base IRI |
| schema.org |
schema |
general |
https://schema.org/ |
| FIBO FND |
fibo-fnd |
financial |
https://spec.edmcouncil.org/fibo/ontology/FND/ |
| FIBO FBC |
fibo-fbc |
financial |
https://spec.edmcouncil.org/fibo/ontology/FBC/ |
| FIBO SEC |
fibo-sec |
financial |
https://spec.edmcouncil.org/fibo/ontology/SEC/ |
| FIBO MD |
fibo-md |
financial |
https://spec.edmcouncil.org/fibo/ontology/MD/ |
| SKOS |
skos |
general |
http://www.w3.org/2004/02/skos/core# |
SKOS match types
| Match type |
When to use |
exactMatch |
The element represents precisely the same concept |
closeMatch |
Very similar but not identical (e.g. trade date ↔ schema:startDate) |
relatedMatch |
Related but distinct concepts |
broadMatch |
The vocabulary concept is broader / more general |
narrowMatch |
The vocabulary concept is narrower / more specific |
Add a vocabulary mapping
curl -X POST \
http://localhost:8001/api/v1/logical-data-elements/{elementId}/vocab-mappings \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-local" \
-d '{
"vocabularyId": "...",
"conceptIri": "https://spec.edmcouncil.org/fibo/ontology/FND/Accounting/CurrencyAmount/MonetaryAmount",
"conceptLabel": "MonetaryAmount",
"matchType": "exactMatch"
}'
Docs › Data Model › OpenLineage
OpenLineage Integration
ODIN's lineage-service exposes an OpenLineage-compatible HTTP endpoint. Any tool that emits OpenLineage
events (Spark, dbt, Airflow, Flink) can send lineage directly to ODIN.
Send a lineage event
curl -X POST http://localhost:8003/api/v1/lineage \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-local" \
-d '{
"eventType": "COMPLETE",
"eventTime": "2026-05-18T10:00:00Z",
"producer": "https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark",
"schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent",
"run": { "runId": "3fa85f64-5717-4562-b3fc-2c963f66afa6" },
"job": { "namespace": "TRADING_DB", "name": "risk_aggregation_job" },
"inputs": [{ "namespace": "TRADING_DB.BLOTTER", "name": "TRADE_BLOTTER" }],
"outputs": [{ "namespace": "REGULATORY_DB.BCBS239", "name": "RISK_AGGREGATION" }]
}'
Query lineage graph
# Upstream lineage, 4 hops
curl "http://localhost:8003/api/v1/datasets/REGULATORY_DB.BCBS239/RISK_AGGREGATION/lineage?direction=upstream&depth=4" \
-H "X-API-Key: dev-local"
# Downstream impact analysis
curl "http://localhost:8003/api/v1/datasets/TRADING_DB.BLOTTER/TRADE_BLOTTER/impact" \
-H "X-API-Key: dev-local"
Lineage is stored in an Apache AGE property graph on PostgreSQL. Cypher queries traverse
DERIVED_FROM, READ_BY, and WRITES_TO edges. Column-level lineage uses
COLUMN_LINEAGE edges.
Docs › Services › Catalog Service
Catalog Service
The catalog-service is the primary metadata store. It owns all DCAT, DPROD, CSV-W, logical model, and
vocabulary resources. All other services treat it as the source of truth.
Key responsibilities
- Persist and version DCAT Datasets, Distributions, DataServices, Catalogs
- Persist DPROD DataProducts, Ports, and lifecycle transitions
- Store CSV-W tables and columns (populated by harvest events)
- Manage LogicalModels and LogicalDataElements with physical column bindings
- Maintain the vocabulary registry and per-dataset vocabulary profiles
- Export the full catalog as DCAT 3.0 JSON-LD via Apache Jena
- Publish
catalog.*.changes Kafka events on all mutations
Database
PostgreSQL 16 on port 5433 (Docker). Migrations managed by Flyway. Key tables: resources,
datasets, distributions, data_products, csvw_columns,
logical_models, logical_data_elements, vocabularies.
Docs › Services › Harvest Service
Harvest Service
The harvest-service crawls external data sources, normalises their metadata, and publishes it to Kafka for
the catalog-service to ingest. Jobs are scheduled with Quartz and executed as Spring Batch jobs.
Supported connectors
| Connector |
Source type |
What it harvests |
dcat_http |
Any DCAT HTTP endpoint |
Datasets, distributions via Apache Jena (JSON-LD, Turtle, RDF/XML) |
aws_glue |
AWS Glue Data Catalog |
Databases, tables, columns, partitions via AWS SDK v2 |
snowflake |
Snowflake |
SHOW TABLES, DESCRIBE TABLE, GET DDL |
teradata |
Teradata |
DBC.TablesV, DBC.ColumnsV, DBC.ShowSQL |
Configure a source
curl -X POST http://localhost:8002/api/v1/sources \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-local" \
-d '{
"name": "Snowflake Production",
"sourceType": "snowflake",
"baseUrl": "orgname-accountname.snowflakecomputing.com",
"databaseName": "TRADING_DB",
"schemaFilter": ["BLOTTER", "RISK"],
"credentialRef": "vault://snowflake/prod"
}'
Docs › Services › Lineage Service
Lineage Service
The lineage-service ingests OpenLineage events and DDL, persists them to PostgreSQL, and builds a property
graph in Apache AGE for multi-hop Cypher traversal.
DDL lineage
Submit raw DDL to extract lineage without running a pipeline:
curl -X POST http://localhost:8003/api/v1/ddl/submit \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-local" \
-d '{
"dialect": "SNOWFLAKE",
"ddl": "CREATE VIEW RISK_DB.MARKET_RISK.DAILY_POSITIONS AS SELECT t.*, p.close_price FROM TRADING_DB.BLOTTER.TRADE_BLOTTER t JOIN kafka://prices-realtime p ON t.instrument_id = p.instrument_id"
}'
Apache Calcite parses the DDL across Snowflake, Teradata, and Hive dialects. A DERIVED_FROM edge
is created in the AGE graph between each source table and the view.
Docs › Services › Search Service
Search Service
The search-service maintains an OpenSearch index that is enriched with logical model data, vocabulary concept
labels, and FIBO IRIs. It consumes Kafka events to stay in sync with the catalog.
Search query
# Full-text search with filters
curl "http://localhost:8004/api/v1/search?q=trade&type=dataset&domain=Finance&hasLineage=true" \
-H "X-API-Key: dev-local"
# FIBO concept facet search
curl "http://localhost:8004/api/v1/search?fibo_concept=MonetaryAmount" \
-H "X-API-Key: dev-local"
# Autocomplete suggestions
curl "http://localhost:8004/api/v1/search/suggest?q=trad" \
-H "X-API-Key: dev-local"
Reindex
curl -X POST http://localhost:8004/api/v1/admin/reindex \
-H "X-API-Key: dev-local"
Docs › Services › AI Service
AI Service
The ai-service provides a RAG (Retrieval-Augmented Generation) pipeline over your metadata corpus using
Spring AI. It can run fully on-premises with Ollama or use the OpenAI API.
Start a conversation
# Create a conversation
CONV=$(curl -s -X POST http://localhost:8005/api/v1/conversations \
-H "X-API-Key: dev-local" -H "Content-Type: application/json" \
-d '{"title": "My session"}' | jq -r .id)
# Ask a question (streaming SSE response)
curl -N -X POST http://localhost:8005/api/v1/conversations/$CONV/messages \
-H "Content-Type: application/json" \
-H "X-API-Key: dev-local" \
-H "Accept: text/event-stream" \
-d '{"content": "Which datasets contain monetary amounts mapped to FIBO?"}'
Embedding pipeline
The ai-service listens on catalog.datasets.changes and
catalog.data-products.changes. On each event it fetches the enriched entity from catalog-service,
chunks the text (title + description + element names + vocabulary labels), embeds the chunks using the
configured model, and upserts into the pgvector store.
✓
The ai-service is optional. Start it with
docker compose --profile ai up -d ai-service ollama. All other services run without it.
Docs › Services › Identity Service
Identity Service
The identity-service manages organisations, users, roles, and access policies. It integrates with Keycloak 24
for OIDC token issuance and validation.
Keycloak configuration
A realm export is included at infra/keycloak/realm-export.json. The realm is imported
automatically on first startup. Keycloak runs on port 8180.
ℹ
Keycloak 24 uses KEYCLOAK_ADMIN and KEYCLOAK_ADMIN_PASSWORD environment
variables. The old KC_BOOTSTRAP_ADMIN_* variables are not supported.
Docs › API Reference › Authentication
Authentication
Request headers
| Header |
Required |
Description |
Authorization: Bearer <jwt> |
One of these two |
Keycloak OIDC access token |
X-API-Key: <key> |
One of these two |
API key from identity-service |
X-Tenant-Id: <uuid> |
Yes |
Tenant scoping — set by Traefik/nginx in production |
Dev API key
Any key starting with dev- bypasses authentication and grants catalog:admin scope.
The tenant ID defaults to 00000000-0000-0000-0000-000000000001.
# These are equivalent for local development:
curl -H "X-API-Key: dev-anything" ...
curl -H "X-API-Key: dev-local" ...
curl -H "X-API-Key: dev-my-test-key" ...
Docs › API Reference › Catalog API
Catalog API :8001
Datasets
GET/api/v1/datasets
List datasets. Query params: page, size, domain,
format.
POST/api/v1/datasets
Create a new dataset.
GET/api/v1/datasets/{id}
Get dataset by ID.
PATCH/api/v1/datasets/{id}
Partial update.
DELETE/api/v1/datasets/{id}
Soft-delete a dataset.
Data Products
GET/api/v1/data-products
List. Query params: lifecycleStatus, domain.
POST/api/v1/data-products
Create a data product.
PATCH/api/v1/data-products/{id}/lifecycle
Transition lifecycle: {"status":"Deploy"}.
Logical Models
GET/api/v1/datasets/{id}/logical-models
List logical models for a dataset.
POST/api/v1/datasets/{id}/logical-models
Create a new logical model.
GET/api/v1/logical-models/{id}/elements
List data elements.
POST/api/v1/logical-data-elements/{id}/bind
Bind element to a physical column.
POST/api/v1/logical-data-elements/{id}/vocab-mappings
Add a SKOS vocabulary mapping.
Vocabularies
GET/api/v1/vocabularies
List all registered vocabularies.
GET/api/v1/vocabularies/{id}/concepts/search
Search concepts. Query: q=price&limit=20.
GET/api/v1/catalogs/{id}/export
Export as DCAT JSON-LD. Set Accept: application/ld+json.
Docs › API Reference › Harvest API
Harvest API :8002
Sources
GET/api/v1/sources
List all harvest sources. Filter by type.
POST/api/v1/sources
Register a new source.
POST/api/v1/sources/{id}/test
Test connectivity to the source.
Jobs & Runs
GET/api/v1/jobs
List harvest jobs.
POST/api/v1/jobs
Create a scheduled or on-demand job.
POST/api/v1/jobs/{id}/trigger
Trigger an immediate harvest run.
POST/api/v1/jobs/{id}/cancel
Cancel a running job.
GET/api/v1/runs/{id}/items
Inspect per-entity results of a run.
Docs › API Reference › Lineage API
Lineage API :8003
POST/api/v1/lineage
Ingest an OpenLineage RunEvent (START, RUNNING, COMPLETE, FAIL, ABORT).
GET/api/v1/datasets/{ns}/{name}/lineage
Graph traversal. Params: direction=upstream|downstream, depth=1..10.
GET/api/v1/datasets/{ns}/{name}/column-lineage
Column-level lineage. Params: column, direction.
GET/api/v1/datasets/{ns}/{name}/impact
Downstream impact analysis — all datasets that depend on this one.
POST/api/v1/ddl/submit
Submit DDL for Calcite parsing. Body: {"dialect":"SNOWFLAKE","ddl":"CREATE VIEW ..."}.
GET/api/v1/jobs/{ns}/{name}/runs
Run history for a job. Filter by state, start, end.
Docs › API Reference › Search API
Search API :8004
GET/api/v1/search
Full-text search. Params: q, type, domain,
lifecycleStatus, format, hasLineage, fibo_concept,
page, size.
GET/api/v1/search/suggest
Autocomplete. Param: q. Returns up to 10 suggestions.
POST/api/v1/search/saved
Save a search query for later reuse.
POST/api/v1/admin/reindex
Trigger a full reindex from catalog-service. Admin only.
Docs › API Reference › AI API
AI API :8005
GET/api/v1/conversations
List conversations for the current user.
POST/api/v1/conversations
Create a new conversation. Body: {"title":"..."}.
POST/api/v1/conversations/{id}/messages
Send a message. Set Accept: text/event-stream for SSE streaming. Response tokens arrive as
data: <token> events.
POST/api/v1/semantic-search
Vector similarity search. Body: {"query":"...","limit":10,"entityTypes":["dataset"]}.
POST/api/v1/admin/embeddings/refresh
Re-embed all documents. Admin only.
Docs › Deployment › Docker Compose
Docker Compose
The repository ships a docker-compose.yml that starts the full stack and a
docker-compose.override.yml for development hot-reload overrides.
Makefile targets
| Target |
Description |
make up |
Start all services in detached mode |
make down |
Stop and remove containers (preserves volumes) |
make destroy |
Stop containers and delete all volumes |
make migrate |
Run Flyway migrations manually |
make seed |
Load financial services sample data |
make reindex |
Full OpenSearch reindex |
make build |
Build all Docker images from source |
make test |
Run all service tests in Docker |
make logs svc=catalog-service |
Tail logs for a specific service |
Profiles
| Profile |
Additional services |
| (default) |
All 6 services + Kafka + PostgreSQL × 4 + OpenSearch + MinIO + Keycloak |
ai |
Adds ai-service and Ollama |
# Start including AI features
docker compose --profile ai up -d
Docs › Deployment › Environment Variables
Environment Variables
Full reference
| Variable |
Service |
Default |
Description |
POSTGRES_PASSWORD |
All |
odin |
PostgreSQL password (shared) |
KEYCLOAK_ADMIN |
identity |
admin |
Keycloak admin username |
KEYCLOAK_ADMIN_PASSWORD |
identity |
admin |
Keycloak admin password |
JWT_SECRET |
All |
— |
HS256 signing secret for dev API keys |
MINIO_ROOT_USER |
harvest |
minio |
MinIO access key |
MINIO_ROOT_PASSWORD |
harvest |
minio123 |
MinIO secret key |
OPENSEARCH_PASSWORD |
search |
admin |
OpenSearch admin password |
OLLAMA_BASE_URL |
ai |
http://ollama:11434 |
Ollama base URL |
OPENAI_API_KEY |
ai |
(empty) |
OpenAI key; takes precedence over Ollama if set |
AI_CHAT_MODEL |
ai |
llama3 |
Chat model name |
AI_EMBED_MODEL |
ai |
nomic-embed-text |
Embedding model (must be 768-dim) |
SNOWFLAKE_ACCOUNT |
harvest |
— |
Snowflake account identifier |
AWS_ACCESS_KEY_ID |
harvest |
— |
AWS credentials for Glue connector |
AWS_SECRET_ACCESS_KEY |
harvest |
— |
AWS credentials for Glue connector |
AWS_REGION |
harvest |
us-east-1 |
AWS region for Glue |
Docs › Deployment › Kubernetes (Helm)
Kubernetes (Helm)
Helm charts are provided under infra/helm/charts/ for each service. A top-level umbrella chart
is planned for GA.
⚠
Kubernetes deployment is in early stages. The Helm charts are functional but not hardened for production.
Use Docker Compose for evaluation and development.
# Install catalog-service chart
helm install odin-catalog infra/helm/charts/catalog-service \
--namespace odin --create-namespace \
--set postgresql.password=changeme \
--set kafka.brokers=kafka:9092
Docs › Contributing › Local Development
Local Development
Build all services
./gradlew build # compile + test all services
./gradlew :services:catalog-service:bootRun # run one service
Build the frontends
cd frontend/shared && pnpm install && pnpm build
cd ../producer && pnpm install && pnpm dev # http://localhost:3000
cd ../consumer && pnpm install && pnpm dev # http://localhost:3001
Rebuild a single Docker image
# Always build then up — `restart` reuses the old image
docker compose build catalog-service
docker compose up -d catalog-service
Docs › Contributing › Contribution Guide
Contribution Guide
Before you start
- Open an issue describing the bug or feature before submitting a PR.
- One logical change per PR — keep diffs reviewable.
- All new API endpoints need an integration test that runs against a real database (no mocks).
Code conventions
- Java: hexagonal architecture — domain classes have no Spring annotations; all infrastructure concerns live
in the
infrastructure/ package.
- TypeScript: no
any; shared API types live in frontend/shared/src/types/.
- SQL: all schema changes via numbered Flyway migrations (
V{n}__description.sql).
License
ODIN Catalog is released under the Apache 2.0 License. By contributing you agree your
changes will be licensed under the same terms.