This repository supports two deployment paths:
- hosted deployment: Azure Container Apps or Render
- self-hosted deployment: Docker Compose via docker-compose.yml
For turning this into your own benchmark product, see BUILD_YOUR_OWN_BENCHMARK.md.
make compose-devThis brings up the normal local product surface without the demo answer service or demo seeder.
make compose-demoThis brings up the full local product surface and runs a seeded end-to-end verification pass.
make compose-faultsThis runs controlled Docker fault injection against the local stack, verifies the live HTTP outcomes, and restores the seeded demo deployment at the end.
docker compose --profile legacy --profile demo up -d --buildUse this when you want the full dataset instead of the two-example demo cap set by make compose-demo.
You can inject a different benchmark without editing code by passing dataset and evaluator overrides at launch time:
METIVTA_DATASET_NAME=My-Benchmark \
METIVTA_DATASET_LOCAL_PATH=/app/custom-dataset \
METIVTA_DATASET_FILES_QUESTIONS=questions.json \
METIVTA_DATASET_FILES_QUESTIONS_ONLY=questions-only.json \
METIVTA_EVALUATION_DAAT_EVALUATORS=hebrew_presence,url_format,response_length,daat_score \
docker compose --profile legacy --profile demo up -d --buildmake compose-dev-downOr, if you started the seeded demo stack:
make compose-demo-down- gateway:
http://localhost:18000 - health:
http://localhost:18000/health - readiness:
http://localhost:18000/ready - Scalar docs:
http://localhost:18000/api/v2/docs - runtime signup page:
http://localhost:18080/signup - legacy leaderboard page:
http://localhost:18080/leaderboard - dataset info:
http://localhost:18080/dataset-info
Render blueprint file:
Defined services:
metivta-fastapimetivta-flaskmetivta-worker
Set these in the Render dashboard for the relevant services:
DATABASE_URLMETIVTA_SECURITY_SECRET_KEYMETIVTA_WORKER_BROKERMETIVTA_WORKER_RESULT_BACKEND
Optional integrations:
ANTHROPIC_API_KEYLANGCHAIN_API_KEYBROWSERLESS_TOKEN
Azure Container Apps is fully supported and was validated with live hosted E2E checks for DAAT and MTEB flows.
- one Azure Resource Group
- one Azure Container Apps Environment
- one Azure Container Registry
- Container Apps for:
- gateway
- fastapi
- redis
- postgres
gatewayis external and serves the public domainfastapishould be internal-onlyredisandpostgresshould be internal-only
Set the gateway environment variable:
PUBLIC_DOCS_ONLY=trueIn this mode, public traffic is intentionally limited to:
//api/v2/docs/api/v2/openapi.json
If you only want a public homepage, API reference, and guide for a maintainer-run promotional site
such as metivta.co, build this static bundle instead of publishing the full application runtime:
make site-buildThis writes deployable files to dist/static-site/:
index.htmlguide/index.htmlsignup/index.htmlapi/v2/docs/index.htmlapi/v2/openapi.json
This is the correct artifact for Azure Static Web Apps, Azure Storage static website, or any other static host. This bundle is for the maintainer-operated docs/promotional site, not for benchmark operators running the full stack. Keep Azure Container Apps only when you want the public edge to proxy the live app during internal verification.
For temporary hosted E2E testing of auth/eval/leaderboard routes, set:
PUBLIC_DOCS_ONLY=falseAfter validation, set it back to true for docs-only public launch if desired.
For apex + www on Azure Container Apps:
@A -><gateway static IP>wwwCNAME -><gateway container app fqdn>asuidTXT -><customDomainVerificationId>asuid.wwwTXT -><customDomainVerificationId>
Both asuid and asuid.www are required for managed certificate validation on apex and www.
Before calling a full application deployment ready:
uv run ruff check .uv run mypy srcuv run pytest -qgo test -race ./...GET /healthreturns healthyGET /readyreturns all required dependencies as readyGET /api/v2/docsloadsGET /signuploadsGET /leaderboardloads- register -> login -> create API key works
- at least one DAAT evaluation works
- at least one MTEB evaluation works if retrieval mode is enabled
Before calling the public promo/docs site ready:
GET /loadsGET /guideloadsGET /api/v2/docsloadsGET /api/v2/openapi.jsonloads- the site does not expose runtime auth or evaluation routes publicly
/submitis retained for compatibility; new integrations should target/api/v2/*- keep full ground-truth datasets private and publish safe question-only views publicly
- when you customize the benchmark harness, update
config.toml, dataset files, and rubric files together