Show HN: Marmot – Single-binary data catalog (no Kafka, no Elasticsearch)

94 points - yesterday at 2:59 PM

Source

Comments

charlie-haley yesterday at 3:03 PM
Hey HN, I wanted to show off my project Marmot! I decided to build Marmot after discovering a lot of data catalogs can be complex and require many external dependencies such as Kafka, Elasticsearch or an external orchestrator like Airflow.

Marmot is a single Go binary backed by Postgres. That's it!

It already supports: Full-text search across tables, topics, queues, buckets, APIs Glossary and asset to term associations

Flexible API so it can support almost any data asset!

Terraform/Pulumi/CLI for managing a catalog-as-code

10+ Plugins (and growing)

Live demo: https://demo.marmotdata.io

pratio yesterday at 4:25 PM
Hey there, Great to see Marmot here and I'm a huge fan of your project. Recently, we deployed a catalog but we went with open-metadata https://open-metadata.org/ another amazing project.

What we missed on marmot was existing integrations with Airflow and other plugins like Tableau, PowerBI etc as well as other features such as sso, mcp etc.

We're an enterprise and needed a more mature product. Fingers crossed marmot reaches there soon.

hilti yesterday at 7:09 PM
I’ve been burned by metadata platforms twice now and honestly, it’s exhausting.

The demo is always incredible - finally, we’ll know where our data lives! No more asking “hey does anyone know which table has the real customer data?” in Slack at 3pm.

Then reality hits.

Week 1 looks great. Week 8, you search “customer data” and get back 47 tables with brilliant names like `customers_final_v3` and `cust_data_new`. Zero descriptions because nobody has time to write them.

You try enforcing it. Developers are already swamped and now you’re asking them to stop and document every column? They either write useless stuff like “customer table contains customers” or they just… don’t. Can’t really blame them.

Three months in, half the docs are outdated.

I don’t know. Maybe it’s a maturity thing? Or maybe we’re all just pretending we’re organized enough for these tools when we’re really not.

paddy_m yesterday at 3:36 PM
When should you reach for a data catalog via a data warehouse or data lake? If you are choosing a data catalog this is probably obvious to you, if you just happened on this HN post less so.

Also, what key decisions do other data catalogs make via your choices? What led to those decisions and what is the benefit to users?

badmonster today at 4:55 AM
This looks great! I'm curious about the plugin architecture - how does Marmot handle schema evolution and versioning across different data sources? For instance, if a Postgres table's schema changes, does the catalog automatically detect and update the lineage, or is there a manual reconciliation step?

Also, given that you're using OpenLineage for cross-system lineage tracking, have you considered building native integrations with data orchestration tools beyond Airflow (e.g., Dagster, Prefect) to automatically capture DAG-level lineage?

e1gen-v yesterday at 6:44 PM
How are you able to see a datasets lineage across storage types. For example how are you able to see that an s3 buckets files are the ancestor of some table in Postgres?
rawkode yesterday at 5:54 PM
This looks fantastic! I’ll need to explore building a SQLite / D1 plugin to consolidate all my worker data
mrbluecoat yesterday at 8:30 PM
If single binary is a selling point, why not use sqlite instead of postgres?
stym06 yesterday at 4:14 PM
How's it different from existing open source data catalogs like amundsen.io?
nchmy yesterday at 7:00 PM
Not to be confused with Marmot, the multi-master distributed SQLite server, which has been around for a couple years longer and just came out of 2 years in hibernation, shed its NATS/Raft fat in favour of a native gossip protocol for replication.

https://github.com/maxpert/marmot