Saltar al contenido
Integrafy-OS · 02 Data Lake

The single store of your truth.

Raw and normalized data in the same place. Native versioning with time travel. End-to-end lineage. Multi-tenant and time-based partitioning. S3-compatible on Apache Iceberg.

Diagram of the Integrafy-OS Data Lake

What makes this Data Lake different?

Two zones, one storage

Raw zone for data as it arrives; normalized zone with the canonical model applied. Both coexist and can be joined on demand.

Automatic lineage

Every row in the lake knows which event it came from, what transformations it went through, when it was written. Full traceability for audit and debug.

Time travel

Query the state of the lake at any past moment. SELECT ... AS OF yesterday. Historical reports without manual snapshots.

Smart partitioning

By date, tenant, event type. Queries read only the relevant partitions. Scales to terabytes without sacrificing speed.

Automatic compacting

The lake compacts small files into large ones to keep performance. No manual maintenance, no degradation over time.

Native multi-tenant

Each tenant has isolated partitions with ACLs. Security by default without replicating infrastructure.

Real example: resolving an accounting discrepancy

Problem: An invoice issued last month doesn't match the delivery note.

Without a lake: Search in ERP logs, EDI files, manual backups. Hours of work, sometimes days.

With Integrafy-OS Data Lake:

→ Query the order AS OF 30 days ago

→ Follow lineage back to the original eCommerce event

→ Compare with the current ERP state

→ The difference is obvious: a later manual adjustment broke the chain.

Frequently asked questions about Data Lake

What storage technology does the Data Lake use?

S3-compatible object storage (AWS S3, Cloudflare R2, MinIO on-prem) with Apache Iceberg or Parquet format. We pick the layer that fits the customer's environment: managed EU Cloud or On-Premise on your own infrastructure.

What is lineage and why does it matter?

Lineage is data traceability: which source it came from, what transformations it went through, where it's replicated. When a KPI looks off, instead of investigating blindly, you follow the lineage and find the origin in seconds. Essential for auditing and debugging.

Are previous versions of the data kept?

Yes. Native time travel: you can query the exact state of the lake at any past moment (configurable retention, typically 90 days). This lets you rebuild historical reports without manual snapshots.

How much storage do I need?

Depends on event volume and retention. For a typical industrial B2B (100-500 orders/day, 50k products) the initial lake runs around 10-50 GB. EU Cloud scales automatically; On-Premise is sized with your team.

Does the Data Lake replace my current data warehouse?

It can, but it's not mandatory. Integrafy-OS can coexist with your existing Snowflake/BigQuery/Redshift as a complementary source. Many customers keep their DWH for historical analytics and use Integrafy-OS for real-time operational data.

How many hours does your team lose hunting for data?

Free 30-minute diagnostic.