Saltar al contenido
Integrafy-OS · 01 Ingestion

From any source to the lake. In minutes.

ERP, eCommerce, CRM, API and file connectors on the same engine. Streaming and batch. Validation at entry. Retries with backoff. No manual pipelines.

Diagram of data ingestion in Integrafy-OS

What kind of sources?

Modern APIs

REST, GraphQL, gRPC. OAuth 2.0, JWT, API Key authentication. Rate limiting handled automatically.

Legacy protocols

SOAP, RFC, EDI (EDIFACT, X12), COM. For ERPs that haven't been updated in decades.

Databases

PostgreSQL, MySQL, SQL Server, Oracle, DB2. Change Data Capture when available; scheduled polling otherwise.

Files

CSV, JSON, XML, Parquet. On FTP, SFTP, S3, Azure Blob or local folders. Incremental processing.

Webhooks

Secure signed endpoint to receive push events. PrestaShop, Shopify, HubSpot, Salesforce — all supported.

Streaming

Kafka, RabbitMQ, AWS Kinesis. Real-time ingestion for high volume with at-least-once guarantees.

Visual pipeline: from event to Data Lake

1. Event reaches the connector (webhook, polling, file)
2. Signature and authentication validation
3. Declarative transformation (SQL, Python, JavaScript)
4. Schema and business rule validation
5. Write to the Data Lake with timestamp and lineage
6. Notify the rest of the system (event bus)

Frequently asked questions about Ingestion

What source types can I ingest?

Any source with a REST API, GraphQL, SOAP, database (PostgreSQL, MySQL, SQL Server, Oracle, DB2), files (CSV, JSON, Parquet, XML, EDI) over FTP/SFTP/S3, or inbound webhooks. If your system exposes data in some form, Integrafy-OS can read it.

How are changing schemas handled?

Connectors support schema-on-read (raw ingestion, schema applied later) and schema-on-write (schema validated at entry). When a field changes at the source, the lake keeps previous versions with explicit lineage, and Data Hub offers assisted reconciliation.

Streaming or batch?

Both on the same engine. Real-time events via webhooks/Kafka/web services; scheduled batch for heavy sources (daily files, weekly full loads). The decision is per connector, not per product.

What happens if a source goes down?

Integrafy-OS keeps an event buffer and retries with exponential backoff. When the source returns, the buffer is drained respecting order. Insight alerts notify the team if the delay exceeds configurable thresholds.

Can I validate data before it reaches the lake?

Yes. Every pipeline supports validation rules (type, range, regex, references to other tables) and transformations (cleanup, enrichment, deduplication). Records that fail validation go to a dead letter queue for review.

Which source do you still need to connect?

Free 30-minute diagnostic.