Cloud + Warehouse + Integration: governed data

A data platform is not a database. It is a contract with the business: that the numbers in the dashboard are the right numbers; that the customer record an analyst is looking at is the same record a marketer is targeting; that what you change in source A shows up in destination B in a predictable amount of time. Most ‘data platforms’ we are asked to fix never wrote that contract down. The combination of AWS, Snowflake, and Informatica — what Shinrai calls the Innovation Trifecta — exists to make that contract enforceable.

Why these three

AWS is the cloud. It runs the infrastructure: VPCs, IAM, S3 buckets that hold raw and cleansed data, the network plumbing that makes all of it talk. AWS is opinionated about how to run infrastructure and indifferent about how to model data.

Snowflake is the analytics engine. It separates compute from storage, scales each independently, and gives the business something approaching a single query surface across structured and semi-structured data. Snowflake is opinionated about how data is queried and indifferent about how it got there.

Informatica is the integration and governance backbone. It moves data between systems, tracks where it came from, catalogues it for the business, and enforces the access and quality rules that turn a warehouse into a platform you can trust. Informatica is opinionated about lineage, governance, and the contract.

The combination works because each owns its layer cleanly. The substitutes exist — Amazon Redshift instead of Snowflake; AWS Glue plus DataZone instead of Informatica — and for an AWS-only customer with simple integration needs they may be the right answer. We pick Snowflake + Informatica when (a) the customer has, or might have, workloads across more than one cloud; (b) governance and lineage need to be first-class concerns from day one; or (c) the data team already has Snowflake or Informatica skills.

The hidden benefit: a data layer that doesn’t lock you to AWS

One of the strongest strategic arguments for this combination is rarely the first reason customers reach for it: the data layer is portable. Snowflake runs natively on AWS, Azure, and Google Cloud; a single Snowflake account can hold warehouses in multiple clouds and replicate between them. Informatica’s Intelligent Data Management Cloud (IDMC) runs across AWS, Azure, GCP, and hybrid on-premises configurations.

The AWS-native substitutes do not offer this. Redshift is AWS-only. Glue, DataZone, and the rest of the native data-integration stack run on AWS. There are good engineering reasons to pick the native stack — tight IAM integration, single-vendor support, lower licence cost. There is one strategic argument against it: if regulatory, sovereignty, or commercial reasons ever require running data workloads outside AWS, the move is small with Snowflake + Informatica and expensive without.

For African and Gulf enterprises navigating data-residency frameworks — Kenya’s Data Protection Act, the UAE PDPL, the KSA PDPL, plus GDPR for European customers — this optionality has cash value even if it is never exercised. AWS is the cloud you start on; the data layer is the one you do not want to redo.

The reference architecture

A standard Trifecta build, end to end:

1. Ingestion. Informatica Cloud Data Integration pulls data from source systems (databases, SaaS APIs, file drops, streaming feeds) into a landing zone on AWS S3.
2. Storage. S3 holds raw, cleansed, and curated zones. Snowflake reads from S3 via external stages, or copies into native tables for hot data.
3. Modelling. Snowflake holds the analytical models — dimensional, data-vault, or whatever the team has standardised on. dbt is the typical modelling tool; Snowflake’s native compute runs the transformations.
4. Governance and catalogue. Informatica Cloud Data Governance and Catalog (CDGC) catalogues the assets, tracks lineage from source through model, enforces data-quality rules, and surfaces the contract to business users.
5. Consumption. BI tools (Power BI, Tableau, QuickSight) connect to Snowflake; operational systems use Snowflake Marketplace or reverse-ETL to consume data downstream.

The hard parts

The architecture is the easy half. The hard parts:

Lineage. A field on a dashboard is only trustworthy if you can trace it back to a source system, through every transformation, with every assumption. Informatica CDGC is the part of the stack that holds this lineage. The discipline is making sure every new transformation gets registered.

Access control. Who can see customer PII? Who can see customer identifiers but not PII (the marketer’s job)? Snowflake’s row-access policies and dynamic data masking enforce this in the warehouse; Informatica’s policies enforce it on the ingestion side, before the data is in the warehouse at all. Both layers, every time.

Real-time vs batch. The default Trifecta build is batch, on the assumption that most reports tolerate an hour’s latency. The moment a use case appears that does not — fraud detection, customer-service screens — the architecture has to shift: Snowflake’s Snowpipe Streaming, Informatica’s CDC capability, and a different style of consumption. Pretending the platform is real-time when it is batch is how trust breaks.

Slowly-changing dimensions. The dull problem that every data-platform team eventually has to solve in earnest. Dimensional models with SCD Type 2 are well documented; the work is operationalising them across dozens of source systems with different update semantics. Informatica’s mapping templates plus Snowflake’s MERGE is the typical pattern. The discipline is consistency.

Benchmarks the platform should hit

“Production-grade” is a vague claim until you write down what it means. The benchmarks below are the ones we hold engagements to, drawn from Informatica’s readiness frameworks and Snowflake’s published metrics.

Data quality (Informatica). Customer-record duplicates below 1%, match rates above 95%, real-time data observability — not weekly catch-ups.

Customer 360 deployment. Time-to-first-golden-record under 12 weeks. Beyond that, the project is fighting silos faster than it is resolving them.

Governance. Documented lineage from every dashboard field back to its source, active metadata management, and automated access controls — not a spreadsheet of role assignments. Informatica publishes a Data Governance Readiness Assessment for benchmarking against industry peers.

Snowflake performance. The Snowflake Performance Index (SPI) tracks real-world workload duration on stable customer accounts — the public benchmark shows ~40% query-duration improvement since baseline. Useful as a peer-comparison signal, not as an SLA.

Operational SLOs. Data freshness (source-to-warehouse latency), median and p99 query latency, concurrency for 50–100+ simultaneous users, throughput targets (e.g. 50 GB transformed within 30 minutes). Each model the team ships should have these written down.

FinOps. Credits per 1,000 queries and credits per 1 TB scanned at the technical layer; cost per customer acquired or cost per data product at the business layer. The technical metrics tell you whether the warehouse is efficient; the business metrics tell you whether the platform is worth what it costs. (For the underlying AWS bill, our note on where AWS bills bleed covers the structural levers.)

Security and posture. Snowflake Trust Center monitors accounts against the CIS Snowflake Benchmarks; the Snowflake Well-Architected Framework grades the broader technical and economic posture. AWS’s own Well-Architected Framework grades the infrastructure underneath.

How we approach a Trifecta build

Our delivery is structured around the contract, not the architecture:

1. Discovery (1–2 weeks). Which sources matter, which models are first, what the business cares about measuring. The deliverable is a measurement contract, not a diagram.

2. Foundation (2–4 weeks). AWS landing zone, Snowflake account setup, Informatica IDMC orgs, IAM/SSO across all three, baseline governance policies registered in Informatica CDGC. The deliverable is a working environment.

3. First model (4–6 weeks). Deliberately small, end-to-end through all five layers, instrumented against the benchmarks above. The deliverable is one dashboard the business trusts and can audit back to source.

4. Iteration (ongoing). More models, more sources, more downstream consumers, governed by the lineage and quality discipline established in steps 1–3. Quarterly reviews against the SPI, FinOps, and governance benchmarks keep the platform honest.

A data platform is judged by the trust the business places in it. Trust is built one verified number at a time. The Trifecta works because each platform in it is opinionated about the discipline that builds that trust — AWS about infrastructure, Snowflake about query, Informatica about lineage. The hard part is the contract. The platforms are the multiplier. (Many of the source systems that feed this platform are SAP; for that side of the estate see our SAP-on-AWS playbook.)

Where this applies: we deliver this stack most often for financial-services data platforms (customer 360, risk reporting), and for tech & SaaS firms building product-analytics surfaces. The natural-language layer on top is Prompt BI.

Written by Timothy Munyao · AWS Golden Jacket holder, first in Kenya. Want this applied to your workloads? Get in touch.

Cloud, warehouse, integration: a data platform the business trusts.