## ## ##

Transparency, Trust and Testability: Our Data Engineering Manifesto

The backbone of our company is the data engineering capability we offer our clients.

Although the technologies and the design patterns used for data platforms evolves constantly, good engineering disciplines are universal. Our approach is to do the basics well - regardless of the stack and services you work with.

Our manifesto is simple:

  • Data processes should be simple and composable

  • Aggregations should be clearly written and testable

  • Transformations and data validation rules should be human-readable and reportable

  • The journey of every datum should be auditable back to source

  • If the data does not change then data processes should guarantee the same results each time they run

We believe - whether you are building a data warehouse, a data lake or a lakehouse - that these principles are fundamental and timeless. They are the key to building up to complexity.

Typical Tech Stacks

  • OLTP RDBMS: PostgreSQL, Aurora, Azure SQL, SQLite (embedded)

  • ETL: Prefect, DBT, Apache Airflow, Pentaho

  • OLAP: Redshift, Snowflake

  • Caching and Search: Elasticsearch, Redis, memcached

  • Events and Streaming: Lambda+SQS, Apache Kafka, RabbitMQ

  • Distributed Processing: AWS Glue, Apache Spark

 

Project Spotlight: Data Engineering Solutions

 

Needles in the Haystack

A Cyber-Intelligence capability for a global bank.

Processing Clinical Data at Scale

Supporting Genomics England to sequence 100,000 genomes

Mobile Services Data Commercialisation

Helping Upstream Systems find new ways to monetize their unique mobile services datasets.