Data warehouse, in the field of computing, has emerged as a central component of business intelligence. It is a system used for reporting and data analysis, and acts as central repositories of integrated data from one or more disparate sources. Data warehouses are designed to perform queries and analysis, and can contain large amounts of historical data.
It is widely believed in the information management field that data warehousing can be rigid, slow to adapt and involve high maintenance cost. However, Picnic, a Dutch e-grocer has successfully built a lakeless data warehouse that proves that a centralised analytical data warehouse product can be successful.
Iliana Iankoulova, Data Engineering Lead at Picnic, says data warehousing fails not because of fundamental flaws in strategy but due to failure in execution. She further adds that a successful data warehouse product needs to be balanced by a “number of antifragile organisational practices”.
What is a successful data warehousing (DWH) product?
According to Iliana, a data warehousing (DWH) product can be defined as successful by looking at how the product scores across few general parameters. If the DWH product scores highly across three business and technical criterias, it means the product is viable. These three criterias are:
- High adoption rate among analysts and business users for self-serve analytics.
- High quality of data and trust in the DWH product
- Excellence in Google’s DevOps Research and Assessment (DORA) metrics
In addition to these core metrics, Iliana also recommends looking at deployment frequency, lead time for changes, change failure rate and time required to restore a service.
Lessons on how to make a DWH product antifragile
Before we get down to antifragility principles recommended by data engineers at Picnic, it is important to define antifragility. Antifragility is defined as “a property of systems in which they increase in their capability to thrive as a result of stressors, shocks, volatility, mistakes, faults, or failures.”
It is different from the concepts of resiliency or robustness and was developed by Nassim Taleb in his book Antifragile: Things That Gain from Disorder. In a blog post, Iliana says software engineering teams have been inspired by antifragile ideas and have adopted them in their practice.
Here are seven antifragility principles adopted by Picnic to build a successful DWH product and the principles that can act as lessons for anyone planning to build a data warehousing product right now.
- Sticking to simple rules: The first lesson from Picnic is sticking to simple rules by building a lean tech stack. The startup believes in pragmatic documentation, which lives together with the code. This allows everyone access to the business logic via GitHub and comments are available in the tables, views, and fields. This singular and unified approach allows Picnic to maintain the data catalogue in the same place where the logic lives. A complex system is also more prone to breakage and thus difficult to fix. The startup is also careful about adoption of new tools and often assesses the tradeoffs with every technology choice.
- Avoiding naive interventions that do more harm than good in the long term: Picnic has built its DWH in such a way that source systems are accountable and responsible for resolving data issues. The startup says that with hundreds of sources, they have built the DWH in such a way that the issue is visible and they can jointly decide the issue with the source system. This eliminates the need to fix issues only in the DWH and Picnic follows the principle of “limited patching in the DWH code.” “We take measures to have proper data migration, compared to keeping dead logic in our codebase forever,” Iliana says. She also recommends organisations to keep their DWH code consistent for all markets and ensure that the data definition language (DDL) is the same everywhere.
- Built-in redundancy and layers: Another lesson from Picnic’s successful DWH product is to create built-in redundancy and layers, and limit single points of failure. They have done this by adopting and building open-source tools. Picnic has open-sourced its Data Vault (DV) framework, allowing the startup to work with a community that is independent of any contributor or vendor. Another recommendation that can be followed across tech teams is to “operate in smaller units called squads”. Iliana says these squads share the same DNA and within the squad, the engineers get both support and challenges. All engineers are full-stack, end-to-end data warehousing consultants, which should be adopted by every company wanting to build a successful DWH product.
- Ensuring that everyone has a stake: Picnic believes that “having capable people and giving them powerful tools is one of the best ways to be antifragile.” It does this by showing the power of data modelling and making its people understand the value. The startup also has an SQL on-boarding programme, which every analyst is required to undergo before getting DWH access. Picnic has also implemented a rotation programme where every data engineer is made “Release Master of the Week”, which makes them the first line of support. Another rotation programme is “Tech Data Support” on Slack, which is reassigned daily on a round-robin principle among the data roles. These simple methods make people feel ownership for the product and one that can be followed for different verticals.
- Experimenting and tinkering: While pragmatism and simplicity are the order of the day at Picnic, the startup is not shy of taking lots of small risks. It does this by providing a sandboxing and temporary table environment for anyone with DWH access. This allows data engineers to bring their own data and build custom tables and thus “freedom to tinker in a contained schema with clear rules.”
- Keeping our options open: One of the important thumb rules in the field of data science is that business models will change over time. At Picnic, the data engineering is aware of this and have built their data warehousing product by keeping their options open. Picnic has done this by using the Data Vault model as the backend DWH while Kimball model is the front end. This allows them robustnesses at the front and ability to change at the backend with evolving business needs. Iliana further notes that Picnic maintains strict separation between the DWH presentation layer and the backend in different schemas. It also strongly prefers clear interfaces and tools that are vendor-agnostic.
- Not reinventing the wheel: This is another lesson that organisations and entrepreneurs can take from Picnic for different product life cycles. Picnic encourages its data engineers to use standards and frameworks that have been tested over the past 30 years of data warehousing. Instead of reinventing their own, Picnic prefers to follow the standards established by Kimball and Linstedt. This includes processes and data modelling approaches learned in practice as BI consultants and data engineers.