In the meantime, data modelers feel left out of the development process... because they are! They fear for their jobs, long term if not sooner. This is a recurring theme we sense at every Fortune 500 company across the US and Europe when we give our training 'Agile Query-Driven Data Modeling for NoSQL'.
The reality is that data modeling needs to be re-invented in order to remain relevant. And since there is so much baggage associated with the term "data modeling", maybe we should give it a less threatening name, such as "schema design"?
Here, the purists generally stop me to say: "Wait, you can't go straight into physical modeling without doing first the conceptual then logical models." Well... maybe, but that's part of the issue. If you can't demonstrate that you facilitate speed to market, then you're viewed as being in the way, and autonomous agile teams will try to get around you.
Logical modeling is counter-productive (for NoSQL)
Working our way backwards in the traditional sequence: conceptual -> logical -> physical, we all know by now that schema design is actually more important with NoSQL than with relational databases, since JSON is so powerful and flexible, but not so forgiving.Traditional Data Modeling Process |
From Traditional Data Modeling to NoSQL Schema Design |
Domain-Driven Design helps avoid "big balls of mud"
Creating an enterprise model is achievable for the initial incarnation of software systems. But without care and attention, inherent domain and technical complexity will, over time, turn monolithic applications into a pattern known as the "big ball of mud". Change is risky, and the best developers spend valuable time fixing technical complexity and technical debt, instead of adding value in domain evolution.Domain-Driven Design is a language- and domain-centric approach to software design for complex problem domains. It recognizes that over time, an entreprise conceptual model will lose integrity as it grows in complexity, as multiple teams work on it, and as language become ambiguous. With DDD you decompose complex problems so you can be effective at modeling bounded contexts that are defined with unity and consistency. DDD promotes the use a Ubiquitous Language to minimize the cost of translation between business and technical terminology and to enable deep insights into the domain thanks to a shared language and collaborative exploration during the modeling phase.
DDD consists of a collection of patterns, principles, and practices that enable teams to focus on what's core to the success of the business while crafting software that tackles the complexity in both the business and the technical spaces. One such pattern is an aggregate, a cluster of domain objects that can be treated as a single unit, for example an order and its order lines.
Domain-Driven Design maps directly to the concepts of Agile and NoSQL
There's nothing in agile to suggest that one should skip design. It suggests that design should be evolutionary and iterative. DDD also encourages an iterative process, first at a strategic level to divide the work and focus on what's important to the business, then at a tactical level to understand the details of each bounded context.
On the database side, relational modeling is vastly different than the types of structures that application developers use. Database joins slow down performance and lead to object-relational impedance mismatch, causing developers to move away from relational modeling and towards aggregate models. When an aggregate is retrieved from the database, the developer gets all the necessary related data, thereby facilitating manipulations.
A NoSQL document structure corresponds to the structure of a programming object in a much better way than a relational database does, and at the same time, can closely represent DDD aggregates of domain objects.
DDD maps directly to NoSQL document DB concepts |
Logical modeling when DDD and NoSQL are used together |
If you had a logical model, how would you go about doing your NoSQL schema design with no knowledge of what queries and reports will look like? In other words, how would you perform entities aggregation without the context of the application screens and their content?
Document schema design
Having defined the aggregates of a bounded context, it is necessary to create additional artifacts: mainly a pragmatic charting of workflows and business rules (not a full BMPN that would be hard to produce, maintain, and digest), plus mockups (or wireframes) for application screens and reports. What's important here is to not fall in the same traps as reviewed earlier with enterprise data models! But the creation of these artifacts tends to reveal points of attention that may have been overlooked in the DDD phase.Domain- and Query-Driven Schema Design for NoSQL |
Say you've agreed to denormalize and aggregate information into one document. The next question is "how?" There are probably as many different ways to do it as you have members on your team: do you embed locally all related entity data? Or do you embed a partial duplicate or snapshot of remote entity data? Or do you refer to remote entity data, with one- or two-way referencing?
Here are a few factors influencing choices in relationship expression:
- cardinality: does high cardinality lead to practical or technical issues?
- strength of entity relationships: do they all conceptually belong together?
- query atomicity: what info needs to be returned together?
- update atomicity: must it all change together?
- update complexity: what's the impact if data is duplicated? How do we avoid data inconsistency?
- document size: how much time will it take to load? Are we in a mobile environment where data traffic matters? Will the document size grow indefinitely?
- coding complexity: does it all make sense in the code?
The added-value of Data Modelers
Beyond the provocative nature of the headline, the exercise of designing a NoSQL database is obviously far from trivial. The dynamic and evolutive nature of a JSON structure is a wonderful opportunity that should not be spoiled by a careless approach. While developers are certainly capable of doing their own schema design, is it really the best allocation of resources? In enterprises dealing with any kind of application complexity, it becomes quickly obvious that data modelers can be tremendous contributors to the quality of agile development.
Years of experience in data modeling of relational databases have trained them to naturally:
- focus on the core business use case
- create pragmatic models without being over ambitious or perfectionist
- reveal hidden insights and simplify
- experiment with different designs to reach a flexible solution
- challenge assumptions and look at things from a different perspective
- facilitate the dialog between application stakeholders
Data modeling is no longer an exercise taking place just in the early stage of an application lifecycle. Data modeling is now part of the iterative agile development and continuous integration loop, adding value every step of the way.
Data Modeling has a role in every step of the agile development process, including in production |
As usual when a major shift is under way, there are 2 possible approaches: resist change, or embrace it. Data modelers should not fear agile development. They should enthusiastically embrace change, become the developers' best friends, and demonstrate their tremendous added value to achieve together higher quality applications.