Decoupling Data: Strategies and Considerations
It may be that an organization must evolve its data's fundamental
structure in order to scale operations. In the e-commerce space, this
could take form in several ways. For instance, it may be that the catalog
cannot grow without an unaffordable spend in computational complexity.
This could be bad because a company cannot effectively compete without
growing the catalog, thereby losing market share and thus losing business.
Another example is slow developer velocity, which grinds to a halt due to
the testing of complex dependencies that shipping even the smallest
feature will require. Slow velocity means inefficent operations, which
also means less business. These are two examples of how the data's
inability to scale will manifest as business problems.
Access to the data's implementation details is the root cause. This
manifests as the use of direct SQL rather than APIS, which means that
consumers have very close relationship to the data's technological
configuration. That is, they know more about the data than just what it
literally is. That is, they know about the tables, what columns, foreign
key relationships, what databases, hardware details etc. It would be like
knowing Google search's internal querying logic, which is very complex,
and also not necessary to perform a simple query; for most, the simple
Google Search website is sufficient. Now, with a small company, it is ok
to be close to the implementation details, but at scale, there are
problems, as we shall see.
Due to the access to implementation details, the quantity and diversity of
consumers makes it challenging to evolve the data structure. The reason is
this. Say, for instance, a table had to be moved to another database.
Then, all the consumers would have to know about that, and that would be
major problem because they would have to reconfigure their query to access
the new table. The result is that a usually simple task is impossible to
achieve because consumers have to migrate. That is, they depend on the
current implementation to access the data, so it is messy and a lot of
work to have all consumers rewrite their code to accomodate the data
changes. Something has to be done about this, because if the refactoring
doesn't happen, the company will lose out on the opportunity to scale the
operation, and in the worse case, they cannot support the current state.
Some other examples will illuminate how this situation become untenable at
scale. To start, consider the existence multiple sources of truth. At
scale, this can be signifant problem due to the complexity in maintaining
consistency, and also has a multiplying factor where the implementation
details are revealed at many places and many times. Another example is the
the challenge of testing amidst the tech debt. If, for instance, there is
a preponderance of 1000-line classes without tests, then verifying a new
feature is a unaffordable amount of work. Needless to saw, classes of that
size mean only one thing: coupling to the implementation details. In fact,
the need for data refactoring is very similar to the need for general
refactoring in this way: that is, the size of a single entity causing too
much coupling, but again, it is not the goal to elaborate on all the
examples, because anyone in the thick of data restructuring knows the
problem is real, nor will they feel a dirth of evidence.
See https://github.com/Adrianjewell91/decoupler-website/blob/main/README.md
The first requirement is to make it easy for consumers to migrate,
otherwise there is no point of solving the problem. In other words,
consumers should be able to continue as they were before the changes were
made, with no interuption to operations.
Another requirement is to refactor the data structure incrementally.
Strictly speaking, incremental wins are not a hard requirement, but it
should be, especially if some of the data needs to be refactored before
other parts. Additionally, it is good practice it think incrementally,
because stakeholders usually expect it, and it avoids the temptation of
waterfall. This last point is important: not doing waterfall, which
creates the risk of building a huge product, but then nobody uses it. It
comes from thinking that clients will adopt new platform no matter what,
but things do not work like that in practice.
The complexity of the effort will also pose a challenge, but thankfully,
it divides roughly into two types. The first is modifying columns, new
data, validations etc. These examples are real; migrating tables to new
databases is exactly what needs to happen, and even more than that has to
be done. Additionally, tables need to be rewritten, refactored, and moved
not only to new databases but to entirely different kinds of databases.
This is very important for saving money on software costs. However, there
also the important second kind, which is the reworking of the business
domains themselves. It is a conceptual refactoring, where new business
entities emerge from the growing requirements of the business. Of course,
both types of refactoring support and inform the other, but they are also
distinct. For instance, migrating a database table is different from
concieving on a new business domain, for instance, a new Catalog or
Marketing concept.
Hide the implementation details in a systematic and simple way.
Importantly, do it without interfering with the software systems that
currently consume it. Practically, this means creating new interfaces,
migrating clients, then refactoring behind the interfaces. This leads to a
great simplification of the software system by decoupling the data from
how it is stored. Once the data is behind interfaces, the data can evolve
independently of one another but retain the existing relationship. This is
what every successfully scaled company has done in order to grow, and the
benefits of doing it are discussed at great length by the people who
implemented it. Therefore, we know it works, and it also works because, in
theory, the current data interfaces aren't bad, they just need to be able
to grow and develop independently of one another, but also keep the
existing functionality. So, it is ok to keep them, and change them and
grow them under the hood.
-
Assume that the starting point is a monolithic software system. In
this sytsem, clients are coupled to the implementation details of the
data, in a way expressed by the diagram below.
-
First, wrap each table behind an interface. This is an agile way to
make the new interfaces available immediately for adoption, without
too much work, so restructuring can start happinng incrementally.
-
Second, create new domains and migrate data as needed. This gives
teams their correct responsibility: let data owners address
implementation details, and domain owners take care of the domain
restructuring.
-
Third, migrate existing clients as needed behind the scenes as needed.
Inject the interfaces into the old system in a way that is hidden from
the consumers. Then, lock the old interface from new features and then
require any new work to use the new interfaces. There are many ways to
do this, even at the db level. In a SQL world, the strategy would map
under the hood every SQL query to an interface, and then lock the SQL
queries, so no new ones could be made. Then, any new feature would
require adoption of the new data structures because the old queries
are locked from being changed. This strategy will work because it is
programmatic rather than negotiative. That is, no work needs to be
done to convince teams to adopt or not, just set the rules and enforce
them programmatically, and it will happen.
One risk is making life difficult for consumers, mainly by mandating a
refactor of their own code, with their own engineering effort. This
mistake is easy to make because the company doesn't "eat their own
dogfood" and therefore think it is easy for clients to migrate. However
this is rarely the case, and this decision costs the company a great deal
of time because consumers have many things to do instead of letting
engineering teams tell them to stop all their work and migrate to the new
interface. It basically defeats the whole point of trying to solve the
orginal problem, which was to stop teams from having to migrate their
interface when the data structure changes. So in the end, there is a
massive risk, which is that consumers won't migrate.
A second risk is striving for waterfall-based, rather incremental, wins.
This is because a company may decide to build a new platform as complex as
the previous one, and before adoption. That is a lot of work: to build it,
to figure out what to build, because they have to find out the intended
consumption patterns, and the new business domains. This is very hard to
do because the future is unpredictable, and could change. If the future
changes, the work would have been wasted. Hopefully, consumers adopt, but
it is not know as to when. With that uncertainty, the whole point of the
project in is question: the important work of refactoring the data
structure. The answer is that it won't get started until the new platform
is built, then adopted, which will take a very long time. In the meantime,
there will still be the legacy data structures in place, so no progress on
the original problem.
A third risk is asking the data owners specifically to perform all of the
data restructuring. This will be a time-sink because teams get very little
done when doing too much. This might not seem like a mistake, because the
data owners know a lot, and they know how their data should look in the
future, but they have an incomplete picture. There are also data consumers
who have their ideas of new domains, what shape they want the data, and
how they want to integrate it, etc. Therefore, the domain-specific
refactoring should rest with the consumer as much as possible because they
know what they want better than anyone else. So, if the data owners try to
do the work of the consumers, or vice versa, then teams will spend too
much time asking other teams what it is that they want, which leads to a
lot of time wasted on discussions and planning instead of building.
If any of these risks are likely to happen, then there are symptoms of a
greater problem, which is that the focus has shifted away from the
original problem, which is bad because the original problem still stands:
refactor the data structure. Probably, a few consumers will still adopt,
but it is clear what is happening: the company is focused on building a
new platform, in a waterfall way, for an imagined, pre-planned,
overarching future use-case which may or may not exist, and then hoping
everyone will adopt it. If all that works, then finally the company can
focus on rewriting the data structures under the hood. However, it is too
far away, and by then it may be too late.
Additional Risks and Management
-
Performance:
-
The decoupled architecture means the removal of the SQL Join as an
architectural lynchpin, which could be undesirable because the
relational model dissolves, and possible performance implications
because joins are very efficient. However, this this permissable
because the decoupled system offers many alternatives to the Join,
such as caching and aggregation, to name a few.
-
Complexity:
-
A decoupled architecture introduces complexity elsewhere, as it
will consist of many micro-services, but that is permissable
because they can be observed and monitored with modern
observability software.
-
Cost of Resources:
-
Manage cost; the cost of running the decoupled architecture needs
to be cheaper than running the database monolith, but that will
happen because the system will be on average the same size as
database monolith but only need to scale some of the components,
which net cheaper than scaling the whole monolith. Additionally,
the ROI on decoupling should justify the costs.