Architecturals - A proposal for the normalization of Data Transformation


Let it be said. Data transformation is a key aspect of software architecture.

It involves converting data from one format or structure to another, ensuring it is suitable for the intended use. This process, generally disregarded for poc and small system, is crucial in enterprise scenarios to allow flexibility, security, migration, and processing. The idea is to introduce data object normalisation and ensure that data is consistently transformed and presented in a way that meets the requirements of each layer in a given architecture.

API Service Data Nomination

We will outlay the following in the inbound/outbound sequence.
System normalization of data transformation is often necessary to adapt data for different layers of the application stack.

Let's discuss generic requirements:

  • Communication: The data being sent by user interactions and received by the applicative surface.
  • System IO: The data being accepted and outputed by the applicative system, beyond it's UI/API (infrastructure) surface.
    • System transformative inputs as Commands
    • System informative inputs as Query
    • System informative outputs as Outcomes
  • Core Business Schematics: The data managed by the applicative system, that is the business notions handled by the application.
  • Presentation: The data being being presented / send back to the user.
  • Storage: The data being stored in databases solutions.

TL;DR: Visualization

   Applicative Layer          System Boundaries          Core Domain           Repositories
   -----------------          -----------------          -----------           ----------

Infrastructure Layer | System Layer                 | Domain Layer            | Data Layer
---------------------|------------------------------|-------------------------|------------
|                    |                              |                         |           |
| [DTO]              | <--> [DCO] (System Inputs)   ----\                     |           |
| (Surface Inputs)   |      |                       |   |                     |           |
|                    |      [DQO] (System Inputs)   ----|----->  [DBO]  <-------> [DSO]   |
|                    |      |                       |   |                     |  (Per DB) |
|                    |      |                       |   |  *here [DPO]        |           |
| [DTO]              | <--> [DOO] (System Outputs)  <---/                     |           |
| (Surface Outputs)  |                              |                         |           |
|                    |                              |                         |           |
---------------------|------------------------------|-------------------------|------------

note: the above does not showcase event sourced/event driven paradigms, which would require additional diagram complexity for event management.

Objects Representation

Let's propose the following terminology for data transformation in the context of an API service (note, a front end would see it's own transformative pipeline taking some of shared concepts, for another post):

DTO: Data Transfer Object. A simple object that carries data between processes, often used to encapsulate data sent over the network. It is typically used to transfer data between the API and the client. This is data shared on any network which we may only sanitize. It may holds business and infrastructrure notions alike.

  • Notes: The DTO is the object that is sent over the network, and is often used to encapsulate data that is not directly related to the business logic of the application, such as metadata or other information that is needed for service processing (eg. security (authentication, authorization, geolocatization, etc.).

DQO and DCO: Data Query Object and Data Command Object (normalizing the CQRS paradigm). These objects are used to encapsulate the data required for querying or commanding the application. They are typically used to pass data to the domain layer for processing. DQO is used for read operations, while DCO is used for write operations. What we discuss here is the Applicative System Inputs, which differs from the API surface, and the domain entities.

  • Notes: Applicative Core/System Inputs. From the top: The data reachs the interface (middleware or controller with value added) as DTO, is transformed into DQO (informative) or DCO (transformative) to reach the Domain layer. In ideal scenario, the domain would never ingest DTOs directly. This is for several reasons such as Security, Integrity, and Separation of Concerns (Networking, Infrastructure, and Business Logic needs).

DBO: Data Business Object. An object that represents the business data and logic of the application. It is used to encapsulate the business rules and operations related to the data. It's the nitty gritty of the service and it's core logic.

  • Notes: The DBO is the object that is used to represent the business constructs. Like what a Product is, A User is, etc. It has theoretically nothing to do with what the applicative surface ingests (DTO) nor what the applicative system ingest (DQO/DCO).

DSO: Data Storage Object. The DSO is the object that is used to represent the data as it is stored in a given data storage mechanism/solution. It is used to encapsulate the data structure and format used by the various data store (SQL, NoSQL, Cache, etc). Note that it's an interface only. Any ORM mapping should live not in the data layer, but in the infrastruture layer.

  • Notes: There is a clear separation of concerns and very different needs between the applicative core logic needs, and what is actually stored in storage solutions. This makes it relatively easier to composite DBOs, and provide choices (eg. storage of critical data, what, how and where).

DPO: Data Presentation/Projection Object. Sometimes called "Projections" or "Result Models" in CQRS. This is generally a translation of the DBO, skimmed for projection ie. ready-for-showcase as it is outputted by the applicative core.

DOO: Data Outcome Objects. System Outputs - This emphasizing the end result, outcomes of processing/operations. Having this buffer is critical to the separation of concerns between system handling and surface translation. It normalises system communication to surface layer, and gather all required for the infrastructure to run it's processes. ie. "What does it mean for a user when my system produces a successfull operation?"

  • Notes: The Applicative Core/System Outputs is generally not what we would want to being sent back to the user. Note that system to surface communications may contains many infrastructure metadata (eg. error codes core outcomes metrics (in case of monolith or microservices), and more. ) which may dictates service surface behavior.

Done!

Thanks for reading! We hope this overview and proposal for data transformation and normalization reaches home and makes sense.