Artificial intelligence alone won’t save you – why data management is the key to success?
Artificial intelligence (AI) has been a popular topic for quite some time — extensively covered in media and actively explored for its practical applications. Alongside AI’s development and with its support, data analysis is also gaining increased attention. It has been helping to discover new directions and optimize processes for years and now it is becoming increasingly easier to use.
Machine learning algorithms are an integral part of this ecosystem, forming a powerful toolkit alongside AI and analytics. We have previously discussed these topics in our articles How can BI (Business Analytics) and AI (Artificial Intelligence) enhance business efficiency even further? and How can companies benefit from machine-learning projects?
However, all of these rely on one fundamental factor: data. The old adage applies — if your data is poor, you cannot expect meaningful results. Garbage in, garbage out.
Therefore, to fully maximize the benefits of AI, analytics, and machine learning, businesses must focus on data quality and data management — both in daily operations and software development. Even the most sophisticated algorithm is only as good as the dataset that powers it.
Data management is a broad and complex discipline, and every organisation must find a strategy that best suits its needs. Below, I have compiled key data management principles that are universally beneficial, whether your organisation is just beginning its data management journey or already has established practices.
These recommendations are based on my personal experience as an analyst and best practices outlined in the Data Management Body of Knowledge (DAMA-DMBoK®). While data security is undeniably a crucial aspect of data management, this article does not cover it, as it is an extensive and nuanced topic deserving of its own dedicated discussion.
At Trinidad Wiseman, we help businesses unlock the full potential of their data through advanced business analytics, data management, and machine learning solutions. Our specialists conduct user and market research to improve business and pre-analysis processes. Learn more about our research, business and pre-analysis and machine learning services or contact us to discuss how we can support your business.
Data as an asset
Many organisations fail to treat data as an asset — and this is a mistake. When data is not formally recognised as a valuable resource, associated costs and potential benefits remain obscured. While organisations often account for hardware expenses related to data storage, they may overlook costs arising from data quality issues.
Additionally, there is often a lack of clear understanding of how much investment is needed for data management to prevent and resolve these issues. By treating data as an asset, businesses can gain visibility into both its management costs and its potential value. However, quantifying the precise financial impact of data management can be challenging. Below are some key aspects to consider:
- Data acquisition and storage costs: How much do servers, system maintenance, and human resources for data collection and management cost? Some expenses may also arise from collecting data outside of traditional information systems.
- Direct costs of data errors: Inaccurate data entry can result in significant financial losses. A single mistake could cost an organisation millions. While rapid response can mitigate the damage, complete prevention is not always possible.
- Indirect costs of data errors: The time and resources spent on correcting errors can accumulate rapidly, affecting various departments:
- Customer service representatives handling complaints due to inaccurate data;
- Developers modifying code to fix data-related issues;
- Data analysts spending time cleansing data instead of performing valuable analysis.
- Regulatory compliance costs: Failing to meet data protection regulations can result in fines. For example, Estonia’s Data Protection Inspectorate may impose penalties for violating personal data protection laws. Organisations must also allocate resources to ensure compliance.
- Revenue generated through data analytics: While harder to quantify, one approach is to measure the financial benefits of data-driven decisions and initiatives.
This list is not exhaustive but illustrates how effective data management directly impacts an organisation’s budget and operational efficiency. If businesses fail to evaluate the impact of data management, they may overlook significant opportunities for optimisation and growth.
Data management strategy and executive support
Data management does not happen automatically. For it to be effective and beneficial, it must be integrated into organisational goals and processes. This requires strong executive support. While many companies encourage grassroots initiatives from employees, even the best ideas cannot thrive without leadership commitment.
If data management lacks allocated resources and clear executive backing, it often takes a backseat or is ignored altogether. A data management strategy defines overarching goals and directions. It can be simple and streamlined or detailed and comprehensive, depending on organisational needs and ambitions.
A basic approach may incorporate data management into daily tasks, with a primary goal of improving data quality within regular workflows. Alternatively, a more advanced strategy may introduce dedicated teams or roles focused on data quality and process optimisation.
Ultimately, a data management strategy should be as simple or as sophisticated as necessary to meet an organisation’s objectives. Overcomplicating solutions when simpler, more effective alternatives exist is not advisable.
Defining and controlling data quality
To ensure that data is more than just numbers and characters in a database, organisations must actively invest in data quality. This involves establishing clear data quality requirements and regularly verifying compliance.
In information systems, defining and monitoring data quality is relatively straightforward. For example, software development projects can define data quality requirements early on and enforce them through system architecture.
Some examples of data quality rules include:
- Mandatory field completion – specifying which fields must always be populated.
- Permitted value constraints – defining acceptable values for fields (e.g., predefined selections).
- Minimum and maximum value thresholds – setting range limits for numerical fields, such as age or price.
- Data object constraints – enforcing rules on uniqueness and valid relationships. For instance, ensuring that a customer can have only one active contract at a time.
Once a system is operational, regular data quality checks should be conducted — either in-house or with external expertise. This helps prevent costly and risk-prone data issues.
Data quality checks may include:
- Ensuring that sums of values remain within expected limits over specific periods (daily, monthly, etc.).
- Verifying row counts to match expected thresholds.
- Monitoring growth or decline trends within predefined tolerance levels.
Metadata collection and utilisation
Metadata is data that describes other data. It provides insight into where data is stored, how it moves, and how it is transformed. A clear metadata strategy accelerates data-driven projects and software development, while also reducing data duplication, which helps minimise both time and financial costs.
Metadata benefits multiple roles across an organisation:
- Data analysts: To locate the data required for analysis.
- Product owners: To determine whether the data they need for their product already exists within the organisation or needs to be created.
- System analysts and developers: To design and develop integrations between systems.
- Business analysts: To get an overview of possible changes that need to be made to processes and required dataset modifications.
Ideally, metadata collection and presentation should be automated, though this is not always feasible. That’s why organisations must carefully evaluate which metadata is most beneficial and how it should be collected.
Examples of metadata:
Conceptual organisational data model: Consolidating all of an organization's data objects and their details into a single diagram is challenging, but it is possible to create a conceptual data model that maps the data objects in use at a name level and, if necessary, includes more detailed references.
The goal is to provide an overview of the organization's existing data and their relationships, serving as a foundation for new business analyses and software developments. Since such a model cannot be fully automated, it is crucial that responsible individuals regularly update and maintain it.
Physical data models of enterprise systems: Describes the current state of data within a software system. It is used for software changes, integration with new systems, and data analysis.
To ensure the model can be effectively utilized, data objects and attributes must be clearly defined and accompanied by descriptions of their business meaning and origin. In most cases, physical data models can be automatically generated from databases, and this should be a regular practice for every software system.
Integration documentation: Defines which systems data is moved from, under what conditions, and how it is transformed. This is crucial for understanding data flows and transformations, assisting both system analysts and developers in integrating with other systems. Although interface documentation is usually created during the analysis phase, in some cases, it may be overlooked.
Integration development tools (such as Swagger) often allow for automatic documentation generation, but in my experience, these are not always sufficient—critical details, such as the conditions under which fields are populated and the origin of source data, are often missing. If this essential information is absent, the documentation becomes practically useless.
Data dictionary: A tool that links business concepts to their technical occurrences across different systems. I have seen purely business-focused data dictionaries being used, but their applicability is limited. When a dictionary includes not only business terms but also synonyms and references to database tables, it becomes valuable for everyone working with data — data analysts can use it to locate information, while analysts and developers can leverage it for software development and business analysis.
Since maintaining a data dictionary manually is time-consuming, it is worth exploring automation options whenever possible. For example, one approach could involve retrieving metadata from various databases based on business terms and using it to populate the dictionary.
The examples above are just a few of the possibilities available. Every organisation has their own unique metadata requirements so a thorough analysis should be conducted to determine which metadata elements are essential to your organisation.
Defining Master Data and Reference Data
Master data and reference data are two critical but often confused concepts. Just so to be on the same page, I have defined them below:
Master Data
Master data represents the single source of truth for specific business entities within an organisation. For instance, personal data may be stored across multiple systems, but only one authoritative system should contain the correct and up-to-date version, which can then be used to update other systems as needed.
From a data management perspective, it is crucial to define the system that holds master data and ensure that this information is easily accessible across the organisation. Since personal details (e.g., legal names) can change, organisations must know which system contains the most accurate version. Additionally, maintaining a centralised master data repository helps resolve duplicate records across multiple systems.
It is important to remember that communication is the key here. Quite often details get lost in everyday work and regular reminders about how to user master data will keep it fresh in people's minds.
Reference Data
Reference data consists of standardised datasets used across an organisation. Unlike master data, reference data is more stable and changes infrequently. An example is Estonia’s Address Data System (ADS), which serves as a standardised reference dataset for location data. Other common examples include product or service classifications.
Many organisations already have reference data internally defined, but ensuring accessibility, usability, and consistency is crucial. It is not sufficient to store reference data in an isolated document shared via email — this often leads to versioning issues where multiple copies circulate, causing confusion and inconsistency.
Reference data should be stored in a centralised repository with clearly defined governance rules. In software development, it is equally important to establish clear processes for managing reference data—whether they originate from a single central source or alternative management solutions are applied.
Conclusion
Successful AI, machine learning, and business analytics implementation depends on data quality. Even the most advanced algorithms cannot generate reliable insights if data integrity is compromised.
For organisations, this underscores the importance of a well-defined data management strategy, encompassing data quality requirements, continuous validation, efficient metadata management, and proper master and reference data governance.
Moreover, leadership commitment and resource allocation are crucial in making data management a sustainable and integral part of business operations. A strong data management framework forms the foundation that enables AI and data analytics to deliver real, measurable business value.