10 steps to cut through the data management complexity
The modern enterprise is data-driven. The capability to quickly access and act upon information has become a key competitive advantage. But business data is often siloed and fragmented. To gain a competitive edge from your information, you need a single view of your data.
Most organizations today have a complicated process for managing their data, one that usually involves multiple data sources of variable structure, ingestion and transformation, loading into an operation database and supporting the business applications that need the data. Analytics, business intelligence (BI) and reporting tools require access to the data, which frequently requires a separate data warehouse or data lake. These layers all need to comply with security protocols, information governance standards and other operational requirements.
Too often, the result of this complexity is that information becomes stranded in silos. Systems are built to handle the requirements of the moment, rather than carefully designed into the existing application state, or a service requires additional attributes to support new functionality. New data sources accumulate due to business mergers and acquisitions. Information on a single business entity, like a customer, winds up in a dozen different and disconnected places.
“We know data is all around us,” says Mat Keep, director of Product and Market Analysis at MongoDB, the company behind the open source, NoSQL document-oriented database of the same name. “It’s growing at 40 to 50 percent every year. Mobile, web, sensor data, social networks. Putting all that data into a single view in increasingly becoming a priority. It’s very complex, often in silos, rarely consistent and hard to make actionable. Companies have been trying to build a single view for a long, long time.”
To help organizations get there, MongoDB has developed a 10-step methodology for delivering a single view of data, based on hard-earned experience from customer engagements.
Step 1: Define project scope and sponsorship
Customers often approach single-view projects with very ambitious plans, Keep says. It’s good to have a vision, but it’s generally a mistake to start by planning to pull every piece of customer data you have from every system you have into your single view.
“What we have found is trying to boil the ocean, trying to get every piece of data in the first phase of the project is a big ask,” he says. “What we’ve found to be most successful is to focus on a single business problem.”
Perhaps you want to reduce the mean time to resolution (MTTR) for your call center. Narrowing the scope of your project to that specific goal will make it much simpler to identify the data that’s most pertinent to success.
“You should really walk before you run,” Keep says. “Start with a specific business problem which has a defined set of data you can pull from and a defined set of goals so you can measure success.”
This will also help you identify the key stakeholders who stand to benefit. They won’t run the project day-to-day, but they can help get the necessary resources in place to ensure the project is successful.
Step 2: Identify data consumers
Once you’ve identified the business problem you’re trying to solve, the next step is to understand the consumers of the single view of data that you’re going to create. To get the right requirements, you need to understand who they are, how they work and ultimately how you can make their jobs simpler.
“You have to block some time out with them,” Keep says. “Observe. How do they actually query the data? Is it a text search? A lookup by customer ID? You can’t overengineer this and you can’t get enough data.”
For example, Keep says, MongoDB has helped insurance company MetLife get a single view up and running for its call center reps. Observation revealed that the company’s call center reps had to navigate across as many as 15 different screens to answer common customer questions. By watching precisely what they were doing day-to-day — the questions they were answering for customers and what it took to reach those answers — MetLife and MongoDB were able to build something much simpler.
Step 3: Identify data producers
The third step, often symbiotic with Step 2, is to identify the data sources that generate the data you need for your project.
“This could potentially mean creating new data sources, but very often the data exists,” Keep says. “It’s a matter of knowing where it is and how to get it. It might mean modifying an existing application to catch a new attribute, or digitizing something that was previously manual.”
Like step 2, this step will help you identify the correct requirements.
Step 4: Appoint data stewards
The previous steps of the methodology encompassed the discovery phase of your single-view project. They were all about creating a framework of requirements. With step 4, you enter the development phase by appointing the data stewards responsible for the data in the source systems. Your data stewards will be the key players in both the creation of your single-view project and its ongoing maintenance.
“They often own the data sources discovered in steps 2 or 3,” Keep says. “They know what tables the data lives in, how it’s formatted, how it’s extracted. They know if there’s a clean way of getting data out without interrupting core data systems.”
Step 5: Develop the single view model
This critical step will govern everything that follows, but Keep notes it’s less daunting if you’ve successfully completed your initial upfront discovery. Identify the type of data, where it lives and how you need to query it.
“Here we might look at exactly what data is mandatory and what’s optional,” Keep says. “For your application, email address, date of birth and credit card number might be mandatory. The social media account might be optional. Then figure out what data needs to be indexed. That’s going to speed up the queries that the consuming applications are going to want to run. This is where a database with a flexible data model really, really helps. We don’t need to know what all the optional fields are, we can add them as we go. We just need the mandatory data.”
Step 6: Data loading and standardization
Once you have your single-view data model in place, you need to define how you want data represented within that single view. You need to design common field names for the attributes you’re capturing. Your various data sources might variously capture ‘DoB,’ ‘Date of Birth,’ and ‘Birthdate.’ You need to standardize those field names.
“In stage six, what we actually do is make sure we’re transforming all the data from our source systems so it’s matching this standardization,” Keep says. “It starts with the initial data load.”
“With the initial load, you’ve got an empty single-view database and you pull in all the data from your source systems so it meets the requirements you’ve defined,” he adds. “Then you’ll capture updates to your single view. You might do that in batch, but what we’re seeing more commonly now is they want a much fresher view. For that, [Apache] Kafka is very popular now. It provides a near real-time version of the data. That’s what we call the delta load.”
Step 7: Match, merge and reconcile
Even though you standardized your data in the previous step, you’ll need to use algorithms to identify where records don’t line up based on source systems. For instance, a business travel application may draw on records that refer to ‘Mat Keep,’ ‘Mr. Keep’ and ‘Matthew Keep.’ Your single-view application needs to match, merge and reconcile those records.
“This is really one of the toughest stages to do,” Keep says. “I have to tell it I’m one in the same person to get my points. That’s where matching and merging comes in. You can use unique identifiers like credit card numbers: search on those fields to determine it’s the same person. If you don’t have that canonical data, or if there’s a typo, you need to catch file attributes. You can cluster records together with similar attributes and start to make decisions about whether that’s the same person or not. You can use tools to automate this process.”
Machine learning could potentially play a role here.
Step 8: Architecture design
The architecture design step marks the beginning of the deployment phase of your single-view project.
“This is how we’re physically going to deploy,” Keep says. “It’s about ensuring the underlying systems meet the performance goals and availability and security goals of the system.”
In this step, you’ll implement proper security protection for personally identifiable information (PII) and make certain the system is resilient to failures and outages.
Step 9: Modify the consuming systems
In this step, you’ll look at the systems that consume the data and make sure the applications are pointing to the single view. In most cases, this means creating RESTful APIs from which applications can pull their data.
Step 10: Implement maintenance processes
No business systems are static. They’re constantly changing as new processes are added or bugs are fixed. You might create the perfect data model, and it will remain so for five days until one of the source systems changes or breaks. That’s why a flexible data model is key to getting your single-view project right. The data model needs to keep pace with rapidly changing source systems.
“Really, step 10 is a meta step,” Keep says. “To maintain the single view, you need to go back through the previous nine steps and continuously update the data model. Step 10 is really a loop around the previous processes. You need change management processes in place so the single view remains current. The data steward is really the guardian of the source system. As new application functionality is rolled out, they need to be working with the single-view team to tell them about changes. It should be on-demand; the single-view team has to be ready to accommodate changes as they’re made and the data stewards should be working closely with the development team.”
Single view maturity model
Once you’ve gotten several single-view projects under your belt and feel comfortable with the methodology, you can become more ambitious with your vision.
“It’s very tempting to try to boil the ocean, but it’s more effective to go with a defined problem,” Keep says.
“Once the single view has proven itself, you know it works, customers get more adventurous in how they use it,” he adds. “They start writing to the single view to get even fresher data. We have some customers, like International Banking Group, that have taken a single-view first approach. When they need new functionality, they implement it in the single view first. When they’ve made all the changes to the back-end source system, they reverse load to the source systems.”