Exclusive Interview: Informatica's Mike Anderson on the Future of Big Data in the Public Sector

This month, DLT Chief Data Scientist Sherry Bennett sat down with Informatica Cheif Strategist for the Public Sector, Mike Anderson to discuss his views on the future of big data in the public sector. Read the exclusive interview below to hear directly from one of the industry's top big data leaders.

SB: To get us started, could you tell us a little bit about your role as Chief Strategist for Public Sector for Informatica 

MA: Thanks Sherry. As Informatica’s Chief Strategist, I kind of have my tentacles in various pies across the company to make sure our solutions, our capabilities, our services match what our customers in the public sector are looking for, and what they need to get their arms around big data challenges that are here today. 

And as well as public-facing, I try to make sure we connect those people who can do the best good with customers that are looking for the really cutting-edge technologies that are going to get them to where they need to be today with data management. 

We all know that getting our arms around data, to use an oft-used framework, a ”People, Processes and Technology” approach, is necessary. Especially when building towards a data-driven organizational culture. However, they all work together in concert and must be looked at iteratively to successfully manage and implement a data strategy in any organization. My job is to help tie all those components together, make sure that we can provide that kind of support around the data challenges that customers are facing today, and ensure Informatica is aligned with those challenges in the public sector.

SB: Thank you. Well, I know that you have been very engaged in the federal data strategy work across many agencies, including the DoD. And since its inception, providing comments and feedback to stakeholders who are now beholden, from a legal standpoint, to adhere to all the various action items that have been rolled out for FY21 and soon we will have another set of action items for FY22.

In all your work within the public sector, both federal and the state, local— what do you see as the top three most issues most agencies are still struggling with?

MA: Sure, I think that the top three is probably a good way to approach it. One, right off the top I would say is, especially given the current crisis we're in and some of the challenges organizations across the public sector are facing, whether your state government, municipal government, or whether at the federal level, is the sharing of data and the actions taken to ensure collaboration across departments within an agency, and then across agencies itself. I think we found that data sharing is key in the current crisis environment. 

The collaboration we can implement with data is one of the most important things we can do in a crisis, like we have been faced with COVID recently. If we're still working in stove pipes, and there's the “mine, mine, mine” data approach, then the data required for evidence-based decision making per the law that you just mentioned really fails to fully implement and achieve what it was intended to. Evidence-based decision making is essentially having access to data across departments across agencies across persons, places or things in order to curate that data and make sure it's available, and it's clean and complete for decision makers to make decisions.

“Now that you've got a [data] strategy in place, don't let it gather dust on the shelf.”

I think the second issue is the whole concept around data governance. You mentioned the Federal Data Strategy, which is hand in glove with the Evidence-Based Decision-Making Act.  The Federal Data Strategy was published by OMB one year ago, followed quickly by implementation or action plans. Now everybody waits breathlessly for the calendar year 2021 action plan, which we expect with the new administration, probably here this late winter, early spring.

But the hard part is, now that you've got a strategy in place, don't let it gather dust on the shelf, we need to implement those strategies, and they must be implemented across the same framework. I mentioned people, processes, and technology. It seems that the past year or two, organizations in the public sector have been focused on the people. Part of the issue is you need a chief data officer in charge somewhere. You also need other positions such as chief evaluation officers, chief statistical officers. You need to set up organizations or committees or overseeing boards, such as governance boards, who start to bring together lines of business or lines of mission and executives to make sure you can get your arms around data and what kind of policies you're going to promulgate and publish within your organization. 

You're establishing processes and data workflows through policy that addresses data across an organization. But if you're not going to automate those processes and avoid creating more manual, redundant methodologies of managing data, technology capabilities must be brought to bear. This is where I think if you've got a strategy in place, you've taken those initial steps across people and processes, it's time for organizations to get after the technology that's going to enable the implementation of those strategies and objectives, whether in a large city’s strategy on data, whether at a state level, or in a federal agency, that's what we need to get after now, what some of the focus needs to be, I think, moving forward.

The last I would say is a priority, and we see this piece across the board, is that the move to cloud continues. It has been happening for a decade, but that move to cloud is I think exponentially taking off across states, and the federal government as well. Organizations have pointed themselves in the direction of wanting to move workloads to the cloud for efficiencies and effectiveness. No one wants to continue to expend operational budget on building their own data centers and on managing other on-prem environments. And the cloud gets them out of that and allows them to use technology for business and mission outcomes that they want versus managing the technology itself. 

“Cloud and AI go together…data is at the core of that.”

On the topic of artificial intelligence, I like to talk about what I call the Trinity of AI. AI is taking off throughout the public sector, whether it be focused on robotic process automation, whether it be machine learning, or some advanced algorithms to help do predictive analytics, or feed data models.  Essentially, the trinity refers to the dependencies between AI, the cloud, and data.  AI needs a tremendous amount of data to train its algorithms, and that amount of data can be efficiently and effectively provided only in a cloud environment.  It’s uneconomical in most cases to attempt to do on-prem where the compute and data costs will be too high.  So, AI needs the cloud, and data is at the core of the trinity.  Data must be discovered – wherever it lives and in whatever format it comes in – it needs to be cleaned, curated and fit for purpose before it’s fed to an AI tool or any type of advanced analytics platform.  Bad or incomplete data, or too little data, results in poor AI outcomes.  Focus on the data plumbing first and the outcomes will achieve an organization’s objectives.    

SB: What are some salient use cases where Informatica has made an impact from a public sector perspective, enabling data sharing and collaboration?

MA: Of course, so as leaders embrace and continue to develop capabilities in their organizations to truly manage data and change the way their organizations use data for the outcomes they need, a few key points are worth looking at.

You and I have often talked about the Department of Defense (DoD). When it comes to the DoD, one of the things all Soldiers, Airmen, Marines, Sailors and now Guardians (the new Space Force), know early in their career is the importance of the ability to “know yourself,” know the capabilities of your organization and know the resources you have on hand—from equipment to manpower, to personnel readiness and qualifications, to supplies and training. The Services ensure they know themselves first before they even approach planning to address our adversaries' moves—you obviously need to know your adversary too, but more importantly, you need to know yourself first. 

When you really break it down, knowing yourself and what you are capable of essential comes down to data management. You need to be able to discover your organization’s needs to know the data required for a particular product, program or project. It is key to be able to do that and do it in an automated way, because trying to do that manually, with the amount of data that organizations have access to today and are trying to collect and share and collaborate on, would be nearly impossible – massively inefficient at best. You need an automated capability to be able to go out and discover that data. Where is it? Is it coming from sensors? Is it coming from the internet of things? Is it coming from workloads in the clouds and applications in the cloud? Is it coming from mobile applications – perhaps reports from the field? Even some of that data is coming from mainframes still today, right? Where is that data? Let me find it so I can get it fit for purpose – and so traditional experiential decision making in the military can be enhanced with the right data at the right time and place. 

Collecting all that data into a simple table with all the rows and columns and then all the data that's associated with it can get exponentially difficult to manage. What you are really talking about when you catalog data is a capability to manage the metadata. Ideally, you should be able to catalog data, find what you need through simple query where that data is, and narrow down the specific information you need for your purpose, so you are not searching for a needle a haystack. 

“All data is not the same.”

You and I have also frequently discussed the importance of clean data, commonly known as data quality. Before you feed an analytics tool, a dashboard, an AI algorithm, the data that you've discovered, and now catalog, needs to be uncompromised. The analogy I like to give is if you have bad data, it's garbage in, garbage out. I think that's pretty simple. 

All data is not the same. We know when you're collecting from multiple sources, then trying to send to a target system, data is going to be structured, it's going to be unstructured, it's going to be in different formats. It is key that there is a governance wrapper around your data so you have a common workflow of that data, you have a common lexicon. It is important that when you're talking about a person, it is indeed a person and a name and associated place, because simple things like different data fields across different types of data in different formats can get problematic when you're trying to feed it to that target system. Governance is all a part of that it's really the core of any data management program.
The next thing I think needs to be considered, is what data should be accessible by which people, which roles, and which responsibilities? Take medical records for example. Being able to see data across a patient’s history, a lot of that is controlled and regulated by personal health information, by law and other global restrictions, and a lot of times a researcher will only be required to see the raw data and not associated with a particular person. 

There are the many capabilities today that make sure that people only see the data they need to and are authorized to, and they protect and meet compliance requirements of personal health information. As an example, let’s take sensitive information at the Department of Defense or within the Department of Homeland Security. Researchers have data scientists, policy-makers, and decision-makers that need access to data in order to apply a use case with it. But perhaps they don't need all of the information, so protection is important not just from a compliance standpoint, but from a mission standpoint as well.

Lastly—and we've kind of touched on it—is mastering that data. It is critical to have a one single Golden Record, if you will, with a full view, regardless of the different departments, agency silos and systems. Being able to access and understand all data associated with a person, place, or thing is the type of automation that exponentially drives the power of data for those who need it.  Also knowing where that data is originating from, knowing its lineage so you can trust the data, can help ensure decisions made are based on a solid foundation; they are evidence-based. And that's what we really get at when we talk about mastering data management – simplifying it for the user to find all the data they need in one place, in a timely manner.

What happens when you do all of this successfully? Well, you get accurate dashboards to make decisions on, and you get accurate, complete data to feed an artificial intelligence algorithm that's going to give you the output you want or any type of advanced analytics capability. 

“It is critical to have one single Golden Record [of your data] …”

There are some great examples of this in the DoD. The US Air Force Chief Data Officer’s office created the VAULT platform that has had great success with capabilities like automated cataloging, governance and data quality to help with things like aircraft readiness and maintenance, and personnel training and qualifications management. 

Another example within the Air Force is Master Data Management of a person, you want to have a 360-degree view of the person in order to help manage their career and track their progress. They have full access to what the Air Force sees as their career path, their career progression, and where they should be going next to serve their own growth as well as to serve mission-needs of the Air Force—that's all possible through data. Today, if you look at most human resources programs, you'll have a training database, you’ll have an education database, maybe they have some certifications and qualifications stored in one base, and yet another system with a record of their assignment locations, and so on. Maybe they were assigned to a camp poster station for two years, and then another one the next two years. This data ends up being so stove piped in those various systems and locations, that it doesn't help the career of that person and it doesn't help the organization. So that's where the 360-degree view comes in. 

Private Mike Anderson, or Airman Mike Anderson wants to be able to get on a site to see all the information associated with their entire career. Where did they enlist? Where were they trained? What certificates do they have? What kind of education do they have? Where should I go next and what kind of training should I seek.  Instead of having to go to various systems to draw all of that information, Private Anderson will have a better opportunity for future assignments, future training, future education, when they have that full view in one location.

Additional areas where Informatica capabilities have had a positive impact is the Food and Drug Administration. The (FDA) has an enormous responsibility to manage all the prescription drugs that this country relies on. If you think about a prescription drug and you go pick up a drug, it's typically been prescribed by your health care provider. But think about all the raw ingredients manufacturing, developing transportation, shipping, and then issuing to you the patient that goes into that one drug. The FDA is responsible for making sure that the supply chain of that drug is safe, it's managed well, and the distribution of that drug is appropriate for where it's needed most throughout the country.

The FDA needs to be able to have all of the information around each drug in order to manage it appropriately. That is the kind of view over an entire prescription drug that the FDA has today because they've invested in automated data management capabilities that give them that view across all of the required information. So they can make decisions on, “Hey, do I need to increase supply or I need to reduce it, or I need to make sure it goes to the west coast instead of the East Coast. And or are we going to run out of that supply because a hurricane just wiped out the facility that brings all those raw materials and manufacturers and puts them in one place?”

Finally, I'd like to talk about a recent use case at the state level. One of our larger state’s Department of Health, as many organizations were during the COVID crisis, was challenged in bringing all the information in on the status of potential patients testing positive for COVID, whether it was coming from nursing homes, whether it's coming from hospitals, other providers, drive by testing facilities in parking lots established everywhere. Much of that data was coming in various formats, to a central location at the Department of Health. And that information was what was driving decisions that impacted the economy, impacted patient care, etc. Getting all that together was becoming very difficult and unsustainable, and it wasn't necessarily accomplished in a timely manner due to manual processes and the amount of data requiring integration and cleaning. So in order to be able to provide good, strong, complete data to decision-makers to decide on what actions needed to be taken to secure the health and welfare of its citizens, what kind of personal protective equipment were needed to go where and when and what had priorities, integrating identifying all the data, integrating it and bringing it into one location, they ensured automated data management technology was leveraged appropriately. This state’s Department of Health really got their arms around it in 2020, early in the crisis. As a result,  decision making became much better based on complete and accurate information. That kind of result impacts all aspects of life under the COVID crisis we're going through.

SB: Thanks Mike, those use cases are very informative. Before we go, one last question, if you were King for a day and could dictate two high-priority items that need to get done to realize the potential of data as a strategic asset across the federal government – what would they be? What two things would you recommend to the Biden, Harris administration that are fundamental for the success of Evidence Act and Federal Data Strategy to reach their intended purpose? 

MA: Thanks, Sherry, I think it's a pretty common approach that it's great to have the Federal Data Strategy, it's great to have a law that was passed and emphasizes it and the action plans that followed. I think one of the two biggest things are non-technology issues, right?

One, there are positive signs to funding the implementation of the strategy; it has started fairly recently. In the early stages of 2020, the actions necessary to implement the Federal Data Strategy weren't necessarily funded. Agencies, especially in the federal level, we're having to find available resources or repurpose operational dollars in order to implement the activities associated with executing on the strategy. Establishing a CDO, establishing a chief evaluation officer, etc, establishing a data governance board except for maybe some repurposing of funds for people, that took care of a good portion of it. 

“You got to have metrics for reporting and monitoring. And you've got to automate those processes.”

But now when you start talking about, ah, I need to get all my data together. I've got policies to integrate data to get people collaborating to, to establish workflows and rules around data and a common lexicon, but how do I implement that? So, funding becomes even more important in order to automate and bring technical capabilities in order to execute on all those plans. We've got a great start, you'll see data governance boards, and in almost every agency today, you'll see Chief Data Officers in I think it’s up to about 90%. These people and these boards in these processes that have been established, now need to get into third gear and move forward to make sure the capabilities are there for them to implement. It's not a really technical issue, industry has capabilities today that can automate much of the data management processes required by the strategy’s action plans. 

So funding, I think, is key to moving forward and applying that funding to capabilities, largely commercial off the shelf capabilities that don't require lots of coding and development work, is a requirement to achieving the strategy’s goals.

Secondly, I think the critical to the strategy’s success is holding organizations accountable for progress. This is a leadership issue, I think, to show progress on the federal data strategy and on the action plans, and demonstrate to Congress that they passed a law that's made a difference. Metrics need to be very, very consciously developed, monitored and reported on in order to show where that additional funding that I mentioned in priority one is going to be applied to and the impacts that it will have on those capabilities. 

Holding organizations accountable to reporting and monitoring progress, is going to be the next step. As the funding starts to flow, and again, there are positive signs that it's starting to flow, and automate, automate, automate! It's all great to put people in positions and for processes to be established, but you’ve got to fund these positions then provide them with the requisite resources to do their job.   Then, holding leaders accountable for successful implementation will also lend to meeting the stated goals. Metrics for reporting and monitoring, leveraging data management technology to automate that reporting and monitoring, all funded appropriately sounds like a recipe for success.