2025: The year we free our IAM data

For the last 20 years, the digital identity market has been primarily focused on protocols and what’s transiting the wires; it has not been standardizing interfaces to data at rest. We have not thought about data-at-rest and the schema thereof since our LDAP days and inetOrgPerson1.

Over the last year I have been writing about what I think modern IAM looks like and what are the architectures we need to support it. Although I do not fully understand this moment in the market, I feel strongly that the future of IAM includes a robust data tier and near-real time events.

To that end, IAM’s drought of attention with respect to schema and data-at-rest must end. Regardless of what you call the latest iteration of the IAM market, the real battle is being fought to be the new identity data tier for the enterprise. Technology providers are pulling different kinds of IAM data together both admin- and run-time. Some are even infusing all of that with event-time data. And in this way, they are addressing one of the fundamental flaws in traditional IAM product approaches: they only know about the data they can push and pull through their own connections to applications and services. While that’s great and needed, it is by no means enough. If an IAM solution’s data repository is closed to you, the buyer and consumer of their services, it does you less good, than an open one.

It’s all about the data and data management

Data is at the center of the modern IAM architecture. Not just data that a single piece of IAM infrastructure can use, but data (and a supporting data tier) that your entire identity fabric can use. Consider that today, in your identity fabric, each component has its own data tier. I posit that each one of those has redundant data within it… my guess is at least 30% of each of these data silos is redundant with the other data silos in your IAM infrastructure. It’s very likely that data objects such as User and Person are found in every data silo for each component in your IAM infrastructure and each is just slightly different from one another, likely due to difference in data sync schedules and data transformation. That is not only needlessly complex, it is dangerous; difference between data stores causes policy clashes while increasing the surface area you have to protect.

And let’s be real here, IAM practitioners are not data practitioners nor data scientists and we shouldn’t try to be. Moreover, IAM vendors, on the whole, are not data management vendors and their offerings are not skilled in data management.

What if standards existed?

I propose that the industry standardizes interfaces for at least a core schema of workforce IAM and events. We could model them after SCIM core schema 2 and the schemas from CAEP and RISC. These interfaces can be thought of as views in a traditional SQL database; they would provide a standardized window into the data tier for some of the most important entities in IAM-land. If these standardized interfaces existed then you could:

operate any supporting IAM vendor’s offering on top of your own self-managed data tier
point multiple pieces of your IAM fabric at the exact same data tier
extract data from a supporting vendor in a vendor-neutral format for migration, reporting, or other purposes

To this end, I propose Open IAM Data Schema (aka OIDS) with a loving tip of the hat to object identifiers in LDAP and the last time the industry standardized data at rest: inetOrgPerson.

Why OIDS in the enterprise?

There are many reasons for enterprises, of all sizes, to support OIDS. The first three that come to mind are:

avoid vendor lock-in
fix data quality
get out of the business of managing identity data

First, on the subject of vendor lock-in. Every decade or so IAM teams change major components in their IAM infrastructure such as their IGA system. Part and parcel to that change is a massive data migration and re-integration. To do that they often call upon a service provider, write a two-comma check, and hope for the best. If OIDS existed, then that cost can be reduced and the ease of migration would be increased. Additionally, when selecting a new IAM component, IAM teams could point each vendor in their bakeoff to the exact same OIDS compliant data store and compare the real value that each provides, instead of spending time marshalling data into proprietary formats.

Second, OIDS would help address data quality issues. We know that our tools are only as good as the data they operate upon. But those tools don’t give us world class data management functions; they aren’t built for it. Often to fix data quality issues, we have to attempt to fix them in the source systems, which we don’t own. (Ever trying telling HR to change their data… not such a fun conversation.) Since we cannot often change data in source systems, that means we have to build a data ingestion process (and often a brittle one) that fixes the data. OIDS would give us an alternative. We could operate our own data tier in partnership with proper data management professions and fix the data in that data tier. Then any product that understood OIDS could refer to that data. We can fix the data in a system we own without disturbing upstream systems and without depending on our IAM products to do so.

Third, OIDS would be a major step of getting IAM practitioners out of the business of data management. We have enough on our plates without also moonlighting as data management experts too. Identity data is just data at the end of the day. Our applications (the IAM infrastructure) should be managed like any other enterprise application - at least from a data perspective. By standardizing schema, we create the opportunity for enterprises with the resources to build their own data tiers to power their IAM infrastructure. That data tier might live within a security data lake to benefit from adjacency with security-oriented contextual data or stand alone or, in CIAM use cases, be a part of the customer data platform (CDP.) The point being is that we, as IAM practitioners building modern IAM architectures, need the help of data management practitioners and OIDS is a step in that direction.

Finally, OIDS adds other value too… ask me about bring-your-own-AI-models next year ;-)

Why would IAM vendors support OIDS?

In modern IAM, the battle for customer dollars will be won by demonstrating superior orchestration and policy capabilities: ones that can use all three kinds of IAM data: admin-, run-, and event-time. You, as a vendor, are not going to win any customers on the strength of your 3rd normal form. Supporting OIDS acknowledges that the real value of your IAM tools lies elsewhere.

Additionally, the data that lives in IAM tools is the customer’s data after all. They should be able to manage it as they see fit, whether that means storing it in a repository you provide, or one that they provide.

By the way, you, as a vendor, likely already support a standardized data schema in other parts of your products: Open Cybersecurity Schema Framework. OCSF provides a means for standardized logging of critical security related events. While there is an IAM category in the schema, it is focused on recording events like authentication, but not core data entities such as User or Person or Entitlement. That is what OIDS is proposing to standardize.

Lastly, supporting OIDS helps reduce your time to value. If you do not have to spend months marshalling data, getting data sources wired up, dealing with the inevitable quality issues, you can demonstrate the value of your offering faster, enabling your customer to see returns from their investment sooner, and, ideally, leading to better outcomes for both the customer and yourself.

What next?

I ran this idea by the captive audience at Gartner IAM last week. I talked to large enterprises who had the resources and willingness to build and maintain their own identity data tiers and they immediately saw the value. I talked to vendor architects and strategists who also saw the long term innovation this could enable. So my 2025 New Year’s wish is to capitalize on this support and find a home in a proper standard body, enlist some data management experts, and get to building v1 of OIDS. Who’s with me?