Menu

Before we get into metadata management, we must understand what metadata really is. To put in simple terms, it is data about data.  Metadata started as walls full of drawers, filled with little cards that showed users at libraries where to find specific information. Librarians would meticulously catalogue information from each book into the cards that are categorised by keyword and subject. Today, this same practice is called Metadata Management and is responsible to maintain a system of record of information contained in each enterprise data lake.

Any time someone tries to tell you that metadata is ‘meaningless, don’t worry, it’s just who you call, it’s just phone records, it’s not a big deal’ – realize we kill people based on metadata. So they must be pretty darn certain that they think they know something based on metadata.

– Rand Paul

For enterprises, though, the question of metadata is a little less obvious. Data is

  • Aggregated from all kinds of sources
  • Aggregated at different points in time
  • Aggregated by different people
  • Moved between multiple storage systems
  • Accessed by multiple applications and users
  • Transformed, modified and at different instances
  • And ultimately managed till obsolescence

Metadata here could be defined as a system of record of every transaction made on enterprise data.

Why metadata

If you walked into a library with a thought in your head and nothing else, what you would experience is the weight of all the information contained in the library. Not something you would like – which is why search engines became as popular – because they help put things into perspective before they display 4 million search results for your keyword or key phrase.

If you walked into a library with a thought in your head and nothing else, what you would experience is the weight of all the information contained in the library. Not something you would like – which is why search engines became as popular – because they help put things into perspective before they display 4 million search results for your keyword or key phrase.

What metadata does is:

  • Traceability: Provide an authentic record of where the data came from, when, who acquired it, where it was stored at different points in time, when it was last used, and by whom.
  • Searchability: By being a system of record, it speeds up the process of enterprise search, which is most critical in a day and age where real-time access to information is crucial for decision making.
  • Validity: For industries, such as Oil and Gas, where accuracy and authenticity of information is of paramount importance. Validity of data, especially geospatial data such as GIS maps of seismic data, can have a significant impact on exploration costs and ultimately productivity.

Metadata management

Having established its importance, metadata needs to be managed effectively for enterprise data to be productive and efficient. But before we step into what it encompasses, metadata management is defined as:

the end-to-end process and governance framework for creating, controlling, enhancing, attributing, defining and managing a metadata schema, model or other structured aggregation system, either independently or within a repository and the associated supporting processes (often to enable the management of content)

Imagine terabytes, or rather petabytes of data in the form of maps, forms, whitepapers, reports, machine data, and a host of other formats that comprise enterprise data. Now imagine trying to unearth insights from this data hoard. With metadata, however, each of these data sets are categorised and catalogued under logical search strings that allow search engines to identify the quintessential needle in the haystack. So when you’re looking to identify the most productive location for drilling, the search engine can pull information from seismic data, drilling data, contract reports, logistics data and a host of other reports to give you information that can help you take that decision quickly and effectively. However, it all starts with creating the metadata registry.

Metadata discovery or harvesting

As a process, metadata management starts with Metadata Discovery, where we search enterprise data for logical associations within data sets and map them to search criteria. This forms the metadata registry. It works at three levels:

  • Lexical Matching: Where data is matched by words contained in the data or the description of said data. This includes matches for exact text, synonyms, and lexical patterns contained in the data sets.
  • Semantic Matching: Where the data is further mapped to semantic search strings of lexical matches.
  • Statistical Matching: Where data sets are matched to results of statistical analysis such as standard deviations in logistics records.

Metadata management tools

There are a multitude of metadata management tools from IBM to Informatica, to Esquire – the list is endless and each has their specific benefits and challenges. For each enterprise, it is important that the tool is selected, based on:

  • The size of the data lake
  • The diversity in data formats
  • Compatibility with storage systems
  • Compatibility with enterprise applications
  • And most importantly, the business case for metadata management

Each of the available tools offer automated metadata discovery and diverse features that help in metadata management. Ultimately, it is the enterprise-specific business case that will help identify the most efficient metadata management solution that needs to be implemented.