When I wrote about the introduction of our new release of ASG Enterprise Data Intelligence and its associated Intelligent Data Catalog, I talked about the richness and complexity of the new data world. An important, but murky, part of that new world is the Data Lake.
The Data Lake is a deceptively simple idea. There’s so much data in the new data world, it accumulates so quickly, and with such variety that the complexity is daunting. The Data Lake is a massive data repository for all that data in all its variety. The Data Lake contains a tremendous amount of useful information, and many useful tools are emerging to allow it to be used. There are new programming languages, self-service data preparation tools, statistical packages and self-service analytic tools.
Storing all the data in the Data Lake means that we generally know where data is. But how do we find the data we need to use to work on a particular business problem? There’s a risk that the valuable information we need is lost in the weeds! The Data Lake is pretty useful metaphor when I think about it. Many sources feed the Lake and not all of them may be that clean. Unmanaged items can pollute the lake with “toxic” data. Moreover, of course, lakes can have more than one runoff, and they might need controlling. The Data Lake has many sources, and those sources might be of varying quality. Uncontrolled modification of data can certainly lead to pollution, and you definitely want to manage who can get what out of the Data Lake.
The tools for using the data in the Lake offer opportunities that have never been available to sophisticated business users – Citizen Data Scientists – before. But there’s a problem – how to find the right data to handle with those great new tools. That’s the purpose of a new capability of Data Intelligence solutions – the Intelligent Data Catalog.
The data catalog builds the inventory of data assets available for exploration. Ideally the data catalog also relates, as far as possible, data assets to aspects of the business. Data assets that are not placed in a business context and connected to business terms are of little value to business users. Another key element is the inclusion of collaboration and social capabilities that allow users to work together to add their own knowledge to the knowledge automatically built by the catalog itself.
Most current data catalog solutions are limited in scope, being focused on the Data Lake, and the most popular use of the data is for exploration of new analytic, business opportunities. But Big Data technologies are quickly being exploited for other things and the Intelligent Data Catalog will expand to become the central source of Data Intelligence covering operational systems, data warehouses, analytic data stores, and reference and master data, as well as the Data Lake. It might even be the Data Intelligence store for the Internet of Things.
Data catalogs have a great future. But the present reality is that there is still a lot of data that isn’t in the Data Lake. The systems that run businesses still depend on traditional data stores, and many data warehouses still have a relational database foundation. How is that challenge to be addressed?  ASG Technologies is solving the problem by using the common thread – the business model and language, in the business glossary – to bridge the enterprise repository and the Intelligent Data Catalog. The broad search capability that allows me to “Find My Data” is a key searchlight for the Data Lake. There are a lot of juicy fish in that lake – but there are many weeds as well!
8/31/2017 Ian Rowlands
