The web site is now storing only essential cookies on your computer. If you don't allow cookies, you may not be able to use certain features of the web site including but not limited to: log in, buy products, see personalized content, switch between site cultures. It is recommended that you allow all cookies. Cookie Policy

ASG Perspectives

Blog > November 2019 > Data Curation with an Intelligent Data Catalog Shortens Time to Data Value

Data Curation with an Intelligent Data Catalog Shortens Time to Data Value

According to a 2018 survey of almost 24,000 respondents across 147 countries, the average data professional spends only 12% of the time finding insights, and 20% building data science models and putting them into production. To get more value out of data, data professionals must devote less time to data preparation and more time to data use. Data curation is crucial for taming the Four Big Data Vs— volume, velocity, variety and veracity. It’s a discipline that frees up time for value-creating activities by providing data consumers—data scientists, analysts and citizen data scientists—with data ready for analysis, increasing their productivity and the value they create from data.
 
Statistician and writer Nate Silver pointed out, “As there is an exponential increase in the amount of available information, there is likewise an exponential increase in the number of hypotheses to investigate.That’s the volume/velocity/variety/veracity—“signal to noise”—problem.
Ian-blog-image.png
There’s so much data that finding real value requires data expertise and a structured approach to get to the “good” data. Machine learning and artificial intelligence can help, but only as part of a managed process in combination with subject matter expertise.

The Four Vs are often used to describe big data, but they are really about the opportunity for new insights and the need for new approaches. The point of view has grown from, “let’s decide what to do and design, and then get the right data,” to include, “and let’s capture and understand data, and learn what we can from it.” This shift in perspective changes the data quality issue—the question of veracity and of how far data can be trusted. Instead of being fixed, data quality requirements vary, depending on how the data is used. Curation is critical to establishing and recording quality attributes.
 
Data curators follow a path:
  1. Discover data
  2. Select the “right” data for the task
  3. Import the data into the right datastore
  4. Prepare the data for use—which includes cleansing and reducing duplication
  5. Improve the data—based on experience gained as it is used
  6. Share the data—make sure it can be found and communicated
  7. Govern the data—which includes understanding the quality and determining access rights
 
Curation is collaborative and iterative. One person may be designated as a curator, but it’s not unlikely that several people will be involved in the curation process. Using information builds understanding, which can then be used to improve the data—and the improved data can drive added value.

ASG’s Data Intelligence provides support for the entire data curation process.

Data Discovery: The platform discovers data in a wide range of data sources across mainframe, distributed and cloud environments. Your organization's expert users can be authorized to tag content, add descriptions and more with relevant concepts to enhance discovery and understand data. Faceted search leads users to the best data options for any challenge.

Data Selection: Users build knowledge about discovered data with lineage, profiling and data sampling. Crowdsourced ratings and comments, and the business glossary, add to that knowledge and allow the best data to be  selected for any use case.

Data Import: Having identified a useful source of data, the curator builds a data set. A data set supports logical grouping of information about data elements within the same or different data catalogs. Data assets or features of interest are assigned to a data set and the visibility specified to control which users can view the data set.

Data Preparation: A user looking for information can discover data sets and then request access. For more extensive data preparation needs, the platform provides a pipeline for authorized users to access the data for data preparation and analysis through features such as a plugin to Google’s Cloud Data Fusion (previously the Cask™ Data Application Platform or CDAP) to support self-service preparation, transformation and cleaning of data for analysis. Data and metadata can be retrieved for data profiling, sampling and pattern matching.

Data Improvement: The expert, authorized users in your organizatin continue to build tribal knowledge and enhance the data sets with ratings, recommendations and comments on data value. They can also flag issues when they find problems profiling with data and provide indications of data quality—completeness, value distribution, value ranges and data patterns. On top of this is the “wisdom of crowds,” as data users apply ratings and recommendations to comment on the value of data and flag issues when they find problems. Adding governed searchable tags and linking them to ASG’s Business Glossary helps with data classification.

Data Sharing: When data sets have been created, users can request access with a built-in workflow. Then they can use their preferred tools. The platform has built-in support for Jupyter notebooks, supporting and accelerating the data science workflow. The Data Iintelligence platform provides a pipeline to allow users to access their datasets in Tableau, the leading business intelligence and data visualization platform. When they have completed their analysis and created an insightful new report, visualization or another dataset, they can then push it out to other catalog users to continue to build upon their efforts, creating a knowledge building “virtuous circle.”

Data Governance: Finally, the Data Intelligence platform supports the creation of data value while mitigating data risks, solving the problems of searchability and simplifying the process of routing the right data to the right people. Role and workflow management supports the provisioning and problem resolution processes; tagging makes data easier to find, and ratings and recommendations allow data to be flagged for usability and trustability.

ASG Data Intelligence offers a range of capabilities to support data curation that reduce the amount of time spent searching for and preparing data. It’s an essential tool for data curators who provide the vital link between business and data. They need a broad range of skills and tools that simplify their work are critical to their success.

To learn how ASG Data Intelligence can support your organization’s data duration, visit this product page. For more information on the Three Cs of Big Data – curation, crowdsourcing and collaboration – read this blog post here.
 
Silver: The Signal and the Noise; Why So Many Predictions Fail – But Some Don’t; Penguin Books