Do you have data sets scattered all over the place: multiple local folders, Git repos, cloud services, databases? Is it sometimes difficult to remember which data set contains what, and where they’re all stored?
Thanks to the pointblank R package, you can document your data sets via R scripts in a report that not only describes column types and data provenance, but also includes where the data set is stored, how it gets updated, what if any key projects use it, and anything else you’d like to add. Since each data dictionary report is generated by an R script, you can include whatever metadata fields are important to you, then use that same structure for each data set.
To read this article in full, please click here
Article Categories: