User: Guest  Login
Document type:
Publication date:
Hussain Shah, Syed Ashfaq
Hussain Shah, Syed Ashfaq; Petzold, Frank
Author affiliation:
Data profiling and migration processes
End date of data production:
Subject area:
DAT Datenverarbeitung, Informatik; INF Informationswesen, Bibliotheks-, Dokumentations-, Archiv-, Museumswesen; WIR Wirtschaftswissenschaften; WIS Wissenschaftskunde
Resource type:
Textdokumente / text documents
Data type:
Texte / texts
Other data type:
Code, JSON
In large research projects participants are required to migrate data to the official information infrastructure and adopt that for subsequent research activities. This package contains migration plan and corresponding data which were created to migrate data and users of large Collaborative Research Centres (CRC) e.g. TRR277 AMC (a CRC funded by German Research Council).

In this data package there are two code packages, an example of data profile, data profile templates and a plan of data and users’ migration. One of the code packages is an example of framework for data profiling to migrate data. The current implementation of this package is based on the example of WebDAV interface offered by PowerFolder. The other code package creates directory structures together with corresponding metadata files in bulk quantity for data to be placed in Data Science Storage (DSS). It creates all the entities based on the specified naming convention for projects, work packages, storages and folders as well as corresponding metadata templates. For details about templates and structures, please refer to the data package “Simplified DataCite compliant metadata templates and directory structures to manage research data” available at The naming convention in this case maintains the contextual information and facilitates the integration of data in TUM Workbench. The profile templates are in tabular forms which are also suitable for spreadsheet format and Web/ digital form. Example of data profile, generated by the data profiling application, is in JSON format. It has been truncated to remove redundant information and edited to replace personal information e.g. values are replaced with ###

The contents of this data package are based on the AMC specific policy, information infrastructure, project distribution and organisation, nature of data etc. For details about policy, please refer to the data package “Research data Management policy for large CRC projects” available at The defined strategy and procedures may be adapted by the target groups as per their own policy, information infrastructure, the distribution of tasks and data organisation schemes etc.
Key words:
Data profiling, Data migration, User migration, Plan, PowerFolder, WebDAV, Framework, Metadata, Template, Data organisation, Data package generator, Project packaging, Digital object packaging, Data bundle, Scheme, RDM, Research data management, CRC, Collaborative research centre
Technical remarks:
View and download (12 MB total, 4 Files)
The data server also offers downloads with FTP
The data server also offers downloads with rsync (password m1735421):
rsync rsync://
Horizon 2020:
DFG, Project Number 414265976 - TRR 277