Code Transformer: Pretrained Models and Preprocessed Data

Dokumenttyp:

Forschungsdaten

Verantwortlich:

Zügner, Daniel

Autorinnen / Autoren:

Zügner, Daniel (1); Kirschstein, Tobias (1); Catasta, Michele (2); Leskovec, Jure (2); Günnemann, Stephan (1)

Institutionszugehörigkeit:

1. Technical University of Munich (TUM)
2. Computer Science Department, Stanford University

Herausgeber:

TUM

Enddatum der Datenerzeugung:

31.03.2021

Fachgebiet:

DAT Datenverarbeitung, Informatik

Quellen der Daten:

Experimente und Beobachtungen / experiments and observations

Andere Quellen der Daten:

Pre-trained models

Datentyp:

mehrdimensionale Visualisierungen oder Modelle / models

Methode der Datenerhebung:

This repository contains the preprocessed data and pretrained model from the ICLR 2021 paper Language-Agnostic Representation Learning of Source Code from Structure and Context. The preprocessed data is from the CodeSearchNet Challenge ( https://github.com/github/CodeSearchNet/) as well as the code2seq paper (Java-small, Java-medium, Java-large; https://github.com/tech-srl/code2seq/). We first use GitHub semantic to construct abstract syntax trees (ASTs) for each code snippet. We then compute d... »

Beschreibung:

This repository contains pretrained models as well as preprocessed data from the ICLR 2021 paper Language-Agnostic Representation Learning of Source Code from Structure and Context. See https://www.in.tum.de/daml/code-transformer/ for additional information such as the code, paper, and poster. The preprocessed data is based on the CodeSearchNet Challenge ( https://github.com/github/CodeSearchNet/) as well as the code2seq paper (Java-small, Java-medium, Java-large; https://github.com/tech-srl/code2seq/) datasets.
See the original datasets for their licenses and terms of use.

Links:

Additional Information: https://openreview.net/forum?id=Xh5eMZVONGF

https://github.com/danielzuegner/code-transformer

https://www.daml.in.tum.de/code-transformer/

Schlagworte:

machine learning; deep learning; transformer models; source code; representation learning

Technische Hinweise:

View and download (40 GB total, 15 Files)
The data server also offers downloads with FTP
The data server also offers downloads with rsync (password m1647000):
rsync rsync:// m1647000@dataserv.ub.tum.de/ m1647000/

Sprache:

Rechte:

by, http://creativecommons.org/licenses/by/4.0

BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 26 - Professur für Data Analytics and Machine Learning (Prof. Günnemann)

mediaTUM Gesamtbestand Forschungsdaten Datenverarbeitung, Informatik