This repository contains the preprocessed data and pretrained model from the ICLR 2021 paper Language-Agnostic Representation Learning of Source Code from Structure and Context. The preprocessed data is from the CodeSearchNet Challenge (
https://github.com/github/CodeSearchNet/) as well as the code2seq paper (Java-small, Java-medium, Java-large;
https://github.com/tech-srl/code2seq/). We first use GitHub semantic to construct abstract syntax trees (ASTs) for each code snippet. We then compute distance metrics on the AST, as described in the paper. The models were trained using PyTorch. See the paper / code for details
«
This repository contains the preprocessed data and pretrained model from the ICLR 2021 paper Language-Agnostic Representation Learning of Source Code from Structure and Context. The preprocessed data is from the CodeSearchNet Challenge (
https://github.com/github/CodeSearchNet/) as well as the code2seq paper (Java-small, Java-medium, Java-large;
https://github.com/tech-srl/code2seq/). We first use GitHub semantic to construct abstract syntax trees (ASTs) for each code snippet. We then compute d...
»