Software packages
We offer these software packages to interdisciplinary researchers as tools/methods for their work.
- SCORES is a graphical user interface to support social scientists in processing free text responses from surveys. The free text responses are first converted into vectors via small language models. Then, the vectors are clustered via K-Means and agglomerative clustering. All steps of the process are configurable and explained to support non-machine learning researchers as much as possible in using the methods.
- edist implements a variety of edit distances between sequences and trees, including backtracing and metric learning (Paaßen et al., 2018), in cython. In particular, the library contains implementations for the Levenshtein distance, dynamic time warping, the affine edit distance, and the tree edit distance, as well as support for further edit distances via algebraic dynamic programming (Giegerich, Meyer, and Steffen, 2004). The library is available on pypi via pip3 install edist (currently only for linux).
Reference implementations
These software packages are meant as reference implementations accompanying our publications and support reproducibility and further research. The target audience are machine learning researchers and this software will generally be less polished than the packages above.
Machine Learning for Education
Machine Learning on Structured Data
- Recursive Tree Grammar Autoencoders are recursive neural networks that can auto-encode tree data if a grammar is known. The autoencoding accuracy and optimization performance for this model is generally higher compared to an autoencoder that encodes trees sequentially or does not use grammar knowledge. Reference Paper
- Graph Edit Networks are graph neural networks which can model changes in time by predicting graph edits at each node. Reference Paper
- Reservoir Stack Machines are an extension of Reservoir Memory Machines (see below) with a stack as memory. This raises the computational power to deterministic context-free grammars (above Chomsky-3 but below Chomsky-2). Reference Paper
- Reservoir Memory Machines are an extension of Echo State Networks with an explicit memory. This enables these networks to solve computational tasks such as losslessly copying data which are difficult or impossible to solve for standard recurrent neural networks (even deep ones). This memory extension also raises the computational power of ESNs from below Chomsky-3 to above Chomsky-3. Reference Paper
- Tree Echo State Autoencoders are a model to auto-encode tree data in domains where a tree grammar is known. Since the model follows the echo state framework (especially tree echo state networks of Gallicchio and Micheli, 2013), it is very simple to train. Given a list of training trees, an autoencoder can be set up within seconds. Reference paper
- Unordered Tree Edit Distance provides an A* algorithm to compute the NP-hard unoredered tree edit distance with custom costs. Reference Paper
- Adversarial Edit Attacks provides an approach to attack classifiers for tree data using tree edits. Reference paper
- Linear Supervised Transfer Learning provides a simple expectation maximization scheme to learn a mapping from a target space to a source space based on a labelled Gaussian mixture model in the source space and very few target space data points. Reference paper