Welcome to string2string’s documentation!

The string2string library is an open-source tool that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes both traditional algorithmic solutions and recent advanced neural approaches to address various problems in pairwise string alignment, distance measurement, lexical and semantic search, and similarity analysis. Additionally, the library provides several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods.

The library features notable algorithms such as the Smith-Waterman algorithm for pairwise local alignment, the Hirschberg algorithm for global alignment, the Wagner-Fisher algorithm for edit distance, BARTScore and BERTScore for similarity analysis, the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic search. Moreover, it wraps existing highly efficient and widely-used implementations of certain frameworks and metrics, such as sacreBLEU and ROUGE, whenever it is appropriate and suitable.

In general, the string2string library seeks to provide extensive coverage and increased flexibility compared to existing libraries for strings. It can be used for many downstream applications, tasks, and problems in natural-language processing, bioinformatics, and computational social sciences. With its comprehensive suite of algorithms, visualization tools, and metrics, the string2string library is a valuable resource for researchers and practitioners in various fields.

Getting Started

Install the string2string library by running the following command in your terminal:

pip install string2string

Once the installation is complete, you can import the library and start using its functionalities.

Remark: We recommend using Python 3.7+ for the library.

Tutorials

Citation

@article{suzgun2023string2string,
   title={string2string: A Modern Python Library for String-to-String Algorithms},
   author={Suzgun, Mirac and Shieber, Stuart M and Jurafsky, Dan},
   journal={arXiv preprint arXiv:2304.14395},
   year={2023}
}

Thanks

Our project owes a debt of gratitude to the following individuals for their contributions, comments, and feedback: Federico Bianchi, Corinna Coupette, Sebastian Gehrmann, Tayfun Gür, Şule Kahraman, Deniz Keleş, Luke Melas-Kyriazi, Christopher Manning, Tolúlopé Ògúnrèmí, Alexander “Sasha” Rush, Kyle Swanson, and Garrett Tanzer.