Datasets

Clean, high-quality data can be hard to come by, so I’ve gathered some datasets I personally scraped and processed. If you find any of them useful, feel free to reach out!

EU Press Materials on Digital Policy

A clean corpus of over 4000 documents scraped from the EU website

Legal Texts for Competition Impact Assessment

A labeled dataset of more than 4000 paragraphs mapped to different categories of potentially anticompetitive provisions