Clean, high-quality data can be hard to come by, so I’ve gathered some datasets I personally scraped and processed. If you find any of them useful, feel free to reach out!
A clean corpus of over 4000 documents scraped from the EU website
A labeled dataset of more than 4000 paragraphs mapped to different categories of potentially anticompetitive provisions