Published March 21, 2025 | https://doi.org/10.59348/65tfk-v7p63

A critical bibliography about LibGen, the pirate site that Meta used for AI training

  • 1. ROR icon Birkbeck, University of London
Feature image

Yesterday, academic social media went into overdrive as many intellectuals discovered LibGen ("Library Genesis") for the first time, thanks to an article and tool in The Atlantic.

It is quite amazing to me that people have only just come to this. There has been a substantial volume of academic literature on the subject for several years now and, as it affects all disciplines, I would expect some engagement with these meta-subjects for access to academic research.

I suppose what is interesting is that most of the backlash that I saw against the "revelations" in The Atlantic were against the use by Meta. This is of note because, as you will discover if you trawl through the below recommended secondary reading, most of the founders of these pirate platforms see themselves as adherents of socialism, communism, and other left philosophies. They see themselves as providing access to all those who are unable to pay for access to knowledge globally. But giving things away for free means that anyone can access them; even the "baddies".

In any case, here's a secondary reading list of some of the material that's out there on LibGen (the books platform) and Sci-Hub (articles and journals). As a technical point: LibGen also acts as the backing store for Sci-Hub, Finally, more recent platforms like Anna's Archive are not that well covered in the secondary literature yet. Some of these links may, by now, be dead. You may also want to keep a lookout for publications resulting from Lance Eaton's PhD work, which is focused on academic piracy (and for which I am on the committee).

That said, if you want the really short version, have a read of:

  • Bodó, Balázs, 'The Genesis of Library Genesis: The Birth of a Global Scholarly Shadow Library', in Shadow Libraries: Access to Educational Materials in Global Higher Education, ed. by Joe Karaganis (The MIT Press, 2018), pp. 25–52
  • Eve, Martin Paul, 'Lessons from the Library: Extreme Minimalist Scaling at Pirate Ebook Platforms', Digital Humanities Quarterly, 16.2 (2022) http://www.digitalhumanities.org/dhq/vol/16/2/000587/000587.html

References

Additional details

Description

Yesterday, academic social media went into overdrive as many intellectuals discovered LibGen ("Library Genesis") for the first time, thanks to an article and tool in The Atlantic . It is quite amazing to me that people have only just come to this.

Identifiers

UUID
63e0ef8e-c464-485f-bc12-3158158ca8ff
GUID
https://doi.org/10.59348/65tfk-v7p63
URL
https://eve.gd/2025/03/21/a-critical-bibliography-about-libgen-the-pirate-site-that-meta-used-for-ai-training

Dates

Issued
2025-03-21T01:00:00
Updated
2025-03-21T01:00:00

References

  1. Andročec, D. (2017). Analysis of Sci-Hub downloads of computer science papers. Acta Universitatis Sapientiae, Informatica, 9(1), 83–96. https://doi.org/10.1515/ausi-2017-0006
  2. Banks, Marcus, 'What Sci-Hub Is and Why It Matters', American Libraries, 47.6 (2016), pp. 46–49
  3. Barok, Dušan, and others, 'In Solidarity with Library Genesis and Sci-Hub', http://Custodians.Online, 30 November 2015 http://custodians.online
  4. Bodó, Balázs, 'Coda: A Short History of Book Piracy', in Media Piracy in Emerging Economies, ed. by Joe Karaganis (Social Science Research Council, 2011), pp. 399–413
  5. ——, 'Library Genesis in Numbers: Mapping the Underground Flow of Knowledge', in Shadow Libraries: Access to Educational Materials in Global Higher Education, ed. by Joe Karaganis (The MIT Press, 2018), pp. 53–78
  6. ——, 'The Genesis of Library Genesis: The Birth of a Global Scholarly Shadow Library', in Shadow Libraries: Access to Educational Materials in Global Higher Education, ed. by Joe Karaganis (The MIT Press, 2018), pp. 25–52
  7. Cheney, Matthew, 'Supporting Openness Should Not Mean Supporting Piracy', Finite Eyes, 29 March 2020 / https://finiteeyes.net/open/supporting-openness-should-not-mean-supporting-piracy
  8. Dulong de Rosnay, M. (2021). Open Access Models, Pirate Libraries and Advocacy Repertoires: Policy Options for Academics to Construct and Govern Knowledge Commons. Westminster Papers in Communication and Culture, 16(1). https://doi.org/10.16997/wpcc.913
  9. Elbakyan, Alexandra, 'How The Chronicle Is Trying to Malign Sci-Hub', Engineuring, 9 July 2021 / https://engineuring.wordpress.com/2021/07/09/how-the-chronicle-is-trying-to-malign-sci-hub
  10. ——, 'Why Sci-Hub Is the True Solution for Open Access: Reply to Criticism', Engineuring, 24 February 2016 / https://engineuring.wordpress.com/2016/02/24/why-sci-hub-is-the-true-solution-for-open-access-reply-to-criticism
  11. ——, 'Sci-Hub Is a Goal, Changing the System Is a Method', Engineuring, 11 March 2016 / https://engineuring.wordpress.com/2016/03/11/sci-hub-is-a-goal-changing-the-system-is-a-method
  12. contributors'], . ['Elsevier C. (2019). Allegations linking Sci-Hub with Russian Intelligence | Elsevier. In www.elsevier.com. Elsevier. https://www.elsevier.com/connect/allegations-linking-sci-hub-with-russian-intelligence
  13. Eve, Martin Paul, 'Lessons from the Library: Extreme Minimalist Scaling at Pirate Ebook Platforms', Digital Humanities Quarterly, 16.2 (2022) http://www.digitalhumanities.org/dhq/vol/16/2/000587/000587.html
  14. ——, Warez: The Infrastructure and Aesthetics of Piracy (punctum books, 2021)
  15. Faust, J. S. (2016). Sci-Hub. Annals of Emergency Medicine, 68(1), A15–A17. https://doi.org/10.1016/j.annemergmed.2016.05.010
  16. Green, T. (2017). We've failed: Pirate black open access is trumping green and gold and we must change our approach. Learned Publishing, 30(4), 325–329. https://doi.org/10.1002/leap.1116
  17. Greshake, B. (2016). Correlating the Sci-Hub data with World Bank Indicators and Identifying Academic Use [Data set]. In The Winnower. Authorea, Inc. https://doi.org/10.15200/winn.146485.57797
  18. Himmelstein, D. S., Romero, A. R., Levernier, J. G., Munro, T. A., McLaughlin, S. R., Greshake Tzovaras, B., & Greene, C. S. (2018). Sci-Hub provides access to nearly all scholarly literature. eLife, 7. https://doi.org/10.7554/elife.32822
  19. Hoy, M. B. (2017). Sci-Hub: What Librarians Should Know and Do about Article Piracy. Medical Reference Services Quarterly, 36(1), 73–78. https://doi.org/10.1080/02763869.2017.1259918
  20. Machin-Mastromatteo, J. D., Uribe-Tirado, A., & Romero-Ortiz, M. E. (2016). Piracy of scientific papers in Latin America. Information Development, 32(5), 1806–1814. https://doi.org/10.1177/0266666916671080
  21. Martin, J. D. (2016). Piracy, public access, and preservation: An exploration of sustainable accessibility in a public torrent index. Proceedings of the Association for Information Science and Technology, 53(1), 1–6. https://doi.org/10.1002/pra2.2016.14505301123
  22. Masnick, Mike, 'Academic Publishers Get Their Wish: DOJ Investigating Sci-Hub Founder For Alleged Ties To Russian Intelligence', Techdirt, 3 January 2020 / https://www.techdirt.com/2020/01/03/academic-publishers-get-their-wish-doj-investigating-sci-hub-founder-alleged-ties-to-russian-intelligence
  23. Maxwell, Andy, 'Meet the Guy Behind the Libgen Torrent Seeding Movement', TorrentFreak, 5 December 2019 / https://torrentfreak.com/meet-the-guy-behind-the-libgen-torrent-seeding-movement-191205
  24. Reisner, Alex, 'Search LibGen, the Pirated-Books Database That Meta Used to Train AI', The Atlantic, 20 March 2025 / https://www.theatlantic.com/technology/archive/2025/03/search-libgen-data-set/682094
  25. Russell, C., & Sanchez, E. (2016). Sci-Hub unmasked: Piracy, information policy, and your library. College & Research Libraries News, 77(3), 122–125. https://doi.org/10.5860/crln.77.3.9457
  26. Sar, Ernesto Van Der, 'Sci-Hub, BookFi and LibGen Resurface After Being Shut Down', TorrentFreak, 21 November 2015 / https://torrentfreak.com/sci-hub-and-libgen-resurface-after-being-shut-down-151121
  27. Schiermeier, Q. (2015). Pirate research-paper sites play hide-and-seek with publishers. Nature. https://doi.org/10.1038/nature.2015.18876
  28. Till, B. M., Rudolfson, N., Saluja, S., Gnanaraj, J., Samad, L., Ljungman, D., & Shrime, M. (2019). Who is pirating medical literature? A bibliometric review of 28 million Sci-Hub downloads. The Lancet Global Health, 7(1), e30–e31. https://doi.org/10.1016/s2214-109x(18)30388-7