Rogue Scholar Posts

Published in Andrew Heiss's blog

At the end of June 2024, Posit released a beta version of its next-generation IDE for data science: Positron. This follows Posit’s general vision for language-agnostic data analysis software: RStudio PBC renamed itself to Posit PBC in 2022 to help move away from a pure R focus, and Quarto is pan-lingual successor to R Markdown.

Published in Stories by Research Graph on Medium

Author Amir Aryani (ORCID: 0000-0002-4259-9774) Introduction In this article we look at Research Graph as an information model , and an approach to connect and capture the connections between research outputs, researchers and research activities. We explore the metadata model, and we discuss how to capture this graph in a Neo4j Graph Database.

Published in Stories by Research Graph on Medium

How to use GROBID to extract text from PDF Author Aland Astudillo (ORCID: 0009-0008-8672-3168) GROBID is a powerful and useful tool based on machine learning that can extract text information from PDF files and other files to a structured format. One of the key challenges in knowledge mining from academic articles is reading the content of PDF files.

Published in Stories by Research Graph on Medium

Authors: Nakul Nambiar (ORCID: 0009-0009-9720-9233) Zhuochen Wu (ORCID: 0009-0000-5642-5348) Research Graph is a structured representation of research objects that captures information about entities and the relationships between Researcher, Organisation, Publication, Grant and Research Data.

A novel approach to improving the efficiency of text search in graph databases utilizing Neo4j, OpenAI, and Typesense. Authors Nakul Nambiar (ORCID: 0009–0009–9720–9233) Aishwarya Nambissan (ORCID: 0009–0003–3823–6609) The ability to use cutting-edge tools and frameworks is essential for staying ahead in the ever-changing field of technology.

A brief overview of different types of clustering techniques and their algorithms. Authors Aishwarya Nambissan (ORCID: 0009-0003-3823-6609) Amir Aryani (ORCID: 0000-0002-4259-9774) Background Clustering is a fascinating technique used in machine learning, where patterns or data points are grouped based on their similarities. It’s like finding hidden connections among different data points without predefined labels.

Published in Chris von Csefalvay
Author Chris von Csefalvay

Every year, the Commandant of the Marine Corps publishes a reading list of books that often only bear on warfighting tangentially at best. The idea behind this is that those entrusted with the lives of servicemembers should have an understanding of the world that goes beyond the profession of arms. In much the same way, I have been advising data scientists to go beyond professional literature.