Luxembourg Institute of Science and Technology Background
Textual content is growing exponentially in every domain. This results in increasing requirements in time and resources to fully read the texts and extract useful information to make decisions.
Papyrus, an AI-supported visual analytics software that helps explore unstructured text data, was designed to:
produce an overview of textual content and related topics;
visually represent closely related aspects to generate or refine hypotheses;
drill down into topics to focus on specific aspects and gather evidence for hypothesis validation.
Technology Overview
Since textual data is unstructured by essence, it requires natural language processing (NLP) and text mining algorithms to extract meaningful patterns and help the analyst retrieve useful information to answer his questions and make decisions. Papyrus software implements a data processing pipeline () to model the corpus and extract latent topics. Derived data includes term importance, sets of co-occurring terms and related documents, as well as topics and topic variants. The NLP components allows also exploitation of knowledge databases or ontologies to annotate the text with concepts specific to a domain (e.g. MeSH in medicine). A set of interactive visualizations ( and ) are provided to analyze topics, drill down into topic variants and precisely locate the related subset of documents, without prior knowledge of the corpus content.
Benefits
Most software available on the market use keyword search or taxonomies to give access to large document collections, which are often presented as lists of results. Papyrus allows the analyst to explore large textual corpora interactively without any a priori knowledge of the content (no prerequisite keywords). The content is structured automatically into topics that are visualized in a compact interactive overview (unlike common list-based approaches). When a topic of interest is selected, its structure is further analyzed and visualized also in a compact and interactive manner supporting full drill-down. The combination of overview and detailed visualizations, as well as the available interactions, are key to allowing the analyst to identify very precisely a handful of documents of interest amongst thousands of documents.
Applications
Papyrus software applies for any collection of unstructured text data which is not restricted to a specific sector. Initially, Papyrus has been built for investigative journalism to analyze newspapers.
It also has been used by the pharmaceutical sector for scientific scoping review and, in the healthcare sector, to explore scientific articles related to COVID19.
For the financial sector, Papyrus can help analysts explain predictions of financial indexes (as proposed in Fintex software) or understand ESG (Environmental, Social and Governance) data disclosures in large documents such as annual reports, fund prospectus or key investor information documents, to support the transition to sustainable finance and prevent greenwashing.
Opportunity
LIST are looking for industrial partners in pharmaceutical, healthcare and financial sectors who are facing challenges to analyze unstructured text data. They are also looking for scientific or industrial collaborations to improve our technology and develop vertical software assets fitting the needs of industry, service or public sectors.