COVIPEDIA

A recommendation system for navigating COVID-19 research articles with NLP and Unsupervised ML Topic Modeling

The goal of this project is build a recommendation system for scientists and researchers to navigate the current surge of papers about COVID-19, find what is relevant to their work, and uncover the hidden semantic relationships. Using the COVID-19 Open Research Dataset, I used the abstract of the subset of articles from January 2020 to May 2021 (about 260,000 articles) as text in this project. With the LDA model, I assigned each documents with dominant topic and their relevance to the topic and grouped articles by topics for recommendation system. So researchers can look up articles based on topic that is related to their work. Lastly, I deployed a Strealit app on Heroku with a smaller dataset that recommends top 20 related articles for the selected topic.

Tools

Python (Numpy, Pandas)
langdetect, regex
spaCy, scispaCy ("en_core_sci_lg" model for biomedical, scientific, and clinical vocabulary)
NLTK
Gensim - LDA
WordCloud
Scikit-learn
pyLDAvis
Streamlit, Heroku

Techniques/Algorithms

Text Preprocessing
Data Transformation
Topic Modeling

Application Usage

The model was built in an web application with a smaller dataset (due to the size limit on GitHub) for demo usage.

To Learn More, Check Out My:

Blog
Code
App
Presentation (Coming Soon)

Note: The app can take awhile to load... please be patient :)

COVIPEDIA

Tools

Techniques/Algorithms

Application Usage

To Learn More, Check Out My:

Let's Talk!

Get in Touch