Development of a Text Mining System for Summarising Sickle Cell Disease Research in Nigeria
Keywords:
Text Mining, Natural Language Processing (NLP), Sickle Cell Disease (SCD), Biomedical Literature, Web Crawler, Machine LearningAbstract
Information about diseased and healthy individuals is readily available online. Mining such biomedical content provides valuable insights into patients’ conditions and research trends. However, this content is scattered, and manually gathering relevant data using traditional search engines is both laborious and incomplete. This paper reviews an ongoing study that aims to develop a text mining system for summarising and predicting research findings on Sickle Cell Disease (SCD) in Nigeria. The system includes a focused web crawler equipped with natural language processing (NLP) tools to extract and summarise abstracts from relevant biomedical databases like PubMed and BMJ Journals. Designed with Python, the crawler effectively mimics search functions to systematically gather relevant data. Results show that the crawler successfully scraped structured data, including article titles and abstracts, using targeted keywords such as "sickcell Nigeria" and "sick cell Nigeria." The next step involves integrating NLP-based summarisation techniques to forecast research trends. This study advances biomedical text mining and provides a scalable solution for automating knowledge extraction in SCD research across Nigeria. Ongoing improvements aim to enhance the crawler’s robustness, enabling bulk page crawling and expansion to additional databases.