Named Entity Recognition for Uncovering Clinical and Emotional Entities from Breast Cancer Patient Interviews

Authors

  • Norma Alias University Teknologi Malaysia, Skudai, Malaysia, Malaysia
  • Agus Sundari Universitas Islam Negeri Sjech M. Djamil Djambek Bukittinggi, Indonesia

DOI:

https://doi.org/10.30983/knowbase.v5i1.10192

Keywords:

Named Entity Recognition (NER), Breast Cancer, Clinical Entities, Emotional Entities, Rule-Based, Natural Language Processing

Abstract

This study aims to develop a Named Entity Recognition (NER) system capable of identifying clinical and emotional entities within interview transcripts of breast cancer patients. The corpus was manually annotated using the BIO scheme across seven main entity categories: Social Support (Dukungan Sosial), Medical Actions (Tindakan Medis), Diagnosis, Negative Emotions (Emosi Negatif), Positive Emotions (Emosi Positif), Symptoms (Gejala), and Spiritual. The annotation process was followed by the implementation of a rule-based method supported by entity dictionaries and word normalization, and the model was evaluated using precision, recall, and F1-score metrics. The analysis results revealed that Dukungan Sosial was the most dominant entity with 347 occurrences, followed by Tindakan Medis and Diagnosis. The rule-based NER model achieved an F1-score of 0.50 for the Diagnosis entity, although its performance on emotional and social entities remained low due to data imbalance. These findings highlight the importance of integrating clinical and emotional aspects in natural language processing to gain a more comprehensive understanding of patient narratives. The proposed approach has potential applications in healthcare text mining for detecting emotional experiences and medical contexts, and it can be further enhanced through the integration of transformer-based models such as IndoBERT to improve entity recognition accuracy.

References

A. Riandini, U. Safari, N. Riani, F. Khoerunnisa, and D. A. Sulistiani, “Peningkatan Pengetahuan Tentang Pencegahan Kanker Payudara Melalui ‘Sadari’ Pada Remaja Di Smk Pelita Alam,” J. Med. Hutama, vol. 02, pp. 434–440, 2020.

D. Dai, H. Coetzer, S. Zion, and M. Malecki, “Anxiety, Depression, and Stress Reaction/Adjustment Disorders and Their Associations with Healthcare Resource Utilization and Costs Among Newly Diagnosed Patients With Breast Cancer,” J. Heal. Econ. Outcomes Res., vol. 10, no. 1, pp. 68–76, 2023, doi: 10.36469/jheor.2023.70238.

B. F. Kalanda and A. J. Cheboi, “Artificial Intelligence in the Analysis of Unstructured Qualitative Data: A Literature Review,” Adv. Soc. Sci. Res. J., vol. 12, no. 08, pp. 199–205, 2025, doi: 10.14738/assrj.1208.19286.

Y. R. Putri, Y. Afiyanti, S. Dewi, and A. R. Ma’rifah, “Breast Cancer Patients’ Experience of Current Health Services as A Holistic Care: A Qualitative Study,” Malaysian J. Med. Heal. Sci., vol. 19, no. 6, pp. 127–135, 2023, doi: 10.47836/mjmhs.19.6.17.

T. Solehati, P. Napisah, A. Rahmawati, I. Nurhidayah, and C. E. Kosasih, “Penatalaksanaan Keperawatan pada Pasien Kanker Payudara; Sistematik Review,” J. Ilm. …, vol. 10, no. 1, pp. 71–82, 2020, [Online]. Available: http://journal.stikeskendal.ac.id/index.php/PSKM/article/view/672

F. M. Surur et al., “Unlocking the power of machine learning in big data: a scoping survey,” Data Sci. Manag., 2025, doi: 10.1016/j.dsm.2025.02.004.

G. Martinelli, F. M. Molfese, S. Tedeschi, A. Fernández-Castro, and R. Navigli, “CNER: Concept and Named Entity Recognition,” Proc. 2024 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. NAACL 2024, vol. 1, pp. 8329–8344, 2024, doi: 10.18653/v1/2024.naacl-long.461.

M. Theofany Aulia Anwar, S. Hadi Wijoyo, and W. Hayuhardhika Nugraha Putra, “Implementasi Metode TextRank dan Named Entity Recognition Untuk Ekstraksi Kata Kunci Pada Media Online Berita,” J. Sist. Informasi, Teknol. Informasi, dan Edukasi Sist. Inf., vol. 5, no. 1, pp. 34–41, 2024, doi: 10.25126/justsi.v5i1.401.

H. Wang, W. Yang, W. Feng, L. Zeng, and Z. Gu, “Threat intelligence named entity recognition techniques based on few-shot learning,” Array, vol. 23, no. April, p. 100364, 2024, doi: 10.1016/j.array.2024.100364.

J. Lee et al., “BioBERT: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020, doi: 10.1093/bioinformatics/btz682.

P. Lewis, B. Oguz, R. Rinott, S. Riedel, and H. Schwenk, “MLQA: Evaluating cross-lingual extractive question answering,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 7315–7330, 2020, doi: 10.18653/v1/2020.acl-main.653.

E. Bolton et al., “BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text,” vol. 2015, pp. 1–23, 2024, [Online]. Available: http://arxiv.org/abs/2403.18421

H. Yuan, Z. Yuan, R. Gan, J. Zhang, Y. Xie, and S. Yu, “BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 97–109, 2022, doi: 10.18653/v1/2022.bionlp-1.9.

Y. Hu et al., “Improving large language models for clinical named entity recognition via prompt engineering,” J. Am. Med. Informatics Assoc., vol. 31, no. 9, pp. 1812–1820, 2024, doi: 10.1093/jamia/ocad259.

Q. Chen et al., “Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations,” no. 2, 2023, [Online]. Available: http://arxiv.org/abs/2305.16326

M. Sung, M. Jeong, Y. Choi, D. Kim, J. Lee, and J. Kang, “BERN2: an advanced neural biomedical named entity recognition and normalization tool,” Bioinformatics, vol. 38, no. 20, pp. 4837–4839, 2022, doi: 10.1093/bioinformatics/btac598.

Y. Yin et al., “Augmenting biomedical named entity recognition with general-domain resources,” J. Biomed. Inform., vol. 159, p. 104731, 2024, doi: 10.1016/j.jbi.2024.104731.

H. Zhao and W. Xiong, “A multi-scale embedding network for unified named entity recognition in Chinese Electronic Medical Records,” Alexandria Eng. J., vol. 107, no. September, pp. 665–674, 2024, doi: 10.1016/j.aej.2024.09.008.

R. Rhouma et al., “Leveraging mobile NER for real-time capture of symptoms, diagnoses, and treatments from clinical dialogues,” Informatics Med. Unlocked, vol. 48, no. January, p. 101519, 2024, doi: 10.1016/j.imu.2024.101519.

L. Ramadani, R. A. Nugraha, and Falahah, “Dialectic and Life-cycle of Institutional Logics in IT Governance: Insights from Healthcare Context,” Procedia Comput. Sci., vol. 234, pp. 1267–1275, 2024, doi: 10.1016/j.procs.2024.03.124.

J. Mantik, S. Indra, and G. Situmeang, “2022) 423-430 Accredited,” J. Mantik, vol. 6, no. 1, pp. 423–430, 2021.

R. Permata, Rendika, and L. C. Julianty, “Towards an Automated Essay Evaluation System NLP Based Text Embeddings and Similarity Metrics,” Digit. Zo. J. Teknol. Inf. dan Komun., vol. 16, no. 1, pp. 37–46, 2025, doi: 10.31849/qvjtcn48.

M. T. Manurung, I Gusti Ngurah Lanang Wijayakusuma, and I Putu Winada Gautama, “Named Entity Recognition for Medical Records of Heart Failure Using a Pre-trained BERT Model,” J. Appl. Informatics Comput., vol. 9, no. 2, pp. 341–348, 2025, doi: 10.30871/jaic.v9i2.9170.

Warto et al., Systematic Literature Review on Named Entity Recognition: Approach, Method, and Application, vol. 12, no. 4. 2024. doi: 10.19139/soic-2310-5070-1631.

A. Ahmed, A. Abbasi, and C. Eickhoff, “Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record Deidentification,” AMIA ... Annu. Symp. proceedings. AMIA Symp., vol. 2021, pp. 102–111, 2021.

E. Subowo, I. Bukhori, and Warto, “Corpus Development and NER Model for Identification of Legal Entities (Articles, Laws, and Sanctions) in Corruption Court Decisions in Indonesia,” Trans. Informatics Data Sci., vol. 2, no. 1, pp. 27–39, 2025, doi: 10.24090/tids.v2i1.13592.

M. Sun, S. Xiong, Y. Cai, and B. Zuo, “Positional Attention for Efficient BERT-Based Named Entity Recognition,” arXiv:2505.01868, 2025, [Online]. Available: https://arxiv.org/abs/2505.01868

I. Majid, V. Mishra, R. Ravindranath, and S. Y. Wang, “Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes,” AMIA ... Annu. Symp. proceedings. AMIA Symp., vol. 2024, pp. 778–787, 2024.

A. Rehman, M. Mujahid, A. Elyassih, B. AlGhofaily, and S. A. O. Bahaj, “Comprehensive Review and Analysis on Facial Emotion Recognition: Performance Insights into Deep and Traditional Learning with Current Updates and Challenges,” Comput. Mater. Contin., vol. 82, no. 1, pp. 41–72, 2025, doi: 10.32604/cmc.2024.058036.

F. Gössi et al., “Jo u rn a l P,” Patient Educ. Couns., p. 109386, 2025, doi: 10.1016/j.pec.2025.109386.

O. Wardhani, R. M. A. A. Mayvandra Aurora Akbar, and Y. Athallahaufa Natawijaya, “A Review on Trends and Effectiveness of Rainfall Prediction Models for Smart Irrigation: Toward Future Development ”, Intellect, vol. 4, no. 1, pp. 30–41, Jun. 2025.

I. Rahmawati and N. Aini, “Analisis Sentimen Komentar YouTube pada Program Clash of Champions Ruangguru Menggunakan Deep Learning Berbasis LSTM”, Intellect, vol. 4, no. 1, pp. 96–106, Jun. 2025.

T. Hardiana Putri, R. . Okra, H. Antoni Musril, and S. . Derta, “Perancangan Media Pembelajaran Berbasis Game Edukasi Menggunakan Scratch Pada Mata Pelajaran Informatika”, Intellect, vol. 4, no. 1, pp. 117–134, Jun. 2025.

Downloads

Published

2024-11-11

How to Cite

Alias, N., & Sundari, A. (2024). Named Entity Recognition for Uncovering Clinical and Emotional Entities from Breast Cancer Patient Interviews. Knowbase : International Journal of Knowledge in Database, 5(1), 110–120. https://doi.org/10.30983/knowbase.v5i1.10192

Issue

Section

Articles

Citation Check