Named Entity Recognition for Uncovering Clinical and Emotional Entities from Breast Cancer Patient Interviews
DOI:
https://doi.org/10.30983/knowbase.v5i1.10192Keywords:
Named Entity Recognition (NER), Breast Cancer, Clinical Entities, Emotional Entities, Rule-Based, Natural Language ProcessingAbstract
This study aims to develop a Named Entity Recognition (NER) system capable of identifying clinical and emotional entities within interview transcripts of breast cancer patients. The corpus was manually annotated using the BIO scheme across seven main entity categories: Social Support (Dukungan Sosial), Medical Actions (Tindakan Medis), Diagnosis, Negative Emotions (Emosi Negatif), Positive Emotions (Emosi Positif), Symptoms (Gejala), and Spiritual. The annotation process was followed by the implementation of a rule-based method supported by entity dictionaries and word normalization, and the model was evaluated using precision, recall, and F1-score metrics. The analysis results revealed that Dukungan Sosial was the most dominant entity with 347 occurrences, followed by Tindakan Medis and Diagnosis. The rule-based NER model achieved an F1-score of 0.50 for the Diagnosis entity, although its performance on emotional and social entities remained low due to data imbalance. These findings highlight the importance of integrating clinical and emotional aspects in natural language processing to gain a more comprehensive understanding of patient narratives. The proposed approach has potential applications in healthcare text mining for detecting emotional experiences and medical contexts, and it can be further enhanced through the integration of transformer-based models such as IndoBERT to improve entity recognition accuracy.
References
A. Riandini, U. Safari, N. Riani, F. Khoerunnisa, and D. A. Sulistiani, “Peningkatan Pengetahuan Tentang Pencegahan Kanker Payudara Melalui ‘Sadari’ Pada Remaja Di Smk Pelita Alam,” J. Med. Hutama, vol. 02, pp. 434–440, 2020.
D. Dai, H. Coetzer, S. Zion, and M. Malecki, “Anxiety, Depression, and Stress Reaction/Adjustment Disorders and Their Associations with Healthcare Resource Utilization and Costs Among Newly Diagnosed Patients With Breast Cancer,” J. Heal. Econ. Outcomes Res., vol. 10, no. 1, pp. 68–76, 2023, doi: 10.36469/jheor.2023.70238.
B. F. Kalanda and A. J. Cheboi, “Artificial Intelligence in the Analysis of Unstructured Qualitative Data: A Literature Review,” Adv. Soc. Sci. Res. J., vol. 12, no. 08, pp. 199–205, 2025, doi: 10.14738/assrj.1208.19286.
Y. R. Putri, Y. Afiyanti, S. Dewi, and A. R. Ma’rifah, “Breast Cancer Patients’ Experience of Current Health Services as A Holistic Care: A Qualitative Study,” Malaysian J. Med. Heal. Sci., vol. 19, no. 6, pp. 127–135, 2023, doi: 10.47836/mjmhs.19.6.17.
T. Solehati, P. Napisah, A. Rahmawati, I. Nurhidayah, and C. E. Kosasih, “Penatalaksanaan Keperawatan pada Pasien Kanker Payudara; Sistematik Review,” J. Ilm. …, vol. 10, no. 1, pp. 71–82, 2020, [Online]. Available: http://journal.stikeskendal.ac.id/index.php/PSKM/article/view/672
F. M. Surur et al., “Unlocking the power of machine learning in big data: a scoping survey,” Data Sci. Manag., 2025, doi: 10.1016/j.dsm.2025.02.004.
G. Martinelli, F. M. Molfese, S. Tedeschi, A. Fernández-Castro, and R. Navigli, “CNER: Concept and Named Entity Recognition,” Proc. 2024 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. NAACL 2024, vol. 1, pp. 8329–8344, 2024, doi: 10.18653/v1/2024.naacl-long.461.
M. Theofany Aulia Anwar, S. Hadi Wijoyo, and W. Hayuhardhika Nugraha Putra, “Implementasi Metode TextRank dan Named Entity Recognition Untuk Ekstraksi Kata Kunci Pada Media Online Berita,” J. Sist. Informasi, Teknol. Informasi, dan Edukasi Sist. Inf., vol. 5, no. 1, pp. 34–41, 2024, doi: 10.25126/justsi.v5i1.401.
H. Wang, W. Yang, W. Feng, L. Zeng, and Z. Gu, “Threat intelligence named entity recognition techniques based on few-shot learning,” Array, vol. 23, no. April, p. 100364, 2024, doi: 10.1016/j.array.2024.100364.
J. Lee et al., “BioBERT: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020, doi: 10.1093/bioinformatics/btz682.
P. Lewis, B. Oguz, R. Rinott, S. Riedel, and H. Schwenk, “MLQA: Evaluating cross-lingual extractive question answering,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 7315–7330, 2020, doi: 10.18653/v1/2020.acl-main.653.
E. Bolton et al., “BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text,” vol. 2015, pp. 1–23, 2024, [Online]. Available: http://arxiv.org/abs/2403.18421
H. Yuan, Z. Yuan, R. Gan, J. Zhang, Y. Xie, and S. Yu, “BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 97–109, 2022, doi: 10.18653/v1/2022.bionlp-1.9.
Y. Hu et al., “Improving large language models for clinical named entity recognition via prompt engineering,” J. Am. Med. Informatics Assoc., vol. 31, no. 9, pp. 1812–1820, 2024, doi: 10.1093/jamia/ocad259.
Q. Chen et al., “Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations,” no. 2, 2023, [Online]. Available: http://arxiv.org/abs/2305.16326
M. Sung, M. Jeong, Y. Choi, D. Kim, J. Lee, and J. Kang, “BERN2: an advanced neural biomedical named entity recognition and normalization tool,” Bioinformatics, vol. 38, no. 20, pp. 4837–4839, 2022, doi: 10.1093/bioinformatics/btac598.
Y. Yin et al., “Augmenting biomedical named entity recognition with general-domain resources,” J. Biomed. Inform., vol. 159, p. 104731, 2024, doi: 10.1016/j.jbi.2024.104731.
H. Zhao and W. Xiong, “A multi-scale embedding network for unified named entity recognition in Chinese Electronic Medical Records,” Alexandria Eng. J., vol. 107, no. September, pp. 665–674, 2024, doi: 10.1016/j.aej.2024.09.008.
R. Rhouma et al., “Leveraging mobile NER for real-time capture of symptoms, diagnoses, and treatments from clinical dialogues,” Informatics Med. Unlocked, vol. 48, no. January, p. 101519, 2024, doi: 10.1016/j.imu.2024.101519.
L. Ramadani, R. A. Nugraha, and Falahah, “Dialectic and Life-cycle of Institutional Logics in IT Governance: Insights from Healthcare Context,” Procedia Comput. Sci., vol. 234, pp. 1267–1275, 2024, doi: 10.1016/j.procs.2024.03.124.
J. Mantik, S. Indra, and G. Situmeang, “2022) 423-430 Accredited,” J. Mantik, vol. 6, no. 1, pp. 423–430, 2021.
R. Permata, Rendika, and L. C. Julianty, “Towards an Automated Essay Evaluation System NLP Based Text Embeddings and Similarity Metrics,” Digit. Zo. J. Teknol. Inf. dan Komun., vol. 16, no. 1, pp. 37–46, 2025, doi: 10.31849/qvjtcn48.
M. T. Manurung, I Gusti Ngurah Lanang Wijayakusuma, and I Putu Winada Gautama, “Named Entity Recognition for Medical Records of Heart Failure Using a Pre-trained BERT Model,” J. Appl. Informatics Comput., vol. 9, no. 2, pp. 341–348, 2025, doi: 10.30871/jaic.v9i2.9170.
Warto et al., Systematic Literature Review on Named Entity Recognition: Approach, Method, and Application, vol. 12, no. 4. 2024. doi: 10.19139/soic-2310-5070-1631.
A. Ahmed, A. Abbasi, and C. Eickhoff, “Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record Deidentification,” AMIA ... Annu. Symp. proceedings. AMIA Symp., vol. 2021, pp. 102–111, 2021.
E. Subowo, I. Bukhori, and Warto, “Corpus Development and NER Model for Identification of Legal Entities (Articles, Laws, and Sanctions) in Corruption Court Decisions in Indonesia,” Trans. Informatics Data Sci., vol. 2, no. 1, pp. 27–39, 2025, doi: 10.24090/tids.v2i1.13592.
M. Sun, S. Xiong, Y. Cai, and B. Zuo, “Positional Attention for Efficient BERT-Based Named Entity Recognition,” arXiv:2505.01868, 2025, [Online]. Available: https://arxiv.org/abs/2505.01868
I. Majid, V. Mishra, R. Ravindranath, and S. Y. Wang, “Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes,” AMIA ... Annu. Symp. proceedings. AMIA Symp., vol. 2024, pp. 778–787, 2024.
A. Rehman, M. Mujahid, A. Elyassih, B. AlGhofaily, and S. A. O. Bahaj, “Comprehensive Review and Analysis on Facial Emotion Recognition: Performance Insights into Deep and Traditional Learning with Current Updates and Challenges,” Comput. Mater. Contin., vol. 82, no. 1, pp. 41–72, 2025, doi: 10.32604/cmc.2024.058036.
F. Gössi et al., “Jo u rn a l P,” Patient Educ. Couns., p. 109386, 2025, doi: 10.1016/j.pec.2025.109386.
O. Wardhani, R. M. A. A. Mayvandra Aurora Akbar, and Y. Athallahaufa Natawijaya, “A Review on Trends and Effectiveness of Rainfall Prediction Models for Smart Irrigation: Toward Future Development ”, Intellect, vol. 4, no. 1, pp. 30–41, Jun. 2025.
I. Rahmawati and N. Aini, “Analisis Sentimen Komentar YouTube pada Program Clash of Champions Ruangguru Menggunakan Deep Learning Berbasis LSTM”, Intellect, vol. 4, no. 1, pp. 96–106, Jun. 2025.
T. Hardiana Putri, R. . Okra, H. Antoni Musril, and S. . Derta, “Perancangan Media Pembelajaran Berbasis Game Edukasi Menggunakan Scratch Pada Mata Pelajaran Informatika”, Intellect, vol. 4, no. 1, pp. 117–134, Jun. 2025.
Downloads
Published
How to Cite
Issue
Section
Citation Check
License
Copyright (c) 2025 Norma Alias, Agus Sundari

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
