Reinforcement Learning with LinUCB: Comparing Reward Designs and Optimizing Alpha for Warfarin Dosage |
||
![]() |
![]() |
|
© 2025 by IJCTT Journal | ||
Volume-73 Issue-3 |
||
Year of Publication : 2025 | ||
Authors : Rishi Nandan Simhadri, Shantanu Awasthi | ||
DOI : 10.14445/22312803/IJCTT-V73I3P103 |
How to Cite?
Rishi Nandan Simhadri, Shantanu Awasthi, "Reinforcement Learning with LinUCB: Comparing Reward Designs and Optimizing Alpha for Warfarin Dosage," International Journal of Computer Trends and Technology, vol. 73, no. 3, pp. 25-31, 2025. Crossref, https://doi.org/10.14445/22312803/IJCTT-V73I3P103
Abstract
Determining the appropriate dose of warfarin is a significant challenge due to the numerous factors that contribute to the proper dose of the anticoagulant, and the consequences of taking an incorrect dose can contribute to adverse side effects and have serious health consequences for the patient. Commonly used approaches to determine the initial dose of warfarin are the pharmacogenetic algorithm, the clinical algorithm, and a fixed-dose approach. This research presents the application of reinforcement learning using the LinUCB algorithm to identify the optimal warfarin dose through three major experiments. First, the authors employed lasso regression for feature selection to identify the most relevant predictors of warfarin dosage in the warfarin dataset, ensuring a more interpretable model. Second, they evaluated various reward designs, including sparse, accuracy-focused dense, time decay, and distribution-based rewards, on several metrics such as accuracy, precision, recall, and f1 score. They discovered that accuracy-based dense reward was superior in predicting optimal doses in most metrics. Third, they improved the LinUCB algorithm’s accuracy and f1 score by utilizing Hyperopt to identify the optimal value of hyperparameter alpha. Using data collected by the Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB), this research provides Reinforcement learning as a potential approach for determining warfarin doses. The final results of this study demonstrate the prospects of Reinforcement learning to improve current personalized medicine practices in Warfarin dosage. Representing an advancement in the application of Reinforcement learning within healthcare, this work provides other options for future research aimed at optimizing medication dosages to improve patient outcomes.
Keywords
Reinforcement Learning, Warfarin, LinUCB, Optimizing Alpha, HyperOpt.
Reference
[1] James Bergstra, Dan Yamins, and David D. Cox, “Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms,” Proceedings of the 12th Python in Science Conference, pp. 1-8, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[2] The International Warfarin Pharmacogenetics Consortium, “Estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data,” New England Journal of Medicine, vol. 360, no. 8, pp. 753-764, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Jonas Eschmann, Reward Function Design in Reinforcement Learning, Reinforcement Learning Algorithms: Analysis and Applications, vol. 883, pp. 25-33, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Contextual Bandits Analysis of Linucb Disjoint Algorithm with Dataset, Kenneth Foo Fangwei, 2020. [Online]. Available: https://kfoofw.github.io/contextual-bandits-linear-ucb-disjoint/
[5] Lihong Li et al., “A Contextual-Bandit Approach to Personalized News Article Recommendation,” Proceedings of the 19th International Conference on World Wide Web, Raleigh North Carolina, USA, pp. 661-670, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Arpita Vats, “Estimation of Warfarin Dosage with Reinforcement Learning,” arXiv, pp. 1-7, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Munir Pirmohamed, “Warfarin: Almost 60 Years Old and Still Causing Problems,” British Journal of Clinical Pharmacology, vol. 62, no. 5, pp. 509-511, 2006.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Yoan Russac, Introduction to Linear Bandits, pp. 1-38, 2019.
[Google Scholar] [Publisher Link]