Enhancing Health Canada’s Guideline for Machine Learning-Enabled Medical Devices: Addressing Algorithmic Bias and Transparency

Keywords: Machine Learning-Enabled Medical Devices (MLMDs), Algorithmic Bias, Health Canada, Transparency, Patient Care

Saeed Ikra

4/4/20257 min read

Introduction

Machine learning (ML), a subset of artificial intelligence (AI), enables systems to learn from data and make accurate predictions on new or unseen datasets. Unlike traditional computer programs, ML continuously improves its performance through experience. The healthcare industry, like many others are increasingly leveraging ML to extract critical clinical insights from the vast amounts of data generated by patients daily. These systems are termed Machine Learning-Enabled Medical Devices (MLMDs). MLMD underscore the transformative role of computational technologies in clinical diagnosis, therapeutic decision making and patient.

For instance, ML algorithms in medicine, such as linear regression models, can determine which patients require surgery, medication or treatment and even identify those assigned “do not resuscitate” orders. Unlike traditional medical devices, MLMDs adapt to new data inputs, improving accuracy and effectiveness over time. Currently, MLMDs are most prevalent in radiology due to the frequent use of imaging, facile implementation and reliance of the service on accurate interpretation. However, their adoption is expected to grow across various other healthcare sectors.

While MLMDs has enhanced potential compared to tradition computer, they also raise unique regulatory challenges, including risks of algorithmic bias, performance degradation, and unintended consequences stemming from continual updates. As some commentators rightly noted, the accuracy and reliability of these ML models depends heavily on the quality and robustness of the data they are trained on, making strong and representative datasets essential for effective performance.

In response to these concerns, Health Canada, in October 2021, released its proposed Good Machine Learning Practice (GMLP) Guideline in collaboration with the United States Food and Drug Administration (FDA) and the United Kingdom’s Medicines and Healthcare products Regulatory Agency (MHRA). The guideline outlines ten principles addressed to ensure MLMDs are used safely, effectively and with high quality throughout the healthcare lifecycle. The guideline serves as a foundational framework to direct future regulatory policies, aiming to build trust among stakeholders that will engage with these tools while driving innovation and sustainable growth in the MLMD sector.

While GMLP principles emphasize transparency, accountability, and lifecycle management of MLMDs, their voluntary nature and lack of specificity undermines their practical utility. GMLP’s shortfalls are particularly evident in two critical areas: (1) mitigating algorithmic bias in data to ensure equitable healthcare outcomes, and (2) enhancing transparency and explainability to foster trust and usability in MLMDs. This blogpost proposes targeted additions in these areas of the guideline to introduce more robust, practical and enforceable measures that foster meaningful engagement with its principles.

Mitigating Algorithmic Bias

Current Framework – Good Machine Learning Practice Guideline

Algorithmic bias in AI health systems occurs when algorithms amplify inequities, leading to poor healthcare outcomes. This often arises from the underrepresentation of certain demographic groups in training datasets and the development of algorithms that feed into these gaps, resulting in devices that perform poorly for these populations. For instance, a 2019 study on use of AI for dermatology conditions in Uganda found that diagnostic algorithms predominantly trained on data from Caucasian populations frequently exhibit reduced predictive accuracy for Black or Indigenous patients.

Health Canada’s GMLP highlights and acknowledges the importance of data quality and representativeness in training MLMDs. Principles 3 and 5 of GMLP emphasize capturing relevant characteristics such as age, gender, sex, race, and ethnicity during data collection and implementing a reference standard to ensure datasets are adequately characterized. Further, Principles 6 to 8 emphasize tailoring algorithms to available data to ensure clinically relevant outputs.

However, it lacks any clear civil liability statement where a manufacturer fails to comply with the principles limiting their effectiveness in addressing algorithmic bias. While it is understandable that it is a guideline with no legal implication, it could have been strengthened by requiring the inclusion of a standard liability clause in contracts between health centres and model developers. This can further incentivize developers to comply with the principles 3 and 5.

The proposed Bill C-27 Artificial Intelligence Data Act [AIDA], if passed may possible make up for the gap in the GMLP guideline. Like the EU AI Act, MLMDs would likely qualify as “high-impact AI systems” under s. 5(1) of the AIDA. This classification reflects the significant risk that errors or unintended consequences in MLMDs could harm health and safety. AIDA reference s. 3(1) of the Canadian Human Rights Act, which prohibits discriminatory practices in healthcare delivery. It explicitly prohibits biased outputs unless justified by natural data correlations. For example, while income may correlate with race or gender, it must not be used as a proxy for these sensitive characteristics. AIDA underscores the importance of identifying, evaluating and mitigating bias.

Without possible civil liability requirements, the GMLP guideline by itself may not incentivize compliance with applicable principles and risks perpetuating health inequities. Clear, actionable standards are necessary to ensure fair and equitable healthcare outcomes for all populations.

Proposed Enhancements

The GMLP guidelines closely align with AIDA in their shared commitment to reducing bias and ensuring accountability in high-impact AI systems. AIDA’s focus on fairness, equity, validity and robustness complements GMLP’s emphasis on clinical relevance and human-AI collaboration in MLMDs. Together, they provide a framework addressing bias in both data and algorithmic developed in healthcare applications. Both could be strengthened by expressly requiring the following;

  1. Participant-centred development strengthens this effort by involving underrepresented groups –such as Indigenous peoples, LGBTQ+ individuals, immigrants and people with disabilities–in algorithm design. These communities can help identify biases and propose solutions to improve representation in datasets. Initiatives like the Open Artificial Pancreas System (OpenAPS), which uses accessible technology to help users manage type 1 diabetes, demonstrates how inclusive approaches generate valuable datasets and advance patient-driven research. Similarly, platforms like Open Human empower participants to share data, design studies, and create algorithms. In addition to fostering trust, this approach can also inadvertently help resolve some of the challenges companies usually face when it comes to data ethics, privacy, and patient involvement. Having this “choose your own adventure” approach allows end users to feel that their needs are being addressed thus, become more willing to share information.

  2. Comprehensive field testing is equally critical, evaluating algorithm performance across diverse populations and clinical settings. This ensures MLMDs are effective and equitable for all users, reducing disparities in healthcare outcomes. This involves the implementation of open science practices, including the preregistration of AI studies, promoting transparency. Common standardized metrics can also play a role here to assess not just the quantitative accuracy of data but also the quality of care and patient outcomes. Frameworks like the Good Evaluations and Practices for Health Informatics offer a structured approach to addressing challenges at various stages of algorithmic development.

Finally, this transparent reporting can also unveil the limitations of MLMDs. Communicating these findings to clinicians and policymakers becomes vital to ensure algorithms are applied appropriately so that trust can be built and equitable AI use in healthcare can be advanced. While manufacturers may claim these regulations burden companies and stifle innovation, proactive measures enhance MDML reliability, build user trust, and elevate industry standards, improving AI-driven healthcare devices.

Ensuring Transparency and Explainability
Current Framework - Good Machine Learning Practice Guideline

Principles 1 and 9 of the GMLP highlight the importance of involving diverse experts throughout the lifecycle of MLMDs. However, these experts–patients, regular patients, healthcare providers, and hospital administrative decision-makers–require tailored information to address their unique needs. For instance, patients need simple explanations about how MLMDs affect their treatment; healthcare providers require detailed insights to assess risks and limitations of a system, and decision-makers in hospitals need information on implementation and cost-effectiveness of the device. Despite this, the GMLP does not mandate standards for data transparency or model explainability. Manufacturers are not required to disclose critical details about training datasets, such as their provenance, curation methods, or demographic composition. To be specific, Principle 9 of the GMLP only require that users of MLMDs be informed of the “characteristics of the data used to train and test the model, acceptable inputs, known limitations, user interface interpretation, and clinical workflow integration of the model”. It does not place obligation on model developers to disclose raw dataset where required, suggesting that the guideline seems to protect proprietary rights which is problematic in terms of transparency and robust explainability. AI companies often rely on proprietary rights to main unjustified secrecy and to disadvantage litigants. Such secrecy makes it difficult for end users to understand the predictive logic of the ML systems.

Similar to the GMLP, AIDA lacks clear standards to ensure that high-impact AI systems, like MLMDs, are interpretable by various stakeholders. This leaves clinicians without the tools necessary to understand and use these systems effectively. The lack of transparency undermines decision-making on all levels.

Health Canada’s expansion of GMLP in their report, Transparency for machine learning-enabled medical devices: Guiding principles, attempts to address these gaps. The report emphasizes a human-centred design approach to communication that emulates “Part 210: Human-centred design for interactive systems,” which has been published and is under review by the International Organization for Standardization. This approach asks manufacturers to consider who needs the information, why it is needed, what to share, where and when to share it, and how to do so effectively. Importantly, addressing “how” is critical for fostering trust and ensuring MLMDs meet stakeholder needs, the report provides little clarity on the specific actions necessary to achieve this.

Proposed Enhancements

In addition to the proposal for proprietary disclosure, technical intervention could help realize the crucial objectives of the GMLP principles through preference for Explainable AI Models (XAI), although they come with their trade offs. Unlike “black box” algorithms that make it unclear how outputs are generated from certain inputs, XAI provides interpretable models where users can understand how and why a decision was made, as well as the algorithm’s strengths and weaknesses. Normally, decision trees and other linear models tend to be easily explained, although they also suffer from high predictive accuracy. XAI ensures clinicians and other parties involved can understand and use algorithmic recommendations, enabling safer and more effective care. By integrating these techniques into MLMDs, manufacturers can align with GMLP and AIDA’s transparency standards, foster public trust, and support informed decision-making across all stakeholder groups.

Conclusion

While MLMDs have the potential to transform healthcare, the risks associated with their use create barriers, limiting entry into the highly regulated healthcare market. Health Canada’s GMLP guideline attempts to address these issues but lacks specificity and enforceable measures, limiting its effectiveness. I suggest that enhancing GMLP in two key areas – (1) mitigating algorithmic bias for equitable healthcare outcomes and (2) improving transparency and explainability of their models to build trust and usability – will be critical in ensuring that MLMDs meet ethical, clinical and legal standards.

To mitigate algorithmic bias and enhance transparency, GMLP must require active measures such as participant-centred algorithm design, comprehensive field testing, and open science practices like preregistration and standardized metrics. These steps promote fairness, accountability, and real-world applicability. Additionally, adopting Explainable AI (XAI) techniques, such as visual interpretability tools, will promote the building of trust by helping stakeholders understand MLMD decisions. Together, these measures enable manufacturers to address regulatory concerns, reduce bias, and build trust with patients and clinicians, ultimately improving safety and usability.

These actionable recommendations represent significant progress toward addressing the challenges of MLMDs and advancing AI-driven healthcare tools. While further exploration can be done on Principle 10 of the GMLP, which addresses post-market surveillance, we can drive innovation by focusing on fairness and transparency while safeguarding equity and trust in healthcare.