HallResearch.ai Library of AI Assessment and Implementation Resources
June 12, 2026 · View on GitHub
(c) HallResearch.ai CC-BY 4.0, some rights reserved.
This repository is one of the two main branches of the HallResearch.ai Library, a curated public collection of resources for understanding, governing, assessing, and implementing artificial intelligence systems.
Use this branch for materials related to responsible machine learning assessment, implementation practice, evaluation, auditing, testing, tooling, documentation, and education.
For resources focused on AI governance, public policy, legal materials, institutional guidance, incidents, accountability, and critique, visit the companion branch: AI Governance and Policy Resources.
For the full Library index, visit the HallResearch.ai Library.
Maintained by Patrick Hall and Daniel Atherton. Maintenance and curation are sponsored by HallResearch.ai.
This repository grew out of the original Awesome Machine Learning Interpretability project and is maintained as part of the broader HallResearch.ai Library structure.
Contents
- Assessment and Implementation Guidance
- Education Resources
- Technical Resources
- Archived
Assessment and Implementation Guidance
Community Frameworks and Guidance
This section is for responsible ML guidance put forward by organizations or individuals, not for official government guidance.
- 2024 State of the AI Regulatory Landscape
- 8 Principles of Responsible ML
- A Brief Overview of AI Governance for Responsible Machine Learning Systems
- A checklist for auditing AI systems | ICT Institute
- A Digital Pandemic: Uncovering the Role of 'Yahoo Boys' in the Surge of Social Media-Enabled Financial Sextortion Targeting Minors | Network Contagion Research Institute (NCRI), January 2024
- A Flexible Maturity Model for AI Governance Based on the NIST AI Risk Management Framework
- A Guide to AI in Schools: Perspectives for the Perplexed
- A NIST Foundation To Support The Agency’s AI Mandate
- A Primer for Developers and Policymakers
- A Taxonomy of Trustworthiness for Artificial Intelligence | January 2023
- ABOUT ML Reference Document
- Acceptable Use Policies for Foundation Models |
- Access Now, Regulatory Mapping on Artificial Intelligence in Latin America: Regional AI Public Policy Report
- Advanced AI Scaling Framework | Meta, April 2026
- Advancing Responsible AI Innovation: A Playbook | World Economic Forum, September 2025
- Adverse Event Reporting for AI: Developing the Information Infrastructure Government Needs to Learn and Act | July 2025
- AI Accidents: An Emerging Threat: What Could Happen and What to Do, CSET Policy Brief, July 2021
- AI Act – Provider Only: Certification Scheme v1.5 | ForHumanity, March 2025
- AI Act Governance on the Ground: Canada’s Algorithmic Impact Assessment Process and Algorithm has evolved
- AI Act Governance: Best Practices for Implementing the EU AI Act | Initiative for Applied Artificial Intelligence, June 2025
- AI Act Handbook | White & Case, June 2025
- AI Agent Governance: A Field Guide | (IAPS), April 2025
- AI alignment vs AI ethical treatment: Ten challenges | Adam Bradley and Bradford Saad, Global Priorities Institute, July 2024
- AI Assurance: A Repeatable Process for Assuring AI-enabled Systems | MITRE, June 2024
- AI Canon | Andreessen Horowitz (a16z)
- AI Decision-Making and the Courts: A guide for Judges, Tribunal Members, and Court Administrators | The Australasian Institute of Judicial Administration Inc., published June 2022 and revised and republished December 2023
- AI Ethics & Governance 2025: A Framework for Malaysia's Tech Industry | PIKOM, May 2025
- AI Ethics and Governance in Practice: AI Safety in Practice | The Alan Turing Institute
- AI Ethics and Governance in Practice | The Alan Turing Institute
- AI For a Planet Under Pressure | Stockholm Resilience Centre, Stockholm University, November 5, 2025
- AI Governance Alliance Briefing Paper Series | World Economic Forum, January 2024
- AI Governance and the EU's Strategic Role in 2025 | Florence School of Transnational Governance, Marta Cantero Gamito, August 2025
- AI Governance InternationaL Evaluation AGILE Index 2025 | July 2025
- AI Governance Needs Sociotechnical Expertise: Why the Humanities and Social Sciences Are Critical to Government Efforts
- AI Governance: A Framework for Responsible and Compliant Artificial Intelligence | Sołtysiński Kawecki & Szlęzak, September 2025
- AI in Africa | Global Center on AI Governance, AI in Africa: A Landscape Study, April 2025
- AI in the Public Service: From Principles to Practice | Oxford Commission on AI & Good Governance
- AI Inventories: Practical Challenges for Organizational Risk Management | Responsible AI Institute and Chevron
- AI Liability Along the Value Chain | Mozilla, 2025
- AI Model Registries: A Foundational Tool for AI Governance | September 2024
- AI Model Risk Management Framework | Cloud Security Alliance and AI Technology and Risk Working Group, July 23, 2024
- AI Policy | Taylor & Francis
- AI Red-Teaming Is Not a One-Stop Solution to AI Harms: Recommendations for Using Red-Teaming for AI Accountability | Data & Society
- AI Risk Atlas: Taxonomy and Tooling for Navigating AI Risks and Resources
- AI Safety Frameworks and Risk Governance | Long Term Resilience, February 2025
- AI Safety Governance, the Southeast Asian Way | Brookings Center for Technology Innovation, AI Safety Asia (AISA), August 2025
- AI Safety in Practice | The Alan Turing Institute
- AI Snake Oil
- AI Standards Hub | The Alan Turing Institute
- AI Sustainability Outlook: The Challenges, Potential, and Path Forward | Salesforce
- AI Verification
- AI Verify Foundation | AI Verify Foundation
- AI Won't Replace the General: Algorithms, Decision-making and Battlefield Command | The Alan Turing Institute, September 2025
- AI-enabled Biosecurity: Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism: What Policymakers Should Know | Center for Strategic and International Studies, August 2025
- AI-Generated Algorithmic Virality | AI Forensics, June 2025
- AI-Generated Disinformation in Europe and Africa: Use Cases, Solutions and Transnational Learning | Konrad Adenauer Stiftung, January 31, 2025
- AI-Relevant Regulatory Precedents: A Systematic Search Across All Federal Agencies
- An In-Depth Guide To Help You Start Auditing Your AI Models | Censius
- An Overview of Artificial Intelligence Ethics
- An Overview of Catastrophic AI Risks | Dan Hendrycks, Mantas Mazeika, and Thomas Woodside, October 9, 2023
- Architectural Risk Analysis of Large Language Models | Berryville Institute of Machine Learning, requires free account
- Artificial Intelligence Controls Matrix Bundle
- Artificial Intelligence Harm and Human Rights: A High Level Exploration of the Interaction of AI Harms | ICAAD and King & Wood Mallesons, September 29, 2025
- Artificial Intelligence Impact Assessment | ECP Platform voor de InformatieSamenleving, November 2018
- Artificial Intelligence in Africa: Challenges and Opportunities | Policy Center for the New South, Fahd Azaroual, May 2024
- Artificial Intelligence in the Securities Industry | Financial Industry Regulatory Authority
- Artificial Intelligence Tools Versus Practice in Conflict Prediction: The Case of Mali | The Hague Centre for Strategic Studies, April 29, 2020
- Assessing AI: Surveying the Spectrum of Approaches to Understanding and Auditing AI Systems | Center for Democracy and Technology (CDT), January 2025
- Assessing the Implementation of Federal AI Leadership and Compliance Mandates | Stanford University Human-Centered Artificial Intelligence (HAI)
- AuditBoard: 5 AI Auditing Frameworks to Encourage Accountability
- Auditing Artificial Intelligence | ISACA
- Auditing Guidelines for Artificial Intelligence | ISACA
- Auditing machine learning algorithms: A white paper for public auditors
- Azure AI Content Safety | Microsoft
- Best Practices for AI and Automation in Trust and Safety | Digital Trust & Safety Partnership, September 2024
- Brendan Bycroft's LLM Visualization
- Building an early warning system for LLM-aided biological threat creation | OpenAI
- C2PA: Coalition for Content Provenance and Authenticity | (C2PA)
- Capability Maturity Model Integration Resources | ISACA
- Casey Flores, AIGP Study Guide
- Cataloguing LLM Evaluations | AI Verify Foundation, Infocomm Media Development Authority (Singapore) and AI Verify Foundation, October 2023
- CEN-CENELEC JTC21 AI Standards: Complete Detailed Overview
- Center for AI and Digital Policy Reports
- Center for Countering Digital Hate, Fake Friend: How ChatGPT betrays vulnerable teens by encouraging dangerous behavior | Center for Countering Digital Hate, 2025
- Center for Countering Digital Hate, YouTube's Anorexia Algorithm: How YouTube Recommends Eating Disorders Videos to Young Girls | Center for Countering Digital Hate (CCDH)
- Center for Democracy and Technology, AI Policy & Governance
- Center for Democracy and Technology, Applying Sociotechnical Approaches to AI Governance in Practice
- Center for Democracy and Technology, In Deep Trouble: Surfacing Tech-Powered Sexual Harassment in K-12 Schools
- Center for Security and Emerging Technology, Adding Structure to AI Harm: An Introduction to CSET's AI Harm Framework
- Center for Security and Emerging Technology, AI Incident Collection: An Observational Study of the Great AI Experiment
- Center for Security and Emerging Technology, Chinese Critiques of Large Language Models: Finding the Path to General Intelligence | January 2025
- Center for Security and Emerging Technology, CSET Publications
- Center for Security and Emerging Technology, Putting Explainable AI to the Test: A Critical Look at AI Evaluation Approaches | February 2025
- Center for Security and Emerging Technology, Repurposing the Wheel: Lessons for AI Standards
- Center for Security and Emerging Technology, Translating AI Risk Management Into Practice
- Center for Security and Emerging Technology, Understanding AI Harms: An Overview
- Character Flaws: School Shooters, Anorexia Coaches, and Sexualized Minors: A Look at Harmful Character Chatbots and the Communities That Build Them | Graphika Atlas Report, March 2025
- Children & AI Design Code: A Protocol for the development and use of AI systems that impact children
- Chinese Critiques of Large Language Models: Finding the Path to General Intelligence | CSET, January 2025
- Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing
- Cloud Security Alliance, AI Model Risk Management Framework | AI Technology and Risk Working Group, July 23, 2024
- Cloud Security Alliance, Artificial Intelligence Controls Matrix Bundle
- Coalition for Content Provenance and Authenticity | (C2PA)
- Countries With Draft AI Legislation or Frameworks | Dominique Shelton Leipzig
- Data governance in AI: Mapping governance | Open Data Institute
- Data governance in the cloud - part 1 - People and processes | Google
- Data Governance in the Cloud - part 2 - Tools | Google
- Data Privacy FAQ | AWS
- Data Provenance Explorer
- Data Statements | University of Washington Tech Policy Lab
- Data Stewardship in Practice | The Alan Turing Institute
- Data Use Policy | Our Data Our Selves, Tactical Tech
- Dealing with Bias and Fairness in AI/ML/Data Science Systems
- Debugging Machine Learning Models | ICLR workshop proceedings
- Decision Points in AI Governance: Three Case Studies Explore Efforts to Operationalize AI Principles | University of California, Berkeley, Center for Long-Term Cybersecurity
- Deepfake Pornography Goes to Washington: Measuring the Prevalence of AI-Generated Non-Consensual Intimate Imagery Targeting Congress | American Sunlight Project, December 11, 2024
- Demos, AI – Trustworthy By Design: How to build trust in AI systems, the institutions that create them and the communities that use them
- Digital Policy Alert, The Anatomy of AI Rules: A systematic comparison of AI rules across the globe
- Disrupting malicious uses of AI: June 2025 | OpenAI
- Distill
- Doing AI Differently: Rethinking the foundations of AI via the humanities | Alan Turing Institute, July 31, 2025
- Emotional Manipulation by AI Companions | Harvard Business School, 2025
- Evaluating social and ethical risks from generative AI | Google
- Evidence of CCP Censorship, Propaganda in U.S. LLM Responses | Sentinel Brief
- Explainable AI in Finance: Addressing the Needs of Diverse Stakeholders | Cheryll-Ann Wilson, CFA Institute, Research & Policy Center, August 2025
- Fairly's Global AI Regulations Map |
- Fairness and Bias in Algorithmic Hiring: A Multidisciplinary Survey
- FATML Principles and Best Practices
- Federation of American Scientists, A NIST Foundation To Support The Agency’s AI Mandate
- First of its kind Generative AI Evaluation Sandbox for Trusted AI by AI Verify Foundation and IMDA | Infocomm Media Development Authority (Singapore)
- Forging Global Cooperation on AI Risks: Cyber Policy as a Governance Blueprint | Paris Peace Forum, February 2025
- ForHumanity Body of Knowledge
- Framework for Identifying Highly Consequential AI Use Cases | Special Competitive Studies Project and Johns Hopkins University Applied Physics Laboratory
- Frameworks and Toolkits for Assuring Responsible AI | Responsible AI UK and Confiance.ai, August 2025
- From Principles to Practice: An interdisciplinary framework to operationalise AI ethics
- Frontier Safety Framework | Google DeepMind, September 2025
- Gage Repeatability and Reproducibility
- Gen-AI: Artificial Intelligence and the Future of Work | International Monetary Fund
- Generative AI Prohibited Use Policy | Google
- Generative AI Vendor Risk Assessment Guide | Future Society, FS-ISAC, February 2024,
- Generative AI: A New Threat for Online Child Sexual Exploitation and Abuse | United Nations Interregional Crime and Justice Research Institute (UNICRI) Centre for AI and Robotics, Bracket Foundation, and Value for Good, September 2024
- Global AI Governance Law and Policy: Canada, EU, Singapore, UK and US | IAPP
- Guidance for Safe Foundation Model Deployment: A Framework for Collective Action
- Guide for Australian Business: Understanding 42001 | Standards Australia and National Artificial Intelligence Centre
- Guide for Preparing and Responding to Deepfake Events: From the OWASP Top 10 for LLM Applications Team | OWASP, Version 1, September 2024
- Guidelines for AI in parliaments | Inter-Parliamentary Union, December 2024
- Guidelines on the Application of the Definition of an AI System in the AI Act: ELI Proposal for a Three-Factor Approach | European Law Institute, Response of the ELI to the EU Commission's Consultation, November 1, 2024
- HackerOne Blog
- Health Care Artificial Intelligence Code of Conduct | National Academy of Medicine, 2025
- How Can We Tackle AI-Fueled Misinformation and Disinformation in Public Health? | Brown University
- How do I cite generative AI in MLA style? | MLA
- How Microsoft names threat actors | Microsoft
- How People Around the World View AI | Pew Research Center, October 15, 2025
- How to Perform an AI Audit for UK Organisations | Haptic Networks
- Human-Calibrated Automated Testing and Validation of Generative Language Models: An Overview | Agus Sudjianto, Aijun Zhang, Srinivas Neppalli, Tarun Joshi, and Michael Malohlava, December 7, 2024
- Identifying and Overcoming Common Data Mining Mistakes
- Implementing the AI Act in Belgium: Scope of Application and Authorities | Data & Society Knowledge Centre, December 2024
- Independent Audit of AI Systems | ForHumanity
- Information System Contingency Planning Guidance | Larry G. Wlosinski, April 30, 2021
- Institute for AI Policy and Strategy | (IAPS)
- Institute of Internal Auditors
- International AI Safety Report | First Key Update, Capabilities and Risk Implications, October 2025
- International Bar Association, The Future Is Now: Artificial Intelligence and the Legal Profession | International Bar Association and the Center for AI and Digital Policy
- International Monetary Fund, Gen-AI: Artificial Intelligence and the Future of Work
- International Organization for Standardization, ISO/IEC 42001:2023, Information technology — Artificial intelligence — Management system
- Intolerable Risk Threshold Recommendations for Artificial Intelligence: Key Principles, Considerations, and Case Studies to Inform Frontier AI Safety Frameworks for Industry and Government | University of California, Berkeley, Center for Long-Term Cybersecurity, February 2025
- ISO policy brief: Harnessing international standards for responsible AI development and governance | ISO, 2025
- ITI's AI Security Policy Principles | Information Technology Industry (ITI) Council, October 2024
- Just Security's Artificial Intelligence Archive
- Key Considerations When Using Artificial Intelligence in the Public Sector | EPI Center and AAAS, February 2025
- Know Your Data | Google
- Language Model Risk Cards: Starter Set |
- Large language models explained with a minimum of math and jargon
- LC Labs AI Planning Framework |
, Library of Congress
- Learning from other domains to advance AI evaluation and testing | Microsoft
- Learning from other domains to advance AI evaluation and testing | Microsoft, August 2025
- Llama 2 Responsible Use Guide
- LLM Visualization
- Machine Learning Quick Reference: Algorithms
- Machine Learning Quick Reference: Best Practices
- Manifest MLBOM Wiki
- Map of Practices: AutoPractices | Governing AI Technologies in Military Systems from the Bottom Up: Practices to Sustain and Strengthen Human Agency, September 2025, The AutoPractices Project. Odense: Center for War Studies
- Mapping AI Risk Mitigations: Evidence Scan and Draft Mitigation Taxonomy | MIT AI Risk Index, FutureTech, and MIT, July 2025
- Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis | (IAPS)
- MIT AI Risk-Management Standards Profile for General-Purpose AI and Foundation Models | University of California, Berkeley, Center for Long-Term Cybersecurity, January 2025
- Mitigating the risk of generative AI models creating Child Sexual Abuse Materials: An analysis by child safety nonprofit Thorn | Partnership on AI and Thorn
- Model Transparency Ratings | Trustible
- model-cards-and-datasheets |
- Multi-Agent Risks from Advanced AI | Cooperative AI Foundation, February 2025
- Navigating AI Compliance Part 1 Tracing Failure Patterns in History | Institute for Security and Technology (IST), December 2024
- Navigating AI Compliance Part 2 Risk Mitigation Strategies for Safeguarding Against Future Failures | Institute for Security and Technology (IST), March 2025
- Navigating the AI Frontier: A Primer on the Evolution and Impact of AI Agents | World Economic Forum and Capgemini, December 2024
- News Integrity in AI Assistants: An international PSM study | EBU and BBC, October 2025
- NewsGuard AI Tracking Center
- On Risk Assessment and Mitigation for Algorithmic Systems | Integrity Institute Report, February 2024
- Open Problems in Technical AI Governance: A repository of open problems in technical AI governance
- Open Sourcing Highly Capable Foundation Models
- Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism: What Policymakers Should Know | Center for Strategic and International Studies, August 2025
- Organization and Training of a Cyber Security Team
- Our Data Our Selves, Data Use Policy
- OWASP AI Testing Guide
- OWASP GenAI Security Project – Solutions Reference Guide Q2–Q3 2025 | OWASP GenAI Security Project, November 2025
- OWASP Top 10 for Agentic Applications 2026 | OWASP GenAI Security Project
- PAIR Explorables: Datasets Have Worldviews
- People + AI Guidebook | PAIR Guidebook
- Perspectives on Issues in AI Governance | Google
- Policy Center for the New South, Artificial Intelligence in Africa: Challenges and Opportunities | Fahd Azaroual, May 2024
- Preparedness Framework v2 | OpenAI, April 2025
- Prioritizing Real-Time Failure Detection in AI Agents | Partnership on AI, September 2025
- Privacy Notice | AWS
- PwC's Responsible AI
- Raising Standards: Data and Artificial Intelligence in Southeast Asia | Asia Society Policy Institute, Elina Noor and Mark Bryan Manantan, July 2022
- RAND Corporation, A Primer for Developers and Policymakers
- RAND Corporation, Analyzing Harms from AI-Generated Images and Safeguarding Online Authenticity
- RAND Corporation, Strengthening Emergency Preparedness and Response for AI Loss of Control Incidents | RAND Europe, July 30, 2025
- RAND Corporation, US Tort Liability for Large-Scale Artificial Intelligence Damages, A Primer for Developers and Policymakers
- Ravit Dotan's Projects
- Real People in Fake Porn: How a Federal Right of Publicity Could Assist in the Regulation of Deepfake Pornography
- Recommendations for the Independent International Scientific Panel on AI and the Global Dialogue on AI Governance | Simon Institute for Longterm Governance, February 2025
- Regulating Under Uncertainty: Governance Options for Generative AI | Stanford Cyber Policy Center, Florence G'Sell, September 2024
- Responsible AI at Stanford: Enabling innovation through AI best practices
- Responsible Data Stewardship in Practice | The Alan Turing Institute
- Responsible Enterprise AI in the Agentic Era | Infosys
- Responsible Practices for Synthetic Media: A Framework for Collective Action
- Responsible Scaling Policy Version 3.0 | Anthropic, February 2026
- Risk Taxonomy and Thresholds for Frontier AI Frameworks | Frontier Model Forum, June 18, 2025
- Risk Tiers: Towards a Gold Standard for Advanced AI | AI Governance Initiative, Oxford Martin School, and the University of Oxford, June 2025
- Safe and Reliable Machine Learning
- Sample AI Incident Response Checklist
- Secure AI Framework Approach | Google
- Secure AI Framework Summary | Google
- Securing Agentic Applications Guide 1.0 | OWASP GenAI Security Project, July 2025
- SHRM Generative Artificial Intelligence AI Chatbot Usage Policy
- Sovereign AI and Sustainable Computation for Indigenous Communities | Keolu Fox
- State of Agentic AI Security and Governance: OWASP Gen AI Security Project Agentic Security Initiative | Version 1.0, July 2025
- State of AI Safety in China | Concordia AI, July 2025
- Std 1012-1998 Standard for Software Verification and Validation
- Summary Report: Workshop on the Geopolitics of Critical Minerals and the AI Supply Chain | Institute for Advanced Study, August 2025
- Synthetic Data: The New Data Frontier | World Economic Forum, September 2025
- System cards | Meta
- Taskade: AI Audit PBC Request Checklist Template
- Taxonomy of Failure Mode in Agentic AI Systems | Microsoft
- Tech Policy Press - Artificial Intelligence
- Technology Trends Outlook 2025 | McKinsey & Company, July 2025, Fifth Edition
- TechTarget: 9 questions to ask when auditing your AI systems
- The AI Act between Digital and Sectoral Regulations | Bertelsmann Stiftung, December 2024
- The AI Act is coming: EU reaches political agreement on comprehensive regulation of artificial intelligence | Hogan Lovells
- The Complete Guide to Crowdsourced Security Testing, Government Edition | Synack
- The Ethics of AI Ethics: An Evaluation of Guidelines
- The Ethics of Developing, Implementing, and Using Advanced Warehouse Technologies: Top-Down Principles Versus The Guidance Ethics Approach
- The Foundation Model Transparency Index
- The Future Is Now: Artificial Intelligence and the Legal Profession | International Bar Association and the Center for AI and Digital Policy
- The Implications of Artificial Intelligence in Cybersecurity: Shifting the Offense-Defense Balance | Institute for Security and Technology (IST)
- The Landscape of ML Documentation Tools | Hugging Face
- The Responsible Use of AI in Healthcare | The Joint Commission and Coalition for Health AI, 2025
- The Rise of Generative AI and the Coming Era of Social Media Manipulation 3.0: Next-Generation Chinese Astroturfing and Coping with Ubiquitous AI
- Toward an evaluation science for generative AI systems
- Towards Effective Governance of Foundation Models and Generative AI | Future Society
- Towards Traceability in Data Ecosystems using a Bill of Materials Model | Manifest MLBOM Wiki
- Transformed by AI: How Generative Artificial Intelligence Could Affect Work in the UK—And How to Manage It | Institute for Public Policy Research (IPPR)
- Troubleshooting Deep Neural Networks
- Trustible, Enhancing the Effectiveness of AI Governance Committees
- Twitter Algorithmic Bias Bounty
- Understanding data governance in AI: Mapping governance | Open Data Institute
- Unite.AI: How to perform an AI Audit in 2023
- University of California, Berkeley, Center for Long-Term Cybersecurity, AI Risk-Management Standards Profile for General-Purpose AI and Foundation Models | Version 1.1, January 2025
- University of California, Berkeley, Center for Long-Term Cybersecurity, Decision Points in AI Governance: Three Case Studies Explore Efforts to Operationalize AI Principles
- University of California, Berkeley, Center for Long-Term Cybersecurity, Intolerable Risk Threshold Recommendations for Artificial Intelligence: Key Principles, Considerations, and Case Studies to Inform Frontier AI Safety Frameworks for Industry and Government | February 2025
- University of California, Berkeley, Information Security Office, How to Write an Effective Website Privacy Statement
- University of Washington Tech Policy Lab, Data Statements
- US Open-Source AI Governance: Balancing Ideological and Geopolitical Considerations with China Competition | Center for AI Policy, February 2025
- Warning Signs: The Future of Privacy and Security in an Age of Machine Learning
- What Are High-Risk AI Systems Within the Meaning of the EU’s AI Act, and What Requirements Apply to Them? | WilmerHale
- When Not to Trust Your Explanations
- Who Should Develop Which AI Evaluations?
- Why We Need to Know More: Exploring the State of AI Incident Documentation Practices
- Worldwide AI Ethics: A Review of 200 Guidelines and Recommendations for AI Governance
- You Created A Machine Learning Application Now Make Sure It's Secure
- YouTube's Anorexia Algorithm: How YouTube Recommends Eating Disorders Videos to Young Girls | Center for Countering Digital Hate (CCDH)
Infographics and Cheat Sheets
- Foundation Model Development Cheatsheet
- Future of Privacy Forum
- Generative AI framework and Generative AI value tree modelling diagram
- Global Index for AI Safety: AGILE Index on Global AI Safety Readiness Feb 2025
- IAPP
- Machine Learning Attack_Cheat_Sheet
- Navigating the EU AI Act: A Process Map for making AI Systems available | AppliedAI Institute
- Oliver Patel's Cheat Sheets
AI Red-Teaming Resources
Papers
- Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
- Exploiting Novel GPT-4 APIs
- GenAI Red Teaming Guide: A Practical Approach to Evaluating AI Vulnerabilities | OWASP Version 1.0, January 23, 2025
- Identifying and Eliminating CSAM in Generative ML Training Data and Models
- Jailbreaking Black Box Large Language Models in Twenty Queries
- LLM Agents can Autonomously Exploit One-day Vulnerabilities
- Red Teaming for GenAI Harms: Revealing the Risks and Rewards for Online Safety | Ofcom, July 23, 2024
- Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
- Red Teaming of Advanced Information Assurance Concepts
- Robust AI Security and Alignment: A Sisyphean Endeavor? | Apostol Vassilev, arXiv, 2026
Tools and Guidance
- @dotey on X/Twitter exploring GPT prompt security and prevention measures
- 0xeb / GPT-analyst |
- 0xk1h0 / ChatGPT "DAN" and other "Jailbreaks" |
- A Safe Harbor for AI Evaluation and Red Teaming
- ACL 2024 Tutorial: Vulnerabilities of Large Language Models to Adversarial Attacks
- Azure's PyRIT |
- Berkeley Center for Long-Term Cybersecurity
- CDAO frameworks, guidance, and best practices for AI test & evaluation
- ChatGPT_system_prompt |
- coolaj86 / Chat GPT "DAN" and other "Jailbreaks" |
- CSET, What Does AI-Red Teaming Actually Mean?
- DAIR Prompt Engineering Guide
- Extracting Training Data from ChatGPT
- Frontier Model Forum: What is Red Teaming?
- Generative AI Red Teaming Challenge: Transparency Report 2024
- HackerOne, An Emerging Playbook for AI Red Teaming with HackerOne
- Humane Intelligence, SeedAI, and DEFCON AI Village, Generative AI Red Teaming Challenge: Transparency Report 2024
- In-The-Wild Jailbreak Prompts on LLMs |
- Learn Prompting, Prompt Hacking
- leeky: Leakage/contamination testing for black box language models |
- LLM Security & Privacy |
- Membership Inference Attacks and Defenses on Machine Learning Models Literature |
- Lakera AI's Gandalf
- leondz / garak |
- Microsoft AI Red Team building future of safer AI
- OpenAI Red Teaming Network
- r/ChatGPTJailbreak
- Y Combinator, ChatGPT Grandma Exploit
Generative AI Explainability
- AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
- Anthropic
- Attention Is All You Need
- Backpack Language Models
- Jay Alammar
- Neuronpedia
- Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph
Challenges and Competitions
This section contains challenges and competitions related to responsible ML.
- FICO Explainable Machine Learning Challenge
- OSD Bias Bounty
- National Fair Housing Alliance Hackathon
- Twitter Algorithmic Bias
Education Resources
This section collects courses, curricula, syllabi, tutorials, and teaching materials that help readers learn about responsible machine learning, AI assessment, AI safety, technical evaluation, interpretability, auditing, implementation practice, and related topics.
These resources are included here because they are primarily pedagogical or training-oriented. They teach readers how to understand, evaluate, build, test, or implement AI systems responsibly.
For university policies, syllabus-use rules, academic integrity statements, classroom AI guidance, and institutional teaching policies, see the companion branch: AI Governance and Policy Resources.
Courses and Curricula
AI Safety and LLM Safety Courses
- AI Safety, Ethics, and Society Virtual Course | Dan Hendrycks and others
- Introduction to ML Safety
- BlueDot Impact Courses
- CS 2881: AI Safety | Harvard University
- COS 597Q: AI Safety | Princeton University
- CS 194/294-267: Understanding Large Language Models: Foundations and Safety | University of California, Berkeley
Responsible AI, Ethics, and Public Policy Courses
- CS 281: Ethics of Artificial Intelligence | Stanford University
- CS 182: Ethics, Public Policy, and Technological Change | Stanford University
- CS 181/181W: Computers, Ethics, and Public Policy | Stanford University
- CS 21SI: AI for Social Good | Stanford University
- Responsible AI, Law, Ethics & Society | University of California, Berkeley
- Responsible AI in Practice | Stanford University
- CS 109: The Essentials of AI for Life and Society | University of Texas at Austin
Embedded Ethics Teaching Materials
- Embedded EthiCS Module Repository | Harvard University
- Stanford Embedded Ethics | Stanford University
AI Literacy and K–12 Curricula
- RAISE: Responsible AI for Social Empowerment and Education | MIT OpenCourseWare
- Day of AI Curriculum Resources | MIT RAISE
- AI4K12
- AI + Ethics for Middle School | AI4K12
Syllabus Collections
- Tech Ethics Curricula: A Collection of Syllabi | Casey Fiesler
Comprehensive Software Examples and Tutorials
This section is a curated collection of guides and tutorials that simplify responsible ML implementation. It spans from basic model interpretability to advanced fairness techniques. Suitable for both novices and experts, the resources cover topics like COMPAS fairness analyses and explainable machine learning via counterfactuals.
- COMPAS Analysis Using Aequitas |
- Explaining Quantitative Measures of Fairness with SHAP |
- Getting a Window into your Black Box Model
- H20.ai
- From GLM to GBM Part 1
- From GLM to GBM Part 2
- IML
- Interpretable Machine Learning with Python |
- Interpreting Machine Learning Models with the iml Package
- Interpretable Machine Learning using Counterfactuals
- Machine Learning Explainability by Kaggle Learn
- Model Interpretability with DALEX
- Model Interpretation series by Dipanjan (DJ) Sarkar
- Partial Dependence Plots in R
- PiML
- Reliable-and-Trustworthy-AI-Notebooks |
- Saliency Maps for Deep Learning
- Visualizing ML Models with LIME
- Visualizing and debugging deep convolutional networks
- What does a CNN see?
Free-ish Books
This section contains books that can be reasonably described as free, including some "historical" books dealing broadly with ethical and responsible tech.
- Adversarial Model Analysis | Przemyslaw Biecek, 2023
- An Introduction to Machine Learning Interpretability: An Applied Perspective on Fairness, Accountability, Transparency, and Explainable AI | Patrick Hall and Navdeep Gill, 2019, Second Edition
- Artificial Intelligence and Fundamental Rights: The AI Act of the European Union and its implications for global technology regulation | Trier Studies on Digital Law, Volume 4
- Case Studies in Information and Computer Ethics | Richard A. Spinello, 1997
- Case Studies in Information Technology Ethics | Richard A. Spinello, 2003, Second Edition
- Computer and Information Ethics | Marsha Cook Woodbury, 2003
- Computer Ethics: Analyzing Information Technology | Deborah G. Johnson and Keith W. Miller, 2009, Fourth Edition
- Computer Power and Human Reason: From Judgment to Calculation | Joseph Weizenbaum, 1976
- Computers, Ethics, and Society | M. David Ermann, Mary B. Williams, and Claudio Gutierrez, 1990
- Controlling Technology: Ethics and the Responsible Engineer | Stephen H. Unger, 1982, First Edition
- Controlling Technology: Ethics and the Responsible Engineer | Stephen H. Unger, 1994, Second Edition
- Ethical Aspects of Information Technology | Richard A. Spinello, 1995
- Ethics for people who work in tech
- Ethics in Information Technology | George Reynolds, 2002, Instructor's Edition
- Ethics in Information Technology | George Reynolds, 2002
- Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models. With examples in R and Python | Przemyslaw Biecek and Tomasz Burzykowski, 2020
- Fairness and Machine Learning: Limitations and Opportunities | Solon Barocas, Moritz Hardt, and Arvind Narayanan, 2022
- Fueling Our Future: A Dialogue about Technology, Ethics, Public Policy, and Remedial Action | Ed Dreby and Keith Helmuth, contributors, and Judy Lumb, editor, 2009
- How Humans Judge Machines | César A. Hidalgo, Diana Orghian, Jordi Albo-Canals, Filipa de Almeida, and Natalia Martin, 2021
- Information Technology Ethics: Cultural Perspectives | Soraj Hongladarom and Charles Ess, 2007
- Interpretable Machine Learning: A Guide for Making Black Box Models Explainable | Christoph Molnar, 2021
- Normal Accidents: Living with High-Risk Technologies with a New Afterword and a Postscript on the Y2K Problem | Charles Perrow, 1999
- Normal Accidents: Living with High-Risk Technologies | Charles Perrow, 1984
- Regulating under Uncertainty: Governance Options for Generative AI | Florence G'sell
- Responsible Machine Learning: Actionable Strategies for Mitigating Risks & Driving Adoption | Patrick Hall, Navdeep Gill, and Benjamin Cox, 2021
- Science and Technology Ethics | Raymond E. Spier (editor), 200
- Society, Ethics, and Technology | Morton E. Winston and Ralph D. Edelbach, 2003, Second Edition
- Society, Ethics, and Technology | Morton E. Winston and Ralph D. Edelbach, 2006, Third Edition
- Society, Ethics, and Technology | Morton E. Winston and Ralph D. Edelbach, 2000, First Edition
- The Cambridge Handbook of the Law, Ethics and Policy of Artificial Intelligence | Nathalie A. Smuha, ed., 2025
- Towards a Code of Ethics for Artificial Intelligence | Paula Boddington, 2017
- Trustworthy AI: African Perspectives | Damian Okaibedi Eke, Kutoma Wakunuma, Simisola Akintoye, and George Ogoh, eds., 2025
- Trustworthy Machine Learning: Concepts for Developing Accurate, Fair, Robust, Explainable, Transparent, Inclusive, Empowering, and Beneficial Machine Learning Systems | Kush R. Varshney, 2022
- Who Shall Live? Medicine, Technology, Ethics | Kenneth Vaux (editor), 1970
Glossaries and Dictionaries
This section features a collection of glossaries and dictionaries that are geared toward defining terms in ML, including some "historical" dictionaries.
- 50 AI terms every beginner should know | TELUS International
- A Glossary of AI Jargon: 29 AI Terms You Should Know | MakeUseOf
- A Multilingual Dictionary of Artificial Intelligence | Otto Vollnhals, 1992 (English, German, French, Spanish, Italian)
- A.I. For Anyone: The A-Z of AI
- Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations | National Institute of Standards and Technology (NIST), NIST AI 100-2 E2023
- AI dictionary: Be a native speaker of Artificial Intelligence | Dataconomy
- AI From A to Z: The Generative AI Glossary for Business Leaders | Salesforce
- AI Terms Glossary | Moveworks
- Appen Artificial Intelligence Glossary
- Artificial intelligence glossary | UK Parliament
- Artificial Intelligence Terms: A to Z Glossary | Coursera
- Artificial intelligence and illusions of understanding in scientific research | (glossary on second page)
- Artificial Intelligence Definitions | Stanford University HAI
- Artificial Intelligence Glossary | Siemens
- Artificial Intelligence Terminology: A Glossary for Beginners | CompTIA
- Brookings: The Brookings glossary of AI and emerging technologies
- Built In, Responsible AI Explained
- Center for Security and Emerging Technology: Glossary
- Collins Dictionary of Artificial Intelligence | Raoul Smith, 1990
- Council of Europe Artificial Intelligence Glossary
- Dictionary of Artificial Intelligence & Robotics | Jerry M. Rosenberg, 1986
- Dictionary of Artificial Intelligence | Dennis Mercadal, 1990
- Dictionary of Cognitive Science: Neuroscience, Psychology, Artificial Intelligence, Linguistics, and Philosophy | Oliver Houdé, 2004
- EU-U.S. Terminology and Taxonomy for Artificial Intelligence | European Commission, Second Edition
- G2: 70+ A to Z Artificial Intelligence Terms in Technology
- General Services Administration: AI Guide for Government: Key AI terminology
- Glossary for Discussion of Ethics of Autonomous and Intelligent Systems | IEEE, Version 1
- Glossary of artificial intelligence | Wikipedia
- Glossary of human-centric artificial intelligence | European Commission
- Google Developers Machine Learning Glossary
- H2O.ai Glossary
- IAPP
- IBM AI glossary
- International Dictionary of Artificial Intelligence | William J. Raynor, Jr, 2009, Second Edition
- ISO/IEC DIS 22989 Information technology — Artificial intelligence — Artificial intelligence concepts and terminology
- Lexicon | Chief Digital and Artificial Intelligence Office (CDAO)
- Open Access Vocabulary
- TechTarget: Artificial intelligence glossary: 60+ terms to know
- Terms from Artificial Intelligence: humans at the heart of algorithms
- The Alan Turing Institute: Data science and AI glossary
- The Facts on File Dictionary of Artificial Intelligence | Raoul Smith, 1989
- The International Dictionary of Artificial Intelligence | William J. Raynor, Jr, 1999, First Edition
- The Language of Trustworthy AI: An In-Depth Glossary of Terms | National Institute of Standards and Technology (NIST)
- The Machine Learning Dictionary | University of New South Wales, Bill Wilson,
- Towards AI, Generative AI Terminology — An Evolving Taxonomy To Get You Started
- Vocabulary of AI Risks | VAIR
Open-ish Classes
This section features a selection of educational courses focused on ethical considerations and best practices in ML. The classes range from introductory courses on data ethics to specialized training in fairness and trustworthy deep learning.
- An Introduction to Data Ethics
- Awesome LLM Courses |
- AWS Skill Builder
- Build a Large Language Model - From Scratch |
- Certified Ethical Emerging Technologist
- Computational Ethics for NLP | Carnegie Mellon University
- CS 4910 - Special Topics in Computer Science: Algorithm Audits | Piotr Sapieżyński
- CS103F: Ethical Foundations of Computer Science
- Data Ethics course | Fast.ai
- DeepLearning.AI
- Disability-Centered AI And Ethics MOOC | OECD.AI
- ETH Zürich ReliableAI 2022 Course Project repository |
- Fairness in Machine Learning
- Generative AI for Educators | Grow with Google
- Generative AI for Everyone | Coursera, DeepLearning.AI
- Generative AI with Large Language Models | Coursera, DeepLearning.AI
- Google Cloud Skills Boost
- Human-Centered Machine Learning
- IBM SkillsBuild
- INFO 4270: Ethics and Policy in Data Science
- Introduction to AI Ethics
- Introduction to Generative AI | Coursera, Google Cloud
- Introduction to Responsible Machine Learning
- Machine Learning Fairness by Google
- Prompt Engineering for ChatGPT | Coursera, Vanderbilt University
- Tech & Ethics Curricula
- Trustworthy Deep Learning
- Visualizing A Neural Machine Translation Model - Mechanics of Seq2seq Models With Attention | Jay Alammar
Course Syllabi
- AI Gov & Nat'l Policy '25 | Colin Shea-Blymyer's syllabus
Podcasts and Channels
This section features podcasts and channels (such as on YouTube) that offer insightful commentary and explanations on responsible AI and machine learning interpretability.
Technical Resources
Benchmarks and Evaluation Frameworks
This section contains benchmarks or datasets used for benchmarks for ML systems, particularly those related to responsible ML desiderata.
| Resource | Description |
|---|---|
| benchm-ml- | "A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.)." |
| Bias Benchmark for QA dataset-BBQ- | "Repository for the Bias Benchmark for QA dataset." |
| Cataloguing LLM Evaluations- | "This repository stems from our paper, 'Cataloguing LLM Evaluations,' and serves as a living, collaborative catalogue of LLM evaluation frameworks, benchmarks and papers." |
| DecodingTrust- | "A Comprehensive Assessment of Trustworthiness in GPT Models." |
| EleutherAI, Language Model Evaluation Harness- | "A framework for few-shot evaluation of language models." |
| Evidently AI 100+ LLM benchmarks and evaluation datasets | "A database of LLM benchmarks and datasets to evaluate the performance of language models." |
| GEM | "GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation, both through human annotations and automated Metrics." |
| HELM | "A holistic framework for evaluating foundation models." |
| Hugging Face, evaluate- | "Evaluate: A library for easily evaluating machine learning models and datasets." |
| i-gallegos, Fair-LLM-Benchmark- | Benchmark from "Bias and Fairness in Large Language Models: A Survey" |
| Inspect AI- | "Inspect: A framework for large language model evaluations." |
| jphall663, Generative AI Risk Management Resources- | "A place for ideas and drafts related to GAI risk management." |
| MLCommons, AI Luminate: A collaborative, transparent approach to safer AI | "The AILuminate v1.1 benchmark suite is the first AI risk assessment benchmark developed with broad involvement from leading AI companies, academia, and civil society." |
| MLCommons, Introducing v0.5 of the AI Safety Benchmark from MLCommons | A paper about the MLCommons AI Safety Benchmark v0.5. |
| MLCommons, MLCommons AI Safety v0.5 Proof of Concept | "The MLCommons AI Safety Benchmark aims to assess the safety of AI systems in order to guide development, inform purchasers and consumers, and support standards bodies and policymakers." |
| ML.ENERGY Leaderboard | "Large language models (LLMs), especially the instruction-tuned ones, can generate human-like responses to chat prompts. Using Zeus for energy measurement, we created a leaderboard for LLM chat energy consumption." |
| ModelSlant.com | "How politically slanted are Large Language Models?" |
| Nvidia MLPerf | "MLPerf™ benchmarks—developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry—are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services." |
| OpenML Benchmarking Suites | OpenML's collection of over two dozen benchmarking suites. |
| Real Toxicity Prompts - Allen Institute for AI | "A dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models." |
| SafetyPrompts.com | "A Living Catalogue of Open Datasets for LLM Safety." |
| Sociotechnical Safety Evaluation Repository | An extensive spreadsheet of sociotechnical safety evaluations in a spreadsheet. |
| Trust-LLM-Benchmark Leaderboard | A series of sortable leaderboards of LLMs based on different trustworthiness criteria. |
| TrustLLM-Benchmark | "A Comprehensive Study of Trustworthiness in Large Language Models." |
| TruthfulQA- | "TruthfulQA: Measuring How Models Imitate Human Falsehoods." |
| WAVES: Benchmarking the Robustness of Image Watermarks | "This paper investigates the weaknesses of image watermarking techniques." |
| Wild-Time: A Benchmark of in-the-Wild Distribution Shifts over Time- | "Benchmark for Natural Temporal Distribution Shift (NeurIPS 2022)." |
| Winogender Schemas- | "Data for evaluating gender bias in coreference resolution systems." |
| yandex-research - tabred- | "A Benchmark of Tabular Machine Learning in-the-Wild with real-world industry-grade tabular datasets." |
Common or Useful Datasets
This section contains datasets that are commonly used in responsible ML evaulations or repositories of interesting/important data sources.
- A dataset on EU legislation for the digital world | Bruegel
- Adult income dataset
- Balanced Faces in the Wild |
- COMPAS Recidivism Risk Score Data and Analysis
- FANNIE MAE Single Family Loan Performance
- German Credit Data | Statlog
- Have I Been Trained?
- nikhgarg / EmbeddingDynamicStereotypes |
- NYPD Stop, Question and Frisk Data
- Presidential Deepfakes Dataset
- socialfoundations / folktables |
- Wikipedia Talk Labels: Personal Attacks
Domain-specific Software
This section curates specialized software tools aimed at responsible ML within specific domains, such as in healthcare, finance, or social sciences.
Machine Learning Environment Management Tools
This section contains open source or open access ML environment management software.
| Resource | Description |
|---|---|
| dvc | "Manage and version images, audio, video, and text files in storage and organize your ML modeling process into a reproducible workflow." |
| gigantum- | "Building a better way to create, collaborate, and share data-driven science." |
| mlflow | "An open source platform for the machine learning lifecycle." |
| mlmd- | "For recording and retrieving metadata associated with ML developer and data scientist workflows." |
| modeldb- | "Open Source ML Model Versioning, Metadata, and Experiment Management." |
| neptune | "A single place to manage all your model metadata." |
| Opik- | "Evaluate, test, and ship LLM applications across your dev and production lifecycles." |
Personal Data Protection Tools
This section contains tools for personal data protection.
| Name | Description |
|---|---|
| LLM Dataset Inference: Did you train on my dataset?- | "Official Repository for Dataset Inference for LLMs" |
Open Source/Access Responsible AI Software Packages
This section contains open source or open access software used to implement responsible ML. As much as possible, descriptions are quoted verbatim from the respective repositories themselves. In rare instances, we provide our own descriptions (unmarked by quotes).
Browser
| Name | Description |
|---|---|
| DiscriLens- | "Discrimination in Machine Learning." |
| Hugging Face, BiasAware: Dataset Bias Detection | "BiasAware is a specialized tool for detecting and quantifying biases within datasets used for Natural Language Processing (NLP) tasks." |
| manifold- | "A model-agnostic visual debugging tool for machine learning." |
| PAIR-code - datacardsplaybook- | "The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation." |
| PAIR-code - facets- | "Visualizations for machine learning datasets." |
| PAIR-code - knowyourdata- | "A tool to help researchers and product teams understand datasets with the goal of improving data quality, and mitigating fairness and bias issues." |
| TensorBoard Projector | "Using the TensorBoard Embedding Projector, you can graphically represent high dimensional embeddings. This can be helpful in visualizing, examining, and understanding your embedding layers." |
| What-if Tool | "Visually probe the behavior of trained machine learning models, with minimal coding." |
C/C++
| Name | Description |
|---|---|
| Born-again Tree Ensembles- | "Born-Again Tree Ensembles: Transforms a random forest into a single, minimal-size, tree with exactly the same prediction function in the entire feature space (ICML 2020)." |
| Certifiably Optimal RulE ListS- | "CORELS is a custom discrete optimization technique for building rule lists over a categorical feature space." |
| Secure-ML- | "Secure Linear Regression in the Semi-Honest Two-Party Setting." |
JavaScript
| Name | Description |
|---|---|
| LDNOOBW- | "List of Dirty, Naughty, Obscene, and Otherwise Bad Words" |
Python
| Name | Description |
|---|---|
| acd- | "Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Official code for Hierarchical interpretations for neural network predictions.” |
| aequitas- | "Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive tools.” |
| AI Explainability 360- | "Interpretability and explainability of data and machine learning models.” |
| AI Fairness 360- | "A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.” |
| ALEPython- | "Python Accumulated Local Effects package.” |
| Aletheia- | "A Python package for unwrapping ReLU DNNs.” |
| algofairness- | See Algorithmic Fairness. |
| Alibi- | "Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-quality implementations of black-box, white-box, local and global explanation methods for classification and regression models.” |
| allennlp- | "An open-source NLP research library, built on PyTorch.” |
| anchor- | "Code for 'High-Precision Model-Agnostic Explanations' paper.” |
| Bayesian Case Model | |
| Bayesian Ors-Of-Ands- | "This code implements the Bayesian or-of-and algorithm as described in the BOA paper. We include the tictactoe dataset in the correct formatting to be used by this code.” |
| Bayesian Rule List - BRL | Rudin group at Duke Bayesian case model implementation |
| BlackBoxAuditing- | "Research code for auditing and exploring black box machine-learning models.” |
| CalculatedContent, WeightWatcher- | "The WeightWatcher tool for predicting the accuracy of Deep Neural Networks." |
| captum- | "Model interpretability and understanding for PyTorch.” |
| casme- | "contains the code originally forked from the ImageNet training in PyTorch that is modified to present the performance of classifier-agnostic saliency map extraction, a practical algorithm to train a classifier-agnostic saliency mapping by simultaneously training a classifier and a saliency mapping.” |
| Causal Discovery Toolbox- | "Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.” |
| causalml- | "Uplift modeling and causal inference with machine learning algorithms.” |
| cdt15, Causal Discovery Lab., Shiga University- | "LiNGAM is a new method for estimating structural equation models or linear causal Bayesian networks. It is based on using the non-Gaussianity of the data." |
| checklist- | "Beyond Accuracy: Behavioral Testing of NLP models with CheckList.” |
| cleverhans- | "An adversarial example library for constructing attacks, building defenses, and benchmarking both.” |
| contextual-AI- | "Contextual AI adds explainability to different stages of machine learning pipelines |
| ContrastiveExplanation - Foil Trees- | "provides an explanation for why an instance had the current outcome (fact) rather than a targeted outcome of interest (foil). These counterfactual explanations limit the explanation to the features relevant in distinguishing fact from foil, thereby disregarding irrelevant features.” |
| counterfit- | "a CLI that provides a generic automation layer for assessing the security of ML models.” |
| dalex- | "moDel Agnostic Language for Exploration and eXplanation.” |
| debiaswe- | "Remove problematic gender bias from word embeddings.” |
| DeepExplain- | "provides a unified framework for state-of-the-art gradient and perturbation-based attribution methods. It can be used by researchers and practitioners for better undertanding the recommended existing models, as well for benchmarking other attribution methods.” |
| DeepLIFT- | "This repository implements the methods in 'Learning Important Features Through Propagating Activation Differences' by Shrikumar, Greenside & Kundaje, as well as other commonly-used methods such as gradients, gradient-times-input (equivalent to a version of Layerwise Relevance Propagation for ReLU networks), guided backprop and integrated gradients.” |
| deepvis- | "the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimization.” |
| DIANNA- | "DIANNA is a Python package that brings explainable AI (XAI) to your research project. It wraps carefully selected XAI methods in a simple, uniform interface. It's built by, with and for (academic) researchers and research software engineers working on machine learning projects.” |
| DiCE- | "Generate Diverse Counterfactual Explanations for any machine learning model.” |
| DoWhy- | "DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.” |
| dtreeviz- | "A python library for decision tree visualization and model interpretation.” |
| ecco- | "Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).” |
| effector- | "eXplainable AI for Tabular Data" |
| eli5- | "A library for debugging/inspecting machine learning classifiers and explaining their predictions.” |
| explabox- | "aims to support data scientists and machine learning (ML) engineers in explaining, testing and documenting AI/ML models, developed in-house or acquired externally. The explabox turns your ingestibles (AI/ML model and/or dataset) into digestibles (statistics, explanations or sensitivity insights).” |
| Explainable Boosting Machine EBM/GA2M- | "an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof. With this package, you can train interpretable glassbox models and explain blackbox systems. InterpretML helps you understand your model's global behavior, or understand the reasons behind individual predictions.” |
| ExplainaBoard- | "a tool that inspects your system outputs, identifies what is working and what is not working, and helps inspire you with ideas of where to go next.” |
| explainerdashboard- | "Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.” |
| explainX- | "Explainable AI framework for data scientists. Explain & debug any blackbox machine learning model with a single line of code.” |
| fair-classification- | "Python code for training fair logistic regression classifiers.” |
| fairlearn- | "a Python package that empowers developers of artificial intelligence (AI) systems to assess their system's fairness and mitigate any observed unfairness issues. Fairlearn contains mitigation algorithms as well as metrics for model assessment. Besides the source code, this repository also contains Jupyter notebooks with examples of Fairlearn usage.” |
| fairml- | "a python toolbox auditing the machine learning models for bias.” |
| fairness_measures_code- | "contains implementations of measures used to quantify discrimination.” |
| fairness-comparison- | "meant to facilitate the benchmarking of fairness aware machine learning algorithms.” |
| Falling Rule List - FRL | Rudin group at Duke falling rule list implementation |
| foolbox- | "A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX.” |
| Giskard- | "The testing framework dedicated to ML models, from tabular to LLMs. Scan AI models to detect risks of biases, performance issues and errors. In 4 lines of code.” |
| gplearn- | "implements Genetic Programming in Python, with a scikit-learn inspired and compatible API.” |
| Grad-CAM-(GitHub topic) | Grad-CAM is a technique for making convolutional neural networks more transparent by visualizing the regions of input that are important for predictions in computer vision models. |
| H2O-3 Monotonic GBM | "Builds gradient boosted classification trees and gradient boosted regression trees on a parsed data set." |
| H2O-3 Penalized Generalized Linear Models | "Fits a generalized linear model, specified by a response variable, a set of predictors, and a description of the error distribution." |
| H2O-3 Sparse Principal Components | "Builds a generalized low rank decomposition of an H2O data frame." |
| h2o-LLM-eval- | "Large-language Model Evaluation framework with Elo Leaderboard and A-B testing." |
| hate-functional-tests- | HateCheck: A dataset and test suite from an ACL 2021 paper, offering functional tests for hate speech detection models, including extensive case annotations and testing functionalities. |
| imodels- | "Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use.” |
| iNNvestigate neural nets- | A comprehensive Python library to analyze and interpret neural network behaviors in Keras, featuring a variety of methods like Gradient, LRP, and Deep Taylor. |
| Integrated-Gradients- | "a variation on computing the gradient of the prediction output w.r.t. features of the input. It requires no modification to the original network, is simple to implement, and is applicable to a variety of deep models (sparse and dense, text and vision).” |
| interpret_with_rules- | "induces rules to explain the predictions of a trained neural network, and optionally also to explain the patterns that the model captures from the training data, and the patterns that are present in the original dataset.” |
| interpret- | "an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof.” |
| InterpretME- | "integrates knowledge graphs (KG) with machine learning methods to generate interesting meaningful insights. It helps to generate human- and machine-readable decisions to provide assistance to users and enhance efficiency.” |
| keract- | Keract is a tool for visualizing activations and gradients in Keras models; it's meant to support a wide range of Tensorflow versions and to offer an intuitive API with Python examples. |
| Keras-vis- | "a high-level toolkit for visualizing and debugging your trained keras neural net models.” |
| L2X- | "Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation at ICML 2018, by Jianbo Chen, Mitchell Stern, Martin J. Wainwright, Michael I. Jordan.” |
| LangFair- | "LangFair is a Python library for conducting use-case level LLM bias and fairness assessments" |
| langtest- | "LangTest: Deliver Safe & Effective Language Models" |
| learning-fair-representations- | "Python numba implementation of Zemel et al. 2013 http://www.cs.toronto.edu/~toni/Papers/icml-final.pdf" |
| leeky: Leakage/contamination testing for black box language models- | "leeky - training data contamination techniques for blackbox models" |
| leondz / garak, LLM vulnerability scanner- | "LLM vulnerability scanner" |
| LiFT- | "The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness and the mitigation of bias in large-scale machine learning workflows. The measurement module includes measuring biases in training data, evaluating fairness metrics for ML models, and detecting statistically significant differences in their performance across different subgroups.” |
| lilac- | "Curate better data for LLMs." |
| lime- | "explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data) or images, with a package called lime (short for local interpretable model-agnostic explanations).” |
| lit- | "The Learning Interpretability Tool (LIT, formerly known as the Language Interpretability Tool) is a visual, interactive ML model-understanding tool that supports text, image, and tabular data. It can be run as a standalone server, or inside of notebook environments such as Colab, Jupyter, and Google Cloud Vertex AI notebooks.” |
| LLM Dataset Inference: Did you train on my dataset?- | "Official Repository for Dataset Inference for LLMs" |
| lofo-importance- | "LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric.” |
| lrp_toolbox- | "The Layer-wise Relevance Propagation (LRP) algorithm explains a classifer's prediction specific to a given data point by attributing relevance scores to important components of the input by using the topology of the learned model itself.” |
| MindsDB- | "enables developers to build AI tools that need access to real-time data to perform their tasks.” |
| ml_privacy_meter- | "an open-source library to audit data privacy in statistical and machine learning algorithms. The tool can help in the data protection impact assessment process by providing a quantitative analysis of the fundamental privacy risks of a (machine learning) model.” |
| ml-fairness-gym- | "a set of components for building simple simulations that explore the potential long-run impacts of deploying machine learning-based decision systems in social environments.” |
| MLextend | "Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks.” |
| mllp- | "This is a PyTorch implementation of Multilayer Logical Perceptrons (MLLP) and Random Binarization (RB) method to learn Concept Rule Sets (CRS) for transparent classification tasks, as described in our paper: Transparent Classification with Multilayer Logical Perceptrons and Random Binarization.” |
| Monotonic Constraints | Guide on implementing and understanding monotonic constraints in XGBoost models to enhance predictive performance with practical Python examples. |
| Multilayer Logical Perceptron - MLLP- | "This is a PyTorch implementation of Multilayer Logical Perceptrons (MLLP) and Random Binarization (RB) method to learn Concept Rule Sets (CRS) for transparent classification tasks, as described in our paper: Transparent Classification with Multilayer Logical Perceptrons and Random Binarization.” |
| OptBinning- | "a library written in Python implementing a rigorous and flexible mathematical programming formulation to solve the optimal binning problem for a binary, continuous and multiclass target type, incorporating constraints not previously addressed.” |
| Optimal Sparse Decision Trees- | "This accompanies the paper, "Optimal Sparse Decision Trees" by Xiyang Hu, Cynthia Rudin, and Margo Seltzer.” |
| parity-fairness | "This repository contains codes that demonstrate the use of fairness metrics, bias mitigations and explainability tool.” |
| PDPbox- | "Python Partial Dependence Plot toolbox. Visualize the influence of certain features on model predictions for supervised machine learning algorithms, utilizing partial dependence plots.” |
| PiML-Toolbox- | "a new Python toolbox for interpretable machine learning model development and validation. Through low-code interface and high-code APIs, PiML supports a growing list of inherently interpretable ML models.” |
| pjsaelin / Cubist- | "A Python package for fitting Quinlan's Cubist regression model" |
| Privacy-Preserving-ML- | "Implementation of privacy-preserving SVM assuming public model private data scenario (data in encrypted but model parameters are unencrypted) using adequate partial homomorphic encryption.” |
| ProtoPNet- | "This code package implements the prototypical part network (ProtoPNet) from the paper "This Looks Like That: Deep Learning for Interpretable Image Recognition" (to appear at NeurIPS 2019), by Chaofan Chen (Duke University), Oscar Li |
| pyBreakDown- | See dalex. |
| PyCEbox- | "Python Individual Conditional Expectation Plot Toolbox.” |
| pyGAM- | "Generalized Additive Models in Python.” |
| pymc3- | "PyMC (formerly PyMC3) is a Python package for Bayesian statistical modeling focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.” |
| pySS3- | "The SS3 text classifier is a novel and simple supervised machine learning model for text classification which is interpretable, that is, it has the ability to naturally (self)explain its rationale.” |
| pytorch-grad-cam- | "a package with state of the art methods for Explainable AI for computer vision. This can be used for diagnosing model predictions, either in production or while developing models. The aim is also to serve as a benchmark of algorithms and metrics for research of new explainability methods.” |
| pytorch-innvestigate- | "PyTorch implementation of Keras already existing project: https://github.com/albermax/innvestigate/.” |
| Quantus- | "Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations." |
| rationale- | "This directory contains the code and resources of the following paper: "Rationalizing Neural Predictions". Tao Lei, Regina Barzilay and Tommi Jaakkola. EMNLP 2016. PDF Slides. The method learns to provide justifications, i.e. rationales, as supporting evidence of neural networks' prediction.” |
| responsibly- | "Toolkit for Auditing and Mitigating Bias and Fairness of Machine Learning Systems.” |
| REVISE: REvealing VIsual biaSEs- | "A tool that automatically detects possible forms of bias in a visual dataset along the axes of object-based, attribute-based, and geography-based patterns, and from which next steps for mitigation are suggested.” |
| RISE- | "contains source code necessary to reproduce some of the main results in the paper: Vitali Petsiuk, Abir Das, Kate Saenko (BMVC, 2018) [and] RISE: Randomized Input Sampling for Explanation of Black-box Models.” |
| Risk-SLIM- | "a machine learning method to fit simple customized risk scores in python.” |
| robustness- | "a package we (students in the MadryLab) created to make training, evaluating, and exploring neural networks flexible and easy.” |
| SAGE- | "SAGE (Shapley Additive Global importancE) is a game-theoretic approach for understanding black-box machine learning models. It quantifies each feature's importance based on how much predictive power it contributes, and it accounts for complex feature interactions using the Shapley value.” |
| SALib- | "Python implementations of commonly used sensitivity analysis methods. Useful in systems modeling to calculate the effects of model inputs or exogenous factors on outputs of interest.” |
| Scikit-Explain | "User-friendly Python module for machine learning explainability," featuring PD and ALE plots, LIME, SHAP, permutation importance and Friedman's H, among other methods. |
| scikit-fairness- | Historical link. Merged with fairlearn. |
| Scikit-learn Decision Trees | "a non-parametric supervised learning method used for classification and regression.” |
| Scikit-learn Generalized Linear Models | "a set of methods intended for regression in which the target value is expected to be a linear combination of the features.” |
| Scikit-learn Sparse Principal Components | "a variant of [principal component analysis, PCA], with the goal of extracting the set of sparse components that best reconstruct the data.” |
| scikit-multiflow | "a machine learning package for streaming data in Python.” |
| shap- | "a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions" |
| shapley- | "a Python library for evaluating binary classifiers in a machine learning ensemble.” |
| sklearn-expertsys- | "a scikit-learn compatible wrapper for the Bayesian Rule List classifier developed by Letham et al., 2015, extended by a minimum description length-based discretizer (Fayyad & Irani, 1993) for continuous data, and by an approach to subsample large datasets for better performance.” |
| skope-rules- | "a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.” |
| solas-ai-disparity- | "a collection of tools that allows modelers, compliance, and business stakeholders to test outcomes for bias or discrimination using widely accepted fairness metrics.” |
| Super-sparse Linear Integer models - SLIMs- | "a package to learn customized scoring systems for decision-making problems.” |
| tensorflow/fairness-indicators- | "designed to support teams in evaluating, improving, and comparing models for fairness concerns in partnership with the broader Tensorflow toolkit.” |
| tensorflow/lattice- | "a library that implements constrained and interpretable lattice based models. It is an implementation of Monotonic Calibrated Interpolated Look-Up Tables in TensorFlow.” |
| tensorflow/lucid- | "a collection of infrastructure and tools for research in neural network interpretability.” |
| tensorflow/model-analysis- | "a library for evaluating TensorFlow models. It allows users to evaluate their models on large amounts of data in a distributed manner, using the same metrics defined in their trainer. These metrics can be computed over different slices of data and visualized in Jupyter notebooks.” |
| tensorflow/model-card-toolkit- | "streamlines and automates generation of Model Cards, machine learning documents that provide context and transparency into a model's development and performance. Integrating the MCT into your ML pipeline enables you to share model metadata and metrics with researchers, developers, reporters, and more.” |
| tensorflow/model-remediation- | "a library that provides solutions for machine learning practitioners working to create and train models in a way that reduces or eliminates user harm resulting from underlying performance biases.” |
| tensorflow/privacy- | "the source code for TensorFlow Privacy, a Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. The library comes with tutorials and analysis tools for computing the privacy guarantees provided.” |
| tensorflow/tcav- | "Testing with Concept Activation Vectors (TCAV) is a new interpretability method to understand what signals your neural networks models uses for prediction.” |
| tensorfuzz- | "a library for performing coverage guided fuzzing of neural networks.” |
| TensorWatch- | "a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Microsoft Research. It works in Jupyter Notebook to show real-time visualizations of your machine learning training and perform several other key analysis tasks for your models and data.” |
| text_explainability | "text_explainability provides a generic architecture from which well-known state-of-the-art explainability approaches for text can be composed.” |
| text_sensitivity | "Uses the generic architecture of text_explainability to also include tests of safety (how safe it the model in production, i.e. types of inputs it can handle), robustness (how generalizable the model is in production, e.g. stability when adding typos, or the effect of adding random unrelated data) and fairness (if equal individuals are treated equally by the model, e.g. subgroup fairness on sex and nationality).” |
| TextFooler- | "A Model for Natural Language Attack on Text Classification and Inference" |
| tf-explain- | "Implements interpretability methods as Tensorflow 2.x callbacks to ease neural network's understanding.” |
| themis-ml- | "A Python library built on top of pandas and sklearnthat implements fairness-aware machine learning algorithms.” |
| Themis- | "A testing-based approach for measuring discrimination in a software system.” |
| TorchUncertainty- | "A package designed to help you leverage uncertainty quantification techniques and make your deep neural networks more reliable.” |
| treeinterpreter- | "Package for interpreting scikit-learn's decision tree and random forest predictions.” |
| TRIAGE- | "This repository contains the implementation of TRIAGE, a "Data-Centric AI" framework for data characterization tailored for regression.” |
| woe- | "Tools for WoE Transformation mostly used in ScoreCard Model for credit rating.” |
| xai- | "A Machine Learning library that is designed with AI explainability in its core.” |
| xdeep- | "An open source Python library for Interpretable Machine Learning.” |
| XGBoost | "an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.” |
| xplique- | "A Python toolkit dedicated to explainability. The goal of this library is to gather the state of the art of Explainable AI to help you understand your complex neural network models.” |
| ydata-profiling- | "Provide(s) a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution.” |
| yellowbrick- | "A suite of visual diagnostic tools called "Visualizers" that extend the scikit-learn API to allow human steering of the model selection process.” |
R
| Name | Description |
|---|---|
| ALEPlot | "Visualizes the main effects of individual predictor variables and their second-order interaction effects in black-box supervised learning models." |
| arules | "Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005)." |
| Causal SVM- | "We present a new machine learning approach to estimate whether a treatment has an effect on an individual, in the setting of the classical potential outcomes framework with binary outcomes." |
| DALEX- | "moDel Agnostic Language for Exploration and eXplanation." |
| DALEXtra: Extension for 'DALEX' Package | "Provides wrapper of various machine learning models." |
| DrWhyAI- | "DrWhy is [a] collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models." |
| elasticnet | "Provides functions for fitting the entire solution path of the Elastic-Net and also provides functions for doing sparse PCA." |
| Explainable Boosting Machine - EBM/GA2M | "Package for training interpretable machine learning models." |
| ExplainPrediction- | "Generates explanations for classification and regression models and visualizes them." |
| fairmodels- | "Flexible tool for bias detection, visualization, and mitigation. Use models explained with DALEX and calculate fairness classification metrics based on confusion matrices using fairness_check() or try newly developed module for regression models using fairness_check_regression()." |
| fairness | "Offers calculation, visualization and comparison of algorithmic fairness metrics." |
| fastshap- | "The goal of fastshap is to provide an efficient and speedy approach (at least relative to other implementations) for computing approximate Shapley values, which help explain the predictions from any machine learning model." |
| featureImportance- | "An extension for the mlr package and allows to compute the permutation feature importance in a model-agnostic manner." |
| flashlight- | "The goal of this package is [to] shed light on black box machine learning models." |
| forestmodel | "Produces forest plots using 'ggplot2' from models produced by functions such as stats::lm(), stats::glm() and survival::coxph()." |
| fscaret | "Automated feature selection using variety of models provided by 'caret' package." |
| gam | "Functions for fitting and working with generalized additive models, as described in chapter 7 of "Statistical Models in S" (Chambers and Hastie (eds), 1991), and "Generalized Additive Models" (Hastie and Tibshirani, 1990)." |
| glm2 | "Fits generalized linear models using the same model specification as glm in the stats package, but with a modified default fitting method that provides greater stability for models that may fail to converge using glm." |
| glmnet | "Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple-response Gaussian, and the grouped multinomial regression." |
| H2O-3 Monotonic GBM | "Builds gradient boosted classification trees and gradient boosted regression trees on a parsed data set." |
| H2O-3 Penalized Generalized Linear Models | "Fits a generalized linear model, specified by a response variable, a set of predictors, and a description of the error distribution." |
| H2O-3 Sparse Principal Components | "Builds a generalized low rank decomposition of an H2O data frame." |
| iBreakDown- | "A model agnostic tool for explanation of predictions from black boxes ML models." |
| ICEbox: Individual Conditional Expectation Plot Toolbox | "Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm." |
| iml- | "An R package that interprets the behavior and explains predictions of machine learning models." |
| ingredients- | "A collection of tools for assessment of feature importance and feature effects." |
| interpret: Fit Interpretable Machine Learning Models | "Package for training interpretable machine learning models." |
| lightgbmExplainer- | "An R package that makes LightGBM models fully interpretable." |
| lime- | "R port of the Python lime package." |
| live | "Helps to understand key factors that drive the decision made by complicated predictive model (black box model)." |
| mcr- | "An R package for Model Reliance and Model Class Reliance." |
| modelDown | "Website generator with HTML summaries for predictive models." |
| modelOriented- | GitHub repositories of Warsaw-based MI².AI. |
| modelStudio- | "Automates the explanatory analysis of machine learning predictive models." |
| Monotonic XGBoost | Enforces consistent, directional relationships between features and predicted outcomes, enhancing model performance by aligning with prior data expectations. |
| quantreg | "Estimation and inference methods for models for conditional quantile functions." |
| rpart | "Recursive partitioning for classification, regression and survival trees." |
| RuleFit | "Implements the learning method and interpretational tools described in Predictive Learning via Rule Ensembles." |
| Scalable Bayesian Rule Lists -SBRL | A more scalable implementation of Bayesian rule list from the Rudin group at Duke. |
| shapFlex- | Computes stochastic Shapley values for machine learning models to interpret them and evaluate fairness, including causal constraints in the feature space. |
| shapleyR- | "An R package that provides some functionality to use mlr tasks and models to generate shapley values." |
| shapper | "Provides SHAP explanations of machine learning models." |
| smbinning | "A set of functions to build a scoring model from beginning to end." |
| vip- | "An R package for constructing variable importance plots (VIPs)." |
| xgboostExplainer- | "An R package that makes xgboost models fully interpretable. |
Archived
Official Policy, Frameworks, and Guidance
For official government files pertaining to responsible AI practices that have been taken offline, we provide Wayback Machine mirror links below. If a document is still available on its original official domain, it can currently be found in its respective subsection above, although it may later be incorporated into this list. Documents may be removed for various reasons (whether political or through routine updates), but archiving them ensures they remain accessible for historical reference. If you're a researcher who finds a dead link to an older version of a government document or one that has altogether been deleted without comment, please feel free to submit a pull request drawing our attention to it and we'll consider it for inclusion. Where possible, we provide links to what appear to be the most recent URLs that governments may want the public to access.
- Artificial Intelligence and Worker Well-Being: Principles and Best Practices for Developers and Employers | United States, Department of Labor, archived February 5, 2025
- Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People, HTML | United States, The White House, Office of Science and Technology Policy, October 4, 2022, archived January 20, 2025
- Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People, PDF | United States, The White House, Office of Science and Technology Policy, October 4, 2022, archived January 20, 2025
- CISA Roadmap for Artificial Intelligence 2023 2024 | United States, Cybersecurity and Infrastructure Security Agency, November 2023
- Data Availability and Transparency Act 2022| Australia, Office of the National Data Commissioner, April 1, 2022, archived March 14, 2024
- Developing Financial Sector Resilience in a Digital World: Selected Themes in Technology and Related Risks | Canada, Office of the Superintendent of Financial Institutions of Canada, September 2020, archived August 2, 2023
- Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence | United States, The White House, October 30, 2023, archived January 20, 2025
- FACT SHEET: Biden-Harris Administration Announces New AI Actions and Receives Additional Major Voluntary Commitment on AI | United States, The White House, July 26, 2024, archived January 20, 2025
- FACT SHEET: Biden-Harris Administration Outlines Coordinated Approach to Harness Power of AI for U.S. National Security | United States, The White House, October 24, 2024, archived January 19, 2025
- FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI | United States, The White House, July 21, 2023, archived January 20, 2025
- FACT SHEET: Biden-Harris Administration Takes New Steps to Advance Responsible Artificial Intelligence Research, Development, and Deployment | United States, The White House, May 23, 2023, archived January 17, 2025
- FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence | United States, The White House, October 30, 2023, archived January 18, 2025
- Federal Register of Legislation, Data Availability and Transparency Act 2022
- Generative Artificial Intelligence Lexicon | United States, Department of Defense, Chief Digital and Artificial Intelligence Office (CDAO), archived September 26, 2024
- Generative Artificial Intelligence Risk Assessment SIMM 5305-F | State of California, Department of Technology, Office of Information Security, March 2024, archived May 24, 2024
- Guidelines on the Application of Republic Act No. 10173 or the Data Privacy Act of 2012 DPA, Its Implementing Rules and Regulations, and the Issuances of the Commission to Artificial Intelligence Systems Processing Personal Data NPC Advisory No. 2024-04 | Philippines, National Privacy Commission, December 19, 2024, archived January 12, 2025
- Introducing the DATA Scheme
- M-21-06 Memorandum for the Heads of Executive Departments and Agencies, Guidance for Regulation of Artificial Intelligence Applications | United States, Executive Office of the President, Office of Management and Budget, November 17, 2020, archived January 18, 2025
- M-24-18 Memorandum for the Heads of Executive Departments and Agencies, Advancing the Responsible Acquisition of Artificial Intelligence in Government | United States, Executive Office of the President, Office of Management and Budget, September 24, 2024, archived January 18, 2025
- Memorandum on Advancing the United States’ Leadership in Artificial Intelligence; Harnessing Artificial Intelligence to Fulfill National Security Objectives; and Fostering the Safety, Security, and Trustworthiness of Artificial Intelligence | United States, The White House, October 24, 2024, archived January 16, 2025
- National Artificial Intelligence Research and Development Strategic Plan 2023 Update | United States, Executive Office of the President, National Science and Technology Council, Select Committee on Artificial Intelligence, May 2023, archived January 16, 2025
- National Science and Technology Council | United States, The White House, Office of Science and Technology Policy, January 16, 2021, archived January 18, 2025
- Office of Science and Technology Policy | United States, The White House, Office of Science and Technology Policy, January 13, 2021, archived January 20, 2025
- Supervisory Guidance on Model Risk Management | ( United States, Federal Deposit Insurance Corporation, archived February 13, 2024
- Aiming for truth, fairness, and equity in your company’s use of AI | United States, Federal Trade Commission, Elisa Jillson, April 19, 2021, archived January 17, 2025
- Using Artificial Intelligence and Algorithms | United States, Federal Trade Commission, Andrew Smith, April 8, 2020, archived January 15, 2024
- Validation of Employee Selection Procedures | Office of Federal Contract Compliance Programs (archived)