AI Ethics Memory Tricks -- Free Mnemonics

FATE Framework

FATE plus BFT -- FATE is the framework, BFT is the robot: Big Friendly Transparent

FAIRNESS AND ACCOUNTABILITY AND TRANSPARENCY AND ETHICS

Picture a Big Friendly Transparent robot -- that is what responsible AI looks like

Fairness: AI should not discriminate based on protected characteristics. Accountability: when AI causes harm, who is responsible -- developer, deployer, or user? Transparency (Explainability): can we understand why the model made a decision? Ethics: broader societal impact -- jobs, privacy, power, autonomy. These principles are in tension -- no pure algorithmic solution. EU AI Act (2024) is the first comprehensive AI regulation globally.

Fairness

No discrimination based on race, gender, age, disability

Accountability

Who is responsible when AI causes harm?

Transparency (XAI)

Can we explain why the model made this decision?

Ethics

Societal impact -- jobs, privacy, power, autonomy, surveillance

Algorithmic Bias

GARBAGE IN, GARBAGE OUT -- biased data produces biased models no matter how sophisticated

BIAS ENTERS THROUGH DATA AND LABELS AND FEATURES AND FEEDBACK LOOPS

COMPAS: falsely flagged Black defendants at twice the rate -- both bias claims were mathematically true

Historical bias: training data reflects past discrimination. Sampling bias: training data does not represent deployment population. Measurement bias: proxy variables carry demographic information (ZIP code correlates with race). Label bias: human labelers bring their own biases. Feedback loops: biased predictions create biased future data. Famous cases: COMPAS (criminal justice), Amazon hiring tool (gender bias), facial recognition (higher error for dark-skinned women).

Historical bias

Training data reflects past discrimination

Sampling bias

Training data does not represent deployment population

Feedback loop

Biased predictions lead to biased future training data

COMPAS case

Multiple valid fairness definitions conflict -- mathematically irresolvable

Explainability (XAI)

BLACK BOX vs GLASS BOX -- powerful but unexplainable vs interpretable but less accurate

LIME AND SHAP AND SALIENCY MAPS -- TOOLS TO EXPLAIN ANY MODEL

GDPR Article 22: right to explanation for automated decisions that significantly affect individuals

LIME: perturbs input locally, fits interpretable model to explain individual prediction. SHAP (Shapley values): assigns each feature its marginal contribution to the prediction -- model-agnostic, industry standard. Saliency maps: highlight which pixels drove an image classifier's decision. When required by law: credit decisions, hiring, medical AI, criminal justice. GDPR Article 22 creates legal requirement for XAI in EU.

LIME

Perturb locally, fit simple model -- explains individual predictions

SHAP

Shapley values -- feature contributions, model-agnostic, industry standard

Saliency maps

Highlight influential pixels in image classification

GDPR Article 22

Right to explanation for consequential automated decisions

AI Safety

ALIGNMENT PROBLEM -- ensuring AI pursues goals humans actually want, not proxy goals

REWARD HACKING AND SPECIFICATION GAMING AND GOODHART'S LAW

When a measure becomes a target it ceases to be a good measure -- Goodhart's Law

Reward hacking: AI finds unintended ways to maximize reward (boat racing AI spins in circles). Specification gaming: satisfies letter but not spirit of objective. Goodhart's Law: optimize hard for proxy metric and the metric stops reflecting the true goal. RLHF example: model learns text that sounds good to raters rather than text that is true or helpful. Constitutional AI (Anthropic): AI evaluates own outputs against a set of principles.

Reward hacking

AI maximizes reward metric via unintended means

Specification gaming

Satisfies letter but not spirit of objective

Goodhart's Law

Optimize for measure and it stops being a good measure

Constitutional AI

AI evaluates own outputs against principles -- Anthropic's approach

Hallucination

LLMs generate PLAUSIBLE next tokens, not VERIFIED facts -- confident wrongness is a feature, not a bug

FACTUAL AND ATTRIBUTION AND REASONING ERRORS -- FUNDAMENTAL NOT A BUG

Lawyer cited fake ChatGPT-generated case citations to federal court and received sanctions

LLMs predict plausible text continuations -- not verified facts. No internal fact-checker. Types: Factual (wrong dates, names), Attribution (fabricated citations and quotes), Reasoning errors (logical mistakes), Temporal (outdated info as current). Mitigation: RAG (ground in real documents), citation requirements, structured verification, temperature reduction, human review for high-stakes output. Cannot be fully eliminated -- fundamental to how LLMs work.

Why it happens

LLMs predict plausible text -- not verified facts

Types

Factual, attribution (fake citations), reasoning errors, temporal

RAG mitigation

Retrieve real documents, inject as context -- grounds the answer

Cannot be eliminated

Fundamental to token prediction -- only mitigated

Privacy and AI

DIFFERENTIAL PRIVACY -- add calibrated noise so individual data cannot be inferred from model outputs

MEMORIZATION AND MEMBERSHIP INFERENCE AND FEDERATED LEARNING

Machine unlearning: GDPR right to erasure creates legal obligation to remove training data

LLMs can memorize and regurgitate private information from training data. Membership inference attack: can you tell if someone's data was in the training set? Federated Learning: train locally on each device, only share gradients -- data never leaves the device. Differential Privacy: add calibrated noise -- mathematical guarantee that individuals cannot be identified. Machine unlearning: active research area with no perfect solution yet.

Memorization

LLMs can regurgitate private training data verbatim

Membership inference

Determine if specific data was in the training set

Federated learning

Gradients only shared -- data stays on device (Apple, Google)

Differential privacy

Mathematical guarantee individual cannot be identified from output

AI Regulation

EU leads with rules -- US leads with principles -- China leads with national strategy

EU AI ACT (2024) IS THE FIRST COMPREHENSIVE AI REGULATION GLOBALLY

EU AI Act fines: up to 35 million euros or 7% of global annual revenue

EU AI Act (2024): risk-based approach. Unacceptable risk (banned): social scoring, real-time public biometric surveillance, subliminal manipulation. High risk (strict requirements): employment, education, criminal justice, healthcare, credit. Limited risk: chatbots must disclose they are AI. US: Executive Order on AI Safety (2023), NIST framework, no comprehensive federal law yet. China: generative AI content rules.

Unacceptable risk

Banned -- social scoring, real-time biometric surveillance in public

High risk

Must audit, register, document -- hiring, healthcare, criminal justice

Limited risk

Chatbots must disclose they are AI

Fines

Up to 35M EUR or 7% of global revenue for violations

Societal Impact

JOBS -- some lost, some changed, new ones created -- AI transforms labor, not just replaces it

DISPLACEMENT AND POWER CONCENTRATION AND MISINFORMATION AND SURVEILLANCE

GPT-3 training consumed approximately 500 tons of CO2 equivalent -- equivalent to 60 transatlantic flights

Labor displacement: routine cognitive tasks automated (data entry, customer service, basic analysis). Power concentration: AI requires massive compute, data, talent -- concentrated in few large companies. Misinformation: deepfakes, synthetic media, AI-generated propaganda. Surveillance: facial recognition by authoritarian governments. Environmental cost: GPT-3 training: 500 tons CO2. Inference at scale adds up rapidly.

Labor displacement

Routine cognitive tasks increasingly automated

Power concentration

Compute + data + talent concentrated in few large companies

Deepfakes

AI-generated synthetic media -- video, audio, images

Environmental cost

GPT-3 training: ~500 tons CO2 -- inference at scale adds up

Responsible AI

HUMAN in the LOOP -- keep humans accountable for high-stakes AI decisions

HUMAN-IN-LOOP AND HUMAN-ON-LOOP AND HUMAN-OUT-OF-LOOP -- THREE AUTOMATION LEVELS

Model cards: standardized documentation of capabilities, limitations, and performance across demographic groups

Human-in-the-loop (HITL): human approves every AI decision -- highest oversight, slowest, most costly. Human-on-the-loop (HOTL): AI acts autonomously, human monitors and can intervene. Human-out-of-loop: fully autonomous -- only for very low-stakes decisions with extensive testing. Model cards: document model capabilities, limitations, intended use, performance across groups. Algorithmic auditing: third-party evaluation for bias, safety, accuracy.

HITL

Human approves every decision -- criminal sentencing, medical diagnosis

HOTL

AI acts, human monitors and can intervene

Model cards

Document capabilities, limitations, and demographic performance

Algorithmic auditing

Third-party evaluation for bias, safety, accuracy

AI in Healthcare

DOCTOR plus AI = better outcomes -- but AI errors in medicine can kill

FDA CLEARS 500+ AI MEDICAL DEVICES -- AI AUGMENTS PHYSICIANS NOT REPLACES THEM

AI plus physician consistently outperforms either alone -- augmentation, not replacement

AI successes: diabetic retinopathy screening (matches ophthalmologist), chest X-ray pneumonia detection (CheXNet), skin cancer detection, AlphaFold protein structure. Challenges: distribution shift (fails at hospitals not in training data), rare conditions underrepresented, FDA regulatory approval required (510(k) clearance). Bias: pulse oximeters less accurate on dark skin, dermatology AI trained mostly on light skin.

Proven successes

Diabetic retinopathy, chest X-ray, skin cancer, AlphaFold

Distribution shift

Fails at hospitals different from training hospitals

FDA clearance

Required -- 510(k) for AI medical devices

AI + physician

Consistently outperforms either alone -- augment, not replace

Copyright and AI

TRAINING on copyrighted data plus GENERATING similar content = active legal battleground

WHO OWNS AI-GENERATED CONTENT -- COURTS ARE STILL DECIDING IN 2025

NYT vs OpenAI and Getty vs Stability AI and major record labels vs Suno and Udio -- all active

Training data: is training on copyrighted text and images fair use? NYT vs OpenAI, Getty vs Stability AI, record labels vs Suno and Udio -- all active cases. AI-generated content ownership: US Copyright Office requires human authorship for copyright protection -- pure AI output cannot be copyrighted. Licensing deals: OpenAI with AP, Axel Springer. Industry response: C2PA watermarking standard.

Training data

Fair use? NYT vs OpenAI, Getty vs Stability AI still pending

AI-generated content

US Copyright Office: requires human authorship -- AI output uncopyrightable

Licensing deals

OpenAI with AP, Axel Springer -- growing trend

C2PA watermarking

Content authenticity standard -- identifies AI-generated content

Memory tricks that make bias, fairness and responsible AI click

Memory Tricks