Memory tricks that make bias, fairness and responsible AI click
From algorithmic bias to the alignment problem -- these memory tricks lock in the ethical frameworks, real-world cases, and regulatory landscape every AI student must understand.
Or continue to the sub-topics below for more specialized Study Rooms and Forums
AI Ethics and Society
Memory Tricks
Proven Mnemonics & Acronyms — fast to learn, hard to forget.
FATE Framework
FATE plus BFT -- FATE is the framework, BFT is the robot: Big Friendly Transparent
FAIRNESS AND ACCOUNTABILITY AND TRANSPARENCY AND ETHICS
Picture a Big Friendly Transparent robot -- that is what responsible AI looks like
Fairness: AI should not discriminate based on protected characteristics. Accountability: when AI causes harm, who is responsible -- developer, deployer, or user? Transparency (Explainability): can we understand why the model made a decision? Ethics: broader societal impact -- jobs, privacy, power, autonomy. These principles are in tension -- no pure algorithmic solution. EU AI Act (2024) is the first comprehensive AI regulation globally.
Fairness
No discrimination based on race, gender, age, disability
GARBAGE IN, GARBAGE OUT -- biased data produces biased models no matter how sophisticated
BIAS ENTERS THROUGH DATA AND LABELS AND FEATURES AND FEEDBACK LOOPS
COMPAS: falsely flagged Black defendants at twice the rate -- both bias claims were mathematically true
Historical bias: training data reflects past discrimination. Sampling bias: training data does not represent deployment population. Measurement bias: proxy variables carry demographic information (ZIP code correlates with race). Label bias: human labelers bring their own biases. Feedback loops: biased predictions create biased future data. Famous cases: COMPAS (criminal justice), Amazon hiring tool (gender bias), facial recognition (higher error for dark-skinned women).
Historical bias
Training data reflects past discrimination
Sampling bias
Training data does not represent deployment population
Feedback loop
Biased predictions lead to biased future training data
BLACK BOX vs GLASS BOX -- powerful but unexplainable vs interpretable but less accurate
LIME AND SHAP AND SALIENCY MAPS -- TOOLS TO EXPLAIN ANY MODEL
GDPR Article 22: right to explanation for automated decisions that significantly affect individuals
LIME: perturbs input locally, fits interpretable model to explain individual prediction. SHAP (Shapley values): assigns each feature its marginal contribution to the prediction -- model-agnostic, industry standard. Saliency maps: highlight which pixels drove an image classifier's decision. When required by law: credit decisions, hiring, medical AI, criminal justice. GDPR Article 22 creates legal requirement for XAI in EU.
LIME
Perturb locally, fit simple model -- explains individual predictions
SHAP
Shapley values -- feature contributions, model-agnostic, industry standard
Saliency maps
Highlight influential pixels in image classification
GDPR Article 22
Right to explanation for consequential automated decisions
AI Safety
ALIGNMENT PROBLEM -- ensuring AI pursues goals humans actually want, not proxy goals
REWARD HACKING AND SPECIFICATION GAMING AND GOODHART'S LAW
When a measure becomes a target it ceases to be a good measure -- Goodhart's Law
Reward hacking: AI finds unintended ways to maximize reward (boat racing AI spins in circles). Specification gaming: satisfies letter but not spirit of objective. Goodhart's Law: optimize hard for proxy metric and the metric stops reflecting the true goal. RLHF example: model learns text that sounds good to raters rather than text that is true or helpful. Constitutional AI (Anthropic): AI evaluates own outputs against a set of principles.
Reward hacking
AI maximizes reward metric via unintended means
Specification gaming
Satisfies letter but not spirit of objective
Goodhart's Law
Optimize for measure and it stops being a good measure
Constitutional AI
AI evaluates own outputs against principles -- Anthropic's approach
Hallucination
LLMs generate PLAUSIBLE next tokens, not VERIFIED facts -- confident wrongness is a feature, not a bug
FACTUAL AND ATTRIBUTION AND REASONING ERRORS -- FUNDAMENTAL NOT A BUG
Lawyer cited fake ChatGPT-generated case citations to federal court and received sanctions
LLMs predict plausible text continuations -- not verified facts. No internal fact-checker. Types: Factual (wrong dates, names), Attribution (fabricated citations and quotes), Reasoning errors (logical mistakes), Temporal (outdated info as current). Mitigation: RAG (ground in real documents), citation requirements, structured verification, temperature reduction, human review for high-stakes output. Cannot be fully eliminated -- fundamental to how LLMs work.
Retrieve real documents, inject as context -- grounds the answer
Cannot be eliminated
Fundamental to token prediction -- only mitigated
Privacy and AI
DIFFERENTIAL PRIVACY -- add calibrated noise so individual data cannot be inferred from model outputs
MEMORIZATION AND MEMBERSHIP INFERENCE AND FEDERATED LEARNING
Machine unlearning: GDPR right to erasure creates legal obligation to remove training data
LLMs can memorize and regurgitate private information from training data. Membership inference attack: can you tell if someone's data was in the training set? Federated Learning: train locally on each device, only share gradients -- data never leaves the device. Differential Privacy: add calibrated noise -- mathematical guarantee that individuals cannot be identified. Machine unlearning: active research area with no perfect solution yet.
Memorization
LLMs can regurgitate private training data verbatim
Membership inference
Determine if specific data was in the training set
Federated learning
Gradients only shared -- data stays on device (Apple, Google)
Differential privacy
Mathematical guarantee individual cannot be identified from output
AI Regulation
EU leads with rules -- US leads with principles -- China leads with national strategy
EU AI ACT (2024) IS THE FIRST COMPREHENSIVE AI REGULATION GLOBALLY
EU AI Act fines: up to 35 million euros or 7% of global annual revenue
EU AI Act (2024): risk-based approach. Unacceptable risk (banned): social scoring, real-time public biometric surveillance, subliminal manipulation. High risk (strict requirements): employment, education, criminal justice, healthcare, credit. Limited risk: chatbots must disclose they are AI. US: Executive Order on AI Safety (2023), NIST framework, no comprehensive federal law yet. China: generative AI content rules.
Unacceptable risk
Banned -- social scoring, real-time biometric surveillance in public
High risk
Must audit, register, document -- hiring, healthcare, criminal justice
Limited risk
Chatbots must disclose they are AI
Fines
Up to 35M EUR or 7% of global revenue for violations
Societal Impact
JOBS -- some lost, some changed, new ones created -- AI transforms labor, not just replaces it
DISPLACEMENT AND POWER CONCENTRATION AND MISINFORMATION AND SURVEILLANCE
GPT-3 training consumed approximately 500 tons of CO2 equivalent -- equivalent to 60 transatlantic flights
Labor displacement: routine cognitive tasks automated (data entry, customer service, basic analysis). Power concentration: AI requires massive compute, data, talent -- concentrated in few large companies. Misinformation: deepfakes, synthetic media, AI-generated propaganda. Surveillance: facial recognition by authoritarian governments. Environmental cost: GPT-3 training: 500 tons CO2. Inference at scale adds up rapidly.
Labor displacement
Routine cognitive tasks increasingly automated
Power concentration
Compute + data + talent concentrated in few large companies
Deepfakes
AI-generated synthetic media -- video, audio, images
Environmental cost
GPT-3 training: ~500 tons CO2 -- inference at scale adds up
Responsible AI
HUMAN in the LOOP -- keep humans accountable for high-stakes AI decisions
HUMAN-IN-LOOP AND HUMAN-ON-LOOP AND HUMAN-OUT-OF-LOOP -- THREE AUTOMATION LEVELS
Model cards: standardized documentation of capabilities, limitations, and performance across demographic groups
Human-in-the-loop (HITL): human approves every AI decision -- highest oversight, slowest, most costly. Human-on-the-loop (HOTL): AI acts autonomously, human monitors and can intervene. Human-out-of-loop: fully autonomous -- only for very low-stakes decisions with extensive testing. Model cards: document model capabilities, limitations, intended use, performance across groups. Algorithmic auditing: third-party evaluation for bias, safety, accuracy.
HITL
Human approves every decision -- criminal sentencing, medical diagnosis
HOTL
AI acts, human monitors and can intervene
Model cards
Document capabilities, limitations, and demographic performance
Algorithmic auditing
Third-party evaluation for bias, safety, accuracy
AI in Healthcare
DOCTOR plus AI = better outcomes -- but AI errors in medicine can kill
FDA CLEARS 500+ AI MEDICAL DEVICES -- AI AUGMENTS PHYSICIANS NOT REPLACES THEM
AI plus physician consistently outperforms either alone -- augmentation, not replacement
AI successes: diabetic retinopathy screening (matches ophthalmologist), chest X-ray pneumonia detection (CheXNet), skin cancer detection, AlphaFold protein structure. Challenges: distribution shift (fails at hospitals not in training data), rare conditions underrepresented, FDA regulatory approval required (510(k) clearance). Bias: pulse oximeters less accurate on dark skin, dermatology AI trained mostly on light skin.
Fails at hospitals different from training hospitals
FDA clearance
Required -- 510(k) for AI medical devices
AI + physician
Consistently outperforms either alone -- augment, not replace
Copyright and AI
TRAINING on copyrighted data plus GENERATING similar content = active legal battleground
WHO OWNS AI-GENERATED CONTENT -- COURTS ARE STILL DECIDING IN 2025
NYT vs OpenAI and Getty vs Stability AI and major record labels vs Suno and Udio -- all active
Training data: is training on copyrighted text and images fair use? NYT vs OpenAI, Getty vs Stability AI, record labels vs Suno and Udio -- all active cases. AI-generated content ownership: US Copyright Office requires human authorship for copyright protection -- pure AI output cannot be copyrighted. Licensing deals: OpenAI with AP, Axel Springer. Industry response: C2PA watermarking standard.
Training data
Fair use? NYT vs OpenAI, Getty vs Stability AI still pending
AI-generated content
US Copyright Office: requires human authorship -- AI output uncopyrightable
Licensing deals
OpenAI with AP, Axel Springer -- growing trend
C2PA watermarking
Content authenticity standard -- identifies AI-generated content