Memory tricks that make AI, ML & Deep Learning click
From narrow AI to the river of regret โ these memory tricks lock in the foundational concepts, key distinctions, and vocabulary every AI student needs to know cold.
Or continue to the sub-topics below for more specialized Study Rooms and Forums
๐ง AI Basics
Memory Tricks
Proven Mnemonics & Acronyms — fast to learn, hard to forget.
โญ Most Important
AI โ ML โ Deep Learning โ each one lives INSIDE the other
THE NESTED CIRCLES โ AI IS THE BIGGEST
AI, ML, and Deep Learning are nested, not interchangeable
The three terms are nested, not synonyms. Artificial Intelligence (AI): the broadest category โ any technique enabling machines to mimic human intelligence. Machine Learning (ML): a subset of AI where machines learn patterns from data without explicit programming. Deep Learning (DL): a subset of ML using multi-layered neural networks. Key exam point: everything deployed today โ ChatGPT, self-driving cars, medical imaging AI โ is Narrow AI.
AI
Umbrella โ expert systems, search, logic, ML, and more
ML
Subset of AI โ learns from data without explicit programming
Deep Learning
Subset of ML โ uses multi-layered neural networks
Everything today
Narrow AI โ excels at one specific task
๐ค AI Types
GANG โ General AI doesn't exist yet. Artificial Narrow AI is all we have. Get it?
ANI ยท AGI ยท ASI โ THREE CAPABILITY LEVELS
Narrow AI exists today. AGI is the goal. Superintelligence is theoretical.
Artificial Narrow Intelligence (ANI): all current AI โ excels at ONE specific task. Cannot transfer skills. ChatGPT writes text but cannot drive a car. Artificial General Intelligence (AGI): hypothetical โ human-level ability across ALL tasks. Does not yet exist. Artificial Superintelligence (ASI): theoretical โ surpasses human intelligence in every domain. Key point: everything deployed today is Narrow AI.
ANI โ Narrow
All current AI. One task only. ChatGPT, AlphaGo, image recognition.
AGI โ General
Hypothetical. Human-level across all domains. Does not exist.
ASI โ Super
Theoretical. Surpasses humans in every domain.
๐ Learning Types
TEACHER ยท EXPLORER ยท REWARD DOG โ three stories that lock in the three learning types forever
SUPERVISED ยท UNSUPERVISED ยท REINFORCEMENT
Three vivid scenes โ one for each ML learning paradigm
Picture three vivid scenes. Supervised: a TEACHER with the answer key โ every example has a correct label. Learns to match inputs to outputs (spam filter, image classifier). Unsupervised: an EXPLORER dropped in an unknown jungle with no map โ discovers hidden structure with no labels (clustering, topic modeling). Reinforcement: a DOG trained with treats โ takes an action, gets a reward or correction, learns what works (game AI, robotics, RLHF for ChatGPT).
Supervised
Teacher with answer key โ labeled input-output pairs
Unsupervised
Explorer with no map โ find patterns in unlabeled data
Reinforcement
Reward dog โ agent maximizes cumulative reward
๐ Bias vs Variance
High Bias = Too Simple (underfits). High Variance = Too Complex (overfits).
THE FUNDAMENTAL ML TRADEOFF
The most tested concept in machine learning theory
Bias: error from wrong assumptions โ model too simple, misses real patterns. A linear model on curved data has high bias (underfitting). Variance: error from sensitivity to training noise โ model memorizes training data, fails on new data (overfitting). Goal: find the sweet spot โ right complexity for the data. Underfitting fix: more complex model, more features. Overfitting fix: more data, regularization, simpler model.
High Bias
Underfitting โ poor on BOTH train and test. Model too simple.
High Variance
Overfitting โ great on train, poor on test. Memorizes noise.
Sweet spot
Low bias AND low variance โ right complexity for your data
๐ Training Loop
ERROR flows BACKWARDS like a RIVER OF REGRET โ backpropagation in one vivid image
How a neural network actually learns โ the cycle repeated thousands of times
Visualize a river of regret flowing upstream. Forward pass: data flows forward through the network producing a prediction. Loss: the network measures how wrong it was โ this is the regret. Backward pass (backpropagation): the regret flows BACKWARD using the chain rule โ assigns blame to each weight. Weight update: each weight is nudged to reduce future regret (gradient descent). Repeat thousands of times. The river of regret gets smaller as the network learns.
Forward pass
Input flows through network โ prediction at output
Loss
Measures how wrong the prediction was (MSE, cross-entropy)
Backpropagation
Chain rule carries error signal backward through all layers
Gradient descent
Nudge each weight in direction that reduces loss
๐ AI History
DADA โ Dartmouth (1956), AI Winters, Deep learning breakthrough (2012), AGI era (2017+)
FOUR ERAS OF AI HISTORY
From the 1956 birth of AI to the era of ChatGPT
Dartmouth Conference (1956): John McCarthy coins 'Artificial Intelligence' โ field is born. AI Winters (1970s, 1980s): funding collapses when expectations outstrip results โ twice. Deep Learning Breakthrough (2012): AlexNet wins ImageNet by a massive margin using deep CNNs and GPUs โ modern AI era begins. LLM Era (2017โpresent): Transformers โ BERT โ GPT โ ChatGPT โ explosion of generative AI. Key names: Turing (test), McCarthy (AI term), Hinton/LeCun/Bengio (deep learning pioneers โ 2018 Turing Award).
1956
Dartmouth Conference โ AI field officially born
1970s/80s
Two AI Winters โ funding collapses twice
2012
AlexNet wins ImageNet โ deep learning revolution begins
2017+
Transformers paper โ BERT โ GPT โ ChatGPT โ now
๐ก๏ธ LLM Vocabulary
TOKEN ยท TEMPERATURE ยท CONTEXT WINDOW ยท HALLUCINATION โ four LLM terms you must know cold
THE VOCABULARY OF LARGE LANGUAGE MODELS
What these four terms mean and why each matters
Token: the unit of text an LLM processes โ roughly ยพ of a word. Temperature: controls randomness โ 0.0 = deterministic/predictable, 1.0+ = creative/random. Context window: maximum tokens the model can see at once โ Claude: 200K, Gemini 1.5: 1M. Hallucination: when an LLM confidently states false information โ a fundamental limitation, not a bug. Models predict plausible next tokens, not verified facts.
Token
~3/4 of a word. 750 words โ 1000 tokens. Affects cost and context.
Temperature
0=deterministic, 0.7=balanced, 1+=creative/random
Context window
All text the model can see at once โ everything outside is forgotten
Hallucination
Confident false statements โ predict plausible, not verified
โ๏ธ AI Ethics
FATE โ Fairness, Accountability, Transparency, Ethics โ and picture a BFT: Big Friendly Transparent robot
THE FOUR PILLARS OF RESPONSIBLE AI
FATE is the framework. BFT (Big Friendly Transparent) is the robot that embodies it.
Fairness: AI should not discriminate based on protected characteristics. Accountability: when AI causes harm, who is responsible? Transparency (Explainability): can we understand why the model made a decision? Ethics: broader societal impact โ jobs, privacy, power, autonomy, surveillance. These principles are in tension โ no pure algorithmic solution. EU AI Act (2024) is the first comprehensive AI regulation globally.
Fairness
No discrimination based on race, gender, age, disability
Accountability
Who is responsible when AI causes harm?
Transparency
Can we explain why the model made a decision? (XAI)
CRAG โ Classification, Regression, And Generation โ match the task to the right algorithm
FOUR CORE AI TASK TYPES
Every AI problem fits into one of these four categories
Classification: predict a category (spam/not spam, cat/dog). Output = discrete label. Regression: predict a number (house price, temperature). Output = continuous value. Clustering: group similar items without labels (customer segments). Unsupervised. Generation: produce new content (text, images, code). LLMs and diffusion models. Knowing the task type immediately narrows which algorithms, loss functions, and evaluation metrics are appropriate.
Classification
Predict a category โ discrete label output
Regression
Predict a number โ continuous value output
Clustering
Group similar items โ unsupervised, no labels
Generation
Produce new content โ text, images, code, audio
๐ข Math Foundations
LAMP โ Linear algebra, Analysis (calculus), Matrix ops, Probability โ the four math pillars of AI
THE MATHEMATICS BEHIND EVERY AI SYSTEM
You don't need to master all four โ but you need to recognize what each contributes
Linear Algebra: vectors (data points), matrices (weight matrices), dot products (similarity), eigenvalues (PCA). Calculus: derivatives (gradients), chain rule (backpropagation), gradient descent finds minimum of loss function. Matrix Operations: matrix multiplication is the core operation of neural networks โ GPUs are optimized for it. Probability: Bayes theorem, probability distributions, expected value โ underpins everything from Naive Bayes to diffusion models.
Linear Algebra
Vectors, matrices, dot products, eigenvalues โ data representation
Calculus
Derivatives, chain rule, gradient descent โ the learning engine
Matrix Ops
Core of neural networks โ GPUs optimized for matrix multiply
Probability
Distributions, Bayes theorem โ underpins all of ML
๐ AI Ecosystem
FHOT โ Frameworks, Hardware, Open-source models, Tools โ the modern AI development stack
PYTORCH ยท TENSORFLOW ยท HUGGING FACE ยท SCIKIT-LEARN
What each major tool is used for and when to reach for it
Frameworks: PyTorch (dominant in research and industry), TensorFlow/Keras (Google, production deployment). Hardware: NVIDIA GPUs (CUDA), Google TPUs, Apple Silicon. Open-source models: Hugging Face Hub โ 500K+ pretrained models (BERT, Llama, Stable Diffusion). Libraries: scikit-learn (classical ML), NumPy/Pandas (data), Matplotlib (visualization). For beginners: scikit-learn for ML fundamentals, PyTorch + Hugging Face for deep learning.
PyTorch
Dynamic graphs, Pythonic โ dominant in research and industry
scikit-learn
Classical ML โ regression, trees, clustering, evaluation
Data manipulation โ the foundation of all ML data work
๐ฎ AI Capabilities
PARVS โ Perception, Autonomous action, Reasoning, Vision, Speech โ what AI can do today
WHAT AI IS GENUINELY GOOD AT โ AND WHERE IT STILL STRUGGLES
Know the real strengths and limits so you are never surprised
AI is genuinely excellent at: pattern recognition in images and audio, language generation and translation, game playing, protein structure prediction (AlphaFold), recommendation systems, code generation, speech recognition. AI still struggles with: robust common-sense reasoning, reliable arithmetic and logic, consistent long-term planning, physical world understanding, verified factual accuracy. Key insight: AI excels at interpolating within its training distribution โ it struggles when situations require true generalization beyond what it has seen.
AI excels at
Pattern recognition, generation, game playing, protein folding
AI struggles with
Common sense, arithmetic, long-term planning, novel situations
Key insight
Interpolates within training distribution โ struggles outside it