AI Basics Memory Tricks — Free Artificial Intelligence Mnemonics

⭐ Most Important

AI ⊃ ML ⊃ Deep Learning — each one lives INSIDE the other

THE NESTED CIRCLES — AI IS THE BIGGEST

AI, ML, and Deep Learning are nested, not interchangeable

The three terms are nested, not synonyms. Artificial Intelligence (AI): the broadest category — any technique enabling machines to mimic human intelligence. Machine Learning (ML): a subset of AI where machines learn patterns from data without explicit programming. Deep Learning (DL): a subset of ML using multi-layered neural networks. Key exam point: everything deployed today — ChatGPT, self-driving cars, medical imaging AI — is Narrow AI.

AI

Umbrella — expert systems, search, logic, ML, and more

ML

Subset of AI — learns from data without explicit programming

Deep Learning

Subset of ML — uses multi-layered neural networks

Everything today

Narrow AI — excels at one specific task

🤖 AI Types

GANG — General AI doesn't exist yet. Artificial Narrow AI is all we have. Get it?

ANI · AGI · ASI — THREE CAPABILITY LEVELS

Narrow AI exists today. AGI is the goal. Superintelligence is theoretical.

Artificial Narrow Intelligence (ANI): all current AI — excels at ONE specific task. Cannot transfer skills. ChatGPT writes text but cannot drive a car. Artificial General Intelligence (AGI): hypothetical — human-level ability across ALL tasks. Does not yet exist. Artificial Superintelligence (ASI): theoretical — surpasses human intelligence in every domain. Key point: everything deployed today is Narrow AI.

ANI — Narrow

All current AI. One task only. ChatGPT, AlphaGo, image recognition.

AGI — General

Hypothetical. Human-level across all domains. Does not exist.

ASI — Super

Theoretical. Surpasses humans in every domain.

📚 Learning Types

TEACHER · EXPLORER · REWARD DOG — three stories that lock in the three learning types forever

SUPERVISED · UNSUPERVISED · REINFORCEMENT

Three vivid scenes — one for each ML learning paradigm

Picture three vivid scenes. Supervised: a TEACHER with the answer key — every example has a correct label. Learns to match inputs to outputs (spam filter, image classifier). Unsupervised: an EXPLORER dropped in an unknown jungle with no map — discovers hidden structure with no labels (clustering, topic modeling). Reinforcement: a DOG trained with treats — takes an action, gets a reward or correction, learns what works (game AI, robotics, RLHF for ChatGPT).

Supervised

Teacher with answer key — labeled input-output pairs

Unsupervised

Explorer with no map — find patterns in unlabeled data

Reinforcement

Reward dog — agent maximizes cumulative reward

📉 Bias vs Variance

High Bias = Too Simple (underfits). High Variance = Too Complex (overfits).

THE FUNDAMENTAL ML TRADEOFF

The most tested concept in machine learning theory

Bias: error from wrong assumptions — model too simple, misses real patterns. A linear model on curved data has high bias (underfitting). Variance: error from sensitivity to training noise — model memorizes training data, fails on new data (overfitting). Goal: find the sweet spot — right complexity for the data. Underfitting fix: more complex model, more features. Overfitting fix: more data, regularization, simpler model.

High Bias

Underfitting — poor on BOTH train and test. Model too simple.

High Variance

Overfitting — great on train, poor on test. Memorizes noise.

Sweet spot

Low bias AND low variance — right complexity for your data

🔄 Training Loop

ERROR flows BACKWARDS like a RIVER OF REGRET — backpropagation in one vivid image

FORWARD PREDICT · MEASURE REGRET · FLOW BACKWARD · ADJUST WEIGHTS

How a neural network actually learns — the cycle repeated thousands of times

Visualize a river of regret flowing upstream. Forward pass: data flows forward through the network producing a prediction. Loss: the network measures how wrong it was — this is the regret. Backward pass (backpropagation): the regret flows BACKWARD using the chain rule — assigns blame to each weight. Weight update: each weight is nudged to reduce future regret (gradient descent). Repeat thousands of times. The river of regret gets smaller as the network learns.

Forward pass

Input flows through network → prediction at output

Loss

Measures how wrong the prediction was (MSE, cross-entropy)

Backpropagation

Chain rule carries error signal backward through all layers

Gradient descent

Nudge each weight in direction that reduces loss

🌐 AI History

DADA — Dartmouth (1956), AI Winters, Deep learning breakthrough (2012), AGI era (2017+)

FOUR ERAS OF AI HISTORY

From the 1956 birth of AI to the era of ChatGPT

Dartmouth Conference (1956): John McCarthy coins 'Artificial Intelligence' — field is born. AI Winters (1970s, 1980s): funding collapses when expectations outstrip results — twice. Deep Learning Breakthrough (2012): AlexNet wins ImageNet by a massive margin using deep CNNs and GPUs — modern AI era begins. LLM Era (2017–present): Transformers → BERT → GPT → ChatGPT → explosion of generative AI. Key names: Turing (test), McCarthy (AI term), Hinton/LeCun/Bengio (deep learning pioneers — 2018 Turing Award).

1956

Dartmouth Conference — AI field officially born

1970s/80s

Two AI Winters — funding collapses twice

2012

AlexNet wins ImageNet — deep learning revolution begins

2017+

Transformers paper → BERT → GPT → ChatGPT → now

🌡️ LLM Vocabulary

TOKEN · TEMPERATURE · CONTEXT WINDOW · HALLUCINATION — four LLM terms you must know cold

THE VOCABULARY OF LARGE LANGUAGE MODELS

What these four terms mean and why each matters

Token: the unit of text an LLM processes — roughly ¾ of a word. Temperature: controls randomness — 0.0 = deterministic/predictable, 1.0+ = creative/random. Context window: maximum tokens the model can see at once — Claude: 200K, Gemini 1.5: 1M. Hallucination: when an LLM confidently states false information — a fundamental limitation, not a bug. Models predict plausible next tokens, not verified facts.

Token

~3/4 of a word. 750 words ≈ 1000 tokens. Affects cost and context.

Temperature

0=deterministic, 0.7=balanced, 1+=creative/random

Context window

All text the model can see at once — everything outside is forgotten

Hallucination

Confident false statements — predict plausible, not verified

⚖️ AI Ethics

FATE — Fairness, Accountability, Transparency, Ethics — and picture a BFT: Big Friendly Transparent robot

THE FOUR PILLARS OF RESPONSIBLE AI

FATE is the framework. BFT (Big Friendly Transparent) is the robot that embodies it.

Fairness: AI should not discriminate based on protected characteristics. Accountability: when AI causes harm, who is responsible? Transparency (Explainability): can we understand why the model made a decision? Ethics: broader societal impact — jobs, privacy, power, autonomy, surveillance. These principles are in tension — no pure algorithmic solution. EU AI Act (2024) is the first comprehensive AI regulation globally.

Fairness

No discrimination based on race, gender, age, disability

Accountability

Who is responsible when AI causes harm?

Transparency

Can we explain why the model made a decision? (XAI)

Ethics

Societal impact — jobs, privacy, power, regulation

📋 AI Task Types

CRAG — Classification, Regression, And Generation — match the task to the right algorithm

FOUR CORE AI TASK TYPES

Every AI problem fits into one of these four categories

Classification: predict a category (spam/not spam, cat/dog). Output = discrete label. Regression: predict a number (house price, temperature). Output = continuous value. Clustering: group similar items without labels (customer segments). Unsupervised. Generation: produce new content (text, images, code). LLMs and diffusion models. Knowing the task type immediately narrows which algorithms, loss functions, and evaluation metrics are appropriate.

Classification

Predict a category — discrete label output

Regression

Predict a number — continuous value output

Clustering

Group similar items — unsupervised, no labels

Generation

Produce new content — text, images, code, audio

🔢 Math Foundations

LAMP — Linear algebra, Analysis (calculus), Matrix ops, Probability — the four math pillars of AI

THE MATHEMATICS BEHIND EVERY AI SYSTEM

You don't need to master all four — but you need to recognize what each contributes

Linear Algebra: vectors (data points), matrices (weight matrices), dot products (similarity), eigenvalues (PCA). Calculus: derivatives (gradients), chain rule (backpropagation), gradient descent finds minimum of loss function. Matrix Operations: matrix multiplication is the core operation of neural networks — GPUs are optimized for it. Probability: Bayes theorem, probability distributions, expected value — underpins everything from Naive Bayes to diffusion models.

Linear Algebra

Vectors, matrices, dot products, eigenvalues — data representation

Calculus

Derivatives, chain rule, gradient descent — the learning engine

Matrix Ops

Core of neural networks — GPUs optimized for matrix multiply

Probability

Distributions, Bayes theorem — underpins all of ML

🚀 AI Ecosystem

FHOT — Frameworks, Hardware, Open-source models, Tools — the modern AI development stack

PYTORCH · TENSORFLOW · HUGGING FACE · SCIKIT-LEARN

What each major tool is used for and when to reach for it

Frameworks: PyTorch (dominant in research and industry), TensorFlow/Keras (Google, production deployment). Hardware: NVIDIA GPUs (CUDA), Google TPUs, Apple Silicon. Open-source models: Hugging Face Hub — 500K+ pretrained models (BERT, Llama, Stable Diffusion). Libraries: scikit-learn (classical ML), NumPy/Pandas (data), Matplotlib (visualization). For beginners: scikit-learn for ML fundamentals, PyTorch + Hugging Face for deep learning.

PyTorch

Dynamic graphs, Pythonic — dominant in research and industry

scikit-learn

Classical ML — regression, trees, clustering, evaluation

Hugging Face

500K+ pretrained models — BERT, Llama, Stable Diffusion

NumPy/Pandas

Data manipulation — the foundation of all ML data work

🔮 AI Capabilities

PARVS — Perception, Autonomous action, Reasoning, Vision, Speech — what AI can do today

WHAT AI IS GENUINELY GOOD AT — AND WHERE IT STILL STRUGGLES

Know the real strengths and limits so you are never surprised

AI is genuinely excellent at: pattern recognition in images and audio, language generation and translation, game playing, protein structure prediction (AlphaFold), recommendation systems, code generation, speech recognition. AI still struggles with: robust common-sense reasoning, reliable arithmetic and logic, consistent long-term planning, physical world understanding, verified factual accuracy. Key insight: AI excels at interpolating within its training distribution — it struggles when situations require true generalization beyond what it has seen.

AI excels at

Pattern recognition, generation, game playing, protein folding

AI struggles with

Common sense, arithmetic, long-term planning, novel situations

Key insight

Interpolates within training distribution — struggles outside it

Memory tricks that make AI, ML & Deep Learning click

Memory Tricks