Robot AI -- AI Algorithms

Memory tricks that make search, optimization and classic AI click

From A-star search to genetic algorithms -- these memory tricks lock in the classical AI algorithms, optimization methods, and ensemble techniques that power modern AI systems.

Community
📋
AI Algorithms Forum
Ask questions · Share tricks
💬
AI Algorithms Study Room
Live · Study together now

Or continue to the sub-topics below for more specialized Study Rooms and Forums

AI Algorithms

Memory Tricks

Proven Mnemonics & Acronyms — fast to learn, hard to forget.

Search Algorithms
A* = g(n) + h(n) -- actual cost PLUS heuristic estimate to goal
BFS AND DFS AND UNIFORM COST AND A-STAR
A-star is optimal if the heuristic is admissible -- never overestimates the true cost
BFS (queue, level-by-level): complete and optimal for unweighted graphs. DFS (stack, goes deep first): memory efficient but not optimal. Uniform Cost Search: optimal for weighted graphs. A* Search: f(n) = g(n) + h(n). g(n) = actual cost from start. h(n) = heuristic estimate to goal. Optimal if heuristic is admissible (never overestimates). Used in GPS navigation and game pathfinding.
BFS
Queue (FIFO), level-by-level, complete, optimal for unweighted
DFS
Stack (LIFO), memory efficient, NOT optimal
A* f(n)=g(n)+h(n)
Optimal if heuristic admissible (never overestimates)
Admissible examples
Straight-line distance, misplaced tiles in 8-puzzle
Minimax
MINIMAX -- MAXimize your score, MINimize opponent score at alternating levels
ASSUMES OPTIMAL OPPONENT -- ALWAYS PLAN FOR WORST CASE
Alpha-beta pruning eliminates branches that cannot affect the final decision -- free speed
Minimax for two-player zero-sum games (chess, tic-tac-toe). MAX player (you) chooses action maximizing score. MIN player (opponent) minimizes your score. Alternate MAX and MIN levels. Alpha-Beta Pruning: prune branches where alpha is >= beta -- same result, dramatically fewer nodes. Reduces from O(b^d) to O(b^(d/2)) in best case. AlphaZero: minimax plus deep learning plus self-play -- beat world champions.
Minimax
Alternate MAX (your move) and MIN (opponent move) levels
Alpha-beta pruning
Prune when alpha >= beta -- same result, far fewer nodes
AlphaZero
Minimax + deep learning + self-play -- beat Go, chess, shogi
CSP
CSP = Variables, Domains, Constraints must all be satisfied simultaneously
SUDOKU AND MAP COLORING AND SCHEDULING ARE ALL CSPs
MRV heuristic -- assign the most constrained variable first -- fails early and saves time
Variables (cells, regions), Domains (digits 1-9, colors), Constraints (no repeats, adjacent regions differ). Backtracking: assign one variable at a time, backtrack when constraint violated. MRV (Minimum Remaining Values): assign most constrained variable first. AC-3 Arc Consistency: preprocessing that removes domain values with no compatible assignment.
Variables
What needs to be assigned (cells, regions, time slots)
Domains
Possible values for each variable
Constraints
Rules that must hold (no repeats, adjacency, order)
MRV heuristic
Assign most constrained variable first -- catches failures early
Genetic Algorithms
SELECT, CROSS, MUTATE -- evolution as an optimization algorithm
POPULATION AND FITNESS AND SELECTION AND CROSSOVER AND MUTATION
Mutation maintains diversity -- without it the population converges to a local optimum
Population: candidate solutions. Fitness function: evaluate solution quality. Selection: fitter solutions more likely to reproduce. Crossover: combine two parents to produce offspring. Mutation: randomly alter part of a solution -- maintains diversity, avoids local optima. Repeat for many generations. Used when search space is huge or gradient is unavailable.
Population
Set of candidate solutions
Fitness function
Evaluates how good each solution is
Crossover
Combine parents -- offspring inherits from both
Mutation
Random change -- prevents convergence to local optima
Bayesian Networks
BAYES NET -- probabilistic graphical model where ARROWS MEAN CAUSES
DIRECTED ACYCLIC GRAPH WITH CONDITIONAL PROBABILITY TABLES
Given its parents, a node is conditionally independent of all non-descendants
Directed Acyclic Graph (DAG). Nodes = random variables. Edges = direct probabilistic influence (X to Y means X directly influences Y). Each node stores a Conditional Probability Table (CPT) given its parents. Applications: medical diagnosis, spam filtering, fault diagnosis. Inference: given evidence, compute probability of query variable. Inference is NP-hard in general.
Nodes
Random variables (symptoms, diseases, sensor readings)
Edges
X to Y means X directly influences Y
CPT
P(node|parents) -- conditional probability table at each node
Inference
Given evidence, compute P(query) -- NP-hard in general
Gradient Descent
GRADIENT DESCENT goes DOWNHILL -- like a ball rolling to the lowest point
BATCH AND SGD AND MINI-BATCH AND ADAM -- FOUR VARIANTS
Learning rate is the most critical hyperparameter -- too high diverges, too low is very slow
Batch GD: gradient on entire dataset -- stable, slow. SGD: gradient on one example -- fast, very noisy. Mini-batch (32-256 samples): standard in deep learning -- best of both worlds. Adam: adaptive per-parameter learning rate -- most popular optimizer, works with defaults (lr=0.001). Momentum: adds inertia, avoids oscillation, escapes saddle points.
Batch GD
Entire dataset per update -- stable but slow
SGD
One example per update -- fast, noisy, can escape local optima
Mini-batch (32-256)
Standard in deep learning -- best compromise
Adam
Adaptive per-parameter LR -- use by default, start lr=0.001
Ensemble Methods
BAG and BOOST -- Bagging = parallel, Boosting = sequential error correction
RANDOM FOREST (BAGGING) AND XGBOOST (BOOSTING) AND STACKING
Ensembles consistently outperform single models -- they win Kaggle competitions
Bagging: parallel independent models on random subsets, average predictions -- reduces variance. Random Forest = bagging of decision trees. Boosting: sequential, each model corrects previous errors -- reduces bias. XGBoost and LightGBM = gradient boosting. Stacking: meta-model learns to combine base model outputs. No Free Lunch: try ensembles before complex single models.
Bagging
Parallel, independent -- reduces variance (overfitting)
Boosting
Sequential, corrects prior errors -- reduces bias (underfitting)
Random Forest
Bagging of decision trees -- reliable off-the-shelf algorithm
XGBoost
Gradient boosting -- most powerful for tabular data
PCA
PCA finds the DIRECTIONS OF MAXIMUM VARIANCE in the data
REDUCE DIMENSIONS WHILE PRESERVING THE MOST IMPORTANT STRUCTURE
Always standardize data before PCA -- large-scale features will otherwise dominate
PCA finds orthogonal directions (principal components) of maximum variance. PC1: greatest variance direction. Project data onto top k components. Scree plot: variance explained vs k -- choose where curve elbows (~95% variance kept). Always standardize first. t-SNE and UMAP: nonlinear alternatives for visualization only.
PC1
Direction of greatest variance in the data
Scree plot
Variance explained vs k -- pick the elbow (~95% total)
Standardize first
Large-scale features dominate without standardization
t-SNE and UMAP
Nonlinear -- visualization only, not for feature extraction
Monte Carlo
SAMPLE then AVERAGE -- Monte Carlo estimates by running many random simulations
LAW OF LARGE NUMBERS: RANDOM SAMPLE AVERAGE CONVERGES TO TRUE VALUE
Error decreases as 1 over sqrt(N) -- quadruple samples to halve the error
Run many random simulations, average results -- converges to true value as samples increase. Monte Carlo Tree Search (MCTS): simulate random game playouts from each position to guide search -- used in AlphaGo. MCMC (Markov Chain Monte Carlo): sample from complex probability distributions -- used in Bayesian inference.
Core idea
Many random samples, take average, converges to true value
MCTS
Game AI -- simulate playouts to estimate position value
MCMC
Sample from complex distributions -- Bayesian inference
Error rate
Decreases as 1/sqrt(N) -- 4x samples to halve error
Association Rules
SUPPORT, CONFIDENCE, LIFT -- the three metrics that make bread-to-butter meaningful
MARKET BASKET ANALYSIS -- ITEMS THAT FREQUENTLY APPEAR TOGETHER
Apriori uses anti-monotonicity -- if A,B is infrequent then A,B,C must also be infrequent
Support: how often do X and Y appear together? Confidence (A to B): P(B given A). Lift: is the relationship stronger than chance? Lift greater than 1 = positive association beyond chance. Apriori: uses anti-monotonicity to prune itemset search efficiently. Applications: recommendations, fraud detection, drug interactions.
Support
Frequency of itemset -- how common is the pair?
Confidence (A to B)
P(B|A) -- given A, how often is B also present?
Lift greater than 1
Items appear together more than expected by chance
Apriori
Prune by anti-monotonicity -- efficient frequent itemset mining
Dynamic Programming
DIVIDE into subproblems, STORE solutions, REUSE -- never solve the same subproblem twice
OPTIMAL SUBSTRUCTURE PLUS OVERLAPPING SUBPROBLEMS = USE DP
Viterbi algorithm uses DP for HMMs -- speech recognition and POS tagging both use it
DP solves complex problems by breaking into overlapping subproblems, solving each once, storing results (memoization). Two conditions: optimal substructure and overlapping subproblems. Top-down (memoization): recursive with caching. Bottom-up (tabulation): iterative. Viterbi algorithm: DP for finding most likely sequence in a Hidden Markov Model -- used in speech recognition and POS tagging.
Optimal substructure
Optimal solution contains optimal sub-solutions
Overlapping subproblems
Same subproblems solved repeatedly in naive recursion
Memoization
Store results -- top-down recursive approach
Viterbi
DP for HMM sequences -- speech recognition and POS tagging
0
Correct
0
Missed
0
Remaining
What does this mean / stand for?
0
Correct
0
Wrong
0
Remaining
© 2026 MemoryTricks 🧠