A4.1.1 Describe machine learning algorithms (AO2)

A4.1.1_1 Algorithms: DL, RL, supervised, TL, UL

Deep Learning (DL)

  • Uses neural networks with multiple layers to model complex patterns
  • Requires significant computational power and data
  • Excels in handling unstructured data (e.g., images, audio)
  • Example: CNNs for image recognition

Reinforcement Learning (RL)

  • Agent learns by interacting with environment
  • Optimizes actions based on rewards or penalties
  • Trial-and-error approach
  • Example: Q-learning for game-playing AI

Supervised Learning

  • Trains models on labeled data to predict outcomes
  • Uses input-output pairs
  • Includes classification (categorical) and regression (continuous)
  • Example: Linear regression for house prices

Transfer Learning (TL)

  • Reuses pre-trained model on new, related task
  • Fine-tunes for specific data
  • Reduces training time and data needs
  • Example: Fine-tuning BERT for sentiment analysis

Unsupervised Learning (UL)

  • Finds patterns in unlabeled data
  • No predefined outputs
  • Includes clustering and dimensionality reduction
  • Example: K-means for customer segmentation

A4.1.1_2 Characteristics of each approach

Algorithm Strengths Weaknesses
Deep Learning Handles complex, high-dimensional data; excels in image and speech recognition Computationally intensive; requires large datasets; less interpretable
Reinforcement Learning Effective for dynamic environments and sequential tasks Slow learning; requires well-defined reward functions; sensitive to environment changes
Supervised Learning Accurate with sufficient labeled data; straightforward for defined tasks Relies on quality labeled data; less effective for unstructured data without preprocessing
Transfer Learning Leverages existing models; reduces training time and data requirements Limited by similarity between pre-trained and target tasks; potential overfitting
Unsupervised Learning Discovers hidden patterns without labels; flexible for exploratory tasks Results may be less actionable; harder to evaluate without ground truth

A4.1.1_3 Applications: market basket analysis, medical imaging, NLP, object detection, robotics, sentiment analysis

Market Basket Analysis

  • Algorithm: Unsupervised Learning (association rule mining)
  • Use: Identifies items frequently purchased together
  • Example: Apriori algorithm finding bread and butter often bought together

Medical Imaging

  • Algorithm: Deep Learning (CNNs)
  • Use: Analyzes scans to detect diseases
  • Example: CNN identifying tumors in MRI images

Natural Language Processing

  • Algorithm: Deep Learning, Transfer Learning
  • Use: Processes text for translation or chatbots
  • Example: BERT for sentiment analysis in reviews

Object Detection

  • Algorithm: Deep Learning (YOLO, Faster R-CNN)
  • Use: Identifies and locates objects in images
  • Example: Autonomous vehicles detecting pedestrians

Robotics

  • Algorithm: Reinforcement Learning, Deep Learning
  • Use: Trains robots for navigation or manipulation
  • Example: RL training robotic arm to pick objects

Sentiment Analysis

  • Algorithm: Supervised Learning, Transfer Learning
  • Use: Determines emotional tone in text
  • Example: Classifying social media posts as positive/negative

A4.1.2 Describe machine learning hardware requirements (AO2)

A4.1.2_1 Configurations for processing, storage, scalability

Processing

  • Requires high computational power for training and inference
  • Configurations include CPUs, GPUs, or specialized hardware like TPUs
  • Example: Training deep neural networks requires GPUs with thousands of cores (e.g., Nvidia A100)

Storage

  • Large datasets demand high-capacity, fast-access storage
  • Configurations include SSDs for quick retrieval or HDDs for archival data
  • Example: 1 million images may require terabytes of SSD storage

Scalability

  • Hardware must scale to handle increasing data sizes
  • Through distributed systems or cloud infrastructure
  • Example: AWS EC2 instances scaling GPU resources for larger ML workloads

A4.1.2_2 Range: laptops to advanced infrastructure

Laptops

  • Suitable for small-scale ML tasks
  • Prototyping or inference with pre-trained models
  • Limited by lower processing power (e.g., 4–8 CPU cores, entry-level GPUs)
  • Example: Running sentiment analysis on laptop with 16 GB RAM and Intel i7

Advanced Infrastructure

  • High-performance systems for large-scale training
  • GPU clusters, supercomputers, or cloud platforms
  • Supports complex models with massive datasets
  • Example: Google Cloud's TPU clusters training large language models

A4.1.2_3 Infrastructure: ASICs, edge devices, FPGAs, GPUs, TPUs, cloud, HPC

ASICs

  • Application-Specific Integrated Circuits
  • Custom chips for specific ML tasks
  • Example: Google's TPU optimized for TensorFlow

Edge Devices

  • Lightweight hardware for ML inference at edge
  • Limited processing, suitable for real-time tasks
  • Example: Raspberry Pi for object detection in smart camera

FPGAs

  • Field-Programmable Gate Arrays
  • Reconfigurable hardware for flexible ML tasks
  • Example: Xilinx FPGAs in custom ML pipelines

GPUs

  • Graphics Processing Units
  • Parallel processing for training and inference
  • Example: Nvidia RTX 3090 for training CNNs

TPUs

  • Tensor Processing Units
  • Google's ASICs optimized for tensor operations
  • Example: Used in Google Cloud for neural network training

Cloud

  • Scalable, on-demand infrastructure
  • AWS, Azure, Google Cloud for ML training
  • Example: AWS SageMaker for distributed training

HPC

  • High-Performance Computing
  • Supercomputers for massive ML tasks
  • Example: Oak Ridge Summit for scientific ML simulations

A4.1.2_4 Application-specific hardware needs

Application Type Hardware Requirements Example
Small-Scale Applications
(e.g., simple regression)
Laptops or entry-level servers with CPUs or modest GPUs Laptop with Intel i5 and 8 GB RAM for linear regression
Deep Learning
(e.g., image recognition, NLP)
GPUs, TPUs, or cloud-based GPU clusters for parallel processing Nvidia A100 GPUs for training CNN on millions of images
Real-Time Inference
(e.g., autonomous vehicles, IoT)
Edge devices or FPGAs for low-latency, low-power processing Jetson Nano for object detection in drones
Big Data Analytics
(e.g., market basket analysis)
Cloud or HPC clusters with high-capacity storage and distributed processing Apache Spark on AWS EMR for analyzing transaction data

A4.2.1 Describe data cleaning significance (AO2) HL

A4.2.1_1 Impact on model performance

Significance

  • Data cleaning removes errors, inconsistencies, and irrelevant data
  • Ensures high-quality input for machine learning models
  • Clean data improves accuracy, reduces bias, and prevents misleading predictions

Impact on Performance

  • Accuracy: Dirty data leads to incorrect predictions or overfitting
  • Example: Inconsistent square footage data produces unreliable house price predictions
  • Efficiency: Clean data reduces processing overhead
  • Example: Removing duplicates speeds up training
  • Generalization: Clean data helps models generalize better to new data
  • Example: Correcting mislabeled categories improves model robustness

A4.2.1_2 Techniques: handle outliers, duplicates, incorrect/irrelevant data, transform formats, impute/delete/predict missing data

Handle Outliers

  • Identify and address deviant data points
  • Using statistical methods like z-scores or IQR
  • Example: Removing $1 billion house price in typical $100K-$500K dataset

Handle Duplicates

  • Remove or merge identical records
  • Prevents bias in model training
  • Example: Deleting duplicate customer entries in sales dataset

Incorrect/Irrelevant Data

  • Correct errors or invalid values
  • Remove data irrelevant to the task
  • Example: Fixing invalid date "2025-13-01" or removing "Notes" column

Transform Formats

  • Standardize data formats for consistency
  • Ensure consistent dates, units, or encodings
  • Example: Converting all dates to YYYY-MM-DD format

Impute Missing Data

  • Fill missing values using mean/median, mode
  • Or interpolation methods
  • Example: Replacing missing ages with average age

Delete Missing Data

  • Remove records with missing values
  • When minimal or non-critical
  • Example: Dropping rows with missing grades in small dataset

Predict Missing Data

  • Use ML models to estimate missing values
  • Based on other features
  • Example: Predicting missing income using regression based on age/occupation

A4.2.1_3 Normalization, standardization as preprocessing

Normalization

  • Scales data to fixed range [0, 1]
  • Ensures features contribute equally to model
  • Formula: (x - min(x)) / (max(x) - min(x))
  • Example: Normalizing house prices $100K-$1M to [0,1]
  • Purpose: Prevents larger scale features from dominating

Standardization

  • Transforms data to have mean of 0, SD of 1
  • Improves convergence for certain algorithms
  • Formula: (x - mean(x)) / std(x)
  • Example: Standardizing test scores for comparison
  • Purpose: Helps algorithms assuming normal distribution

Significance

  • Both techniques ensure comparable feature scales
  • Improving model performance and training stability
  • Example: In dataset with age (20-80) and income ($20K-$200K), normalization ensures balanced influence

A4.2.2 Describe feature selection role (AO2) HL

A4.2.2_1 Identify, retain informative attributes

Role of Feature Selection

  • Identifies and retains most relevant attributes in dataset
  • Contributes to accurate predictions in ML models
  • Reduces irrelevant or redundant features to improve performance
  • Enhances model interpretability

Identification Process

  • Evaluates features based on correlation with target variable
  • Or predictive power
  • Example: In house price prediction, selecting square footage and location while discarding house color

Retention Benefits

  • Enhances model accuracy by focusing on strong predictive features
  • Reduces noise from irrelevant features
  • Improves model generalization
  • Example: Retaining Age and Income for credit risk model as they correlate with repayment ability

A4.2.2_2 Strategies: filter, wrapper, embedded methods

Filter Methods

  • Select features based on statistical measures
  • Independent of ML model
  • Uses metrics like correlation, chi-square, mutual information
  • Example: Pearson correlation to select features correlated with house prices
  • Advantage: Computationally efficient
  • Disadvantage: Ignores feature interactions

Wrapper Methods

  • Evaluate feature subsets by training and testing model
  • Selects subset with best performance
  • Uses algorithms like recursive feature elimination (RFE)
  • Example: RFE with decision tree for customer churn prediction
  • Advantage: Considers feature interactions
  • Disadvantage: Computationally expensive

Embedded Methods

  • Perform feature selection during model training
  • Using model-specific criteria
  • Built into algorithms like Lasso regression or decision trees
  • Example: Lasso regression selecting features by assigning zero weights
  • Advantage: Balances efficiency and relevance
  • Disadvantage: Limited to specific algorithms

A4.2.3 Describe dimensionality reduction importance (AO2) HL

A4.2.3_1 Address overfitting, complexity, sparsity, distance metrics, sample size, visualization, memory

Overfitting

  • High-dimensional data increases risk of learning noise
  • Dimensionality reduction removes irrelevant features
  • Improves model generalization
  • Example: Reducing features prevents overfitting to unique IDs

Complexity

  • High dimensions increase computational complexity
  • Slows training and inference
  • Reduction lowers processing time
  • Example: Simplifying 100-feature dataset speeds up neural network

Sparsity

  • High-dimensional data often has many zero/missing values
  • Complicates analysis
  • Reduction creates denser representations
  • Example: Compressing sparse text data for classification

Distance Metrics

  • In high dimensions, distance becomes less meaningful
  • Reduction ensures reliable calculations
  • Improves algorithms like k-NN
  • Example: Reducing dimensions improves similarity in recommendation systems

Visualization

  • High-dimensional data is hard to visualize
  • Reduction enables 2D/3D representation
  • Aids interpretation
  • Example: Reducing many features to two or three summary dimensions for plotting

Sample Size

  • More dimensions usually require more training examples
  • Sparse coverage makes patterns harder to learn reliably
  • Reduction can help when available data is limited
  • Example: A small dataset with too many features may not represent enough combinations

Memory

  • High dimensions require significant storage
  • Reduction decreases memory usage
  • Enables efficient processing
  • Example: Compressing 1000 features to 50 reduces memory needs

A4.2.3_2 Reduce variables, preserve relevant data aspects

Reduction Process

  • Eliminates or combines features to create lower-dimensional representation
  • Retains key information
  • Removes variables that are irrelevant, duplicated, or too noisy to help prediction
  • Example: Combining many similar sensor readings into fewer useful inputs for a model

Preserving Relevant Data

  • Ensures reduced dimensions retain critical patterns
  • Reduction should keep the information most relevant to the prediction or classification task
  • Example: In fraud detection, keeping transaction amount, frequency, and location while removing duplicate identifiers

Course Scope

  • Focus on why dimensionality reduction is useful, not on detailed statistical methods
  • Specific techniques such as PCA and LDA are beyond the scope of this course
  • Use Case: Reducing sensor readings for real-time anomaly detection while preserving important patterns

A4.3.1 Explain linear regression for continuous outcomes (AO2) HL

A4.3.1_1 Predictor-response variable relationship

Description

  • Models relationship between predictors and response
  • Assumes linear relationship: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
  • y = response, xᵢ = predictors, βᵢ = coefficients, ε = error

Predictor-Response Relationship

  • Predictors influence response through learned coefficients
  • Example: Square footage has positive relationship with house price
  • Larger houses tend to cost more

Use Case

  • Predicting continuous outcomes like sales, temperature, stock prices
  • Based on input features
  • Example: Estimating student exam score based on study hours and previous grades

A4.3.1_2 Slope, intercept significance

Intercept (β₀)

  • Represents predicted value when all predictors are zero
  • Example: Base price of house with zero square footage (often not practically meaningful)
  • Sets the baseline for predictions

Slope (βᵢ)

  • Indicates change in response for one-unit increase in predictor
  • Holding other predictors constant
  • Example: Slope of 100 for square_footage means each additional sq ft increases price by $100

Significance

  • Intercept sets baseline, slopes quantify impact of each predictor
  • Coefficients learned during training to minimize prediction errors
  • Example: In sales prediction, slope of 50 for advertising_budget suggests $50 sales increase per dollar spent

A4.3.1_3 Model fit assessment (r²)

r² (Coefficient of Determination)

  • Measures how well model explains variance in response
  • Ranges from 0 to 1 (1 = perfect fit, 0 = no explanatory power)
  • Example: r²=0.85 means 85% of price variability explained by predictors

Assessment Process

  • Formula: r² = 1 - (Sum of Squared Errors / Total Sum of Squares)
  • SSE = error between predicted and actual values
  • SST = total variance in response
  • Higher r² indicates better fit, but overfitting can inflate r²

Use Case

  • Evaluate model quality
  • Compare models to select best explanatory power
  • Example: Student score model with r²=0.9 is more reliable than one with r²=0.6

A4.3.2 Explain classification techniques in supervised learning (AO2) HL

A4.3.2_1 K-NN, decision trees for categorical outcomes

K-Nearest Neighbors (K-NN)

  • Classifies based on majority class of k closest neighbors
  • Uses distance metrics (e.g., Euclidean)
  • Characteristics: Non-parametric, simple, sensitive to k choice
  • Example: Classifying email as spam based on similar emails

Decision Trees

  • Builds tree-like model with decisions based on feature values
  • Nodes test features, leaves give categorical outcomes
  • Characteristics: Interpretable, handles mixed data, prone to overfitting
  • Example: Predicting disease based on symptoms like fever or blood pressure

A4.3.2_2 K-NN applications: recommendation systems

K-NN in Recommendation Systems

  • Use: Recommends items by finding similar users or items
  • Based on user behavior or item features
  • Mechanism: Treats users/items as points in feature space
  • Recommends items from nearest neighbors
  • Example: Netflix recommending movies based on similar users' viewing history

Advantages & Challenges

  • Advantages: Simple, effective for collaborative filtering
  • Captures local patterns in user preferences
  • Challenges: Scales poorly with large datasets
  • Requires efficient distance calculations

A4.3.2_3 Decision trees applications: medical diagnosis

Decision Trees in Medical Diagnosis

  • Use: Classifies patients into diagnostic categories
  • Based on symptoms or test results
  • Mechanism: Tree where nodes test features (e.g., "Is temperature > 38°C?")
  • Leads to diagnosis at leaf nodes
  • Example: Diagnosing diabetes using blood sugar, age, and BMI

Advantages & Challenges

  • Advantages: Easy to interpret, visualizable as flowchart
  • Handles mixed data types effectively
  • Challenges: Overfitting if tree is too deep
  • Requires pruning or ensemble methods for robustness

A4.3.3 Explain hyperparameter tuning in supervised learning (AO2) HL

A4.3.3_1 Metrics: accuracy, precision, recall, F1 score

Accuracy

  • Proportion of correct predictions out of all predictions
  • Formula: (TP+TN)/(TP+TN+FP+FN)
  • Use: Overall model performance; best for balanced datasets
  • Example: 95% accuracy spam classifier correctly identifies 95/100 emails

Precision

  • Proportion of true positives out of all positive predictions
  • Formula: TP/(TP+FP)
  • Use: Important when false positives are costly
  • Example: High precision cancer model minimizes false positive diagnoses

Recall

  • Proportion of true positives identified out of all actual positives
  • Formula: TP/(TP+FN)
  • Use: Critical when missing positives is costly
  • Example: High recall fraud model catches most fraudulent transactions

F1 Score

  • Harmonic mean of precision and recall
  • Formula: 2 × (Precision × Recall)/(Precision + Recall)
  • Use: For imbalanced datasets where both FP and FN matter
  • Example: F1=0.85 indicates balanced performance in sentiment analysis

A4.3.3_2 Tuning impact on performance

Hyperparameter Tuning

  • Adjusts model parameters not learned during training
  • e.g., learning rate, number of trees in random forest
  • Methods: Grid search, random search, Bayesian optimization

Impact on Performance

  • Improved Accuracy: Proper tuning aligns model complexity with data
  • Example: Tuning learning rate to 0.01 improves neural network convergence
  • Balanced Precision/Recall: Optimizes for specific metrics
  • Example: Increasing k in K-NN may improve recall but reduce precision
  • Reduced Overfitting/Underfitting: Adjusts complexity appropriately
  • Example: Adjusting tree depth prevents overfitting in decision trees

A4.3.3_3 Overfitting, underfitting considerations

Overfitting

  • Model learns training data too well, including noise
  • Poor generalization on new data
  • Tuning Solution: Reduce complexity (e.g., fewer layers, more regularization)
  • Example: Lowering neural network layers prevents overfitting on small dataset

Underfitting

  • Model is too simple to capture patterns
  • Poor performance on both training and test data
  • Tuning Solution: Increase complexity (e.g., more trees, higher learning rate)
  • Example: Increasing the number of training iterations may improve performance

Tuning Considerations

  • Use cross-validation to assess hyperparameter performance
  • Balance model complexity with dataset size
  • Smaller datasets require simpler models
  • Example: Grid search for SVM kernel type and penalty parameter (C)

A4.3.4 Describe clustering in unsupervised learning (AO2) HL

A4.3.4_1 Group data by feature similarities

Description

  • Unsupervised technique that groups data points into clusters
  • Based on similarity in features, without predefined labels
  • Similarity measured using distance metrics (e.g., Euclidean) or density

Process

  • Algorithms identify patterns or structures in data
  • Common approach: choose a similarity measure, group similar records, then interpret each group
  • Example: Grouping customers by purchasing behavior

Characteristics

  • No prior knowledge of group labels required
  • Suitable for exploratory analysis
  • Clusters formed based on feature proximity
  • Minimizes intra-cluster variance, maximizes inter-cluster differences
  • Example: Retail purchases grouped by item type and frequency

A4.3.4_2 Applications: customer segmentation

Customer Segmentation

  • Use: Divides customers into groups based on shared characteristics
  • To tailor marketing, improve service, or optimize offerings
  • Mechanism: Analyzes features like age, purchase history, browsing behavior
  • Example: Retailer using K-means to group "budget shoppers," "luxury buyers," "occasional buyers"

Benefits

  • Enables targeted marketing (e.g., personalized promotions)
  • Improves customer experience through preference understanding
  • Supports business strategy for inventory planning

Example Application

  • E-commerce platform clusters users by purchase frequency and order value
  • Sends tailored discounts to low-frequency, high-value customers
  • Aims to boost retention and increase sales

A4.3.5 Describe association rule learning (AO2) HL

A4.3.5_1 Uncover attribute relations in large datasets

Description

  • Unsupervised technique that identifies frequent patterns, correlations
  • Among attributes in large transactional datasets
  • Generates rules in form "If {antecedent} then {consequent}"
  • Example: {bread} → {butter} based on co-occurrence

Key Metrics

  • Support: Frequency of itemset in dataset
  • e.g., percentage of transactions containing both bread and butter
  • Confidence: Probability consequent occurs given antecedent
  • e.g., likelihood of buying butter if bread is bought
  • Lift: Measures association strength vs. random occurrence
  • lift > 1 indicates positive association

Process & Applications

  • Algorithms: Apriori (iterative generation), FP-Growth (tree structures)
  • Process: Scan for frequent itemsets, generate rules, prune weak ones
  • Applications: Market basket analysis, web usage mining, bioinformatics
  • Example: Retail suggesting products based on purchase patterns

A4.3.6 Describe reinforcement learning decision-making (AO2) HL

A4.3.6_1 Cumulative reward, agent-environment interaction (actions, states, rewards, policies)

Cumulative Reward

  • RL goal is to maximize total reward over time
  • Rewards are numerical values guiding agent toward desirable outcomes
  • Example: Game-playing AI earns points for winning moves

Agent-Environment Interaction

  • Agent: Decision-maker (e.g., robot, game AI)
  • Environment: External system providing states and rewards
  • Actions: Choices agent makes (e.g., move left, buy stock)
  • States: Environment's current situation (e.g., position in maze)
  • Rewards: Feedback based on actions (e.g., +10 for goal, -1 for wrong move)
  • Policies: Strategies defining actions for each state
  • Example: Self-driving car observes road conditions, chooses to brake/accelerate

A4.3.6_2 Exploration vs exploitation trade-off

Exploration

  • Agent tries new actions to discover effects
  • Purpose: Uncovers optimal strategies
  • Example: Game AI trying new move to find higher score

Exploitation

  • Agent chooses known high-reward actions
  • Purpose: Maximizes immediate rewards
  • Example: AI repeatedly using known winning strategy

Trade-Off & Strategies

  • Balance: Too much exploration delays learning; too much exploitation misses better options
  • Epsilon-Greedy: Best action most of time, explore randomly with probability ε
  • Softmax: Assign probabilities based on expected rewards, favoring better actions
  • Example: Recommendation system exploits known preferences but explores new genres

A4.3.7 Describe genetic algorithms applications (AO2) HL

A4.3.7_1 Components: population, fitness, selection, crossover, mutation, evaluation, termination

Population

  • Set of potential solutions (individuals)
  • Represented as data structures (e.g., bit strings)
  • Example: Multiple route configurations for delivery optimization

Fitness

  • Measure of how well solution solves problem
  • Defined by fitness function
  • Example: Total distance traveled in route (lower is better)

Selection

  • Chooses individuals for reproduction based on fitness
  • Example: Selecting shortest routes to create next generation

Crossover

  • Combines parts of two parent solutions
  • Creates offspring with new trait combinations
  • Example: Merging segments of two delivery routes

Mutation

  • Randomly alters parts of solution
  • Maintains diversity, avoids local optima
  • Example: Randomly swapping two stops in a route

Evaluation

  • Assesses fitness of new solutions
  • After crossover and mutation
  • Example: Calculating total distance of new routes

Termination

  • Stops algorithm when condition met
  • e.g., max iterations, satisfactory fitness
  • Example: Ending after 100 generations or when route distance below threshold

A4.3.7_2 Applications: route planning (e.g., travelling salesperson)

Route Planning (Travelling Salesperson Problem)

  • Description: Optimizes paths visiting multiple locations while minimizing distance/cost
  • Process:
  • Population: Multiple possible routes (city visit sequences)
  • Fitness: Total distance or travel time
  • Selection: Choose routes with shorter distances
  • Crossover: Combine parts of two routes
  • Mutation: Randomly reorder a city in route
  • Termination: Stop when optimal/near-optimal route found
  • Example: Optimizing delivery truck route for 20 cities, minimizing fuel costs

Benefits & Other Applications

  • Benefits: Handles complex problems where exhaustive search impractical
  • Finds near-optimal solutions efficiently
  • Other Applications:
  • Scheduling: Optimizing task schedules (e.g., job-shop)
  • Engineering Design: Evolving designs for performance/cost
  • Machine Learning: Tuning hyperparameters or evolving neural architectures

A4.3.8 Outline ANN structure, function (AO2) HL

A4.3.8_1 ANN for classification, regression, pattern recognition

Artificial Neural Networks (ANNs)

  • Computational models inspired by biological neural networks
  • Used for classification, regression, pattern recognition
  • Classification: Assigns inputs to discrete categories
  • Example: ANN classifying images as "cat" or "dog"
  • Regression: Predicts continuous values
  • Example: ANN predicting stock prices
  • Pattern Recognition: Identifies patterns in complex data
  • Example: ANN recognizing handwritten digits

Function

  • Processes input data through layers of interconnected nodes
  • Learns patterns via weighted connections
  • Produces outputs based on learned patterns
  • Adapts weights during training to minimize prediction errors

A4.3.8_2 Single perceptron: input, weights, bias, activation, output

Single Perceptron

  • Basic unit of ANN, mimicking a neuron
  • Used for simple tasks like binary classification

Components

  • Input: Features fed into perceptron (e.g., pixel values)
  • Weights: Numerical values representing importance (learned during training)
  • Bias: Constant added to weighted sum, allowing activation shift
  • Activation: Function (e.g., sigmoid, ReLU) introducing non-linearity
  • Output: Result after activation, representing prediction
  • Function: output = activation(∑(inputᵢ × weightᵢ) + bias)
  • Example: Perceptron classifying email as spam by weighting word features

A4.3.8_3 MLP: input, hidden, output layers

Multi-Layer Perceptron (MLP)

  • ANN with multiple layers of nodes
  • Capable of solving complex, non-linear problems

Layers

  • Input Layer: Receives raw data (e.g., features like age, income)
  • Hidden Layers: Process inputs through weighted connections
  • Apply activation functions to learn complex patterns
  • Multiple hidden layers enable deep learning
  • Output Layer: Produces final prediction
  • e.g., class label or continuous value

Function

  • Data flows from input to hidden layers
  • Transforming through weights and activations
  • Produces output at final layer
  • Example: MLP predicting house prices with input (square footage, location)
  • Hidden layers learning interactions, output giving price

A4.3.9 Describe CNNs for spatial feature learning (AO2) HL

A4.3.9_1 Architecture: input, convolutional, activation, pooling, fully connected, output layers

Input Layer

  • Receives raw data, typically images
  • Pixel values in 2D/3D arrays
  • Example: 28x28 grayscale image from MNIST

Convolutional Layer

  • Applies convolution operations using filters
  • Extracts features like edges, textures
  • Example: 3x3 filter detecting vertical edges

Activation Layer

  • Applies non-linear function to feature maps
  • e.g., ReLU setting negatives to zero
  • Example: ReLU improving convergence in classification

Pooling Layer

  • Reduces spatial dimensions (downsampling)
  • Uses max pooling or average pooling
  • Example: Max pooling reducing 28x28 to 14x14

Fully Connected Layer

  • Connects all neurons from previous layer
  • Combines features for final predictions
  • Example: Combining edges/shapes to classify as "cat"

Output Layer

  • Produces final prediction
  • e.g., class probabilities for classification
  • Example: Softmax outputting digit probabilities (0-9)

A4.3.9_2 Impact of layers, kernel size, stride, activation, loss function

  • Non-linear functions improve learning; choice affects convergence
  • ReLU accelerates training vs. sigmoid
  • Measures prediction error; guides optimization
  • Cross-entropy for classification improves accuracy
  • Component Impact Example
    Layers More layers enable hierarchical feature learning but increase complexity Deeper CNNs can learn more complex image features
    Kernel Size Smaller kernels capture fine details; larger capture broader patterns 3x3 for edges; 7x7 for textures in images
    Stride Larger strides reduce output size but may miss details Stride of 2 halves feature map dimensions
    Activation
    Loss Function

    A4.3.10 Explain model selection, comparison (AO2) HL

    A4.3.10_1 Algorithm performance varies by data, problem

    Performance Variation

    • Different ML algorithms perform better or worse depending on dataset and problem
    • Data Characteristics: Size, dimensionality, noise, distribution impact suitability
    • Example: Linear regression works for linear relationships, fails for complex data
    • Problem Type: Classification, regression, clustering require different algorithms
    • Example: Decision trees effective for classification with categorical features

    Factors Influencing Performance

    • Data Size: Large datasets favor complex models; small datasets suit simpler ones
    • Example: Small datasets usually suit simpler models; very large datasets can support more complex models
    • Feature Types: Numeric vs. categorical affects algorithm choice
    • Example: K-NN for numeric; Naive Bayes for categorical
    • Complexity: Simple problems need simple models; complex need advanced algorithms
    • Example: House price prediction may use linear regression; object detection needs CNNs

    A4.3.10_2 Model selection based on problem nature, complexity, outcomes

  • Linear regression for student grades; CNNs for high-res image recognition
  • Decision trees for interpretable medical diagnosis; neural networks for high-accuracy image classification
  • Selection Factor Considerations Example
    Problem Nature Classification predicts categories
    Regression predicts continuous values
    Clustering groups similar data
    Fraud detection is classification; house price prediction is regression
    Complexity Simple problems: less complex models to avoid overfitting
    Complex problems: advanced models like deep learning
    Outcomes Desired outcomes guide selection: accuracy, interpretability, speed
    Trade-offs between simpler (faster) and complex (more accurate) models

    A4.3.10_3 Data characteristics impact performance

    Data Size

    • Small datasets: Risk overfitting with complex models
    • Use simpler models like Naive Bayes
    • Large datasets: Enable complex models like neural networks
    • Example: 100 samples usually require a simpler model; millions of samples can support a more complex model

    Data Noise

    • Noisy data: Requires robust algorithms
    • Random forests handle noise well
    • Example: Noisy sales dataset benefits from ensemble model

    Feature Distribution

    • Algorithms assume specific distributions
    • Linear regression assumes linear relationships
    • Example: Non-linear data needs decision trees or neural networks

    Sparsity

    • Sparse datasets (e.g., text) benefit from specific algorithms
    • SVM or neural networks with dimensionality reduction
    • Example: Text classification with TF-IDF uses SVM

    Imbalanced Data

    • Imbalanced datasets require special handling
    • Ensemble methods or techniques like SMOTE
    • Example: Random forests with class weighting for fraud detection

    Evaluation

    • Use appropriate metrics based on data and problem
    • F1 for imbalanced data, accuracy for balanced
    • Example: Cross-validating imbalanced dataset using F1 score

    A4.4.1 Discuss machine learning ethical implications (AO3)

    A4.4.1_1 Issues: accountability, fairness, bias, consent, environment, privacy, security, societal impact, transparency

    Accountability

    • Issue: Determining responsibility for ML decisions
    • Especially in critical applications like healthcare
    • Example: Who is accountable if ML misdiagnoses a patient?
    • Consideration: Clear governance frameworks needed

    Fairness

    • Issue: Ensuring models treat all groups equitably
    • Avoiding discrimination based on race, gender, etc.
    • Example: Hiring algorithm favoring male candidates
    • Consideration: Fairness metrics and regular audits

    Bias

    • Issue: Models inherit biases from training data
    • Leading to skewed or unfair predictions
    • Example: Facial recognition with lower accuracy for darker skin
    • Consideration: Diverse datasets and bias mitigation

    Consent

    • Issue: Using personal data without explicit consent
    • Example: Training recommendation system without permission
    • Consideration: Transparent policies and opt-in mechanisms

    Environment

    • Issue: Training large models consumes significant energy
    • Contributing to carbon emissions
    • Example: Training GPT emits as much CO2 as several cars yearly
    • Consideration: Optimizing algorithms and efficient hardware

    Privacy

    • Issue: Models may expose sensitive data
    • Enable re-identification from anonymized data
    • Example: Model trained on medical records leaking patient info
    • Consideration: Differential privacy or anonymization

    Security

    • Issue: Vulnerable to adversarial attacks
    • Manipulating predictions with altered inputs
    • Example: Altering image to fool self-driving car detection
    • Consideration: Robust model design and security testing

    Societal Impact

    • Issue: Can disrupt jobs, exacerbate inequality
    • Influence societal behaviors
    • Example: Automation replacing low-skill jobs
    • Consideration: Workforce retraining and equitable access

    Transparency

    • Issue: Complex models are "black boxes"
    • Decisions hard to explain
    • Example: Loan denial without clear reasoning
    • Consideration: Explainable AI techniques (e.g., SHAP values)

    A4.4.1_2 Training data bias challenges

    Challenges

    • Data Representation: Training data may underrepresent certain groups
    • Leading to biased predictions
    • Example: Hiring model trained on male-dominated resumes undervalues female candidates
    • Historical Bias: Data reflecting past discrimination perpetuates bias
    • Example: Predictive policing using biased arrest data unfairly targets communities
    • Data Collection: Non-transparent methods can introduce bias
    • Example: Social media data skewed toward active users may not represent all populations

    Mitigation

    • Use diverse, representative datasets for fair outcomes
    • Apply bias detection tools (e.g., fairness metrics)
    • Use rewighting techniques to balance data
    • Regularly audit models for biased predictions
    • Adjust training data or algorithms accordingly
    • Example: Rebalancing dataset to include equal gender representation in hiring model

    A4.4.1_3 Ethics in online communication: misinformation, harassment, anonymity, privacy

    Misinformation

    • Issue: ML can amplify false information
    • Through recommendation systems or generate misleading content
    • Example: Social media algorithms promoting viral false news
    • Consideration: Content moderation and fact-checking algorithms

    Harassment

    • Issue: ML platforms may fail to detect harassment
    • Enabling toxic behavior
    • Example: Chatbots not filtering abusive language effectively
    • Consideration: NLP models to detect and flag harmful content

    Anonymity

    • Issue: Anonymity enables harmful behavior but over-identification risks privacy
    • Example: Anonymous accounts spreading hate vs. requiring IDs exposing data
    • Consideration: Balance with moderated platforms or pseudonymity

    Privacy

    • Issue: ML systems processing communication data may compromise privacy
    • Through data collection or inference
    • Example: Sentiment analysis inferring personal details from public posts
    • Consideration: Strict privacy policies, anonymization, federated learning

    A4.4.2 Discuss ethical aspects of technology integration (AO3)

    A4.4.2_1 Reassess ethical guidelines as technology advances

    Need for Reassessment

    • Rapid advancements in AI, quantum computing, AR, VR outpace existing frameworks
    • New capabilities introduce unforeseen ethical challenges
    • Example: Rise of generative AI (deepfakes) necessitates new guidelines

    Process

    • Regularly review and update ethical standards
    • Involve stakeholders: developers, policymakers, users
    • Incorporate interdisciplinary perspectives (ethics, law, sociology)
    • Example: IEEE's Ethically Aligned Design framework evolves for AI

    Challenges

    • Balancing innovation with regulation to avoid stifling progress
    • Global coordination as standards vary across cultures/jurisdictions
    • Example: Differing privacy laws (GDPR vs. less strict regulations) complicate universal AI ethics

    A4.4.2_2 Implications of quantum computing, AR, VR, AI on society, rights, privacy, equity

  • Tracks user environments
  • High-cost devices exclude some users
  • Data ownership concerns
  • Tracks biometrics, behavior
  • Expensive hardware limits access
  • Fairness in automated decisions
  • Data leakage in training
  • Unequal access to benefits
  • Technology Societal Impact Rights Privacy Equity
    Quantum Computing Accelerates innovation; disrupts encryption Threatens data security rights Risks decrypting sensitive data Limited access for less affluent
    Augmented Reality Enhances experiences; blurs reality Raises surveillance concerns
    Virtual Reality Transforms engagement; risks isolation
    Artificial Intelligence Automates tasks; risks job loss

    Key Considerations

    • Quantum Computing: Develop post-quantum cryptography; ensure equitable access
    • AR: Enforce transparent data policies; privacy-by-design principles
    • VR: Implement strict data minimization; user control over data collection
    • AI: Promote fairness through bias audits; transparent decision-making; inclusive practices

    Synthesis: Quantum computing offers transformative potential in medicine, logistics, and AI — but its ability to break current encryption standards poses a serious and time-sensitive threat. The ethical imperative is to accelerate post-quantum cryptography standards before quantum hardware reaches practical scale.

    Synthesis: AI integration brings efficiency and new capabilities but raises urgent questions about accountability, bias, and employment. A balanced position requires both embracing the benefits and establishing clear regulatory frameworks to govern automated decision-making.

    Synthesis: Immersive technologies enhance education, healthcare, and accessibility, but introduce risks around privacy, psychological wellbeing, and the blurring of physical and digital realities. Benefits are real but contingent on responsible design standards.