A4.1.1 Describe machine learning algorithms (AO2)

A4.1.1_1 Algorithms: DL, RL, supervised, TL, UL

Deep Learning (DL)

  • Uses neural networks with multiple layers to model complex patterns
  • Requires significant computational power and data
  • Excels in handling unstructured data (e.g., images, audio)
  • Example: CNNs for image recognition

Reinforcement Learning (RL)

  • Agent learns by interacting with environment
  • Optimizes actions based on rewards or penalties
  • Trial-and-error approach
  • Example: Q-learning for game-playing AI

Supervised Learning

  • Trains models on labeled data to predict outcomes
  • Uses input-output pairs
  • Includes classification (categorical) and regression (continuous)
  • Example: Linear regression for house prices

Transfer Learning (TL)

  • Reuses pre-trained model on new, related task
  • Fine-tunes for specific data
  • Reduces training time and data needs
  • Example: Fine-tuning BERT for sentiment analysis

Unsupervised Learning (UL)

  • Finds patterns in unlabeled data
  • No predefined outputs
  • Includes clustering and dimensionality reduction
  • Example: K-means for customer segmentation

A4.1.1_2 Characteristics of each approach

Algorithm Strengths Weaknesses
Deep Learning Handles complex, high-dimensional data; excels in image and speech recognition Computationally intensive; requires large datasets; less interpretable
Reinforcement Learning Effective for dynamic environments and sequential tasks Slow learning; requires well-defined reward functions; sensitive to environment changes
Supervised Learning Accurate with sufficient labeled data; straightforward for defined tasks Relies on quality labeled data; less effective for unstructured data without preprocessing
Transfer Learning Leverages existing models; reduces training time and data requirements Limited by similarity between pre-trained and target tasks; potential overfitting
Unsupervised Learning Discovers hidden patterns without labels; flexible for exploratory tasks Results may be less actionable; harder to evaluate without ground truth

A4.1.1_3 Applications: market basket analysis, medical imaging, NLP, object detection, robotics, sentiment analysis

Market Basket Analysis

  • Algorithm: Unsupervised Learning (association rule mining)
  • Use: Identifies items frequently purchased together
  • Example: Apriori algorithm finding bread and butter often bought together

Medical Imaging

  • Algorithm: Deep Learning (CNNs)
  • Use: Analyzes scans to detect diseases
  • Example: CNN identifying tumors in MRI images

Natural Language Processing

  • Algorithm: Deep Learning, Transfer Learning
  • Use: Processes text for translation or chatbots
  • Example: BERT for sentiment analysis in reviews

Object Detection

  • Algorithm: Deep Learning (YOLO, Faster R-CNN)
  • Use: Identifies and locates objects in images
  • Example: Autonomous vehicles detecting pedestrians

Robotics

  • Algorithm: Reinforcement Learning, Deep Learning
  • Use: Trains robots for navigation or manipulation
  • Example: RL training robotic arm to pick objects

Sentiment Analysis

  • Algorithm: Supervised Learning, Transfer Learning
  • Use: Determines emotional tone in text
  • Example: Classifying social media posts as positive/negative

A4.1.2 Describe machine learning hardware requirements (AO2)

A4.1.2_1 Configurations for processing, storage, scalability

Processing

  • Requires high computational power for training and inference
  • Configurations include CPUs, GPUs, or specialized hardware like TPUs
  • Example: Training deep neural networks requires GPUs with thousands of cores (e.g., Nvidia A100)

Storage

  • Large datasets demand high-capacity, fast-access storage
  • Configurations include SSDs for quick retrieval or HDDs for archival data
  • Example: 1 million images may require terabytes of SSD storage

Scalability

  • Hardware must scale to handle increasing data sizes
  • Through distributed systems or cloud infrastructure
  • Example: AWS EC2 instances scaling GPU resources for larger ML workloads

A4.1.2_2 Range: laptops to advanced infrastructure

Laptops

  • Suitable for small-scale ML tasks
  • Prototyping or inference with pre-trained models
  • Limited by lower processing power (e.g., 4–8 CPU cores, entry-level GPUs)
  • Example: Running sentiment analysis on laptop with 16 GB RAM and Intel i7

Advanced Infrastructure

  • High-performance systems for large-scale training
  • GPU clusters, supercomputers, or cloud platforms
  • Supports complex models with massive datasets
  • Example: Google Cloud's TPU clusters training large language models

A4.1.2_3 Infrastructure: ASICs, edge devices, FPGAs, GPUs, TPUs, cloud, HPC

ASICs

  • Application-Specific Integrated Circuits
  • Custom chips for specific ML tasks
  • Example: Google's TPU optimized for TensorFlow

Edge Devices

  • Lightweight hardware for ML inference at edge
  • Limited processing, suitable for real-time tasks
  • Example: Raspberry Pi for object detection in smart camera

FPGAs

  • Field-Programmable Gate Arrays
  • Reconfigurable hardware for flexible ML tasks
  • Example: Xilinx FPGAs in custom ML pipelines

GPUs

  • Graphics Processing Units
  • Parallel processing for training and inference
  • Example: Nvidia RTX 3090 for training CNNs

TPUs

  • Tensor Processing Units
  • Google's ASICs optimized for tensor operations
  • Example: Used in Google Cloud for neural network training

Cloud

  • Scalable, on-demand infrastructure
  • AWS, Azure, Google Cloud for ML training
  • Example: AWS SageMaker for distributed training

HPC

  • High-Performance Computing
  • Supercomputers for massive ML tasks
  • Example: Oak Ridge Summit for scientific ML simulations

A4.1.2_4 Application-specific hardware needs

Application Type Hardware Requirements Example
Small-Scale Applications
(e.g., simple regression)
Laptops or entry-level servers with CPUs or modest GPUs Laptop with Intel i5 and 8 GB RAM for linear regression
Deep Learning
(e.g., image recognition, NLP)
GPUs, TPUs, or cloud-based GPU clusters for parallel processing Nvidia A100 GPUs for training CNN on millions of images
Real-Time Inference
(e.g., autonomous vehicles, IoT)
Edge devices or FPGAs for low-latency, low-power processing Jetson Nano for object detection in drones
Big Data Analytics
(e.g., market basket analysis)
Cloud or HPC clusters with high-capacity storage and distributed processing Apache Spark on AWS EMR for analyzing transaction data

A4.2.1 Describe data cleaning significance (AO2)

A4.2.1_1 Impact on model performance

Significance

  • Data cleaning removes errors, inconsistencies, and irrelevant data
  • Ensures high-quality input for machine learning models
  • Clean data improves accuracy, reduces bias, and prevents misleading predictions

Impact on Performance

  • Accuracy: Dirty data leads to incorrect predictions or overfitting
  • Example: Inconsistent square footage data produces unreliable house price predictions
  • Efficiency: Clean data reduces processing overhead
  • Example: Removing duplicates speeds up training
  • Generalization: Clean data helps models generalize better to new data
  • Example: Correcting mislabeled categories improves model robustness

A4.2.1_2 Techniques: handle outliers, duplicates, incorrect/irrelevant data, transform formats, impute/delete/predict missing data

Handle Outliers

  • Identify and address deviant data points
  • Using statistical methods like z-scores or IQR
  • Example: Removing $1 billion house price in typical $100K-$500K dataset

Handle Duplicates

  • Remove or merge identical records
  • Prevents bias in model training
  • Example: Deleting duplicate customer entries in sales dataset

Incorrect/Irrelevant Data

  • Correct errors or invalid values
  • Remove data irrelevant to the task
  • Example: Fixing invalid date "2025-13-01" or removing "Notes" column

Transform Formats

  • Standardize data formats for consistency
  • Ensure consistent dates, units, or encodings
  • Example: Converting all dates to YYYY-MM-DD format

Impute Missing Data

  • Fill missing values using mean/median, mode
  • Or interpolation methods
  • Example: Replacing missing ages with average age

Delete Missing Data

  • Remove records with missing values
  • When minimal or non-critical
  • Example: Dropping rows with missing grades in small dataset

Predict Missing Data

  • Use ML models to estimate missing values
  • Based on other features
  • Example: Predicting missing income using regression based on age/occupation

A4.2.1_3 Normalization, standardization as preprocessing

Normalization

  • Scales data to fixed range [0, 1]
  • Ensures features contribute equally to model
  • Formula: (x - min(x)) / (max(x) - min(x))
  • Example: Normalizing house prices $100K-$1M to [0,1]
  • Purpose: Prevents larger scale features from dominating

Standardization

  • Transforms data to have mean of 0, SD of 1
  • Improves convergence for certain algorithms
  • Formula: (x - mean(x)) / std(x)
  • Example: Standardizing test scores for comparison
  • Purpose: Helps algorithms assuming normal distribution

Significance

  • Both techniques ensure comparable feature scales
  • Improving model performance and training stability
  • Example: In dataset with age (20-80) and income ($20K-$200K), normalization ensures balanced influence

A4.2.2 Describe feature selection role (AO2)

A4.2.2_1 Identify, retain informative attributes

Role of Feature Selection

  • Identifies and retains most relevant attributes in dataset
  • Contributes to accurate predictions in ML models
  • Reduces irrelevant or redundant features to improve performance
  • Enhances model interpretability

Identification Process

  • Evaluates features based on correlation with target variable
  • Or predictive power
  • Example: In house price prediction, selecting square footage and location while discarding house color

Retention Benefits

  • Enhances model accuracy by focusing on strong predictive features
  • Reduces noise from irrelevant features
  • Improves model generalization
  • Example: Retaining Age and Income for credit risk model as they correlate with repayment ability

A4.2.2_2 Strategies: filter, wrapper, embedded methods

Filter Methods

  • Select features based on statistical measures
  • Independent of ML model
  • Uses metrics like correlation, chi-square, mutual information
  • Example: Pearson correlation to select features correlated with house prices
  • Advantage: Computationally efficient
  • Disadvantage: Ignores feature interactions

Wrapper Methods

  • Evaluate feature subsets by training and testing model
  • Selects subset with best performance
  • Uses algorithms like recursive feature elimination (RFE)
  • Example: RFE with decision tree for customer churn prediction
  • Advantage: Considers feature interactions
  • Disadvantage: Computationally expensive

Embedded Methods

  • Perform feature selection during model training
  • Using model-specific criteria
  • Built into algorithms like Lasso regression or decision trees
  • Example: Lasso regression selecting features by assigning zero weights
  • Advantage: Balances efficiency and relevance
  • Disadvantage: Limited to specific algorithms

A4.2.3 Describe dimensionality reduction importance (AO2)

A4.2.3_1 Address overfitting, complexity, sparsity, distance metrics, visualization, memory

Overfitting

  • High-dimensional data increases risk of learning noise
  • Dimensionality reduction removes irrelevant features
  • Improves model generalization
  • Example: Reducing features prevents overfitting to unique IDs

Complexity

  • High dimensions increase computational complexity
  • Slows training and inference
  • Reduction lowers processing time
  • Example: Simplifying 100-feature dataset speeds up neural network

Sparsity

  • High-dimensional data often has many zero/missing values
  • Complicates analysis
  • Reduction creates denser representations
  • Example: Compressing sparse text data for classification

Distance Metrics

  • In high dimensions, distance becomes less meaningful
  • Reduction ensures reliable calculations
  • Improves algorithms like k-NN
  • Example: Reducing dimensions improves similarity in recommendation systems

Visualization

  • High-dimensional data is hard to visualize
  • Reduction enables 2D/3D representation
  • Aids interpretation
  • Example: Using t-SNE to visualize image data in 2D

Memory

  • High dimensions require significant storage
  • Reduction decreases memory usage
  • Enables efficient processing
  • Example: Compressing 1000 features to 50 reduces memory needs

A4.2.3_2 Reduce variables, preserve relevant data aspects

Reduction Process

  • Eliminates or combines features to create lower-dimensional representation
  • Retains key information
  • Techniques: PCA, t-SNE, autoencoders
  • Example: PCA transforms 50 features into 5 principal components capturing most variance

Preserving Relevant Data

  • Ensures reduced dimensions retain critical patterns
  • Methods prioritize features contributing to variance or performance
  • Example: In fraud detection, PCA retains transaction amount and frequency

Techniques

  • PCA: Linear transformation creating uncorrelated components, maximizing variance
  • t-SNE: Non-linear method for visualization, preserving local structures
  • Autoencoders: Neural networks learning compressed representations
  • Use Case: Reducing sensor readings for real-time anomaly detection in IoT

A4.3.1 Explain linear regression for continuous outcomes (AO2)

A4.3.1_1 Predictor-response variable relationship

Description

  • Models relationship between predictors and response
  • Assumes linear relationship: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
  • y = response, xᵢ = predictors, βᵢ = coefficients, ε = error

Predictor-Response Relationship

  • Predictors influence response through learned coefficients
  • Example: Square footage has positive relationship with house price
  • Larger houses tend to cost more

Use Case

  • Predicting continuous outcomes like sales, temperature, stock prices
  • Based on input features
  • Example: Estimating student exam score based on study hours and previous grades

A4.3.1_2 Slope, intercept significance

Intercept (β₀)

  • Represents predicted value when all predictors are zero
  • Example: Base price of house with zero square footage (often not practically meaningful)
  • Sets the baseline for predictions

Slope (βᵢ)

  • Indicates change in response for one-unit increase in predictor
  • Holding other predictors constant
  • Example: Slope of 100 for square_footage means each additional sq ft increases price by $100

Significance

  • Intercept sets baseline, slopes quantify impact of each predictor
  • Coefficients learned during training to minimize prediction errors
  • Example: In sales prediction, slope of 50 for advertising_budget suggests $50 sales increase per dollar spent

A4.3.1_3 Model fit assessment (r²)

r² (Coefficient of Determination)

  • Measures how well model explains variance in response
  • Ranges from 0 to 1 (1 = perfect fit, 0 = no explanatory power)
  • Example: r²=0.85 means 85% of price variability explained by predictors

Assessment Process

  • Formula: r² = 1 - (Sum of Squared Errors / Total Sum of Squares)
  • SSE = error between predicted and actual values
  • SST = total variance in response
  • Higher r² indicates better fit, but overfitting can inflate r²

Use Case

  • Evaluate model quality
  • Compare models to select best explanatory power
  • Example: Student score model with r²=0.9 is more reliable than one with r²=0.6

A4.3.2 Explain classification techniques in supervised learning (AO2)

A4.3.2_1 K-NN, decision trees for categorical outcomes

K-Nearest Neighbors (K-NN)

  • Classifies based on majority class of k closest neighbors
  • Uses distance metrics (e.g., Euclidean)
  • Characteristics: Non-parametric, simple, sensitive to k choice
  • Example: Classifying email as spam based on similar emails

Decision Trees

  • Builds tree-like model with decisions based on feature values
  • Nodes test features, leaves give categorical outcomes
  • Characteristics: Interpretable, handles mixed data, prone to overfitting
  • Example: Predicting disease based on symptoms like fever or blood pressure

A4.3.2_2 K-NN applications: recommendation systems

K-NN in Recommendation Systems

  • Use: Recommends items by finding similar users or items
  • Based on user behavior or item features
  • Mechanism: Treats users/items as points in feature space
  • Recommends items from nearest neighbors
  • Example: Netflix recommending movies based on similar users' viewing history

Advantages & Challenges

  • Advantages: Simple, effective for collaborative filtering
  • Captures local patterns in user preferences
  • Challenges: Scales poorly with large datasets
  • Requires efficient distance calculations

A4.3.2_3 Decision trees applications: medical diagnosis

Decision Trees in Medical Diagnosis

  • Use: Classifies patients into diagnostic categories
  • Based on symptoms or test results
  • Mechanism: Tree where nodes test features (e.g., "Is temperature > 38°C?")
  • Leads to diagnosis at leaf nodes
  • Example: Diagnosing diabetes using blood sugar, age, and BMI

Advantages & Challenges

  • Advantages: Easy to interpret, visualizable as flowchart
  • Handles mixed data types effectively
  • Challenges: Overfitting if tree is too deep
  • Requires pruning or ensemble methods for robustness

A4.3.3 Explain hyperparameter tuning in supervised learning (AO2)

A4.3.3_1 Metrics: accuracy, precision, recall, F1 score

Accuracy

  • Proportion of correct predictions out of all predictions
  • Formula: (TP+TN)/(TP+TN+FP+FN)
  • Use: Overall model performance; best for balanced datasets
  • Example: 95% accuracy spam classifier correctly identifies 95/100 emails

Precision

  • Proportion of true positives out of all positive predictions
  • Formula: TP/(TP+FP)
  • Use: Important when false positives are costly
  • Example: High precision cancer model minimizes false positive diagnoses

Recall

  • Proportion of true positives identified out of all actual positives
  • Formula: TP/(TP+FN)
  • Use: Critical when missing positives is costly
  • Example: High recall fraud model catches most fraudulent transactions

F1 Score

  • Harmonic mean of precision and recall
  • Formula: 2 × (Precision × Recall)/(Precision + Recall)
  • Use: For imbalanced datasets where both FP and FN matter
  • Example: F1=0.85 indicates balanced performance in sentiment analysis

A4.3.3_2 Tuning impact on performance

Hyperparameter Tuning

  • Adjusts model parameters not learned during training
  • e.g., learning rate, number of trees in random forest
  • Methods: Grid search, random search, Bayesian optimization

Impact on Performance

  • Improved Accuracy: Proper tuning aligns model complexity with data
  • Example: Tuning learning rate to 0.01 improves neural network convergence
  • Balanced Precision/Recall: Optimizes for specific metrics
  • Example: Increasing k in K-NN may improve recall but reduce precision
  • Reduced Overfitting/Underfitting: Adjusts complexity appropriately
  • Example: Adjusting tree depth prevents overfitting in decision trees

A4.3.3_3 Overfitting, underfitting considerations

Overfitting

  • Model learns training data too well, including noise
  • Poor generalization on new data
  • Tuning Solution: Reduce complexity (e.g., fewer layers, more regularization)
  • Example: Lowering neural network layers prevents overfitting on small dataset

Underfitting

  • Model is too simple to capture patterns
  • Poor performance on both training and test data
  • Tuning Solution: Increase complexity (e.g., more trees, higher learning rate)
  • Example: Increasing iterations in gradient boosting improves performance

Tuning Considerations

  • Use cross-validation to assess hyperparameter performance
  • Balance model complexity with dataset size
  • Smaller datasets require simpler models
  • Example: Grid search for SVM kernel type and penalty parameter (C)

A4.3.4 Describe clustering in unsupervised learning (AO2)

A4.3.4_1 Group data by feature similarities

Description

  • Unsupervised technique that groups data points into clusters
  • Based on similarity in features, without predefined labels
  • Similarity measured using distance metrics (e.g., Euclidean) or density

Process

  • Algorithms identify patterns or structures in data
  • Common algorithms: K-means, hierarchical clustering, DBSCAN
  • Example: Grouping customers by purchasing behavior

Characteristics

  • No prior knowledge of group labels required
  • Suitable for exploratory analysis
  • Clusters formed based on feature proximity
  • Minimizes intra-cluster variance, maximizes inter-cluster differences
  • Example: Retail purchases grouped by item type and frequency

A4.3.4_2 Applications: customer segmentation

Customer Segmentation

  • Use: Divides customers into groups based on shared characteristics
  • To tailor marketing, improve service, or optimize offerings
  • Mechanism: Analyzes features like age, purchase history, browsing behavior
  • Example: Retailer using K-means to group "budget shoppers," "luxury buyers," "occasional buyers"

Benefits

  • Enables targeted marketing (e.g., personalized promotions)
  • Improves customer experience through preference understanding
  • Supports business strategy for inventory planning

Example Application

  • E-commerce platform clusters users by purchase frequency and order value
  • Sends tailored discounts to low-frequency, high-value customers
  • Aims to boost retention and increase sales

A4.3.5 Describe association rule learning (AO2)

A4.3.5_1 Uncover attribute relations in large datasets

Description

  • Unsupervised technique that identifies frequent patterns, correlations
  • Among attributes in large transactional datasets
  • Generates rules in form "If {antecedent} then {consequent}"
  • Example: {bread} → {butter} based on co-occurrence

Key Metrics

  • Support: Frequency of itemset in dataset
  • e.g., percentage of transactions containing both bread and butter
  • Confidence: Probability consequent occurs given antecedent
  • e.g., likelihood of buying butter if bread is bought
  • Lift: Measures association strength vs. random occurrence
  • lift > 1 indicates positive association

Process & Applications

  • Algorithms: Apriori (iterative generation), FP-Growth (tree structures)
  • Process: Scan for frequent itemsets, generate rules, prune weak ones
  • Applications: Market basket analysis, web usage mining, bioinformatics
  • Example: Retail suggesting products based on purchase patterns

A4.3.6 Describe reinforcement learning decision-making (AO2)

A4.3.6_1 Cumulative reward, agent-environment interaction (actions, states, rewards, policies)

Cumulative Reward

  • RL goal is to maximize total reward over time
  • Rewards are numerical values guiding agent toward desirable outcomes
  • Example: Game-playing AI earns points for winning moves

Agent-Environment Interaction

  • Agent: Decision-maker (e.g., robot, game AI)
  • Environment: External system providing states and rewards
  • Actions: Choices agent makes (e.g., move left, buy stock)
  • States: Environment's current situation (e.g., position in maze)
  • Rewards: Feedback based on actions (e.g., +10 for goal, -1 for wrong move)
  • Policies: Strategies defining actions for each state
  • Example: Self-driving car observes road conditions, chooses to brake/accelerate

A4.3.6_2 Exploration vs exploitation trade-off

Exploration

  • Agent tries new actions to discover effects
  • Purpose: Uncovers optimal strategies
  • Example: Game AI trying new move to find higher score

Exploitation

  • Agent chooses known high-reward actions
  • Purpose: Maximizes immediate rewards
  • Example: AI repeatedly using known winning strategy

Trade-Off & Strategies

  • Balance: Too much exploration delays learning; too much exploitation misses better options
  • Epsilon-Greedy: Best action most of time, explore randomly with probability ε
  • Softmax: Assign probabilities based on expected rewards, favoring better actions
  • Example: Recommendation system exploits known preferences but explores new genres

A4.3.7 Describe genetic algorithms applications (AO2)

A4.3.7_1 Components: population, fitness, selection, crossover, mutation, evaluation, termination

Population

  • Set of potential solutions (individuals)
  • Represented as data structures (e.g., bit strings)
  • Example: Multiple route configurations for delivery optimization

Fitness

  • Measure of how well solution solves problem
  • Defined by fitness function
  • Example: Total distance traveled in route (lower is better)

Selection

  • Chooses individuals for reproduction based on fitness
  • Example: Selecting shortest routes to create next generation

Crossover

  • Combines parts of two parent solutions
  • Creates offspring with new trait combinations
  • Example: Merging segments of two delivery routes

Mutation

  • Randomly alters parts of solution
  • Maintains diversity, avoids local optima
  • Example: Randomly swapping two stops in a route

Evaluation

  • Assesses fitness of new solutions
  • After crossover and mutation
  • Example: Calculating total distance of new routes

Termination

  • Stops algorithm when condition met
  • e.g., max iterations, satisfactory fitness
  • Example: Ending after 100 generations or when route distance below threshold

A4.3.7_2 Applications: route planning (e.g., travelling salesperson)

Route Planning (Travelling Salesperson Problem)

  • Description: Optimizes paths visiting multiple locations while minimizing distance/cost
  • Process:
  • Population: Multiple possible routes (city visit sequences)
  • Fitness: Total distance or travel time
  • Selection: Choose routes with shorter distances
  • Crossover: Combine parts of two routes
  • Mutation: Randomly reorder a city in route
  • Termination: Stop when optimal/near-optimal route found
  • Example: Optimizing delivery truck route for 20 cities, minimizing fuel costs

Benefits & Other Applications

  • Benefits: Handles complex problems where exhaustive search impractical
  • Finds near-optimal solutions efficiently
  • Other Applications:
  • Scheduling: Optimizing task schedules (e.g., job-shop)
  • Engineering Design: Evolving designs for performance/cost
  • Machine Learning: Tuning hyperparameters or evolving neural architectures

A4.3.8 Outline ANN structure, function (AO2)

A4.3.8_1 ANN for classification, regression, pattern recognition

Artificial Neural Networks (ANNs)

  • Computational models inspired by biological neural networks
  • Used for classification, regression, pattern recognition
  • Classification: Assigns inputs to discrete categories
  • Example: ANN classifying images as "cat" or "dog"
  • Regression: Predicts continuous values
  • Example: ANN predicting stock prices
  • Pattern Recognition: Identifies patterns in complex data
  • Example: ANN recognizing handwritten digits

Function

  • Processes input data through layers of interconnected nodes
  • Learns patterns via weighted connections
  • Produces outputs based on learned patterns
  • Adapts weights during training to minimize prediction errors

A4.3.8_2 Single perceptron: input, weights, bias, activation, output

Single Perceptron

  • Basic unit of ANN, mimicking a neuron
  • Used for simple tasks like binary classification

Components

  • Input: Features fed into perceptron (e.g., pixel values)
  • Weights: Numerical values representing importance (learned during training)
  • Bias: Constant added to weighted sum, allowing activation shift
  • Activation: Function (e.g., sigmoid, ReLU) introducing non-linearity
  • Output: Result after activation, representing prediction
  • Function: output = activation(∑(inputᵢ × weightᵢ) + bias)
  • Example: Perceptron classifying email as spam by weighting word features

A4.3.8_3 MLP: input, hidden, output layers

Multi-Layer Perceptron (MLP)

  • ANN with multiple layers of nodes
  • Capable of solving complex, non-linear problems

Layers

  • Input Layer: Receives raw data (e.g., features like age, income)
  • Hidden Layers: Process inputs through weighted connections
  • Apply activation functions to learn complex patterns
  • Multiple hidden layers enable deep learning
  • Output Layer: Produces final prediction
  • e.g., class label or continuous value

Function

  • Data flows from input to hidden layers
  • Transforming through weights and activations
  • Produces output at final layer
  • Example: MLP predicting house prices with input (square footage, location)
  • Hidden layers learning interactions, output giving price

A4.3.9 Describe CNNs for spatial feature learning (AO2)

A4.3.9_1 Architecture: input, convolutional, activation, pooling, fully connected, output layers

Input Layer

  • Receives raw data, typically images
  • Pixel values in 2D/3D arrays
  • Example: 28x28 grayscale image from MNIST

Convolutional Layer

  • Applies convolution operations using filters
  • Extracts features like edges, textures
  • Example: 3x3 filter detecting vertical edges

Activation Layer

  • Applies non-linear function to feature maps
  • e.g., ReLU setting negatives to zero
  • Example: ReLU improving convergence in classification

Pooling Layer

  • Reduces spatial dimensions (downsampling)
  • Uses max pooling or average pooling
  • Example: Max pooling reducing 28x28 to 14x14

Fully Connected Layer

  • Connects all neurons from previous layer
  • Combines features for final predictions
  • Example: Combining edges/shapes to classify as "cat"

Output Layer

  • Produces final prediction
  • e.g., class probabilities for classification
  • Example: Softmax outputting digit probabilities (0-9)

A4.3.9_2 Impact of layers, kernel size, stride, activation, loss function

  • Non-linear functions improve learning; choice affects convergence
  • ReLU accelerates training vs. sigmoid
  • Measures prediction error; guides optimization
  • Cross-entropy for classification improves accuracy
  • Component Impact Example
    Layers More layers enable hierarchical feature learning but increase complexity Deep CNNs like ResNet excel in facial recognition
    Kernel Size Smaller kernels capture fine details; larger capture broader patterns 3x3 for edges; 7x7 for textures in images
    Stride Larger strides reduce output size but may miss details Stride of 2 halves feature map dimensions
    Activation
    Loss Function

    A4.3.10 Explain model selection, comparison (AO2)

    A4.3.10_1 Algorithm performance varies by data, problem

    Performance Variation

    • Different ML algorithms perform better or worse depending on dataset and problem
    • Data Characteristics: Size, dimensionality, noise, distribution impact suitability
    • Example: Linear regression works for linear relationships, fails for complex data
    • Problem Type: Classification, regression, clustering require different algorithms
    • Example: Decision trees effective for classification with categorical features

    Factors Influencing Performance

    • Data Size: Large datasets favor complex models; small datasets suit simpler ones
    • Example: Small dataset uses logistic regression; millions use deep learning
    • Feature Types: Numeric vs. categorical affects algorithm choice
    • Example: K-NN for numeric; Naive Bayes for categorical
    • Complexity: Simple problems need simple models; complex need advanced algorithms
    • Example: House price prediction may use linear regression; object detection needs CNNs

    A4.3.10_2 Model selection based on problem nature, complexity, outcomes

  • Linear regression for student grades; CNNs for high-res image recognition
  • Decision trees for interpretable medical diagnosis; neural networks for high-accuracy image classification
  • Selection Factor Considerations Example
    Problem Nature Classification: logistic regression, SVM, random forests
    Regression: linear regression, gradient boosting, neural networks
    Clustering: K-means, DBSCAN
    Fraud detection (classification) uses random forest for robustness to imbalanced data
    Complexity Simple problems: less complex models to avoid overfitting
    Complex problems: advanced models like deep learning
    Outcomes Desired outcomes guide selection: accuracy, interpretability, speed
    Trade-offs between simpler (faster) and complex (more accurate) models

    A4.3.10_3 Data characteristics impact performance

    Data Size

    • Small datasets: Risk overfitting with complex models
    • Use simpler models like Naive Bayes
    • Large datasets: Enable complex models like neural networks
    • Example: 100 samples use logistic regression; millions use deep learning

    Data Noise

    • Noisy data: Requires robust algorithms
    • Random forests handle noise well
    • Example: Noisy sales dataset benefits from ensemble model

    Feature Distribution

    • Algorithms assume specific distributions
    • Linear regression assumes linear relationships
    • Example: Non-linear data needs decision trees or neural networks

    Sparsity

    • Sparse datasets (e.g., text) benefit from specific algorithms
    • SVM or neural networks with dimensionality reduction
    • Example: Text classification with TF-IDF uses SVM

    Imbalanced Data

    • Imbalanced datasets require special handling
    • Ensemble methods or techniques like SMOTE
    • Example: Random forests with class weighting for fraud detection

    Evaluation

    • Use appropriate metrics based on data and problem
    • F1 for imbalanced data, accuracy for balanced
    • Example: Cross-validating imbalanced dataset using F1 score

    A4.4.1 Discuss machine learning ethical implications (AO3)

    A4.4.1_1 Issues: accountability, fairness, bias, consent, environment, privacy, security, societal impact, transparency

    Accountability

    • Issue: Determining responsibility for ML decisions
    • Especially in critical applications like healthcare
    • Example: Who is accountable if ML misdiagnoses a patient?
    • Consideration: Clear governance frameworks needed

    Fairness

    • Issue: Ensuring models treat all groups equitably
    • Avoiding discrimination based on race, gender, etc.
    • Example: Hiring algorithm favoring male candidates
    • Consideration: Fairness metrics and regular audits

    Bias

    • Issue: Models inherit biases from training data
    • Leading to skewed or unfair predictions
    • Example: Facial recognition with lower accuracy for darker skin
    • Consideration: Diverse datasets and bias mitigation

    Consent

    • Issue: Using personal data without explicit consent
    • Example: Training recommendation system without permission
    • Consideration: Transparent policies and opt-in mechanisms

    Environment

    • Issue: Training large models consumes significant energy
    • Contributing to carbon emissions
    • Example: Training GPT emits as much CO2 as several cars yearly
    • Consideration: Optimizing algorithms and efficient hardware

    Privacy

    • Issue: Models may expose sensitive data
    • Enable re-identification from anonymized data
    • Example: Model trained on medical records leaking patient info
    • Consideration: Differential privacy or anonymization

    Security

    • Issue: Vulnerable to adversarial attacks
    • Manipulating predictions with altered inputs
    • Example: Altering image to fool self-driving car detection
    • Consideration: Robust model design and security testing

    Societal Impact

    • Issue: Can disrupt jobs, exacerbate inequality
    • Influence societal behaviors
    • Example: Automation replacing low-skill jobs
    • Consideration: Workforce retraining and equitable access

    Transparency

    • Issue: Complex models are "black boxes"
    • Decisions hard to explain
    • Example: Loan denial without clear reasoning
    • Consideration: Explainable AI techniques (e.g., SHAP values)

    A4.4.1_2 Training data bias challenges

    Challenges

    • Data Representation: Training data may underrepresent certain groups
    • Leading to biased predictions
    • Example: Hiring model trained on male-dominated resumes undervalues female candidates
    • Historical Bias: Data reflecting past discrimination perpetuates bias
    • Example: Predictive policing using biased arrest data unfairly targets communities
    • Data Collection: Non-transparent methods can introduce bias
    • Example: Social media data skewed toward active users may not represent all populations

    Mitigation

    • Use diverse, representative datasets for fair outcomes
    • Apply bias detection tools (e.g., fairness metrics)
    • Use rewighting techniques to balance data
    • Regularly audit models for biased predictions
    • Adjust training data or algorithms accordingly
    • Example: Rebalancing dataset to include equal gender representation in hiring model

    A4.4.1_3 Ethics in online communication: misinformation, harassment, anonymity, privacy

    Misinformation

    • Issue: ML can amplify false information
    • Through recommendation systems or generate misleading content
    • Example: Social media algorithms promoting viral false news
    • Consideration: Content moderation and fact-checking algorithms

    Harassment

    • Issue: ML platforms may fail to detect harassment
    • Enabling toxic behavior
    • Example: Chatbots not filtering abusive language effectively
    • Consideration: NLP models to detect and flag harmful content

    Anonymity

    • Issue: Anonymity enables harmful behavior but over-identification risks privacy
    • Example: Anonymous accounts spreading hate vs. requiring IDs exposing data
    • Consideration: Balance with moderated platforms or pseudonymity

    Privacy

    • Issue: ML systems processing communication data may compromise privacy
    • Through data collection or inference
    • Example: Sentiment analysis inferring personal details from public posts
    • Consideration: Strict privacy policies, anonymization, federated learning

    A4.4.2 Discuss ethical aspects of technology integration (AO3)

    A4.4.2_1 Reassess ethical guidelines as technology advances

    Need for Reassessment

    • Rapid advancements in AI, quantum computing, AR, VR outpace existing frameworks
    • New capabilities introduce unforeseen ethical challenges
    • Example: Rise of generative AI (deepfakes) necessitates new guidelines

    Process

    • Regularly review and update ethical standards
    • Involve stakeholders: developers, policymakers, users
    • Incorporate interdisciplinary perspectives (ethics, law, sociology)
    • Example: IEEE's Ethically Aligned Design framework evolves for AI

    Challenges

    • Balancing innovation with regulation to avoid stifling progress
    • Global coordination as standards vary across cultures/jurisdictions
    • Example: Differing privacy laws (GDPR vs. less strict regulations) complicate universal AI ethics

    A4.4.2_2 Implications of quantum computing, AR, VR, AI on society, rights, privacy, equity

  • Tracks user environments
  • High-cost devices exclude some users
  • Data ownership concerns
  • Tracks biometrics, behavior
  • Expensive hardware limits access
  • Fairness in automated decisions
  • Data leakage in training
  • Unequal access to benefits
  • Technology Societal Impact Rights Privacy Equity
    Quantum Computing Accelerates innovation; disrupts encryption Threatens data security rights Risks decrypting sensitive data Limited access for less affluent
    Augmented Reality Enhances experiences; blurs reality Raises surveillance concerns
    Virtual Reality Transforms engagement; risks isolation
    Artificial Intelligence Automates tasks; risks job loss

    Key Considerations

    • Quantum Computing: Develop post-quantum cryptography; ensure equitable access
    • AR: Enforce transparent data policies; privacy-by-design principles
    • VR: Implement strict data minimization; user control over data collection
    • AI: Promote fairness through bias audits; transparent decision-making; inclusive practices