A4.1.1 Describe machine learning algorithms (AO2)
A4.1.1_1 Algorithms: DL, RL, supervised, TL, UL
Deep Learning (DL)
- Uses neural networks with multiple layers to model complex patterns
- Requires significant computational power and data
- Excels in handling unstructured data (e.g., images, audio)
- Example: CNNs for image recognition
Reinforcement Learning (RL)
- Agent learns by interacting with environment
- Optimizes actions based on rewards or penalties
- Trial-and-error approach
- Example: Q-learning for game-playing AI
Supervised Learning
- Trains models on labeled data to predict outcomes
- Uses input-output pairs
- Includes classification (categorical) and regression (continuous)
- Example: Linear regression for house prices
Transfer Learning (TL)
- Reuses pre-trained model on new, related task
- Fine-tunes for specific data
- Reduces training time and data needs
- Example: Fine-tuning BERT for sentiment analysis
Unsupervised Learning (UL)
- Finds patterns in unlabeled data
- No predefined outputs
- Includes clustering and dimensionality reduction
- Example: K-means for customer segmentation
A4.1.1_2 Characteristics of each approach
| Algorithm | Strengths | Weaknesses |
|---|---|---|
| Deep Learning | Handles complex, high-dimensional data; excels in image and speech recognition | Computationally intensive; requires large datasets; less interpretable |
| Reinforcement Learning | Effective for dynamic environments and sequential tasks | Slow learning; requires well-defined reward functions; sensitive to environment changes |
| Supervised Learning | Accurate with sufficient labeled data; straightforward for defined tasks | Relies on quality labeled data; less effective for unstructured data without preprocessing |
| Transfer Learning | Leverages existing models; reduces training time and data requirements | Limited by similarity between pre-trained and target tasks; potential overfitting |
| Unsupervised Learning | Discovers hidden patterns without labels; flexible for exploratory tasks | Results may be less actionable; harder to evaluate without ground truth |
A4.1.1_3 Applications: market basket analysis, medical imaging, NLP, object detection, robotics, sentiment analysis
Market Basket Analysis
- Algorithm: Unsupervised Learning (association rule mining)
- Use: Identifies items frequently purchased together
- Example: Apriori algorithm finding bread and butter often bought together
Medical Imaging
- Algorithm: Deep Learning (CNNs)
- Use: Analyzes scans to detect diseases
- Example: CNN identifying tumors in MRI images
Natural Language Processing
- Algorithm: Deep Learning, Transfer Learning
- Use: Processes text for translation or chatbots
- Example: BERT for sentiment analysis in reviews
Object Detection
- Algorithm: Deep Learning (YOLO, Faster R-CNN)
- Use: Identifies and locates objects in images
- Example: Autonomous vehicles detecting pedestrians
Robotics
- Algorithm: Reinforcement Learning, Deep Learning
- Use: Trains robots for navigation or manipulation
- Example: RL training robotic arm to pick objects
Sentiment Analysis
- Algorithm: Supervised Learning, Transfer Learning
- Use: Determines emotional tone in text
- Example: Classifying social media posts as positive/negative
A4.1.2 Describe machine learning hardware requirements (AO2)
A4.1.2_1 Configurations for processing, storage, scalability
Processing
- Requires high computational power for training and inference
- Configurations include CPUs, GPUs, or specialized hardware like TPUs
- Example: Training deep neural networks requires GPUs with thousands of cores (e.g., Nvidia A100)
Storage
- Large datasets demand high-capacity, fast-access storage
- Configurations include SSDs for quick retrieval or HDDs for archival data
- Example: 1 million images may require terabytes of SSD storage
Scalability
- Hardware must scale to handle increasing data sizes
- Through distributed systems or cloud infrastructure
- Example: AWS EC2 instances scaling GPU resources for larger ML workloads
A4.1.2_2 Range: laptops to advanced infrastructure
Laptops
- Suitable for small-scale ML tasks
- Prototyping or inference with pre-trained models
- Limited by lower processing power (e.g., 4–8 CPU cores, entry-level GPUs)
- Example: Running sentiment analysis on laptop with 16 GB RAM and Intel i7
Advanced Infrastructure
- High-performance systems for large-scale training
- GPU clusters, supercomputers, or cloud platforms
- Supports complex models with massive datasets
- Example: Google Cloud's TPU clusters training large language models
A4.1.2_3 Infrastructure: ASICs, edge devices, FPGAs, GPUs, TPUs, cloud, HPC
ASICs
- Application-Specific Integrated Circuits
- Custom chips for specific ML tasks
- Example: Google's TPU optimized for TensorFlow
Edge Devices
- Lightweight hardware for ML inference at edge
- Limited processing, suitable for real-time tasks
- Example: Raspberry Pi for object detection in smart camera
FPGAs
- Field-Programmable Gate Arrays
- Reconfigurable hardware for flexible ML tasks
- Example: Xilinx FPGAs in custom ML pipelines
GPUs
- Graphics Processing Units
- Parallel processing for training and inference
- Example: Nvidia RTX 3090 for training CNNs
TPUs
- Tensor Processing Units
- Google's ASICs optimized for tensor operations
- Example: Used in Google Cloud for neural network training
Cloud
- Scalable, on-demand infrastructure
- AWS, Azure, Google Cloud for ML training
- Example: AWS SageMaker for distributed training
HPC
- High-Performance Computing
- Supercomputers for massive ML tasks
- Example: Oak Ridge Summit for scientific ML simulations
A4.1.2_4 Application-specific hardware needs
| Application Type | Hardware Requirements | Example |
|---|---|---|
| Small-Scale Applications (e.g., simple regression) |
Laptops or entry-level servers with CPUs or modest GPUs | Laptop with Intel i5 and 8 GB RAM for linear regression |
| Deep Learning (e.g., image recognition, NLP) |
GPUs, TPUs, or cloud-based GPU clusters for parallel processing | Nvidia A100 GPUs for training CNN on millions of images |
| Real-Time Inference (e.g., autonomous vehicles, IoT) |
Edge devices or FPGAs for low-latency, low-power processing | Jetson Nano for object detection in drones |
| Big Data Analytics (e.g., market basket analysis) |
Cloud or HPC clusters with high-capacity storage and distributed processing | Apache Spark on AWS EMR for analyzing transaction data |
A4.2.1 Describe data cleaning significance (AO2) HL
A4.2.1_1 Impact on model performance
Significance
- Data cleaning removes errors, inconsistencies, and irrelevant data
- Ensures high-quality input for machine learning models
- Clean data improves accuracy, reduces bias, and prevents misleading predictions
Impact on Performance
- Accuracy: Dirty data leads to incorrect predictions or overfitting
- Example: Inconsistent square footage data produces unreliable house price predictions
- Efficiency: Clean data reduces processing overhead
- Example: Removing duplicates speeds up training
- Generalization: Clean data helps models generalize better to new data
- Example: Correcting mislabeled categories improves model robustness
A4.2.1_2 Techniques: handle outliers, duplicates, incorrect/irrelevant data, transform formats, impute/delete/predict missing data
Handle Outliers
- Identify and address deviant data points
- Using statistical methods like z-scores or IQR
- Example: Removing $1 billion house price in typical $100K-$500K dataset
Handle Duplicates
- Remove or merge identical records
- Prevents bias in model training
- Example: Deleting duplicate customer entries in sales dataset
Incorrect/Irrelevant Data
- Correct errors or invalid values
- Remove data irrelevant to the task
- Example: Fixing invalid date "2025-13-01" or removing "Notes" column
Transform Formats
- Standardize data formats for consistency
- Ensure consistent dates, units, or encodings
- Example: Converting all dates to YYYY-MM-DD format
Impute Missing Data
- Fill missing values using mean/median, mode
- Or interpolation methods
- Example: Replacing missing ages with average age
Delete Missing Data
- Remove records with missing values
- When minimal or non-critical
- Example: Dropping rows with missing grades in small dataset
Predict Missing Data
- Use ML models to estimate missing values
- Based on other features
- Example: Predicting missing income using regression based on age/occupation
A4.2.1_3 Normalization, standardization as preprocessing
Normalization
- Scales data to fixed range [0, 1]
- Ensures features contribute equally to model
- Formula: (x - min(x)) / (max(x) - min(x))
- Example: Normalizing house prices $100K-$1M to [0,1]
- Purpose: Prevents larger scale features from dominating
Standardization
- Transforms data to have mean of 0, SD of 1
- Improves convergence for certain algorithms
- Formula: (x - mean(x)) / std(x)
- Example: Standardizing test scores for comparison
- Purpose: Helps algorithms assuming normal distribution
Significance
- Both techniques ensure comparable feature scales
- Improving model performance and training stability
- Example: In dataset with age (20-80) and income ($20K-$200K), normalization ensures balanced influence
A4.2.2 Describe feature selection role (AO2) HL
A4.2.2_1 Identify, retain informative attributes
Role of Feature Selection
- Identifies and retains most relevant attributes in dataset
- Contributes to accurate predictions in ML models
- Reduces irrelevant or redundant features to improve performance
- Enhances model interpretability
Identification Process
- Evaluates features based on correlation with target variable
- Or predictive power
- Example: In house price prediction, selecting square footage and location while discarding house color
Retention Benefits
- Enhances model accuracy by focusing on strong predictive features
- Reduces noise from irrelevant features
- Improves model generalization
- Example: Retaining Age and Income for credit risk model as they correlate with repayment ability
A4.2.2_2 Strategies: filter, wrapper, embedded methods
Filter Methods
- Select features based on statistical measures
- Independent of ML model
- Uses metrics like correlation, chi-square, mutual information
- Example: Pearson correlation to select features correlated with house prices
- Advantage: Computationally efficient
- Disadvantage: Ignores feature interactions
Wrapper Methods
- Evaluate feature subsets by training and testing model
- Selects subset with best performance
- Uses algorithms like recursive feature elimination (RFE)
- Example: RFE with decision tree for customer churn prediction
- Advantage: Considers feature interactions
- Disadvantage: Computationally expensive
Embedded Methods
- Perform feature selection during model training
- Using model-specific criteria
- Built into algorithms like Lasso regression or decision trees
- Example: Lasso regression selecting features by assigning zero weights
- Advantage: Balances efficiency and relevance
- Disadvantage: Limited to specific algorithms
A4.2.3 Describe dimensionality reduction importance (AO2) HL
A4.2.3_1 Address overfitting, complexity, sparsity, distance metrics, sample size, visualization, memory
Overfitting
- High-dimensional data increases risk of learning noise
- Dimensionality reduction removes irrelevant features
- Improves model generalization
- Example: Reducing features prevents overfitting to unique IDs
Complexity
- High dimensions increase computational complexity
- Slows training and inference
- Reduction lowers processing time
- Example: Simplifying 100-feature dataset speeds up neural network
Sparsity
- High-dimensional data often has many zero/missing values
- Complicates analysis
- Reduction creates denser representations
- Example: Compressing sparse text data for classification
Distance Metrics
- In high dimensions, distance becomes less meaningful
- Reduction ensures reliable calculations
- Improves algorithms like k-NN
- Example: Reducing dimensions improves similarity in recommendation systems
Visualization
- High-dimensional data is hard to visualize
- Reduction enables 2D/3D representation
- Aids interpretation
- Example: Reducing many features to two or three summary dimensions for plotting
Sample Size
- More dimensions usually require more training examples
- Sparse coverage makes patterns harder to learn reliably
- Reduction can help when available data is limited
- Example: A small dataset with too many features may not represent enough combinations
Memory
- High dimensions require significant storage
- Reduction decreases memory usage
- Enables efficient processing
- Example: Compressing 1000 features to 50 reduces memory needs
A4.2.3_2 Reduce variables, preserve relevant data aspects
Reduction Process
- Eliminates or combines features to create lower-dimensional representation
- Retains key information
- Removes variables that are irrelevant, duplicated, or too noisy to help prediction
- Example: Combining many similar sensor readings into fewer useful inputs for a model
Preserving Relevant Data
- Ensures reduced dimensions retain critical patterns
- Reduction should keep the information most relevant to the prediction or classification task
- Example: In fraud detection, keeping transaction amount, frequency, and location while removing duplicate identifiers
Course Scope
- Focus on why dimensionality reduction is useful, not on detailed statistical methods
- Specific techniques such as PCA and LDA are beyond the scope of this course
- Use Case: Reducing sensor readings for real-time anomaly detection while preserving important patterns
A4.3.1 Explain linear regression for continuous outcomes (AO2) HL
A4.3.1_1 Predictor-response variable relationship
Description
- Models relationship between predictors and response
- Assumes linear relationship: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
- y = response, xᵢ = predictors, βᵢ = coefficients, ε = error
Predictor-Response Relationship
- Predictors influence response through learned coefficients
- Example: Square footage has positive relationship with house price
- Larger houses tend to cost more
Use Case
- Predicting continuous outcomes like sales, temperature, stock prices
- Based on input features
- Example: Estimating student exam score based on study hours and previous grades
A4.3.1_2 Slope, intercept significance
Intercept (β₀)
- Represents predicted value when all predictors are zero
- Example: Base price of house with zero square footage (often not practically meaningful)
- Sets the baseline for predictions
Slope (βᵢ)
- Indicates change in response for one-unit increase in predictor
- Holding other predictors constant
- Example: Slope of 100 for square_footage means each additional sq ft increases price by $100
Significance
- Intercept sets baseline, slopes quantify impact of each predictor
- Coefficients learned during training to minimize prediction errors
- Example: In sales prediction, slope of 50 for advertising_budget suggests $50 sales increase per dollar spent
A4.3.1_3 Model fit assessment (r²)
r² (Coefficient of Determination)
- Measures how well model explains variance in response
- Ranges from 0 to 1 (1 = perfect fit, 0 = no explanatory power)
- Example: r²=0.85 means 85% of price variability explained by predictors
Assessment Process
- Formula: r² = 1 - (Sum of Squared Errors / Total Sum of Squares)
- SSE = error between predicted and actual values
- SST = total variance in response
- Higher r² indicates better fit, but overfitting can inflate r²
Use Case
- Evaluate model quality
- Compare models to select best explanatory power
- Example: Student score model with r²=0.9 is more reliable than one with r²=0.6
A4.3.2 Explain classification techniques in supervised learning (AO2) HL
A4.3.2_1 K-NN, decision trees for categorical outcomes
K-Nearest Neighbors (K-NN)
- Classifies based on majority class of k closest neighbors
- Uses distance metrics (e.g., Euclidean)
- Characteristics: Non-parametric, simple, sensitive to k choice
- Example: Classifying email as spam based on similar emails
Decision Trees
- Builds tree-like model with decisions based on feature values
- Nodes test features, leaves give categorical outcomes
- Characteristics: Interpretable, handles mixed data, prone to overfitting
- Example: Predicting disease based on symptoms like fever or blood pressure
A4.3.2_2 K-NN applications: recommendation systems
K-NN in Recommendation Systems
- Use: Recommends items by finding similar users or items
- Based on user behavior or item features
- Mechanism: Treats users/items as points in feature space
- Recommends items from nearest neighbors
- Example: Netflix recommending movies based on similar users' viewing history
Advantages & Challenges
- Advantages: Simple, effective for collaborative filtering
- Captures local patterns in user preferences
- Challenges: Scales poorly with large datasets
- Requires efficient distance calculations
A4.3.2_3 Decision trees applications: medical diagnosis
Decision Trees in Medical Diagnosis
- Use: Classifies patients into diagnostic categories
- Based on symptoms or test results
- Mechanism: Tree where nodes test features (e.g., "Is temperature > 38°C?")
- Leads to diagnosis at leaf nodes
- Example: Diagnosing diabetes using blood sugar, age, and BMI
Advantages & Challenges
- Advantages: Easy to interpret, visualizable as flowchart
- Handles mixed data types effectively
- Challenges: Overfitting if tree is too deep
- Requires pruning or ensemble methods for robustness
A4.3.3 Explain hyperparameter tuning in supervised learning (AO2) HL
A4.3.3_1 Metrics: accuracy, precision, recall, F1 score
Accuracy
- Proportion of correct predictions out of all predictions
- Formula: (TP+TN)/(TP+TN+FP+FN)
- Use: Overall model performance; best for balanced datasets
- Example: 95% accuracy spam classifier correctly identifies 95/100 emails
Precision
- Proportion of true positives out of all positive predictions
- Formula: TP/(TP+FP)
- Use: Important when false positives are costly
- Example: High precision cancer model minimizes false positive diagnoses
Recall
- Proportion of true positives identified out of all actual positives
- Formula: TP/(TP+FN)
- Use: Critical when missing positives is costly
- Example: High recall fraud model catches most fraudulent transactions
F1 Score
- Harmonic mean of precision and recall
- Formula: 2 × (Precision × Recall)/(Precision + Recall)
- Use: For imbalanced datasets where both FP and FN matter
- Example: F1=0.85 indicates balanced performance in sentiment analysis
A4.3.3_2 Tuning impact on performance
Hyperparameter Tuning
- Adjusts model parameters not learned during training
- e.g., learning rate, number of trees in random forest
- Methods: Grid search, random search, Bayesian optimization
Impact on Performance
- Improved Accuracy: Proper tuning aligns model complexity with data
- Example: Tuning learning rate to 0.01 improves neural network convergence
- Balanced Precision/Recall: Optimizes for specific metrics
- Example: Increasing k in K-NN may improve recall but reduce precision
- Reduced Overfitting/Underfitting: Adjusts complexity appropriately
- Example: Adjusting tree depth prevents overfitting in decision trees
A4.3.3_3 Overfitting, underfitting considerations
Overfitting
- Model learns training data too well, including noise
- Poor generalization on new data
- Tuning Solution: Reduce complexity (e.g., fewer layers, more regularization)
- Example: Lowering neural network layers prevents overfitting on small dataset
Underfitting
- Model is too simple to capture patterns
- Poor performance on both training and test data
- Tuning Solution: Increase complexity (e.g., more trees, higher learning rate)
- Example: Increasing the number of training iterations may improve performance
Tuning Considerations
- Use cross-validation to assess hyperparameter performance
- Balance model complexity with dataset size
- Smaller datasets require simpler models
- Example: Grid search for SVM kernel type and penalty parameter (C)
A4.3.4 Describe clustering in unsupervised learning (AO2) HL
A4.3.4_1 Group data by feature similarities
Description
- Unsupervised technique that groups data points into clusters
- Based on similarity in features, without predefined labels
- Similarity measured using distance metrics (e.g., Euclidean) or density
Process
- Algorithms identify patterns or structures in data
- Common approach: choose a similarity measure, group similar records, then interpret each group
- Example: Grouping customers by purchasing behavior
Characteristics
- No prior knowledge of group labels required
- Suitable for exploratory analysis
- Clusters formed based on feature proximity
- Minimizes intra-cluster variance, maximizes inter-cluster differences
- Example: Retail purchases grouped by item type and frequency
A4.3.4_2 Applications: customer segmentation
Customer Segmentation
- Use: Divides customers into groups based on shared characteristics
- To tailor marketing, improve service, or optimize offerings
- Mechanism: Analyzes features like age, purchase history, browsing behavior
- Example: Retailer using K-means to group "budget shoppers," "luxury buyers," "occasional buyers"
Benefits
- Enables targeted marketing (e.g., personalized promotions)
- Improves customer experience through preference understanding
- Supports business strategy for inventory planning
Example Application
- E-commerce platform clusters users by purchase frequency and order value
- Sends tailored discounts to low-frequency, high-value customers
- Aims to boost retention and increase sales
A4.3.5 Describe association rule learning (AO2) HL
A4.3.5_1 Uncover attribute relations in large datasets
Description
- Unsupervised technique that identifies frequent patterns, correlations
- Among attributes in large transactional datasets
- Generates rules in form "If {antecedent} then {consequent}"
- Example: {bread} → {butter} based on co-occurrence
Key Metrics
- Support: Frequency of itemset in dataset
- e.g., percentage of transactions containing both bread and butter
- Confidence: Probability consequent occurs given antecedent
- e.g., likelihood of buying butter if bread is bought
- Lift: Measures association strength vs. random occurrence
- lift > 1 indicates positive association
Process & Applications
- Algorithms: Apriori (iterative generation), FP-Growth (tree structures)
- Process: Scan for frequent itemsets, generate rules, prune weak ones
- Applications: Market basket analysis, web usage mining, bioinformatics
- Example: Retail suggesting products based on purchase patterns
A4.3.6 Describe reinforcement learning decision-making (AO2) HL
A4.3.6_1 Cumulative reward, agent-environment interaction (actions, states, rewards, policies)
Cumulative Reward
- RL goal is to maximize total reward over time
- Rewards are numerical values guiding agent toward desirable outcomes
- Example: Game-playing AI earns points for winning moves
Agent-Environment Interaction
- Agent: Decision-maker (e.g., robot, game AI)
- Environment: External system providing states and rewards
- Actions: Choices agent makes (e.g., move left, buy stock)
- States: Environment's current situation (e.g., position in maze)
- Rewards: Feedback based on actions (e.g., +10 for goal, -1 for wrong move)
- Policies: Strategies defining actions for each state
- Example: Self-driving car observes road conditions, chooses to brake/accelerate
A4.3.6_2 Exploration vs exploitation trade-off
Exploration
- Agent tries new actions to discover effects
- Purpose: Uncovers optimal strategies
- Example: Game AI trying new move to find higher score
Exploitation
- Agent chooses known high-reward actions
- Purpose: Maximizes immediate rewards
- Example: AI repeatedly using known winning strategy
Trade-Off & Strategies
- Balance: Too much exploration delays learning; too much exploitation misses better options
- Epsilon-Greedy: Best action most of time, explore randomly with probability ε
- Softmax: Assign probabilities based on expected rewards, favoring better actions
- Example: Recommendation system exploits known preferences but explores new genres
A4.3.7 Describe genetic algorithms applications (AO2) HL
A4.3.7_1 Components: population, fitness, selection, crossover, mutation, evaluation, termination
Population
- Set of potential solutions (individuals)
- Represented as data structures (e.g., bit strings)
- Example: Multiple route configurations for delivery optimization
Fitness
- Measure of how well solution solves problem
- Defined by fitness function
- Example: Total distance traveled in route (lower is better)
Selection
- Chooses individuals for reproduction based on fitness
- Example: Selecting shortest routes to create next generation
Crossover
- Combines parts of two parent solutions
- Creates offspring with new trait combinations
- Example: Merging segments of two delivery routes
Mutation
- Randomly alters parts of solution
- Maintains diversity, avoids local optima
- Example: Randomly swapping two stops in a route
Evaluation
- Assesses fitness of new solutions
- After crossover and mutation
- Example: Calculating total distance of new routes
Termination
- Stops algorithm when condition met
- e.g., max iterations, satisfactory fitness
- Example: Ending after 100 generations or when route distance below threshold
A4.3.7_2 Applications: route planning (e.g., travelling salesperson)
Route Planning (Travelling Salesperson Problem)
- Description: Optimizes paths visiting multiple locations while minimizing distance/cost
- Process:
- Population: Multiple possible routes (city visit sequences)
- Fitness: Total distance or travel time
- Selection: Choose routes with shorter distances
- Crossover: Combine parts of two routes
- Mutation: Randomly reorder a city in route
- Termination: Stop when optimal/near-optimal route found
- Example: Optimizing delivery truck route for 20 cities, minimizing fuel costs
Benefits & Other Applications
- Benefits: Handles complex problems where exhaustive search impractical
- Finds near-optimal solutions efficiently
- Other Applications:
- Scheduling: Optimizing task schedules (e.g., job-shop)
- Engineering Design: Evolving designs for performance/cost
- Machine Learning: Tuning hyperparameters or evolving neural architectures
A4.3.8 Outline ANN structure, function (AO2) HL
A4.3.8_1 ANN for classification, regression, pattern recognition
Artificial Neural Networks (ANNs)
- Computational models inspired by biological neural networks
- Used for classification, regression, pattern recognition
- Classification: Assigns inputs to discrete categories
- Example: ANN classifying images as "cat" or "dog"
- Regression: Predicts continuous values
- Example: ANN predicting stock prices
- Pattern Recognition: Identifies patterns in complex data
- Example: ANN recognizing handwritten digits
Function
- Processes input data through layers of interconnected nodes
- Learns patterns via weighted connections
- Produces outputs based on learned patterns
- Adapts weights during training to minimize prediction errors
A4.3.8_2 Single perceptron: input, weights, bias, activation, output
Single Perceptron
- Basic unit of ANN, mimicking a neuron
- Used for simple tasks like binary classification
Components
- Input: Features fed into perceptron (e.g., pixel values)
- Weights: Numerical values representing importance (learned during training)
- Bias: Constant added to weighted sum, allowing activation shift
- Activation: Function (e.g., sigmoid, ReLU) introducing non-linearity
- Output: Result after activation, representing prediction
- Function: output = activation(∑(inputᵢ × weightᵢ) + bias)
- Example: Perceptron classifying email as spam by weighting word features
A4.3.8_3 MLP: input, hidden, output layers
Multi-Layer Perceptron (MLP)
- ANN with multiple layers of nodes
- Capable of solving complex, non-linear problems
Layers
- Input Layer: Receives raw data (e.g., features like age, income)
- Hidden Layers: Process inputs through weighted connections
- Apply activation functions to learn complex patterns
- Multiple hidden layers enable deep learning
- Output Layer: Produces final prediction
- e.g., class label or continuous value
Function
- Data flows from input to hidden layers
- Transforming through weights and activations
- Produces output at final layer
- Example: MLP predicting house prices with input (square footage, location)
- Hidden layers learning interactions, output giving price
A4.3.9 Describe CNNs for spatial feature learning (AO2) HL
A4.3.9_1 Architecture: input, convolutional, activation, pooling, fully connected, output layers
Input Layer
- Receives raw data, typically images
- Pixel values in 2D/3D arrays
- Example: 28x28 grayscale image from MNIST
Convolutional Layer
- Applies convolution operations using filters
- Extracts features like edges, textures
- Example: 3x3 filter detecting vertical edges
Activation Layer
- Applies non-linear function to feature maps
- e.g., ReLU setting negatives to zero
- Example: ReLU improving convergence in classification
Pooling Layer
- Reduces spatial dimensions (downsampling)
- Uses max pooling or average pooling
- Example: Max pooling reducing 28x28 to 14x14
Fully Connected Layer
- Connects all neurons from previous layer
- Combines features for final predictions
- Example: Combining edges/shapes to classify as "cat"
Output Layer
- Produces final prediction
- e.g., class probabilities for classification
- Example: Softmax outputting digit probabilities (0-9)
A4.3.9_2 Impact of layers, kernel size, stride, activation, loss function
| Component | Impact | Example |
|---|---|---|
| Layers | More layers enable hierarchical feature learning but increase complexity | Deeper CNNs can learn more complex image features |
| Kernel Size | Smaller kernels capture fine details; larger capture broader patterns | 3x3 for edges; 7x7 for textures in images |
| Stride | Larger strides reduce output size but may miss details | Stride of 2 halves feature map dimensions |
| Activation | ||
| Loss Function |
A4.3.10 Explain model selection, comparison (AO2) HL
A4.3.10_1 Algorithm performance varies by data, problem
Performance Variation
- Different ML algorithms perform better or worse depending on dataset and problem
- Data Characteristics: Size, dimensionality, noise, distribution impact suitability
- Example: Linear regression works for linear relationships, fails for complex data
- Problem Type: Classification, regression, clustering require different algorithms
- Example: Decision trees effective for classification with categorical features
Factors Influencing Performance
- Data Size: Large datasets favor complex models; small datasets suit simpler ones
- Example: Small datasets usually suit simpler models; very large datasets can support more complex models
- Feature Types: Numeric vs. categorical affects algorithm choice
- Example: K-NN for numeric; Naive Bayes for categorical
- Complexity: Simple problems need simple models; complex need advanced algorithms
- Example: House price prediction may use linear regression; object detection needs CNNs
A4.3.10_2 Model selection based on problem nature, complexity, outcomes
| Selection Factor | Considerations | Example |
|---|---|---|
| Problem Nature | Classification predicts categories Regression predicts continuous values Clustering groups similar data |
Fraud detection is classification; house price prediction is regression |
| Complexity | Simple problems: less complex models to avoid overfitting Complex problems: advanced models like deep learning |
|
| Outcomes | Desired outcomes guide selection: accuracy, interpretability, speed Trade-offs between simpler (faster) and complex (more accurate) models |
A4.3.10_3 Data characteristics impact performance
Data Size
- Small datasets: Risk overfitting with complex models
- Use simpler models like Naive Bayes
- Large datasets: Enable complex models like neural networks
- Example: 100 samples usually require a simpler model; millions of samples can support a more complex model
Data Noise
- Noisy data: Requires robust algorithms
- Random forests handle noise well
- Example: Noisy sales dataset benefits from ensemble model
Feature Distribution
- Algorithms assume specific distributions
- Linear regression assumes linear relationships
- Example: Non-linear data needs decision trees or neural networks
Sparsity
- Sparse datasets (e.g., text) benefit from specific algorithms
- SVM or neural networks with dimensionality reduction
- Example: Text classification with TF-IDF uses SVM
Imbalanced Data
- Imbalanced datasets require special handling
- Ensemble methods or techniques like SMOTE
- Example: Random forests with class weighting for fraud detection
Evaluation
- Use appropriate metrics based on data and problem
- F1 for imbalanced data, accuracy for balanced
- Example: Cross-validating imbalanced dataset using F1 score
A4.4.1 Discuss machine learning ethical implications (AO3)
A4.4.1_1 Issues: accountability, fairness, bias, consent, environment, privacy, security, societal impact, transparency
Accountability
- Issue: Determining responsibility for ML decisions
- Especially in critical applications like healthcare
- Example: Who is accountable if ML misdiagnoses a patient?
- Consideration: Clear governance frameworks needed
Fairness
- Issue: Ensuring models treat all groups equitably
- Avoiding discrimination based on race, gender, etc.
- Example: Hiring algorithm favoring male candidates
- Consideration: Fairness metrics and regular audits
Bias
- Issue: Models inherit biases from training data
- Leading to skewed or unfair predictions
- Example: Facial recognition with lower accuracy for darker skin
- Consideration: Diverse datasets and bias mitigation
Consent
- Issue: Using personal data without explicit consent
- Example: Training recommendation system without permission
- Consideration: Transparent policies and opt-in mechanisms
Environment
- Issue: Training large models consumes significant energy
- Contributing to carbon emissions
- Example: Training GPT emits as much CO2 as several cars yearly
- Consideration: Optimizing algorithms and efficient hardware
Privacy
- Issue: Models may expose sensitive data
- Enable re-identification from anonymized data
- Example: Model trained on medical records leaking patient info
- Consideration: Differential privacy or anonymization
Security
- Issue: Vulnerable to adversarial attacks
- Manipulating predictions with altered inputs
- Example: Altering image to fool self-driving car detection
- Consideration: Robust model design and security testing
Societal Impact
- Issue: Can disrupt jobs, exacerbate inequality
- Influence societal behaviors
- Example: Automation replacing low-skill jobs
- Consideration: Workforce retraining and equitable access
Transparency
- Issue: Complex models are "black boxes"
- Decisions hard to explain
- Example: Loan denial without clear reasoning
- Consideration: Explainable AI techniques (e.g., SHAP values)
A4.4.1_2 Training data bias challenges
Challenges
- Data Representation: Training data may underrepresent certain groups
- Leading to biased predictions
- Example: Hiring model trained on male-dominated resumes undervalues female candidates
- Historical Bias: Data reflecting past discrimination perpetuates bias
- Example: Predictive policing using biased arrest data unfairly targets communities
- Data Collection: Non-transparent methods can introduce bias
- Example: Social media data skewed toward active users may not represent all populations
Mitigation
- Use diverse, representative datasets for fair outcomes
- Apply bias detection tools (e.g., fairness metrics)
- Use rewighting techniques to balance data
- Regularly audit models for biased predictions
- Adjust training data or algorithms accordingly
- Example: Rebalancing dataset to include equal gender representation in hiring model
A4.4.1_3 Ethics in online communication: misinformation, harassment, anonymity, privacy
Misinformation
- Issue: ML can amplify false information
- Through recommendation systems or generate misleading content
- Example: Social media algorithms promoting viral false news
- Consideration: Content moderation and fact-checking algorithms
Harassment
- Issue: ML platforms may fail to detect harassment
- Enabling toxic behavior
- Example: Chatbots not filtering abusive language effectively
- Consideration: NLP models to detect and flag harmful content
Anonymity
- Issue: Anonymity enables harmful behavior but over-identification risks privacy
- Example: Anonymous accounts spreading hate vs. requiring IDs exposing data
- Consideration: Balance with moderated platforms or pseudonymity
Privacy
- Issue: ML systems processing communication data may compromise privacy
- Through data collection or inference
- Example: Sentiment analysis inferring personal details from public posts
- Consideration: Strict privacy policies, anonymization, federated learning
A4.4.2 Discuss ethical aspects of technology integration (AO3)
A4.4.2_1 Reassess ethical guidelines as technology advances
Need for Reassessment
- Rapid advancements in AI, quantum computing, AR, VR outpace existing frameworks
- New capabilities introduce unforeseen ethical challenges
- Example: Rise of generative AI (deepfakes) necessitates new guidelines
Process
- Regularly review and update ethical standards
- Involve stakeholders: developers, policymakers, users
- Incorporate interdisciplinary perspectives (ethics, law, sociology)
- Example: IEEE's Ethically Aligned Design framework evolves for AI
Challenges
- Balancing innovation with regulation to avoid stifling progress
- Global coordination as standards vary across cultures/jurisdictions
- Example: Differing privacy laws (GDPR vs. less strict regulations) complicate universal AI ethics
A4.4.2_2 Implications of quantum computing, AR, VR, AI on society, rights, privacy, equity
| Technology | Societal Impact | Rights | Privacy | Equity |
|---|---|---|---|---|
| Quantum Computing | Accelerates innovation; disrupts encryption | Threatens data security rights | Risks decrypting sensitive data | Limited access for less affluent |
| Augmented Reality | Enhances experiences; blurs reality | Raises surveillance concerns | ||
| Virtual Reality | Transforms engagement; risks isolation | |||
| Artificial Intelligence | Automates tasks; risks job loss |
Key Considerations
- Quantum Computing: Develop post-quantum cryptography; ensure equitable access
- AR: Enforce transparent data policies; privacy-by-design principles
- VR: Implement strict data minimization; user control over data collection
- AI: Promote fairness through bias audits; transparent decision-making; inclusive practices
Synthesis: Quantum computing offers transformative potential in medicine, logistics, and AI — but its ability to break current encryption standards poses a serious and time-sensitive threat. The ethical imperative is to accelerate post-quantum cryptography standards before quantum hardware reaches practical scale.
Synthesis: AI integration brings efficiency and new capabilities but raises urgent questions about accountability, bias, and employment. A balanced position requires both embracing the benefits and establishing clear regulatory frameworks to govern automated decision-making.
Synthesis: Immersive technologies enhance education, healthcare, and accessibility, but introduce risks around privacy, psychological wellbeing, and the blurring of physical and digital realities. Benefits are real but contingent on responsible design standards.