A4.1.1 Describe machine learning algorithms (AO2)
A4.1.1_1 Algorithms: DL, RL, supervised, TL, UL
Deep Learning (DL)
- Uses neural networks with multiple layers to model complex patterns
- Requires significant computational power and data
- Excels in handling unstructured data (e.g., images, audio)
- Example: CNNs for image recognition
Reinforcement Learning (RL)
- Agent learns by interacting with environment
- Optimizes actions based on rewards or penalties
- Trial-and-error approach
- Example: Q-learning for game-playing AI
Supervised Learning
- Trains models on labeled data to predict outcomes
- Uses input-output pairs
- Includes classification (categorical) and regression (continuous)
- Example: Linear regression for house prices
Transfer Learning (TL)
- Reuses pre-trained model on new, related task
- Fine-tunes for specific data
- Reduces training time and data needs
- Example: Fine-tuning BERT for sentiment analysis
Unsupervised Learning (UL)
- Finds patterns in unlabeled data
- No predefined outputs
- Includes clustering and dimensionality reduction
- Example: K-means for customer segmentation
A4.1.1_2 Characteristics of each approach
Algorithm | Strengths | Weaknesses |
---|---|---|
Deep Learning | Handles complex, high-dimensional data; excels in image and speech recognition | Computationally intensive; requires large datasets; less interpretable |
Reinforcement Learning | Effective for dynamic environments and sequential tasks | Slow learning; requires well-defined reward functions; sensitive to environment changes |
Supervised Learning | Accurate with sufficient labeled data; straightforward for defined tasks | Relies on quality labeled data; less effective for unstructured data without preprocessing |
Transfer Learning | Leverages existing models; reduces training time and data requirements | Limited by similarity between pre-trained and target tasks; potential overfitting |
Unsupervised Learning | Discovers hidden patterns without labels; flexible for exploratory tasks | Results may be less actionable; harder to evaluate without ground truth |
A4.1.1_3 Applications: market basket analysis, medical imaging, NLP, object detection, robotics, sentiment analysis
Market Basket Analysis
- Algorithm: Unsupervised Learning (association rule mining)
- Use: Identifies items frequently purchased together
- Example: Apriori algorithm finding bread and butter often bought together
Medical Imaging
- Algorithm: Deep Learning (CNNs)
- Use: Analyzes scans to detect diseases
- Example: CNN identifying tumors in MRI images
Natural Language Processing
- Algorithm: Deep Learning, Transfer Learning
- Use: Processes text for translation or chatbots
- Example: BERT for sentiment analysis in reviews
Object Detection
- Algorithm: Deep Learning (YOLO, Faster R-CNN)
- Use: Identifies and locates objects in images
- Example: Autonomous vehicles detecting pedestrians
Robotics
- Algorithm: Reinforcement Learning, Deep Learning
- Use: Trains robots for navigation or manipulation
- Example: RL training robotic arm to pick objects
Sentiment Analysis
- Algorithm: Supervised Learning, Transfer Learning
- Use: Determines emotional tone in text
- Example: Classifying social media posts as positive/negative
A4.1.2 Describe machine learning hardware requirements (AO2)
A4.1.2_1 Configurations for processing, storage, scalability
Processing
- Requires high computational power for training and inference
- Configurations include CPUs, GPUs, or specialized hardware like TPUs
- Example: Training deep neural networks requires GPUs with thousands of cores (e.g., Nvidia A100)
Storage
- Large datasets demand high-capacity, fast-access storage
- Configurations include SSDs for quick retrieval or HDDs for archival data
- Example: 1 million images may require terabytes of SSD storage
Scalability
- Hardware must scale to handle increasing data sizes
- Through distributed systems or cloud infrastructure
- Example: AWS EC2 instances scaling GPU resources for larger ML workloads
A4.1.2_2 Range: laptops to advanced infrastructure
Laptops
- Suitable for small-scale ML tasks
- Prototyping or inference with pre-trained models
- Limited by lower processing power (e.g., 4–8 CPU cores, entry-level GPUs)
- Example: Running sentiment analysis on laptop with 16 GB RAM and Intel i7
Advanced Infrastructure
- High-performance systems for large-scale training
- GPU clusters, supercomputers, or cloud platforms
- Supports complex models with massive datasets
- Example: Google Cloud's TPU clusters training large language models
A4.1.2_3 Infrastructure: ASICs, edge devices, FPGAs, GPUs, TPUs, cloud, HPC
ASICs
- Application-Specific Integrated Circuits
- Custom chips for specific ML tasks
- Example: Google's TPU optimized for TensorFlow
Edge Devices
- Lightweight hardware for ML inference at edge
- Limited processing, suitable for real-time tasks
- Example: Raspberry Pi for object detection in smart camera
FPGAs
- Field-Programmable Gate Arrays
- Reconfigurable hardware for flexible ML tasks
- Example: Xilinx FPGAs in custom ML pipelines
GPUs
- Graphics Processing Units
- Parallel processing for training and inference
- Example: Nvidia RTX 3090 for training CNNs
TPUs
- Tensor Processing Units
- Google's ASICs optimized for tensor operations
- Example: Used in Google Cloud for neural network training
Cloud
- Scalable, on-demand infrastructure
- AWS, Azure, Google Cloud for ML training
- Example: AWS SageMaker for distributed training
HPC
- High-Performance Computing
- Supercomputers for massive ML tasks
- Example: Oak Ridge Summit for scientific ML simulations
A4.1.2_4 Application-specific hardware needs
Application Type | Hardware Requirements | Example |
---|---|---|
Small-Scale Applications (e.g., simple regression) |
Laptops or entry-level servers with CPUs or modest GPUs | Laptop with Intel i5 and 8 GB RAM for linear regression |
Deep Learning (e.g., image recognition, NLP) |
GPUs, TPUs, or cloud-based GPU clusters for parallel processing | Nvidia A100 GPUs for training CNN on millions of images |
Real-Time Inference (e.g., autonomous vehicles, IoT) |
Edge devices or FPGAs for low-latency, low-power processing | Jetson Nano for object detection in drones |
Big Data Analytics (e.g., market basket analysis) |
Cloud or HPC clusters with high-capacity storage and distributed processing | Apache Spark on AWS EMR for analyzing transaction data |
A4.2.1 Describe data cleaning significance (AO2)
A4.2.1_1 Impact on model performance
Significance
- Data cleaning removes errors, inconsistencies, and irrelevant data
- Ensures high-quality input for machine learning models
- Clean data improves accuracy, reduces bias, and prevents misleading predictions
Impact on Performance
- Accuracy: Dirty data leads to incorrect predictions or overfitting
- Example: Inconsistent square footage data produces unreliable house price predictions
- Efficiency: Clean data reduces processing overhead
- Example: Removing duplicates speeds up training
- Generalization: Clean data helps models generalize better to new data
- Example: Correcting mislabeled categories improves model robustness
A4.2.1_2 Techniques: handle outliers, duplicates, incorrect/irrelevant data, transform formats, impute/delete/predict missing data
Handle Outliers
- Identify and address deviant data points
- Using statistical methods like z-scores or IQR
- Example: Removing $1 billion house price in typical $100K-$500K dataset
Handle Duplicates
- Remove or merge identical records
- Prevents bias in model training
- Example: Deleting duplicate customer entries in sales dataset
Incorrect/Irrelevant Data
- Correct errors or invalid values
- Remove data irrelevant to the task
- Example: Fixing invalid date "2025-13-01" or removing "Notes" column
Transform Formats
- Standardize data formats for consistency
- Ensure consistent dates, units, or encodings
- Example: Converting all dates to YYYY-MM-DD format
Impute Missing Data
- Fill missing values using mean/median, mode
- Or interpolation methods
- Example: Replacing missing ages with average age
Delete Missing Data
- Remove records with missing values
- When minimal or non-critical
- Example: Dropping rows with missing grades in small dataset
Predict Missing Data
- Use ML models to estimate missing values
- Based on other features
- Example: Predicting missing income using regression based on age/occupation
A4.2.1_3 Normalization, standardization as preprocessing
Normalization
- Scales data to fixed range [0, 1]
- Ensures features contribute equally to model
- Formula: (x - min(x)) / (max(x) - min(x))
- Example: Normalizing house prices $100K-$1M to [0,1]
- Purpose: Prevents larger scale features from dominating
Standardization
- Transforms data to have mean of 0, SD of 1
- Improves convergence for certain algorithms
- Formula: (x - mean(x)) / std(x)
- Example: Standardizing test scores for comparison
- Purpose: Helps algorithms assuming normal distribution
Significance
- Both techniques ensure comparable feature scales
- Improving model performance and training stability
- Example: In dataset with age (20-80) and income ($20K-$200K), normalization ensures balanced influence
A4.2.2 Describe feature selection role (AO2)
A4.2.2_1 Identify, retain informative attributes
Role of Feature Selection
- Identifies and retains most relevant attributes in dataset
- Contributes to accurate predictions in ML models
- Reduces irrelevant or redundant features to improve performance
- Enhances model interpretability
Identification Process
- Evaluates features based on correlation with target variable
- Or predictive power
- Example: In house price prediction, selecting square footage and location while discarding house color
Retention Benefits
- Enhances model accuracy by focusing on strong predictive features
- Reduces noise from irrelevant features
- Improves model generalization
- Example: Retaining Age and Income for credit risk model as they correlate with repayment ability
A4.2.2_2 Strategies: filter, wrapper, embedded methods
Filter Methods
- Select features based on statistical measures
- Independent of ML model
- Uses metrics like correlation, chi-square, mutual information
- Example: Pearson correlation to select features correlated with house prices
- Advantage: Computationally efficient
- Disadvantage: Ignores feature interactions
Wrapper Methods
- Evaluate feature subsets by training and testing model
- Selects subset with best performance
- Uses algorithms like recursive feature elimination (RFE)
- Example: RFE with decision tree for customer churn prediction
- Advantage: Considers feature interactions
- Disadvantage: Computationally expensive
Embedded Methods
- Perform feature selection during model training
- Using model-specific criteria
- Built into algorithms like Lasso regression or decision trees
- Example: Lasso regression selecting features by assigning zero weights
- Advantage: Balances efficiency and relevance
- Disadvantage: Limited to specific algorithms
A4.2.3 Describe dimensionality reduction importance (AO2)
A4.2.3_1 Address overfitting, complexity, sparsity, distance metrics, visualization, memory
Overfitting
- High-dimensional data increases risk of learning noise
- Dimensionality reduction removes irrelevant features
- Improves model generalization
- Example: Reducing features prevents overfitting to unique IDs
Complexity
- High dimensions increase computational complexity
- Slows training and inference
- Reduction lowers processing time
- Example: Simplifying 100-feature dataset speeds up neural network
Sparsity
- High-dimensional data often has many zero/missing values
- Complicates analysis
- Reduction creates denser representations
- Example: Compressing sparse text data for classification
Distance Metrics
- In high dimensions, distance becomes less meaningful
- Reduction ensures reliable calculations
- Improves algorithms like k-NN
- Example: Reducing dimensions improves similarity in recommendation systems
Visualization
- High-dimensional data is hard to visualize
- Reduction enables 2D/3D representation
- Aids interpretation
- Example: Using t-SNE to visualize image data in 2D
Memory
- High dimensions require significant storage
- Reduction decreases memory usage
- Enables efficient processing
- Example: Compressing 1000 features to 50 reduces memory needs
A4.2.3_2 Reduce variables, preserve relevant data aspects
Reduction Process
- Eliminates or combines features to create lower-dimensional representation
- Retains key information
- Techniques: PCA, t-SNE, autoencoders
- Example: PCA transforms 50 features into 5 principal components capturing most variance
Preserving Relevant Data
- Ensures reduced dimensions retain critical patterns
- Methods prioritize features contributing to variance or performance
- Example: In fraud detection, PCA retains transaction amount and frequency
Techniques
- PCA: Linear transformation creating uncorrelated components, maximizing variance
- t-SNE: Non-linear method for visualization, preserving local structures
- Autoencoders: Neural networks learning compressed representations
- Use Case: Reducing sensor readings for real-time anomaly detection in IoT
A4.3.1 Explain linear regression for continuous outcomes (AO2)
A4.3.1_1 Predictor-response variable relationship
Description
- Models relationship between predictors and response
- Assumes linear relationship: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε
- y = response, xᵢ = predictors, βᵢ = coefficients, ε = error
Predictor-Response Relationship
- Predictors influence response through learned coefficients
- Example: Square footage has positive relationship with house price
- Larger houses tend to cost more
Use Case
- Predicting continuous outcomes like sales, temperature, stock prices
- Based on input features
- Example: Estimating student exam score based on study hours and previous grades
A4.3.1_2 Slope, intercept significance
Intercept (β₀)
- Represents predicted value when all predictors are zero
- Example: Base price of house with zero square footage (often not practically meaningful)
- Sets the baseline for predictions
Slope (βᵢ)
- Indicates change in response for one-unit increase in predictor
- Holding other predictors constant
- Example: Slope of 100 for square_footage means each additional sq ft increases price by $100
Significance
- Intercept sets baseline, slopes quantify impact of each predictor
- Coefficients learned during training to minimize prediction errors
- Example: In sales prediction, slope of 50 for advertising_budget suggests $50 sales increase per dollar spent
A4.3.1_3 Model fit assessment (r²)
r² (Coefficient of Determination)
- Measures how well model explains variance in response
- Ranges from 0 to 1 (1 = perfect fit, 0 = no explanatory power)
- Example: r²=0.85 means 85% of price variability explained by predictors
Assessment Process
- Formula: r² = 1 - (Sum of Squared Errors / Total Sum of Squares)
- SSE = error between predicted and actual values
- SST = total variance in response
- Higher r² indicates better fit, but overfitting can inflate r²
Use Case
- Evaluate model quality
- Compare models to select best explanatory power
- Example: Student score model with r²=0.9 is more reliable than one with r²=0.6
A4.3.2 Explain classification techniques in supervised learning (AO2)
A4.3.2_1 K-NN, decision trees for categorical outcomes
K-Nearest Neighbors (K-NN)
- Classifies based on majority class of k closest neighbors
- Uses distance metrics (e.g., Euclidean)
- Characteristics: Non-parametric, simple, sensitive to k choice
- Example: Classifying email as spam based on similar emails
Decision Trees
- Builds tree-like model with decisions based on feature values
- Nodes test features, leaves give categorical outcomes
- Characteristics: Interpretable, handles mixed data, prone to overfitting
- Example: Predicting disease based on symptoms like fever or blood pressure
A4.3.2_2 K-NN applications: recommendation systems
K-NN in Recommendation Systems
- Use: Recommends items by finding similar users or items
- Based on user behavior or item features
- Mechanism: Treats users/items as points in feature space
- Recommends items from nearest neighbors
- Example: Netflix recommending movies based on similar users' viewing history
Advantages & Challenges
- Advantages: Simple, effective for collaborative filtering
- Captures local patterns in user preferences
- Challenges: Scales poorly with large datasets
- Requires efficient distance calculations
A4.3.2_3 Decision trees applications: medical diagnosis
Decision Trees in Medical Diagnosis
- Use: Classifies patients into diagnostic categories
- Based on symptoms or test results
- Mechanism: Tree where nodes test features (e.g., "Is temperature > 38°C?")
- Leads to diagnosis at leaf nodes
- Example: Diagnosing diabetes using blood sugar, age, and BMI
Advantages & Challenges
- Advantages: Easy to interpret, visualizable as flowchart
- Handles mixed data types effectively
- Challenges: Overfitting if tree is too deep
- Requires pruning or ensemble methods for robustness
A4.3.3 Explain hyperparameter tuning in supervised learning (AO2)
A4.3.3_1 Metrics: accuracy, precision, recall, F1 score
Accuracy
- Proportion of correct predictions out of all predictions
- Formula: (TP+TN)/(TP+TN+FP+FN)
- Use: Overall model performance; best for balanced datasets
- Example: 95% accuracy spam classifier correctly identifies 95/100 emails
Precision
- Proportion of true positives out of all positive predictions
- Formula: TP/(TP+FP)
- Use: Important when false positives are costly
- Example: High precision cancer model minimizes false positive diagnoses
Recall
- Proportion of true positives identified out of all actual positives
- Formula: TP/(TP+FN)
- Use: Critical when missing positives is costly
- Example: High recall fraud model catches most fraudulent transactions
F1 Score
- Harmonic mean of precision and recall
- Formula: 2 × (Precision × Recall)/(Precision + Recall)
- Use: For imbalanced datasets where both FP and FN matter
- Example: F1=0.85 indicates balanced performance in sentiment analysis
A4.3.3_2 Tuning impact on performance
Hyperparameter Tuning
- Adjusts model parameters not learned during training
- e.g., learning rate, number of trees in random forest
- Methods: Grid search, random search, Bayesian optimization
Impact on Performance
- Improved Accuracy: Proper tuning aligns model complexity with data
- Example: Tuning learning rate to 0.01 improves neural network convergence
- Balanced Precision/Recall: Optimizes for specific metrics
- Example: Increasing k in K-NN may improve recall but reduce precision
- Reduced Overfitting/Underfitting: Adjusts complexity appropriately
- Example: Adjusting tree depth prevents overfitting in decision trees
A4.3.3_3 Overfitting, underfitting considerations
Overfitting
- Model learns training data too well, including noise
- Poor generalization on new data
- Tuning Solution: Reduce complexity (e.g., fewer layers, more regularization)
- Example: Lowering neural network layers prevents overfitting on small dataset
Underfitting
- Model is too simple to capture patterns
- Poor performance on both training and test data
- Tuning Solution: Increase complexity (e.g., more trees, higher learning rate)
- Example: Increasing iterations in gradient boosting improves performance
Tuning Considerations
- Use cross-validation to assess hyperparameter performance
- Balance model complexity with dataset size
- Smaller datasets require simpler models
- Example: Grid search for SVM kernel type and penalty parameter (C)
A4.3.4 Describe clustering in unsupervised learning (AO2)
A4.3.4_1 Group data by feature similarities
Description
- Unsupervised technique that groups data points into clusters
- Based on similarity in features, without predefined labels
- Similarity measured using distance metrics (e.g., Euclidean) or density
Process
- Algorithms identify patterns or structures in data
- Common algorithms: K-means, hierarchical clustering, DBSCAN
- Example: Grouping customers by purchasing behavior
Characteristics
- No prior knowledge of group labels required
- Suitable for exploratory analysis
- Clusters formed based on feature proximity
- Minimizes intra-cluster variance, maximizes inter-cluster differences
- Example: Retail purchases grouped by item type and frequency
A4.3.4_2 Applications: customer segmentation
Customer Segmentation
- Use: Divides customers into groups based on shared characteristics
- To tailor marketing, improve service, or optimize offerings
- Mechanism: Analyzes features like age, purchase history, browsing behavior
- Example: Retailer using K-means to group "budget shoppers," "luxury buyers," "occasional buyers"
Benefits
- Enables targeted marketing (e.g., personalized promotions)
- Improves customer experience through preference understanding
- Supports business strategy for inventory planning
Example Application
- E-commerce platform clusters users by purchase frequency and order value
- Sends tailored discounts to low-frequency, high-value customers
- Aims to boost retention and increase sales
A4.3.5 Describe association rule learning (AO2)
A4.3.5_1 Uncover attribute relations in large datasets
Description
- Unsupervised technique that identifies frequent patterns, correlations
- Among attributes in large transactional datasets
- Generates rules in form "If {antecedent} then {consequent}"
- Example: {bread} → {butter} based on co-occurrence
Key Metrics
- Support: Frequency of itemset in dataset
- e.g., percentage of transactions containing both bread and butter
- Confidence: Probability consequent occurs given antecedent
- e.g., likelihood of buying butter if bread is bought
- Lift: Measures association strength vs. random occurrence
- lift > 1 indicates positive association
Process & Applications
- Algorithms: Apriori (iterative generation), FP-Growth (tree structures)
- Process: Scan for frequent itemsets, generate rules, prune weak ones
- Applications: Market basket analysis, web usage mining, bioinformatics
- Example: Retail suggesting products based on purchase patterns
A4.3.6 Describe reinforcement learning decision-making (AO2)
A4.3.6_1 Cumulative reward, agent-environment interaction (actions, states, rewards, policies)
Cumulative Reward
- RL goal is to maximize total reward over time
- Rewards are numerical values guiding agent toward desirable outcomes
- Example: Game-playing AI earns points for winning moves
Agent-Environment Interaction
- Agent: Decision-maker (e.g., robot, game AI)
- Environment: External system providing states and rewards
- Actions: Choices agent makes (e.g., move left, buy stock)
- States: Environment's current situation (e.g., position in maze)
- Rewards: Feedback based on actions (e.g., +10 for goal, -1 for wrong move)
- Policies: Strategies defining actions for each state
- Example: Self-driving car observes road conditions, chooses to brake/accelerate
A4.3.6_2 Exploration vs exploitation trade-off
Exploration
- Agent tries new actions to discover effects
- Purpose: Uncovers optimal strategies
- Example: Game AI trying new move to find higher score
Exploitation
- Agent chooses known high-reward actions
- Purpose: Maximizes immediate rewards
- Example: AI repeatedly using known winning strategy
Trade-Off & Strategies
- Balance: Too much exploration delays learning; too much exploitation misses better options
- Epsilon-Greedy: Best action most of time, explore randomly with probability ε
- Softmax: Assign probabilities based on expected rewards, favoring better actions
- Example: Recommendation system exploits known preferences but explores new genres
A4.3.7 Describe genetic algorithms applications (AO2)
A4.3.7_1 Components: population, fitness, selection, crossover, mutation, evaluation, termination
Population
- Set of potential solutions (individuals)
- Represented as data structures (e.g., bit strings)
- Example: Multiple route configurations for delivery optimization
Fitness
- Measure of how well solution solves problem
- Defined by fitness function
- Example: Total distance traveled in route (lower is better)
Selection
- Chooses individuals for reproduction based on fitness
- Example: Selecting shortest routes to create next generation
Crossover
- Combines parts of two parent solutions
- Creates offspring with new trait combinations
- Example: Merging segments of two delivery routes
Mutation
- Randomly alters parts of solution
- Maintains diversity, avoids local optima
- Example: Randomly swapping two stops in a route
Evaluation
- Assesses fitness of new solutions
- After crossover and mutation
- Example: Calculating total distance of new routes
Termination
- Stops algorithm when condition met
- e.g., max iterations, satisfactory fitness
- Example: Ending after 100 generations or when route distance below threshold
A4.3.7_2 Applications: route planning (e.g., travelling salesperson)
Route Planning (Travelling Salesperson Problem)
- Description: Optimizes paths visiting multiple locations while minimizing distance/cost
- Process:
- Population: Multiple possible routes (city visit sequences)
- Fitness: Total distance or travel time
- Selection: Choose routes with shorter distances
- Crossover: Combine parts of two routes
- Mutation: Randomly reorder a city in route
- Termination: Stop when optimal/near-optimal route found
- Example: Optimizing delivery truck route for 20 cities, minimizing fuel costs
Benefits & Other Applications
- Benefits: Handles complex problems where exhaustive search impractical
- Finds near-optimal solutions efficiently
- Other Applications:
- Scheduling: Optimizing task schedules (e.g., job-shop)
- Engineering Design: Evolving designs for performance/cost
- Machine Learning: Tuning hyperparameters or evolving neural architectures
A4.3.8 Outline ANN structure, function (AO2)
A4.3.8_1 ANN for classification, regression, pattern recognition
Artificial Neural Networks (ANNs)
- Computational models inspired by biological neural networks
- Used for classification, regression, pattern recognition
- Classification: Assigns inputs to discrete categories
- Example: ANN classifying images as "cat" or "dog"
- Regression: Predicts continuous values
- Example: ANN predicting stock prices
- Pattern Recognition: Identifies patterns in complex data
- Example: ANN recognizing handwritten digits
Function
- Processes input data through layers of interconnected nodes
- Learns patterns via weighted connections
- Produces outputs based on learned patterns
- Adapts weights during training to minimize prediction errors
A4.3.8_2 Single perceptron: input, weights, bias, activation, output
Single Perceptron
- Basic unit of ANN, mimicking a neuron
- Used for simple tasks like binary classification
Components
- Input: Features fed into perceptron (e.g., pixel values)
- Weights: Numerical values representing importance (learned during training)
- Bias: Constant added to weighted sum, allowing activation shift
- Activation: Function (e.g., sigmoid, ReLU) introducing non-linearity
- Output: Result after activation, representing prediction
- Function: output = activation(∑(inputᵢ × weightᵢ) + bias)
- Example: Perceptron classifying email as spam by weighting word features
A4.3.8_3 MLP: input, hidden, output layers
Multi-Layer Perceptron (MLP)
- ANN with multiple layers of nodes
- Capable of solving complex, non-linear problems
Layers
- Input Layer: Receives raw data (e.g., features like age, income)
- Hidden Layers: Process inputs through weighted connections
- Apply activation functions to learn complex patterns
- Multiple hidden layers enable deep learning
- Output Layer: Produces final prediction
- e.g., class label or continuous value
Function
- Data flows from input to hidden layers
- Transforming through weights and activations
- Produces output at final layer
- Example: MLP predicting house prices with input (square footage, location)
- Hidden layers learning interactions, output giving price
A4.3.9 Describe CNNs for spatial feature learning (AO2)
A4.3.9_1 Architecture: input, convolutional, activation, pooling, fully connected, output layers
Input Layer
- Receives raw data, typically images
- Pixel values in 2D/3D arrays
- Example: 28x28 grayscale image from MNIST
Convolutional Layer
- Applies convolution operations using filters
- Extracts features like edges, textures
- Example: 3x3 filter detecting vertical edges
Activation Layer
- Applies non-linear function to feature maps
- e.g., ReLU setting negatives to zero
- Example: ReLU improving convergence in classification
Pooling Layer
- Reduces spatial dimensions (downsampling)
- Uses max pooling or average pooling
- Example: Max pooling reducing 28x28 to 14x14
Fully Connected Layer
- Connects all neurons from previous layer
- Combines features for final predictions
- Example: Combining edges/shapes to classify as "cat"
Output Layer
- Produces final prediction
- e.g., class probabilities for classification
- Example: Softmax outputting digit probabilities (0-9)
A4.3.9_2 Impact of layers, kernel size, stride, activation, loss function
Component | Impact | Example |
---|---|---|
Layers | More layers enable hierarchical feature learning but increase complexity | Deep CNNs like ResNet excel in facial recognition |
Kernel Size | Smaller kernels capture fine details; larger capture broader patterns | 3x3 for edges; 7x7 for textures in images |
Stride | Larger strides reduce output size but may miss details | Stride of 2 halves feature map dimensions |
Activation | ||
Loss Function |
A4.3.10 Explain model selection, comparison (AO2)
A4.3.10_1 Algorithm performance varies by data, problem
Performance Variation
- Different ML algorithms perform better or worse depending on dataset and problem
- Data Characteristics: Size, dimensionality, noise, distribution impact suitability
- Example: Linear regression works for linear relationships, fails for complex data
- Problem Type: Classification, regression, clustering require different algorithms
- Example: Decision trees effective for classification with categorical features
Factors Influencing Performance
- Data Size: Large datasets favor complex models; small datasets suit simpler ones
- Example: Small dataset uses logistic regression; millions use deep learning
- Feature Types: Numeric vs. categorical affects algorithm choice
- Example: K-NN for numeric; Naive Bayes for categorical
- Complexity: Simple problems need simple models; complex need advanced algorithms
- Example: House price prediction may use linear regression; object detection needs CNNs
A4.3.10_2 Model selection based on problem nature, complexity, outcomes
Selection Factor | Considerations | Example |
---|---|---|
Problem Nature | Classification: logistic regression, SVM, random forests Regression: linear regression, gradient boosting, neural networks Clustering: K-means, DBSCAN |
Fraud detection (classification) uses random forest for robustness to imbalanced data |
Complexity | Simple problems: less complex models to avoid overfitting Complex problems: advanced models like deep learning |
|
Outcomes | Desired outcomes guide selection: accuracy, interpretability, speed Trade-offs between simpler (faster) and complex (more accurate) models |
A4.3.10_3 Data characteristics impact performance
Data Size
- Small datasets: Risk overfitting with complex models
- Use simpler models like Naive Bayes
- Large datasets: Enable complex models like neural networks
- Example: 100 samples use logistic regression; millions use deep learning
Data Noise
- Noisy data: Requires robust algorithms
- Random forests handle noise well
- Example: Noisy sales dataset benefits from ensemble model
Feature Distribution
- Algorithms assume specific distributions
- Linear regression assumes linear relationships
- Example: Non-linear data needs decision trees or neural networks
Sparsity
- Sparse datasets (e.g., text) benefit from specific algorithms
- SVM or neural networks with dimensionality reduction
- Example: Text classification with TF-IDF uses SVM
Imbalanced Data
- Imbalanced datasets require special handling
- Ensemble methods or techniques like SMOTE
- Example: Random forests with class weighting for fraud detection
Evaluation
- Use appropriate metrics based on data and problem
- F1 for imbalanced data, accuracy for balanced
- Example: Cross-validating imbalanced dataset using F1 score
A4.4.1 Discuss machine learning ethical implications (AO3)
A4.4.1_1 Issues: accountability, fairness, bias, consent, environment, privacy, security, societal impact, transparency
Accountability
- Issue: Determining responsibility for ML decisions
- Especially in critical applications like healthcare
- Example: Who is accountable if ML misdiagnoses a patient?
- Consideration: Clear governance frameworks needed
Fairness
- Issue: Ensuring models treat all groups equitably
- Avoiding discrimination based on race, gender, etc.
- Example: Hiring algorithm favoring male candidates
- Consideration: Fairness metrics and regular audits
Bias
- Issue: Models inherit biases from training data
- Leading to skewed or unfair predictions
- Example: Facial recognition with lower accuracy for darker skin
- Consideration: Diverse datasets and bias mitigation
Consent
- Issue: Using personal data without explicit consent
- Example: Training recommendation system without permission
- Consideration: Transparent policies and opt-in mechanisms
Environment
- Issue: Training large models consumes significant energy
- Contributing to carbon emissions
- Example: Training GPT emits as much CO2 as several cars yearly
- Consideration: Optimizing algorithms and efficient hardware
Privacy
- Issue: Models may expose sensitive data
- Enable re-identification from anonymized data
- Example: Model trained on medical records leaking patient info
- Consideration: Differential privacy or anonymization
Security
- Issue: Vulnerable to adversarial attacks
- Manipulating predictions with altered inputs
- Example: Altering image to fool self-driving car detection
- Consideration: Robust model design and security testing
Societal Impact
- Issue: Can disrupt jobs, exacerbate inequality
- Influence societal behaviors
- Example: Automation replacing low-skill jobs
- Consideration: Workforce retraining and equitable access
Transparency
- Issue: Complex models are "black boxes"
- Decisions hard to explain
- Example: Loan denial without clear reasoning
- Consideration: Explainable AI techniques (e.g., SHAP values)
A4.4.1_2 Training data bias challenges
Challenges
- Data Representation: Training data may underrepresent certain groups
- Leading to biased predictions
- Example: Hiring model trained on male-dominated resumes undervalues female candidates
- Historical Bias: Data reflecting past discrimination perpetuates bias
- Example: Predictive policing using biased arrest data unfairly targets communities
- Data Collection: Non-transparent methods can introduce bias
- Example: Social media data skewed toward active users may not represent all populations
Mitigation
- Use diverse, representative datasets for fair outcomes
- Apply bias detection tools (e.g., fairness metrics)
- Use rewighting techniques to balance data
- Regularly audit models for biased predictions
- Adjust training data or algorithms accordingly
- Example: Rebalancing dataset to include equal gender representation in hiring model
A4.4.1_3 Ethics in online communication: misinformation, harassment, anonymity, privacy
Misinformation
- Issue: ML can amplify false information
- Through recommendation systems or generate misleading content
- Example: Social media algorithms promoting viral false news
- Consideration: Content moderation and fact-checking algorithms
Harassment
- Issue: ML platforms may fail to detect harassment
- Enabling toxic behavior
- Example: Chatbots not filtering abusive language effectively
- Consideration: NLP models to detect and flag harmful content
Anonymity
- Issue: Anonymity enables harmful behavior but over-identification risks privacy
- Example: Anonymous accounts spreading hate vs. requiring IDs exposing data
- Consideration: Balance with moderated platforms or pseudonymity
Privacy
- Issue: ML systems processing communication data may compromise privacy
- Through data collection or inference
- Example: Sentiment analysis inferring personal details from public posts
- Consideration: Strict privacy policies, anonymization, federated learning
A4.4.2 Discuss ethical aspects of technology integration (AO3)
A4.4.2_1 Reassess ethical guidelines as technology advances
Need for Reassessment
- Rapid advancements in AI, quantum computing, AR, VR outpace existing frameworks
- New capabilities introduce unforeseen ethical challenges
- Example: Rise of generative AI (deepfakes) necessitates new guidelines
Process
- Regularly review and update ethical standards
- Involve stakeholders: developers, policymakers, users
- Incorporate interdisciplinary perspectives (ethics, law, sociology)
- Example: IEEE's Ethically Aligned Design framework evolves for AI
Challenges
- Balancing innovation with regulation to avoid stifling progress
- Global coordination as standards vary across cultures/jurisdictions
- Example: Differing privacy laws (GDPR vs. less strict regulations) complicate universal AI ethics
A4.4.2_2 Implications of quantum computing, AR, VR, AI on society, rights, privacy, equity
Technology | Societal Impact | Rights | Privacy | Equity |
---|---|---|---|---|
Quantum Computing | Accelerates innovation; disrupts encryption | Threatens data security rights | Risks decrypting sensitive data | Limited access for less affluent |
Augmented Reality | Enhances experiences; blurs reality | Raises surveillance concerns | ||
Virtual Reality | Transforms engagement; risks isolation | |||
Artificial Intelligence | Automates tasks; risks job loss |
Key Considerations
- Quantum Computing: Develop post-quantum cryptography; ensure equitable access
- AR: Enforce transparent data policies; privacy-by-design principles
- VR: Implement strict data minimization; user control over data collection
- AI: Promote fairness through bias audits; transparent decision-making; inclusive practices