Table of Contents
- Introduction
- Architecture Overview
- Configuration Parameters
- Genetic Lifecycle Management
- Population Initialization
- Fitness Evaluation and Callbacks
- Selection and Reproduction
- Mutation and CrossoverMutation ProcessCrossover Process
- Encoding and DecodingGene Encoding and DecodingDelimiter Handling
- Delimiter Handling and Repair Mechanisms
- Logging and Experiment Tracking
- Code ImplementationClass InitializationLogging MethodsMutation and Crossover Methods
- Usage Examples
- Conclusion
Introduction
The M_E_GA_Base class is the GA interface with the M_E_Engine. Together they make the MEGA framework. M_E_GA_Base is responsible for managing the GA loop which consists of initializing the initial population, evaluating fitness, selecting parents for breeding, performing crossover, managing mutations.
This document provides a comprehensive breakdown of M_E_GA_Base, focusing on its core attributes, primary responsibilities, and method implementations, enriched with code examples for clarity.
Architecture Overview
M_E_GA_Base structures the genetic algorithm’s main processes, from population initialization to mutation, crossover, and logging. Its design emphasizes flexibility and modularity, featuring:
- Configurable Initialization: Customizable algorithm parameters for tailored GA behavior.
- Lifecycle Management: Orchestrates fitness evaluation and new population selection.
- Dynamic Mutation and Crossover: Implements probabilistic gene selection for mutations and crossovers.
- Mutable Encoding: Supports encoding and decoding mechanisms, including delimiter handling for nested gene structures and the creation and management of Meta Genes and the Meta Genome.
- Detailed Logging: Captures comprehensive logs for analysis and experiment tracking.
Configuration Parameters
Upon initialization, M_E_GA_Base allows extensive customization through its parameters:
- Mutation and Crossover Control:
mutation_prob
: Probability of mutation per gene.delimited_mutation_prob
: Probability of mutation within delimited segments.delimit_delete_prob
: Probability of deleting delimiters.open_mutation_prob
: Probability of performing an open mutation.capture_mutation_prob
: Probability of capturing a gene segment as a meta-gene.delimiter_insert_prob
: Probability of inserting a pair of delimiters.crossover_prob
: Probability of crossover between parents.base_gene_prob
: Probability of selecting a base gene over a captured segment during gene selection.capture_gene_prob
: Probability of capturing a gene as a meta-gene.
- Delimiter Management:
delimiters
: Enables the use of delimiters (Start
andEnd
) in the initial population.delimiter_space
: Number of genes between delimiters when delimiters are used.
- Population and Generation Control:
population_size
: Number of individuals in the population.num_parents
: Number of parents selected for reproduction.elitism_ratio
: Proportion of elite individuals preserved each generation.max_generations
: Maximum number of generations to run.max_individual_length
: Maximum length of an an organism when creating initial population. Lengths are random up to the Max.
- Logging Options:
logging
: Enables or disables logging.generation_logging
: Logs generation-level data.mutation_logging
: Logs mutation events.crossover_logging
: Logs crossover events.individual_logging
: Logs individual organism data.
- Callbacks:
before_fitness_evaluation
: Function called before fitness evaluation.after_population_selection
: Function called after selecting the new population.before_generation_finalize
: Function called before finalizing the current generation.
- Other Parameters:
genes
: List of initial genes available for the genetic algorithm.fitness_function
: Function used to evaluate the fitness of individuals.experiment_name
: Name of the experiment for logging purposes.encodings
: External encodings to integrate with the encoding manager.seed
: Seed value for random number generation to ensure reproducibility.
Genetic Lifecycle Management
Population Initialization
The initialize_population
method generates an initial population of organisms based on the provided genes and maximum individual length.
def initialize_population(self): population = [] for _ in range(self.population_size): individual_length = random.randint(2, self.max_individual_length) organism = self.encoding_manager.generate_random_organism( functional_length=individual_length, include_specials=self.delimiters, probability=0.10, verbose=False ) population.append(organism) return population
Fitness Evaluation and Callbacks
The fitness of each organism is evaluated using the fitness_function
provided during initialization. Callbacks allow for custom actions before and after fitness evaluation.
def evaluate_population_fitness(self): if self.before_fitness_evaluation: self.before_fitness_evaluation(self) self.fitness_scores = [self.fitness_function(individual, self) for individual in self.population] if self.after_population_selection: self.after_population_selection(self) return self.fitness_scores
Callback Example
def custom_before_evaluation(ga_instance): print(f"Evaluating fitness for generation {ga_instance.current_generation}") ga = M_E_GA_Base( genes=['A', 'B', 'C'], fitness_function=my_fitness_function, before_fitness_evaluation=custom_before_evaluation )
Selection and Reproduction
The select_and_generate_new_population
method selects the fittest individuals and generates a new population through elitism, crossover, and mutation.
def select_and_generate_new_population(self, generation): # Sort population based on fitness scores sorted_population = sorted( zip(self.population, self.fitness_scores), key=lambda x: x[1], reverse=True ) # Apply elitism num_elites = int(self.elitism_ratio * self.population_size) elites = [individual for individual, _ in sorted_population[:num_elites]] new_population = elites[:] # Select parents selected_parents = [individual for individual, _ in sorted_population[:self.num_parents]] # Generate offspring while len(new_population) < self.population_size: parent1, parent2 = random.sample(selected_parents, 2) offspring1, offspring2 = self.crossover(parent1, parent2) offspring1 = self.mutate_organism(offspring1, generation) offspring2 = self.mutate_organism(offspring2, generation) new_population.extend([offspring1, offspring2]) return new_population[:self.population_size]
Mutation and Crossover
Mutation Process
Mutations introduce genetic diversity by altering genes based on predefined probabilities.
Mutation Types
- Point Mutation: Replaces a gene with another randomly selected gene.
- Insertion: Inserts a new gene at a random position.
- Deletion: Removes a gene from the organism.
- Swap: Swaps two adjacent genes.
- Delimiter-specific Mutations: Handles mutations involving
Start
andEnd
delimiters.
Point Mutation Example
def perform_point_mutation(self, organism, index): new_gene = self.select_gene() organism[index] = new_gene return organism, index
Crossover Process
Crossover combines genetic material from two parent organisms to produce offspring.
Single-point Crossover Example
def crossover(self, parent1, parent2): if random.random() < self.crossover_prob: crossover_point = random.randint(1, min(len(parent1), len(parent2)) - 1) offspring1 = parent1[:crossover_point] + parent2[crossover_point:] offspring2 = parent2[:crossover_point] + parent1[crossover_point:] else: offspring1, offspring2 = parent1[:], parent2[:] return offspring1, offspring2
Maintaining Structural Integrity
The crossover process accounts for delimiters to avoid breaking nested gene structures.
def crossover(self, parent1, parent2): # Identify valid crossover points that don't disrupt delimiters non_delimited_indices = self.get_non_delimiter_indices(parent1, parent2) crossover_point = self.choose_crossover_point(non_delimited_indices) # Perform crossover as before if crossover_point is not None: offspring1 = parent1[:crossover_point] + parent2[crossover_point:] offspring2 = parent2[:crossover_point] + parent1[crossover_point:] else: offspring1, offspring2 = parent1[:], parent2[:] return offspring1, offspring2
Encoding and Decoding
Gene Encoding and Decoding
Genes are encoded into hash keys for compact storage and efficient lookup.
def encode_string(self, genetic_string): encoded_sequence = [] for gene in genetic_string: encoded_gene = self.encoding_manager.reverse_encodings.get(gene) if encoded_gene is None: self.encoding_manager.add_gene(gene) encoded_gene = self.encoding_manager.reverse_encodings[gene] encoded_sequence.append(encoded_gene) return encoded_sequence
def decode_organism(self, encoded_organism): decoded_genes = self.encoding_manager.decode(encoded_organism) return decoded_genes
Delimiter Handling
Delimiters (Start
and End
) define hierarchical gene segments and are managed carefully during mutations and crossovers.
def is_fully_delimited(self, organism): start_codon = self.encoding_manager.reverse_encodings['Start'] end_codon = self.encoding_manager.reverse_encodings['End'] return organism[0] == start_codon and organism[-1] == end_codon
Delimiter Handling and Repair Mechanisms
Unbalanced delimiters can disrupt the genetic structure. The repair
method ensures that delimiters are correctly paired. This is more for debugging/development purposes as delimiters can be programmatically handled to avoid mismatch. Enabling this creates significant overhead so only use if you are experiencing issues and just need to make things work until you can fix the problem.
def repair(self, organism): start_codon = self.encoding_manager.reverse_encodings['Start'] end_codon = self.encoding_manager.reverse_encodings['End'] depth = 0 indices_to_remove = [] for i, codon in enumerate(organism): if codon == start_codon: depth += 1 elif codon == end_codon: if depth == 0: indices_to_remove.append(i) else: depth -= 1 # Remove unpaired 'End' codons for index in sorted(indices_to_remove, reverse=True): del organism[index] # Remove extra 'Start' codons if depth > 0 while depth > 0: organism.remove(start_codon) depth -= 1 return organism
Logging and Experiment Tracking
Detailed logging enables tracking of the algorithm’s progress and facilitates analysis.
Logging Configuration
Logging can be customized via initialization parameters:
logging
: Enable or disable logging.generation_logging
: Log generation summaries.mutation_logging
: Log mutation events.crossover_logging
: Log crossover events.individual_logging
: Log individual organism data.
Saving Logs
Logs are saved in JSON format for easy processing.
def save_logs(self, logs, file_name=None): if file_name is None: file_name = f"{self.experiment_name}_{datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}.json" with open(file_name, 'w') as f: json.dump(logs, f, indent=4) print(f"Logs saved to {file_name}")
Code Implementation
Class Initialization
The __init__
method sets up the genetic algorithm with all configurable parameters.
class M_E_GA_Base: def __init__(self, genes, fitness_function, mutation_prob=0.01, crossover_prob=0.5, population_size=100, max_generations=1000, logging=True, experiment_name=None, seed=None, **kwargs): self.genes = genes self.fitness_function = fitness_function self.mutation_prob = mutation_prob self.crossover_prob = crossover_prob self.population_size = population_size self.max_generations = max_generations self.logging = logging self.experiment_name = experiment_name or "Experiment" self.seed = seed if seed is not None: random.seed(seed) # Initialize other attributes and encoding manager
Logging Methods
Logging methods capture various aspects of the GA’s execution.
def start_new_generation_logging(self, generation_number): generation_log = { "generation": generation_number, "summary": {}, "individuals": [], "mutations": [], "crossovers": [] } self.logs.append(generation_log)
Mutation and Crossover Methods
Methods implementing the genetic operations, accounting for delimiters and encoding.
def mutate_organism(self, organism, generation): for i in range(len(organism)): if random.random() < self.mutation_prob: mutation_type = self.select_mutation_type() organism, i = self.apply_mutation(organism, i, mutation_type) return organism
Utility Functions
Helper methods for gene selection, depth calculation, and more.
def select_gene(self): return random.choice(list(self.encoding_manager.encodings.keys())) def calculate_depth(self, organism, index): depth = 0 start_codon = self.encoding_manager.reverse_encodings['Start'] end_codon = self.encoding_manager.reverse_encodings['End'] for codon in organism[:index + 1]: if codon == start_codon: depth += 1 elif codon == end_codon: depth -= 1 return depth
Usage Examples
Example 1: Basic Usage
# Define genes and fitness function genes = ['A', 'B', 'C'] def fitness_function(organism, ga): decoded = ga.decode_organism(organism) return decoded.count('A') # Fitness based on count of 'A' # Initialize GA ga = M_E_GA_Base( genes=genes, fitness_function=fitness_function, population_size=50, max_generations=20, logging=True, experiment_name="BasicUsageExample" ) # Run GA ga.run_algorithm()
Example 2: Custom Callbacks and Logging
def before_evaluation(ga_instance): print(f"Starting fitness evaluation for generation {ga_instance.current_generation}") def after_selection(ga_instance): print(f"Generation {ga_instance.current_generation} completed") # Initialize GA with callbacks and custom logging ga = M_E_GA_Base( genes=['X', 'Y', 'Z'], fitness_function=my_fitness_function, before_fitness_evaluation=before_evaluation, after_population_selection=after_selection, logging=True, generation_logging=True, mutation_logging=True, experiment_name="CustomCallbacksExample" ) ga.run_algorithm() ga.save_logs(ga.logs, "CustomCallbacksExample_logs.json")
Example 3: Adjusting Mutation Rates
# Initialize GA with higher mutation probability ga = M_E_GA_Base( genes=['G1', 'G2', 'G3'], fitness_function=my_fitness_function, mutation_prob=0.05, # Increased mutation rate crossover_prob=0.7, population_size=100, max_generations=50, logging=True, experiment_name="HighMutationRateExample" ) ga.run_algorithm()
Conclusion
The M_E_GA_Base class provides a powerful and flexible foundation for implementing genetic algorithms with mutable encodings and nested gene structures. Its configurable parameters, advanced mutation and crossover mechanisms, and detailed logging capabilities make it well-suited for complex optimization problems requiring adaptable genetic representations.
By leveraging the encoding and delimiter features, users can model hierarchical and nested genetic information, enabling the exploration of more intricate solution spaces. The inclusion of callbacks and extensive logging further enhances the ability to customize and monitor the genetic algorithm’s behavior, facilitating deeper insights into the evolutionary process.