M_E_GA_Base Technical

Table of Contents

  1. Introduction
  2. Architecture Overview
  3. Configuration Parameters
  4. Genetic Lifecycle Management
  5. Population Initialization
  6. Fitness Evaluation and Callbacks
  7. Selection and Reproduction
  8. Mutation and CrossoverMutation ProcessCrossover Process
  9. Encoding and DecodingGene Encoding and DecodingDelimiter Handling
  10. Delimiter Handling and Repair Mechanisms
  11. Logging and Experiment Tracking
  12. Code ImplementationClass InitializationLogging MethodsMutation and Crossover Methods
  13. Usage Examples
  14. Conclusion

Introduction

The M_E_GA_Base class is the GA interface with the M_E_Engine. Together they make the MEGA framework. M_E_GA_Base  is responsible for managing the GA loop which consists of initializing the initial population, evaluating fitness, selecting parents for breeding, performing crossover, managing mutations.

This document provides a comprehensive breakdown of M_E_GA_Base, focusing on its core attributes, primary responsibilities, and method implementations, enriched with code examples for clarity.

Architecture Overview

M_E_GA_Base structures the genetic algorithm’s main processes, from population initialization to mutation, crossover, and logging. Its design emphasizes flexibility and modularity, featuring:

  • Configurable Initialization: Customizable algorithm parameters for tailored GA behavior.
  • Lifecycle Management: Orchestrates fitness evaluation and new population selection.
  • Dynamic Mutation and Crossover: Implements probabilistic gene selection for mutations and crossovers.
  • Mutable Encoding: Supports encoding and decoding mechanisms, including delimiter handling for nested gene structures and the creation and management of Meta Genes and the Meta Genome.
  • Detailed Logging: Captures comprehensive logs for analysis and experiment tracking.

Configuration Parameters

Upon initialization, M_E_GA_Base allows extensive customization through its parameters:

  • Mutation and Crossover Control:
    • mutation_prob: Probability of mutation per gene.
    • delimited_mutation_prob: Probability of mutation within delimited segments.
    • delimit_delete_prob: Probability of deleting delimiters.
    • open_mutation_prob: Probability of performing an open mutation.
    • capture_mutation_prob: Probability of capturing a gene segment as a meta-gene.
    • delimiter_insert_prob: Probability of inserting a pair of delimiters.
    • crossover_prob: Probability of crossover between parents.
    • base_gene_prob: Probability of selecting a base gene over a captured segment during gene selection.
    • capture_gene_prob: Probability of capturing a gene as a meta-gene.
  • Delimiter Management:
    • delimiters: Enables the use of delimiters (Start and End) in the initial population.
    • delimiter_space: Number of genes between delimiters when delimiters are used.
  • Population and Generation Control:
    • population_size: Number of individuals in the population.
    • num_parents: Number of parents selected for reproduction.
    • elitism_ratio: Proportion of elite individuals preserved each generation.
    • max_generations: Maximum number of generations to run.
    • max_individual_length: Maximum length of an an organism when creating initial population. Lengths are random up to the Max.
  • Logging Options:
    • logging: Enables or disables logging.
    • generation_logging: Logs generation-level data.
    • mutation_logging: Logs mutation events.
    • crossover_logging: Logs crossover events.
    • individual_logging: Logs individual organism data.
  • Callbacks:
    • before_fitness_evaluation: Function called before fitness evaluation.
    • after_population_selection: Function called after selecting the new population.
    • before_generation_finalize: Function called before finalizing the current generation.
  • Other Parameters:
    • genes: List of initial genes available for the genetic algorithm.
    • fitness_function: Function used to evaluate the fitness of individuals.
    • experiment_name: Name of the experiment for logging purposes.
    • encodings: External encodings to integrate with the encoding manager.
    • seed: Seed value for random number generation to ensure reproducibility.

Genetic Lifecycle Management

Population Initialization

The initialize_population method generates an initial population of organisms based on the provided genes and maximum individual length.

def initialize_population(self):
    population = []
    for _ in range(self.population_size):
        individual_length = random.randint(2, self.max_individual_length)
        organism = self.encoding_manager.generate_random_organism(
            functional_length=individual_length,
            include_specials=self.delimiters,
            probability=0.10,
            verbose=False
        )
        population.append(organism)
    return population

Fitness Evaluation and Callbacks

The fitness of each organism is evaluated using the fitness_function provided during initialization. Callbacks allow for custom actions before and after fitness evaluation.

def evaluate_population_fitness(self):
    if self.before_fitness_evaluation:
        self.before_fitness_evaluation(self)

    self.fitness_scores = [self.fitness_function(individual, self) for individual in self.population]

    if self.after_population_selection:
        self.after_population_selection(self)

    return self.fitness_scores

Callback Example

def custom_before_evaluation(ga_instance):
    print(f"Evaluating fitness for generation {ga_instance.current_generation}")

ga = M_E_GA_Base(
    genes=['A', 'B', 'C'],
    fitness_function=my_fitness_function,
    before_fitness_evaluation=custom_before_evaluation
)

Selection and Reproduction

The select_and_generate_new_population method selects the fittest individuals and generates a new population through elitism, crossover, and mutation.

def select_and_generate_new_population(self, generation):
    # Sort population based on fitness scores
    sorted_population = sorted(
        zip(self.population, self.fitness_scores),
        key=lambda x: x[1],
        reverse=True
    )

    # Apply elitism
    num_elites = int(self.elitism_ratio * self.population_size)
    elites = [individual for individual, _ in sorted_population[:num_elites]]
    new_population = elites[:]

    # Select parents
    selected_parents = [individual for individual, _ in sorted_population[:self.num_parents]]

    # Generate offspring
    while len(new_population) < self.population_size:
        parent1, parent2 = random.sample(selected_parents, 2)
        offspring1, offspring2 = self.crossover(parent1, parent2)
        offspring1 = self.mutate_organism(offspring1, generation)
        offspring2 = self.mutate_organism(offspring2, generation)
        new_population.extend([offspring1, offspring2])

    return new_population[:self.population_size]

Mutation and Crossover

Mutation Process

Mutations introduce genetic diversity by altering genes based on predefined probabilities.

Mutation Types

  • Point Mutation: Replaces a gene with another randomly selected gene.
  • Insertion: Inserts a new gene at a random position.
  • Deletion: Removes a gene from the organism.
  • Swap: Swaps two adjacent genes.
  • Delimiter-specific Mutations: Handles mutations involving Start and End delimiters.

Point Mutation Example

def perform_point_mutation(self, organism, index):
    new_gene = self.select_gene()
    organism[index] = new_gene
    return organism, index

Crossover Process

Crossover combines genetic material from two parent organisms to produce offspring.

Single-point Crossover Example

def crossover(self, parent1, parent2):
    if random.random() < self.crossover_prob:
        crossover_point = random.randint(1, min(len(parent1), len(parent2)) - 1)
        offspring1 = parent1[:crossover_point] + parent2[crossover_point:]
        offspring2 = parent2[:crossover_point] + parent1[crossover_point:]
    else:
        offspring1, offspring2 = parent1[:], parent2[:]
    return offspring1, offspring2

Maintaining Structural Integrity

The crossover process accounts for delimiters to avoid breaking nested gene structures.

def crossover(self, parent1, parent2):
    # Identify valid crossover points that don't disrupt delimiters
    non_delimited_indices = self.get_non_delimiter_indices(parent1, parent2)
    crossover_point = self.choose_crossover_point(non_delimited_indices)
    
    # Perform crossover as before
    if crossover_point is not None:
        offspring1 = parent1[:crossover_point] + parent2[crossover_point:]
        offspring2 = parent2[:crossover_point] + parent1[crossover_point:]
    else:
        offspring1, offspring2 = parent1[:], parent2[:]
    
    return offspring1, offspring2

Encoding and Decoding

Gene Encoding and Decoding

Genes are encoded into hash keys for compact storage and efficient lookup.

def encode_string(self, genetic_string):
    encoded_sequence = []
    for gene in genetic_string:
        encoded_gene = self.encoding_manager.reverse_encodings.get(gene)
        if encoded_gene is None:
            self.encoding_manager.add_gene(gene)
            encoded_gene = self.encoding_manager.reverse_encodings[gene]
        encoded_sequence.append(encoded_gene)
    return encoded_sequence
def decode_organism(self, encoded_organism):
    decoded_genes = self.encoding_manager.decode(encoded_organism)
    return decoded_genes

Delimiter Handling

Delimiters (Start and End) define hierarchical gene segments and are managed carefully during mutations and crossovers.

def is_fully_delimited(self, organism):
    start_codon = self.encoding_manager.reverse_encodings['Start']
    end_codon = self.encoding_manager.reverse_encodings['End']
    return organism[0] == start_codon and organism[-1] == end_codon

Delimiter Handling and Repair Mechanisms

Unbalanced delimiters can disrupt the genetic structure. The repair method ensures that delimiters are correctly paired. This is more for debugging/development purposes as delimiters can be programmatically handled to avoid mismatch. Enabling this creates significant overhead so only use if you are experiencing issues and just need to make things work until you can fix the problem.

def repair(self, organism):
    start_codon = self.encoding_manager.reverse_encodings['Start']
    end_codon = self.encoding_manager.reverse_encodings['End']
    depth = 0
    indices_to_remove = []
    for i, codon in enumerate(organism):
        if codon == start_codon:
            depth += 1
        elif codon == end_codon:
            if depth == 0:
                indices_to_remove.append(i)
            else:
                depth -= 1
    # Remove unpaired 'End' codons
    for index in sorted(indices_to_remove, reverse=True):
        del organism[index]
    # Remove extra 'Start' codons if depth > 0
    while depth > 0:
        organism.remove(start_codon)
        depth -= 1
    return organism

Logging and Experiment Tracking

Detailed logging enables tracking of the algorithm’s progress and facilitates analysis.

Logging Configuration

Logging can be customized via initialization parameters:

  • logging: Enable or disable logging.
  • generation_logging: Log generation summaries.
  • mutation_logging: Log mutation events.
  • crossover_logging: Log crossover events.
  • individual_logging: Log individual organism data.

Saving Logs

Logs are saved in JSON format for easy processing.

def save_logs(self, logs, file_name=None):
    if file_name is None:
        file_name = f"{self.experiment_name}_{datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}.json"
    with open(file_name, 'w') as f:
        json.dump(logs, f, indent=4)
    print(f"Logs saved to {file_name}")

Code Implementation

Class Initialization

The __init__ method sets up the genetic algorithm with all configurable parameters.

class M_E_GA_Base:
    def __init__(self, genes, fitness_function, mutation_prob=0.01, crossover_prob=0.5, 
                 population_size=100, max_generations=1000, logging=True, 
                 experiment_name=None, seed=None, **kwargs):
        self.genes = genes
        self.fitness_function = fitness_function
        self.mutation_prob = mutation_prob
        self.crossover_prob = crossover_prob
        self.population_size = population_size
        self.max_generations = max_generations
        self.logging = logging
        self.experiment_name = experiment_name or "Experiment"
        self.seed = seed
        if seed is not None:
            random.seed(seed)
        # Initialize other attributes and encoding manager

Logging Methods

Logging methods capture various aspects of the GA’s execution.

def start_new_generation_logging(self, generation_number):
    generation_log = {
        "generation": generation_number,
        "summary": {},
        "individuals": [],
        "mutations": [],
        "crossovers": []
    }
    self.logs.append(generation_log)

Mutation and Crossover Methods

Methods implementing the genetic operations, accounting for delimiters and encoding.

def mutate_organism(self, organism, generation):
    for i in range(len(organism)):
        if random.random() < self.mutation_prob:
            mutation_type = self.select_mutation_type()
            organism, i = self.apply_mutation(organism, i, mutation_type)
    return organism

Utility Functions

Helper methods for gene selection, depth calculation, and more.

def select_gene(self):
    return random.choice(list(self.encoding_manager.encodings.keys()))

def calculate_depth(self, organism, index):
    depth = 0
    start_codon = self.encoding_manager.reverse_encodings['Start']
    end_codon = self.encoding_manager.reverse_encodings['End']
    for codon in organism[:index + 1]:
        if codon == start_codon:
            depth += 1
        elif codon == end_codon:
            depth -= 1
    return depth

Usage Examples

Example 1: Basic Usage

# Define genes and fitness function
genes = ['A', 'B', 'C']

def fitness_function(organism, ga):
    decoded = ga.decode_organism(organism)
    return decoded.count('A')  # Fitness based on count of 'A'

# Initialize GA
ga = M_E_GA_Base(
    genes=genes,
    fitness_function=fitness_function,
    population_size=50,
    max_generations=20,
    logging=True,
    experiment_name="BasicUsageExample"
)

# Run GA
ga.run_algorithm()

Example 2: Custom Callbacks and Logging

def before_evaluation(ga_instance):
    print(f"Starting fitness evaluation for generation {ga_instance.current_generation}")

def after_selection(ga_instance):
    print(f"Generation {ga_instance.current_generation} completed")

# Initialize GA with callbacks and custom logging
ga = M_E_GA_Base(
    genes=['X', 'Y', 'Z'],
    fitness_function=my_fitness_function,
    before_fitness_evaluation=before_evaluation,
    after_population_selection=after_selection,
    logging=True,
    generation_logging=True,
    mutation_logging=True,
    experiment_name="CustomCallbacksExample"
)

ga.run_algorithm()
ga.save_logs(ga.logs, "CustomCallbacksExample_logs.json")

Example 3: Adjusting Mutation Rates

# Initialize GA with higher mutation probability
ga = M_E_GA_Base(
    genes=['G1', 'G2', 'G3'],
    fitness_function=my_fitness_function,
    mutation_prob=0.05,  # Increased mutation rate
    crossover_prob=0.7,
    population_size=100,
    max_generations=50,
    logging=True,
    experiment_name="HighMutationRateExample"
)

ga.run_algorithm()

Conclusion

The M_E_GA_Base class provides a powerful and flexible foundation for implementing genetic algorithms with mutable encodings and nested gene structures. Its configurable parameters, advanced mutation and crossover mechanisms, and detailed logging capabilities make it well-suited for complex optimization problems requiring adaptable genetic representations.

By leveraging the encoding and delimiter features, users can model hierarchical and nested genetic information, enabling the exploration of more intricate solution spaces. The inclusion of callbacks and extensive logging further enhances the ability to customize and monitor the genetic algorithm’s behavior, facilitating deeper insights into the evolutionary process.