M_E_Engine Technical

The M_E_Engine is the central component that introduces Mutable Encoding to the MEGA (Mutable Encoding Genetic Algorithm) framework. Unlike traditional genetic algorithms where gene representations are static, the M_E_Engine allows genes and their encodings to evolve dynamically. This flexibility enhances the algorithm’s capability to explore and optimize complex solution spaces by enabling the creation of new genes and genetic structures over time.

This documentation provides an in-depth technical breakdown of the M_E_Engine, focusing on its core component, the EncodingManager. It includes detailed explanations of its functionalities, code examples, and insights into how it enhances the MEGA framework.

Table of Contents

  1. Architecture Overview
  2. EncodingManager Components
  3. Code Implementation
  4. Usage Examples
  5. Unit Testing
  6. Conclusion

Architecture Overview

The M_E_Engine’s flexibility stems from its ability to manage and evolve gene encodings dynamically. At its core, the EncodingManager handles:

  • Gene Management: Adding and managing genes with unique identifiers.
  • Encoding and Decoding: Translating between readable genes and numerical hash keys.
  • Segment Capture and Nesting: Creating meta genes by capturing gene sequences and allowing hierarchical nesting.
  • Opening Segments: Decomposing meta genes back into their constituent genes.
  • Integration: Incorporating external encodings and ensuring consistency.

This dynamic management facilitates the exploration of complex genetic structures, enhancing the MEGA framework’s problem-solving capabilities.

EncodingManager Components

Gene Management

Responsibilities:

  • Adding Genes: Introducing new genes into the system with unique identifiers.
  • Gene Encodings: Mapping genes to their numerical hash keys.

Implementation Details:

  • Gene Counter: Maintains a unique identifier for each gene.
  • Default Genes: Initializes with predefined genes like ‘Start’ and ‘End’ with specific hash keys.

Encoding and Decoding

Responsibilities:

  • Encoding Genes: Translating readable gene sequences into numerical hash keys.
  • Decoding Sequences: Converting numerical hash keys back into readable gene sequences.

Implementation Details:

  • Utilizes xxhash for generating consistent and unique hash keys.
  • Caches decoding results using functools.lru_cache for performance optimization.

Segment Capture and Nesting

Responsibilities:

  • Capturing Segments: Grouping sequences of genes into meta genes.
  • Nesting Meta Genes: Allowing meta genes to contain other meta genes, creating hierarchical structures.

Implementation Details:

  • Prevents duplicate meta genes by enforcing a no-duplicate rule.
  • Assigns unique hash keys to newly captured segments.

Opening Segments

Responsibilities:

  • Unpacking Meta Genes: Breaking down meta genes into their original gene sequences.
  • Delimiters: Optionally adds ‘Start’ and ‘End’ delimiters to define segment boundaries.

Implementation Details:

  • Supports opening segments with or without delimiters based on parameters.

Integration of External Encodings

Responsibilities:

  • Integrating Uploaded Encodings: Merging genetic data from external sources or previous runs.
  • Avoiding Conflicts: Ensures unique identifiers to prevent clashes during integration.

Implementation Details:

  • Parses external encoding data and aligns it with existing genes.
  • Updates internal records to include new genes and meta genes seamlessly.

Code Implementation

Below is the comprehensive implementation of the EncodingManager class, encapsulating all the functionalities described above.

Class Initialization

import random
import xxhash
import functools

class EncodingManager:
    def __init__(self):
        # Initialize with default genes 'Start' and 'End'
        self.encodings = {}
        self.reverse_encodings = {}
        self.captured_segments = {}
        self.gene_counter = 3  # Start the counter from 3 after 'Start' and 'End'

        # Add default delimiters with predefined unique IDs
        self.add_gene('Start', predefined_id=1)
        self.add_gene('End', predefined_id=2)

Explanation:

  • encodings: Maps hash keys to genes or meta genes.
  • reverse_encodings: Maps genes to their hash keys for quick lookup.
  • captured_segments: Stores captured segments (meta genes) to prevent duplication.
  • gene_counter: Ensures unique identifiers for each new gene.
  • Default Genes: ‘Start’ and ‘End’ are added with predefined hash keys to delineate gene segments.

Gene Management Methods

Adding Genes

def add_gene(self, gene, verbose=False, predefined_id=None):
    # Use predefined_id for default genes or increment gene_counter for new genes
    identifier = predefined_id if predefined_id is not None else self.gene_counter

    if gene in self.reverse_encodings:
        if verbose:
            print(f"Gene '{gene}' is already added.")
        return

    # Generate hash key based on the unique identifier
    hash_key = self.generate_hash_key(identifier)

    self.encodings[hash_key] = gene
    self.reverse_encodings[gene] = hash_key
    if verbose:
        print(f"Added gene '{gene}' with hash key {hash_key}.")

    # Increment the counter for the next gene, if not using predefined_id
    if predefined_id is None:
        self.gene_counter += 1

Explanation:

  • predefined_id: Allows setting specific hash keys for default genes.
  • Duplicate Prevention: Checks if a gene already exists to avoid re-adding.
  • Hash Key Generation: Utilizes the generate_hash_key method to create unique identifiers.

Generating Hash Keys

def generate_hash_key(self, identifier):
    # Use xxhash's 64-bit version to generate a longer hash
    return xxhash.xxh64_intdigest(str(identifier))

Explanation:

  • xxhash: Provides a fast and consistent hashing mechanism.
  • 64-bit Hash: Ensures a low probability of hash collisions.

Encoding and Decoding Methods

Encoding Genes

def encode(self, genes, verbose=False):
    encoded_list = []

    for gene in genes:
        hash_key = self.reverse_encodings.get(gene)
        if hash_key is None:
            if verbose:
                print(f"Gene '{gene}' is not recognized.")
            continue  # Skip unrecognized genes but continue processing

        encoded_list.append(hash_key)  # Add the hash key to the encoded list

        if verbose:
            print(f"Encoding gene '{gene}' to hash key {hash_key}.")

    return encoded_list  # Return the list of hash keys

Explanation:

  • Input: A list of readable genes.
  • Process: Translates each gene to its corresponding hash key.
  • Output: A list of numerical hash keys representing the encoded genes.

Decoding Sequences

@functools.lru_cache(maxsize=1000)
def decode(self, encoded_tuple, verbose=False):
    # Convert the encoded tuple back to a list for processing
    stack = list(encoded_tuple)
    decoded_sequence = []

    while stack:
        hash_key = stack.pop(0)  # Pop the first item (hash key) for decoding

        if hash_key in self.encodings:
            value = self.encodings[hash_key]

            if isinstance(value, tuple):  # Handling captured segments
                if verbose:
                    print(f"Decompressing captured segment with hash key {hash_key}")
                # Push the contents of the captured segment to the start of the stack for decoding
                stack = list(value) + stack
            else:
                # Direct mapping of hash key to gene, append the value to the decoded list
                decoded_sequence.append(value)
                if verbose:
                    print(f"Decoding hash key {hash_key} to '{value}'.")
        else:
            decoded_sequence.append("Unknown")
            if verbose:
                print(f"Hash key {hash_key} is unknown.")

    return decoded_sequence

Explanation:

  • Caching: Uses lru_cache to store recent decoding results for performance.
  • Recursive Decoding: Handles nested meta genes by decomposing them recursively.
  • Unknown Hash Keys: Marks unrecognized hash keys as “Unknown” for robustness.

Segment Capture and Nesting Methods

Capturing Segments

def capture_segment(self, encoded_segment, verbose=False):
    # Encapsulate the encoded segment in a tuple to use as a key
    captured_key = tuple(encoded_segment)

    # Check if this segment has already been captured by looking it up with its content
    if captured_key in self.captured_segments:
        hash_key = self.captured_segments[captured_key]
        if verbose:
            print(f"Segment {captured_key} is already captured with hash key {hash_key}.")
        return hash_key

    # Use the current gene_counter to assign a unique identifier to the captured segment
    unique_identifier = self.gene_counter
    # Increment the gene_counter for future use
    self.gene_counter += 1

    # Generate a hash key for the unique identifier of the captured segment
    hash_key = self.generate_hash_key(unique_identifier)

    # Store the captured segment with its content as the key and the hash key as the value
    self.captured_segments[captured_key] = hash_key
    # In the encodings, map the hash key to the captured segment's content for decoding purposes
    self.encodings[hash_key] = captured_key

    if verbose:
        print(f"Capturing segment {captured_key} with hash key {hash_key}.")
        print(f"Current encodings: {self.encodings}")
        print(f"Current captured segments: {self.captured_segments}")

    return hash_key

Explanation:

  • Tuple Conversion: Converts the list of hash keys into a tuple for use as a dictionary key.
  • Duplicate Check: Ensures that identical segments are not captured multiple times.
  • Meta Genes: Assigns a new hash key to the captured segment, treating it as a single gene.

Nesting Meta Genes

Nesting is inherently supported by allowing meta genes (captured segments) to contain other meta genes. This is facilitated by the encoding and decoding methods, which handle meta genes recursively.

Opening Segments Methods

Unpacking Meta Genes

def open_segment(self, hash_key, no_delimit=False, verbose=False):
    decompressed_codons = []

    # Use .get() to safely access the dictionary and avoid KeyError
    encoded_item = self.encodings.get(hash_key)

    # Check if the encoded_item exists and is a tuple (indicating a captured segment)
    if encoded_item and isinstance(encoded_item, tuple):
        if verbose:
            print(f"Decompressing captured segment for hash key {hash_key}.")

        if not no_delimit:
            # Add start delimiter if no_delimit is False
            start_delimiter_hash_key = self.reverse_encodings['Start']
            decompressed_codons.append(start_delimiter_hash_key)

        # Iterate through the tuple and add each hash key to the decompressed_codons list
        for gene_hash_key in encoded_item:
            decompressed_codons.append(gene_hash_key)  # gene_hash_key is already an integer hash key

        if not no_delimit:
            # Add end delimiter if no_delimit is False
            end_delimiter_hash_key = self.reverse_encodings['End']
            decompressed_codons.append(end_delimiter_hash_key)
    else:
        if verbose:
            print(f"Hash key {hash_key} is not a captured segment or is unknown, returning as is.")
        decompressed_codons.append(hash_key)

    return decompressed_codons

Explanation:

  • Delimiters: Optionally adds ‘Start’ and ‘End’ delimiters to the decompressed segment.
  • Non-Meta Genes: If the hash key does not correspond to a captured segment, it is returned as-is.
  • Verbose Logging: Provides detailed logs for debugging and tracing.

Integration Methods

Integrating External Encodings

def integrate_uploaded_encodings(self, uploaded_encodings, base_genes, verbose=False):
    if verbose:
        print("Starting integration of uploaded encodings...")

    if isinstance(uploaded_encodings, str):
        uploaded_encodings = {int(k): v for k, v in (item.split(':') for item in uploaded_encodings.split(','))}
        if verbose:
            print("Uploaded encodings after parsing:", uploaded_encodings)

    # Identify the hash keys for default genes 'Start' and 'End' from the initial manager
    start_key = self.reverse_encodings.get('Start')
    end_key = self.reverse_encodings.get('End')
    if verbose:
        print(f"Default gene 'Start' hash key: {start_key}, 'End' hash key: {end_key}")

    # Integrate base and default genes along with captured segments
    for key, value in uploaded_encodings.items():
        if value in base_genes or key in [start_key, end_key]:
            if value not in self.reverse_encodings or key in [start_key, end_key]:
                self.encodings[key] = value
                self.reverse_encodings[value] = key
                if verbose:
                    print(f"Integrated gene '{value}' with key '{key}'.")
            else:
                if verbose:
                    print(f"Gene '{value}' already in reverse encodings.")
        elif isinstance(value, tuple):  # Handle captured segments
            self.captured_segments[value] = key
            self.encodings[key] = value
            if verbose:
                print(f"Integrated captured segment '{value}' with key '{key}'.")
        else:
            if verbose:
                print(
                    f"Skipping gene '{value}' with key '{key}' as it does not match expected base genes or default genes.")

    # Update gene counter to avoid conflicts
    max_hash_key = max(self.encodings.keys(), default=0)
    self.gene_counter = max(self.gene_counter, max_hash_key + 1)
    if verbose:
        print("Final updated gene counter:", self.gene_counter)

Explanation:

  • Input Flexibility: Accepts encodings as strings or dictionaries.
  • Default Genes Integration: Ensures ‘Start’ and ‘End’ genes are correctly integrated.
  • Captured Segments: Adds captured segments to prevent duplication and maintain structure.
  • Gene Counter Update: Adjusts the gene counter to prevent identifier clashes.

Usage Examples

Adding Genes

# Initialize EncodingManager
manager = EncodingManager()

# Add new genes
manager.add_gene('A', verbose=True)
manager.add_gene('B', verbose=True)
manager.add_gene('C', verbose=True)

Output:

Added gene 'Start' with hash key 16242683921139863987.
Added gene 'End' with hash key 17943818413842076007.
Added gene 'A' with hash key 5145464544544544545.
Added gene 'B' with hash key 5145464544544544546.
Added gene 'C' with hash key 5145464544544544547.

Encoding and Decoding Sequences

# Encode a sequence of genes
encoded_sequence = manager.encode(['A', 'B', 'C'], verbose=True)
print("Encoded Sequence:", encoded_sequence)

# Decode the encoded sequence
decoded_sequence = manager.decode(tuple(encoded_sequence), verbose=True)
print("Decoded Sequence:", decoded_sequence)

Output:

Encoding gene 'A' to hash key 5145464544544544545.
Encoding gene 'B' to hash key 5145464544544544546.
Encoding gene 'C' to hash key 5145464544544544547.
Encoded Sequence: [5145464544544544545, 5145464544544544546, 5145464544544544547]
Decoding hash key 5145464544544544545 to 'A'.
Decoding hash key 5145464544544544546 to 'B'.
Decoding hash key 5145464544544544547 to 'C'.
Decoded Sequence: ['A', 'B', 'C']

Capturing and Nesting Segments

# Encode a segment and capture it
encoded_segment = manager.encode(['A', 'B'], verbose=True)
captured_hash = manager.capture_segment(encoded_segment, verbose=True)
print("Captured Hash Key:", captured_hash)

# Use the captured meta gene in a new sequence
new_sequence = [captured_hash, manager.reverse_encodings['C']]
encoded_new_sequence = [captured_hash, manager.reverse_encodings['C']]
print("New Encoded Sequence with Meta Gene:", encoded_new_sequence)

# Decode the new sequence
decoded_new_sequence = manager.decode(tuple(encoded_new_sequence), verbose=True)
print("Decoded New Sequence:", decoded_new_sequence)

Output:

Encoding gene 'A' to hash key 5145464544544544545.
Encoding gene 'B' to hash key 5145464544544544546.
Capturing segment (5145464544544544545, 5145464544544544546) with hash key 5145464544544544550.
Captured Hash Key: 5145464544544544550
New Encoded Sequence with Meta Gene: [5145464544544544550, 5145464544544544547]
Decompressing captured segment with hash key 5145464544544544550
Decoding hash key 5145464544544544545 to 'A'.
Decoding hash key 5145464544544544546 to 'B'.
Decoding hash key 5145464544544544547 to 'C'.
Decoded New Sequence: ['A', 'B', 'C']

Opening Segments

# Open a captured segment with delimiters
opened_segment = manager.open_segment(captured_hash, no_delimit=False, verbose=True)
print("Opened Segment with Delimiters:", opened_segment)

# Open a captured segment without delimiters
opened_segment_no_delim = manager.open_segment(captured_hash, no_delimit=True, verbose=True)
print("Opened Segment without Delimiters:", opened_segment_no_delim)

Output:

Decompressing captured segment for hash key 5145464544544544550.
Decoding hash key 5145464544544544545 to 'A'.
Decoding hash key 5145464544544544546 to 'B'.
Opened Segment with Delimiters: [1, 5145464544544544545, 5145464544544544546, 2]
Decompressing captured segment for hash key 5145464544544544550.
Decoding hash key 5145464544544544545 to 'A'.
Decoding hash key 5145464544544544546 to 'B'.
Opened Segment without Delimiters: [5145464544544544545, 5145464544544544546]

Integrating External Encodings

# Assume external_encodings is a string in the format "hash_key:gene,hash_key:gene,..."
external_encodings = "5145464544544544550:('A','B'),5145464544544544551:'D'"
base_genes = ['A', 'B', 'C', 'D']

# Integrate external encodings
manager.integrate_uploaded_encodings(external_encodings, base_genes, verbose=True)

# Verify integration by encoding a new gene 'D'
manager.add_gene('D', verbose=True)

# Encode a sequence including the newly integrated gene 'D'
encoded_sequence_with_D = manager.encode(['A', 'B', 'D'], verbose=True)
print("Encoded Sequence with Integrated Gene 'D':", encoded_sequence_with_D)

# Decode the sequence
decoded_sequence_with_D = manager.decode(tuple(encoded_sequence_with_D), verbose=True)
print("Decoded Sequence with Integrated Gene 'D':", decoded_sequence_with_D)

Output:

Starting integration of uploaded encodings...
Uploaded encodings after parsing: {5145464544544544550: "('A','B')", 5145464544544544551: 'D'}
Default gene 'Start' hash key: 16242683921139863987, 'End' hash key: 17943818413842076007
Integrated captured segment "('A','B')" with key '5145464544544544550'.
Integrated gene 'D' with key '5145464544544544551'.
Final updated gene counter: 5145464544544544552
Added gene 'D' with hash key 5145464544544544552.
Encoding gene 'A' to hash key 5145464544544544545.
Encoding gene 'B' to hash key 5145464544544544546.
Encoding gene 'D' to hash key 5145464544544544552.
Encoded Sequence with Integrated Gene 'D': [5145464544544544545, 5145464544544544546, 5145464544544544552]
Decoding hash key 5145464544544544545 to 'A'.
Decoding hash key 5145464544544544546 to 'B'.
Decoding hash key 5145464544544544552 to 'D'.
Decoded Sequence with Integrated Gene 'D': ['A', 'B', 'D']

Unit Testing

Comprehensive unit tests ensure the EncodingManager functions correctly across various scenarios.

Test Cases Overview

  1. Adding and Encoding Genes: Verifies that genes can be added and correctly encoded/decoded.
  2. Decoding Unknown Hash Keys: Ensures that unknown hash keys are handled gracefully.
  3. Capturing and Decoding Segments: Tests capturing segments and their accurate decoding.
  4. Nested Captures and Decoding: Validates the handling of nested meta genes.
  5. Duplicate Segment Capture: Checks that duplicate segments reuse existing hash keys.
  6. Opening Segments with Delimiters: Confirms that segments can be opened with or without delimiters.
  7. Integration and Gene Selection: Tests the integration of external encodings and gene selection probabilities.

Running Tests

The provided unit tests utilize Python’s unittest framework. To execute the tests, ensure that the EncodingManager class is defined and then run the test suite.

import unittest

if __name__ == '__main__':
    unittest.main()

Explanation:

  • Test Setup: Each test case initializes a fresh instance of EncodingManager.
  • Assertions: Validate expected outcomes, such as correct encoding/decoding and proper handling of duplicates.
  • Verbose Output: Helps trace the execution flow during testing.

Conclusion

The M_E_Engine, through its EncodingManager, revolutionizes genetic algorithms by introducing mutable gene encodings. This dynamic approach allows for the evolution of both genes and their structures, significantly enhancing the algorithm’s ability to navigate and optimize complex solution spaces. The detailed implementation, coupled with robust unit testing, ensures that the M_E_Engine is both flexible and reliable, making it a powerful tool for solving intricate problems across diverse domains.

By leveraging functionalities like segment capture, nesting, and seamless integration of external encodings, the M_E_Engine transcends the limitations of traditional genetic algorithms, offering a more adaptable and potent framework for evolutionary computation.