Hidden Image Identification Using Convolutional Neural Networks (CNN)

Hidden Image Identification Using Convolutional Neural Networks (CNN)
Photo by Mitya Ivanov / Unsplash

Introduction

In this case study we embarked on a comprehensive project aiming to develop an advanced solution for hidden image identification. The initiative was part of our in interest in contributing to digital forensics services to help organizations identify content creators, detect unauthorized information leaks, and aid law enforcement in tracing criminal organizations via hidden image tracing.

The project involved leveraging machine learning, specifically Convolutional Neural Networks (CNNs), a type of deep learning model exceptionally proficient at processing images.

Project Objective

The primary objectives of the project were:

  1. Develop a system to identify hidden watermarks in images, used for tracking the origin of the content.
  2. Use CNNs to distinguish between images with hidden content and those without.
  3. Develop an image tracing system that can help trace back images to their source, supporting anti-piracy measures and criminal investigations.

Technical Approach

New Collar's team approached this project by first creating a watermarking system to embed QR codes into images. The unique QR codes were discretely embedded into images, providing each image with a unique identifier. The Python-based watermarking system was as follows:

def apply_watermark(image_path, output_image_path, transparency, color_factor=0.5):
    ...
    # Generate unique identifier
    unique_id = str(uuid.uuid4())
    ...
    # Generate QR code
    qr = qrcode.QRCode(
        version=1,
        error_correction=qrcode.constants.ERROR_CORRECT_H,
        box_size=20,
        border=4,
    )
    qr.add_data(unique_id)
    qr.make(fit=True)
    watermark = qr.make_image(fill='black', back_color='white').convert("RGBA")
    ...
    # Save the QR code for debugging
    watermark.save('debug_qr_code.png')
    ...
    # Open the image
    image = Image.open(image_path).convert("RGBA")
    ...
    # Paste the colored watermark onto the image, taking only x and y dimensions from position
    image.paste(watermark_colored, min_gradient_position, watermark_mask)
    ...
    # Save watermarked image
    image.save(output_image_path)
...
# Use the function with a lower transparency to make QR code more visible
apply_watermark('woman.PNG', 'qr_masked.PNG', transparency=255)

To place the watermark in the area of the image with the least detail, they computed the image gradient and selected the region with the smallest average gradient. The watermark's color was made to contrast with the average color of this region, ensuring visibility while maintaining subtlety. This watermarking code was used to generate a diverse dataset of watermarked images for the CNN model.

The second phase was training a CNN model to identify images containing these hidden watermarks. CNNs, with their multi-layered architecture, are excellent for image classification tasks. They utilized libraries like TensorFlow and Keras to build and train the CNN model. The model was designed with multiple convolutional layers to identify various image features and fully connected layers to interpret these features and classify the images.

The dataset comprised thousands of images, half containing hidden watermarks, and half without, creating a balanced binary classification problem. The dataset was split into training and testing sets to evaluate the model's performance and prevent overfitting.

Results

The CNN model performed exceptionally well on the testing set, achieving a high accuracy rate in detecting watermarked images. The model was also proficient at identifying the watermark's position, which allowed the original content creator or source to be traced efficiently. This result was a breakthrough for New Collar's effort to aid anti-piracy and law enforcement partners.