Quickly Find out What’s Being Processed in Your Migration Pipeline

buffalo migration is messy and so is debugging your migration code

Drupal 8 has a wonderful migration system that generally boils migrations into a particular entity type down into a single YAML file. This file specifies the data source, the output entity type, and how to process input fields and where in the entity they get stored.  

However, there can be a number of complications. Two, in particular, are bad or unexpected input data values, and exactly what form the input data - or even, the partially processed data - is in (eg, is it a scalar, an array, a nested array, etc.). Because the migration system doesn’t provide a way to look at what’s going on inside the process pipeline, it can be difficult and frustrating to debug these kinds of issues.

One way to quickly see what’s going on without having to get into xdebug and grope around in the code for the various plugins provided by the migrate, drupal_migrate, migrate_plus, and migrate_tools modules, is to create a debug plugin and place it wherever you need in the process pipeline.

Usually, the custom migration module code will be structured like this:

migration/
├── config
│   └── install
│       ...
│       ├── migrate_plus.migration.node_article.yml
│       ...
├── migration.info.yml
├── README.txt
└── src
    └── Plugin
        └── migrate
            ├── destination
            ├── process
            │   ...
            │   ├── Debug.php
            │   ...
            └── source

We’re going to look at the node_article migration pass for an example. The YAML file could look something like this:

id: node_article
label: Node - Article
migration_group: migration
migration_tags:
  - Drupal 6
source:
  plugin: node
  node_type: blog
  constants:
    type: article
process:
  nid: tnid
  type: constants/type
  langcode:
    plugin: default_value
    source: language
    default_value: "und"
  title: title
  uid: node_uid
  status: status
  created: created
  changed: changed
  promote: promote
  sticky: sticky
  'body/format':
    -
      plugin: static_map
      source: format
      map:
        2: 'filtered_text'      # Filtered HTML "no links"
        3: 'html_text'          # HTML
        5: 'plain_text'         # PHP code
        6: 'filtered_text'      # Filtered HTML
        7: 'plain_text'         # Email Filter
    -
      plugin: default_value
      default_value: 'html_text'
  'body/value': body
  'body/summary': teaser
  revision_uid: revision_uid
  revision_log: log
  revision_timestamp: timestamp
destination:
  plugin: entity:node

But say that it turns out that the body/format is not getting translated correctly when the node_article migration pass is run. How can this be debugged to understand if the YAML structure is wrong, or the data values are wrong?

Let’s have a look at migration/src/Plugin/migrate/process/Debug.php:

<?php
namespace Drupal\migration\Plugin\migrate\process;

use Drupal\migrate\MigrateExecutableInterface;
use Drupal\migrate\ProcessPluginBase;
use Drupal\migrate\Row;

/**
 * Debug process pipeline
 *
 * @MigrateProcessPlugin(
 *   id = "debug"
 * )
 */
class Debug extends ProcessPluginBase {

  /**
   * {@inheritdoc}
   */
  public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
    echo "DEBUG: " . $this->configuration['message'] . "\n";
    print_r(['value' => $value]);
    if (!empty($this->configuration['row']) &&
        $this->configuration['row']) {
      print_r(['row' => $row]);
    }
    return $value;
  }

}

Debug.php provides a process plugin that can be placed “invisibly” into a process pipeline and gives details about what’s happening inside there. The transform method for all process plugins receives the $value being processed and the $row of input values, among other things. It should return a value that is the result of processing that the plugin does.

The ProcessPluginBase object also contains a configuration object, where named parameters that appear in the process pipeline YAML code are stored. For example, the static_map plugin can find the source value in $this->configuration[‘source’].

In this case, the debug plugin returns exactly the value it receives, so it doesn’t change anything. At base, it prints a message parameter and dumps the $value so it can show what actual data is being passed and what format it is in. It has been useful to also be able to look at the $row values. To see them, simply add the parameter row: 1.

For example, to see what the result of the body/format mapping is before passing it along to the default_value plugin:

  'body/format':
    -
      plugin: static_map
      source: format
      map:
        2: 'filtered_text'      # Filtered HTML "no links"
        3: 'html_text'          # HTML
        5: 'plain_text'         # PHP code
        6: 'filtered_text'      # Filtered HTML
        7: 'plain_text'         # Email Filter
    -
      plugin: debug
    -
      plugin: default_value
      default_value: 'html_text'

If it would also be useful to see what the input row values looked like:

  'body/format':
    -
      plugin: static_map
      source: format
      map:
        2: 'filtered_text'      # Filtered HTML "no links"
        3: 'html_text'          # HTML
        5: 'plain_text'         # PHP code
        6: 'filtered_text'      # Filtered HTML
        7: 'plain_text'         # Email Filter
    -
      plugin: debug
      row: 1
    -
      plugin: default_value
      default_value: 'html_text'

This produces output for each row processed, which can get quite large. If you know what rows are causing problems, the migration can be run with the --idlist=”<source keys>” parameter. If not, simply redirect the output into a file and use an editor like vim to hunt through it for the cases that are problematic.

This idea can be expanded: if there is some logic to the problem, this can be added so that output is only created for rows that create an issue. It’s possible to create more than one debug plugin (with different names, of course) if there are multiple special purpose needs.