Skip to main content

Importing Drupal Content with YAML

Mar 21 '14

When launching a new Drupal website, it is invaluable to have a means by which to import the new site’s content. This way it will be fully populated at launch, everyone will be happy, and we can all sit back and relax after all our hard work. At Third and Grove, whenever we do a release, or have to migrate large swaths of content for a release, we do it using a script.

Usually, clients have a surprisingly large repository of content that they would like to be on the site, but most of it is not in a format that can be put directly into Drupal (an odd assortment of pdfs, .txt, and .doc/docx files). Also, a client will almost always want to migrate articles from their existing website which may be on an entirely different platform.

The first job in a content import, therefore, is to collect all the myriad articles/posts/pages/docx/docs/txts/pdfs and the like, and put them into a common markup format. There are many options for this that could work: XML and JSON perhaps being the most common. We’ve found however, that our personal favorite is YAML.

YAML stands for “Yet another markup language.” The lightheartedness of it’s name is indicative of it’s simplicity, and this is why we prefer it. It takes the format of indented lists, and can represent almost all the data structures we need (strings for content and numeric associative and multidimensional arrays for taxonomy, menu entries and the like). For more information on YAML and how to use it, see here. The main reason that we prefer it is that of all the options, syntax errors in the markup are the easiest to see. Putting content from a document into yaml format is so easy, that anyone in our company (experience with markup or not) can do it with ease.

A sample of some yaml content follows.

title: Third and Grove Uses YAML.
content_type: page
- Release Scripts
- Content
body: |
Isn't this fun….

We use a fantastic php YAML library called Spyc. It will parse a yaml file and return an associative array of it’s contents.

All we have to do is put all of our yaml files, one per piece of content, into a directory in our release folder. Then in our release script we can run something like this.

// A very simple example.
// Include the library
$spyc = new Spyc();
// Get the files from the content directory
$files = file_scan_directory('../release-folder/content', '/[.*.yaml]$/');
foreach ($files as $file) {
// Load the file into the yaml parser.
$yaml = $spyc->YAMLLoad($file->uri);
// Import our content into drupal
$node = new stdClass();
$node->type = $yaml['content_type'];
$node->title = $yaml['title'];
$node->language = LANGUAGE_NONE;
$node->body[$node->language][0]['value'] = $yaml['body'];
foreach ($yaml['categories'] as $category) {
$term = reset(taxonomy_get_term_by_name($category));
$node->field_category[$node->language][]['tid'] = $term->tid;

And that’s it. Doing this, we can import hundreds of pieces of content in seconds, all the content can be reviewed by editors before release day, and a ton of needless effort is saved.