View other posts

Adventures in Content Migration

Date: 15 November 2021
Reading time: 10 mins

Photo by Annie Spratt on Unsplash, which shows a map laid out with a large camera, hand bag and the books on top with a pencil

Migration from one CMS to another gives many businesses pause for thought. You’ve invested a chunk of money in your current CMS but the time has come to modernise and move to a new platform. What adventures are you in for?

In this article, I’ll take you through the process of migrating content to a headless CMS, introduce the possibilities and explain the reasoning behind every choice that we’ve made for our client.

Content Migration is inevitable

Photo by Danique Photography on Unsplash, a hand on a dashboard

A new platform means redevelopment which can be exciting and bring much-needed evolution and enhancement to your digital offering. But then you think about the content. You’ve built up a vast swathe of content over the years. Sure, some of it will be rewritten but you have a lot of content that needs moving. It’s a standard scenario in every re-platforming / rebuild project. Content migration is an unenviable task.

We had a client on a legacy CMS which no longer fulfilled their digital needs. They were due to move to a brand-new site powered by a headless CMS Kontent by Kentico. And as they had their old CMS for a long time, as you’d expect, they had generated a lot of content in it – most of which needed to be ported through to the new site.

Obviously, they could have manually copied the content over, but they needed the site in place to hit the deadline for an annual awards competition. Missing that deadline was not an option and the manpower for manual transition wasn’t available.

The base requirements

Photo by David Iskander on Unsplash, a female sitting down with a handbag on her lap holding a black book with the words 'THE MISSION JOURNAL' on it

Let’s start with the brief. As manual entry was off the table, I had to figure out the best option for getting legacy content into our Kontent instance. I had the following conditions and requirements:

  • It needs to be quick and easy.
  • Full custom tooling is not an option due to limited resources available.
  • There are 1000-2000 content items per content type.
  • There are 7 or 8 content types with different structures.
  • The data are not in a nice, structured format.
  • Data are in CSV files exported from the legacy CMS.

Where do we start?

Photo by Victoriano Izquierdo on Unsplash, a bearded man looking at refrigerated drinks

First, I always consider revisiting the content within the legacy CMS. A re-platform brings the opportunity to reimagine the digital offering, streamline the content and remove any unused and unnecessary items. I would heavily advise building out your Content Strategy before building the new website, carefully constructing a flexible and future-proof content architecture, roadmap, and plan.

My first place to start was the overview page of Kontent documentation for Importing content that covers a lot of the considerations you would need to take on board when selecting the right import process for you. I broke down the options we had into:

  • Manual content entry by the client
  • Manual content entry by the agency
  • Using Kontent CLI
  • Using Google Sheets add-on
  • Using Management API
  • Using Migration tool built by Milan Lund

Next, I’ll explain the options and their (dis)advantages.

Manual content entry by the client

Yep, I know this had already been ruled out but in the interests of completeness, I threw this back on the table.

ProsCons

+ No additional resource required on the agency side.

+ Opportunity to introduce content management training early on and with real-life content.

+ We can cleanse the data as they are entered.

+ No additional development/coding required.

- Client needs to have a dedicated resource / team to make this happen.

- It’s very time intensive.

- Big risk of manual errors and inconsistency between pages due to human error.

The disadvantages presented a real threat of not meeting the client’s deadline.

Manual content entry by the agency

A similar approach to the previous one although this time we’re shifting the burden of work onto the agency.

ProsCons

+ No additional resource required on the client side.

+ We can cleanse the data as they are entered.

+ No additional development/coding required.

+ Zero stress for client – they hand the process off to the agency.

- Agency needs to have a dedicated resource / team to make this happen.

- It’s very time intensive.

- Big risk of manual errors and inconsistency between pages due to human error.

A similar set of disadvantages to the client handling the manual entry. Shifting it to the agency doesn’t necessarily solve the problem as they too have a finite number of resources and have the same time pressures. So, this was off the table as well.

Using Kontent CLI to migrate content

Kontent has several Migration Tools. However, not all of them support the desired import process from the legacy CMS. The first suitable candidate was the Kontent CLI, a command line tool that runs migration scripts. It supports both content types and content items as part of the migration into a Kontent project via the Management API.

ProsCons

+ No need to dedicate resource to early content management training.

+ Zero stress for client – they hand the process off to the agency.

+ Reduced risk of errors and inconsistencies between pages.

+ Code structure changes can be handled in code, which is quick.

- Agency needs to have a dedicated resource / team to make this happen.

- Need to train developers as not all of them have experience with TypeScript.

- Hard to reliably cleanse the data.

- Need to create a lot of custom code (CSV file reader, script for each content type, exceptions, etc)

- Complicated error handling during the import process with the large amount of content.

Overall, there was a good chance we would have hit the deadline but my gut instinct was that this was a sledgehammer to crack a nut. It felt too heavyweight to solve the problem we had and if it went wrong, we wouldn’t be able to course-correct in time.

Using Google Sheets add-on to migrate the content

The Google Sheets add-on ingests a few settings and then imports the content from the sheet into Kontent. If you want to read more about this, check out this blog post by Eric Dugre.

ProsCons

+ No need to dedicate resource to early content management training.

+ Zero stress for client – they hand the process off to the agency.

+ Ideal for bulk imports.

+ Reduced risk of errors and inconsistencies between pages.

+ Code structure changes can be handled in the sheet.

+ We’re working with CSV files in Google Sheets which is feasible also for non-technical staff.

+ No additional development/coding required.

+ Possible to make mapping corrections

- Not possible to cleanse the data automatically, it needs to be done manually.

- Uncertainty over whether the approach had been battle-tested with such a large migration.

- Uncertainty over the performance.


So far, so good. Yes, there were some concerns, but this approach was ticking a lot of boxes and gave us a great chance of meeting the deadline with time to spare.

Using Management API to migrate the content

This approach would have allowed us to utilise the powerful Management API and create the content items via code. It has a similar approach to the Kontent CLI option above.

ProsCons

+ No need to dedicate resource to early content management training.

+ Zero stress for client – they hand the process off to the agency.

+ Reduced risk of errors and inconsistencies between pages.

+ Code structure changes can be handled in code, which is quick.

+ Officially recommended approach for large imports with varying types of content

+ Workload can be distributed among developers

- Agency needs to have a dedicated resource / team to make this happen.

- Need to train developers on the Management API.

- Hard to reliably cleanse the data.

- Complicated error handling during the import process with the large amount of content.

- Uncertainty over the development time required.


On paper this was the recommended approach and surely the winner. However, I was concerned about not meeting the project deadline.

Using Migration Tool built by Milan Lund

I also looked into the Migration Tool created by Milan Lund, where this tool allows for either a JSON or CSV option to import into the project.

ProsCons

+ No need to dedicate resource to early content management training.

+ Zero stress for client – they hand the process off to the agency.

+ Ideal for bulk imports.

+ Reduced risk of errors and inconsistencies between pages.

- Agency needs to have a dedicated resource / team to make this happen.

- At the time of my investigation, the CSV import process didn’t work well.

- Unclear error log outputs.

- Hard to reliably cleanse the data.

- Complicated error handling during the import process with the large amount of content.

- Uncertainty over whether the approach had been battle-tested with such a large migration.

- Concerns over the available support.

It’s a handy tool but there are multiple concerns about the performance and support.

Making the right choice

Photo by Jake Ingle on Unsplash, a man standing on top on a hill with his arms in the air, wearing a rucksack, baseball hat and shorts.

I made the decision to opt for the Google Sheets add-on. There was very little time needed to get accustomed to it and I was confident it would be easy to get it working and see results. As we had very little time to invest in building something for the import process, it felt like the perfect tool for us. Yet, like all import processes, it wasn't all plain sailing. We stumbled upon these challenges:

Rich text format

Our initial challenge was to create a PoC to demonstrate how we could import 50-100 content items. Straight away we ran into an issue with Rich Text content. The export process from the legacy CMS didn't provide the content in a valid HTML format – essentially some missing closing HTML tags and invalid HTML markup. This was fixed on the export side with a few manual tweaks to the CSV.

Performance and batch processing

We only tried a sample of 20 items as an initial test and it was failing due to the requests being made. This was a big concern. If 20 is failing, what about 1000-2000 items? I reached out to the Kontent Support team and the conversation was picked up by Eric Dugre, the author of the Google Sheets add-on. He investigated the issue promptly and addressed it by increasing the batch processing and going further to also improve the general usability of the tool itself. Both of that helped us a lot as the performance posed the biggest issue for us.

Note: Some of the adjustments required approval from Google – it is still a Google Sheets add-on. Even though the add-on author’s reaction was prompt, this was something we needed to factor into our timings.

Random GUID generated

Then, we looked at how to link content items together. For that, each content item should have a unique identifier, but that wasn’t our case. We had to generate a GUID for every item ourselves right within Google Sheets. This is the formula we used:

Public Function CreateGUID(Prefix as String) As String
 Do While Len(CreateGUID) < 32
   If Len(CreateGUID) = 16 Then
     '17th character holds version information
     CreateGUID = CreateGUID & Hex$(8 + CInt(Rnd * 3))
   End If

    CreateGUID = CreateGUID & Hex$(CInt(Rnd * 15))
 Loop
 CreateGUID = Prefix + Mid(CreateGUID, 1, 8) & "-" & Mid(CreateGUID, 9, 4) & "-" & Mid(CreateGUID, 13, 4) & "-" & Mid(CreateGUID, 17, 4) & "-" & Mid(CreateGUID, 21, 12)
End Function

Note: You can create this in Excel, within Tools -> Macro -> Visual Basic Editor and create the above function.

This will allow you to generate random GUIDs for any row that uses 'CreateGUID('news_id_')'. I put an additional 'prefix' parameter to the function to be able to identify the appropriate content item and can link it to another one. It’s good to keep related content items within the same Excel worksheet as then you can create an Excel Forumla to bring the reference ID into the linked content item.

Once this all was in place, we ran the different imports into the Kontent project. Then the client made the necessary last-minute amends and final touches to the content and we didn’t really experience any other issues. Mission accomplished.

In conclusion…

Photo by Ian Stauffer on Unsplash, a man sitting on some large rocks with one arm in the air surrounded by clouds

I introduced six ways to migrate content to Kontent and explained the pros and cons of each approach. I hope my adventures in content migration will help you should you come across a situation like this. In our case, the Google Sheets add-on was the best approach. It allowed us to hit the deadline way in advance which was a massive success for us and the client.

And, once more for those in the back, my recommendation when moving to a new CMS is to always have a good content strategy in place. Consider revisiting the content you already have and streamlining it in the process.

If you need any additional advice on my adventures going through the different content migration options, then don’t hesitate to get in touch with me via email, Twitter, or just pop a question on the Kontent Discord.