Importing legacy content to a database of a CMS like Joomla or Drupal is not a mean task. Its often time-consuming and a painstaking exercise.
Here we present to you a sneak peak into the export2web strategy for content migration.
Create an Information Architecture for the new website using the legacy content and old sitemap.
The legacy content may be disaggregated, if is it will be required to organize the HTML content. In doing so we will also be mapping legacy content to the new site architecture.
Determine consistency and structure of static HTML pages. It will also imperative to demarcate clear design and content boundaries in the HTML pages to ensure smooth import from legacy to the CMS Databases.
The legacy content images and other digital assets would be uploaded to the new CMS media manager directly. The references to these assets in the static HTML pages will be taken care of during the content import to the database.
The next step would be to devise an import path based on the fore-mentioned investigations. The migration might look like the following (will change based on real data)
Store static HTML.
Convert from static HTML to XML (if legacy content conversions – is structured and consistent – it will supports XML conversions).
Read XML values and store to database.
Store HTML content directly to a database.
Perform several operations on the data in the database, keeping in mind the following:
Extraction rules - Control what fields are extracted from source content.
Metadata rules - Control the metadata that is automatically applied to content.
Links migration rules - Control how links referenced by source content are handled. Eg., How are image paths translated?
Converting the database data to a delimited output
Performing more data manipulations using Perl/Python (if required)
Testing against small samples
Finally importing the resulting files into the CMS database.
956 responses to "Legacy Content Migration"