Helping clients migrate business information into Office 365 SharePoint Online is one of Insight's core competencies, whether it be from SharePoint On-Premises, tenant to tenant migrations, pushing documents from file shares with metadata mapping, or from any other third-party systems.
Our many offices are constantly working on these types of projects that involve all sizes of content to help our clients with their moves to the cloud. Many times, these projects are under a tight deadline due to business diversification, support or normal cost-cutting measures.
To ensure speed and accuracy of migrations, we rely on an enterprise-level migration tool that integrates with SharePoint. This ensures content is always moved in a timely and consistent manner, with complete metadata attachment that can be accomplished in a short timeline.
That said, there are times when the current third-party storage system relies on APIs that vendors don’t want to support anymore, or the tools don’t work as advertised. This scenario occurred during a large migration from Google Sites to SharePoint Online, where only two vendors’ tool sets were found that purported to support the Google APIs, but neither would connect to the Google Sites, let alone move the content.
That meant the team had no reports to review on types of documents or the volume of documents that might be stored in the file cabinets/attachments, instead relying on visual inspection of content. With more than 1,500 sites to walk through, manual tracking sites and their content quickly became untenable.
Once the team realized that neither vendor's tool set would work against the Google Sites API, they made a decision to build custom Windows PowerShell scripts using its vast capabilities to query the Google Sites for their documents and metadata against the Google Data API.
Right away, we found we had to discern whether the sites were Classic style from before November 2016 or New Sites, as Google had moved away from the GData API format. Luckily, most of the sites were built using Classic mode, so only one API format was needed.
Once we had an extract of Google Sites URLs to work from, and which API to target, we created a PowerShell script that would query a site for all of its file cabinets and attachments, collate their associated metadata together (file name, updated date, author, Content Delivery Network (CDN) path) and then write out each file’s metadata to a CSV for reviewing by the team.
That would allow everyone to see the volume of content each site had. And later, using this CSV output to feed the next PowerShell script would then pull the file down from the CDN and push it into a target SharePoint site's document library.
To get started with this script, we had to break down connecting to the GData APIs into their core requirements and then merge them back together into a unified PowerShell script.
The first step is to create a service account that would be our broker for accessing all of the data. Navigating to the Google API dashboard and signing in with Google admin credentials granted by the suite administrator, we create a project that will hold our credentials.
Once the project is created on the dashboard, we need to click on the Credentials left link and the Create Credentials drop-down, and then select the OAuth Client ID choice.
On the creation page, select the Web Application choice, give it a descriptive name and click on the Create button.
The screen will now reload, showing us our credentials. Click on the name to load the details screen and copy the Client ID and Client Secret for the next step.
Now with an account, we need to navigate to the Google Developers OAuth 2.0 and use the Client ID and Secret to generate an authorization token to make API queries with. We do this by clicking on the Use Your Own OAuth Credentials checkbox in the upper-right gear and then updating the OAuth 2.0 Configuration with Client ID/Secret.
Now on the left side in the Step 1 Scopes box, enter the Classic Sites API we need to authorize and then click on the Authorize APIs button.
An Authorization screen will appear. Select the admin account you used above to create the credentials. The screen will refresh to allow the scopes to be managed, so click the Allow button to finish.
The OAuth page will now show Step 2 — Exchange Authorization Code for Tokens. Click the Auto-Refresh token checkbox and then click the Exchange Authorization Code for Tokens button. This will generate the Access Token we need to query Google APIs. Finally, copy the Access Token from the textbox to use in our PowerShell scripts.
With the Access Token captured, we move over to using a web request tool to see how the data is returned from the Google API. I prefer Postman, but others, such as SoapUI or Runscope.com, will work as well.
In our tool, we just need to put together our Google Sites URL and the headers to pass, which requires the Access Token we gathered before from Google. After posting our GET command, we can see the body data returned shows all of the individual entries from the query.
Although there’s not a lot of metadata stored in Google Sites around documents, we note the core values we'll use to pull down the file and then, after uploading to SharePoint, set its metadata fields. Core values:
Now that we know how the data is formatted from the API, we create our PowerShell script to query the Google Site for its documents to log in to a CSV. Using the code below, I'll call out the line numbers of the important processing.
The image below shows the CSV file output from the script with our header row of columns and all of the subsequent rows of comma-separated values in the same order we'll use for migrating the documents later. The highlighted selections show the file name we'll upload, the last modified date, the user to set as modified by (if he or she still exists) and the Google CDN URL, which is required later to bring the file down for migrating.
With our Google Site documents pushed out into a CSV file, we can now use that file to feed a download/upload script that will pull the files down locally from Google and then push them to a targeted SharePoint document library.
Below shows only the core download/upload script. It would normally be called by a master script that used a ForEach through all of the rows of our above CSV, passing in the variables we want it to evaluate.
Again, using the code below, I'll call out the line numbers of the important processing:
Although it seems like a large, ungainly amount of work, with all of this together, we now have a fully functional process for querying Google Sites of all of their documents and then migrating them to SharePoint Online.
The biggest hurdle to overcome was Google's authentication needs and how to get a listing of all of the Google Site's file locations so they could be programmatically migrated, including figuring out that recursion was needed.