Spludlow Web Header

Case study – Companies House


Video: Spludlow Framework - Data Case study - Companies House

Introduction

Working with the UK Companies House dataset titled “Free Company Data”.

I am documenting my experience with this dataset as it’s a pretty useful thing even if you only use it for test data. I will go through the whole process of getting this raw data into a database table using the Spludlow Framework to help where it can.

All the code used here is in the class “Spludlow.Data.BasicCompanyData” this is included in the core system. I recommend you look at the code as you run each bit.

Running the Import

See video.

Follow these step to create the database:

·         Ensure “SpludlowGroup” has the “sysadmin” role on the database, so it can do things like create databases: SSMS->Security->Logins->SpludlowGroup Properties->Server Roles->Tick sysadmin

·         Create a directory “C:\DatabaseShared” and ensure “SpludlowGroup” has full control (This directory must be accessible by the Spludlow Framework and the database server, if the database is remote then create a share on the remote host)

·         Download the BasicCompanyDataSupportFiles.7z archive from the downloads directory on the Spludlow Web

·         Extract the archive rename the directory and put here “C:\ProgramData\SpludlowV1\BasicCompany” (If you put anywhere else ensure that “SpludlowGroup” has full permissions on the directory

·         On the Intranet Call page search for “company” and click on the Spludlow.Data.BasicCompanyData.Run() method the parameter text boxes will appear.

·         workingDirectory: “C:\ProgramData\SpludlowV1\BasicCompany”

·         connectionString: “WSYS-HOME-PC”

·         databaseName: “BasicCompany”

·         databaseTempBackupDirectory: “C:\DatabaseShared”

·         Run on message queue and check the status.

·         Fix the error by editing the file “DataURLs.txt” in notepad.

·         Go to http://download.companieshouse.gov.uk/en_output.html in a web browser

·         You should see a list of 5 links to the 5 parts of the data.

·         Right click on the first part 1 link and select “Copy Link” (Or whatever you do in your browser)

·         Open the DataURLs.txt file in notepad

·         Paste the full URL to part 1 of the data from the previous step 5 times, pressing return after each, so you have 5 identical lines

·         Edit lines 2, 3, 4, and 5 changing “part1_5” to “part2_5”, “part3_5”, “part4_5”, and “part5_5”.

·         The file should now contain 5 lines with URLs to the 5 parts of the data, save and close notepad, like so:

o   http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part1_5.zip

o   http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part2_5.zip

o   http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part3_5.zip

o   http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part4_5.zip

o   http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part5_5.zip

·         Restart the stopped message queue to start processing again

·         Keep an eye on the Status page, Logs page, the working directory, and task manager. It may take a while.

·         When its finished you will get a log informing you

Examining the data

Have a look at some table data.

Create a database diagram in SSMS:

Create an Access front end and perform as query.

 


 
 
 
 
 
 
 
 
 
Spludlow Web Footer