2015 UK General Election using SQL Server
This tutorial assumes you have access to powershell and sqlcmd or Microsoft SQL Server Management Studio
Get to the powershell prompt and download the csv
Invoke-WebRequest http://researchbriefings.files.parliament.uk/documents/CBP-7979/hocl-ge2017-results-full.csv -OutFile ge2017.csv
Go into sqlcmd or SSMS and create a table for the results
Here you create a single flat table that can store all of the unnormalised data.
The first line of the CSV file contains the column headings (ons_id,ons_region_id,constituency_name,county_name,region_name,country_name,constituency_type,party_name,party_abbreviation,firstname,surname,gender,sitting_mp,former_mp,votes,share,change). We create a table with a column for each of these:
sqlcmd -S .\sqlexpress -E
We do not know the size of each column yet, too small and the import statement will give errors or warnings. It is best to be generous - memory is cheap.
The create table statement could be:
CREATE TABLE ge( ons_id VARCHAR(10), ons_region_id VARCHAR(10), constituency_name VARCHAR(50), county_name VARCHAR(50), region_name VARCHAR(50), country_name VARCHAR(50), constituency_type VARCHAR(10), party_name VARCHAR(50), party_abbreviation VARCHAR(50), firstname VARCHAR(50), surname VARCHAR(50), gender VARCHAR(6), sitting_mp VARCHAR(3), former_mp VARCHAR(3), votes INT, share FLOAT, change VARCHAR(20), PRIMARY KEY(ons_id,firstname,surname) )
Attempt import the csv file into your flat table
You can import the data using this line (from SQLCMD or SQL Server Management Studio)
BULK INSERT ge FROM 'C:\db\ge2015.csv' WITH (FIELDTERMINATOR=',', ROWTERMINATOR='\n', FIRSTROW=2) GO
- Each line is ended with carriage return \n
- The first row contains column headings not data so we start at row 2
You will most likely get error messages like this:
Msg 4863, Level 16, State 1, Server ME1C039-130368\SQLEXPRESS, Line 3 Bulk load data conversion error (truncation) for row 105, column 13 (sitting_mp). Msg 4863, Level 16, State 1, Server ME1C039-130368\SQLEXPRESS, Line 3 Bulk load data conversion error (truncation) for row 106, column 12 (gender). Msg 4863, Level 16, State 1, Server ME1C039-130368\SQLEXPRESS, Line 3 Bulk load data conversion error (truncation) for row 107, column 13 (sitting_mp). ...
Unfortunately BULK INSERT cannot deal with the CSV format so we need to preprocess the file. Some solutions:
- Use Excel - you can load CSV and save as TXT
- Use regular expressions: convert CSV to TXT
- Use python
- Find a converter online
With the conversion complete you can import the data - you will have too use the full path name and you may have permissions problems
BULK INSERT ge FROM 'c:\path\ge2017.txt' WITH (FIRSTROW=2)
Run some queries
Now let's look at some data. How many female candidates were there?
select count(1) from ge where gender='female';
Who stood in Edinburgh South?
select surname,votes from ge where constituency_name='Edinburgh South';
You can now move to the next stage: 2015 UK General Election Normalising Data