Difference between revisions of "2015 UK General Election using mysql"

From SQLZOO
Jump to navigation Jump to search
 
(4 intermediate revisions by the same user not shown)
Line 3: Line 3:
The command '''wget''' is part of Linux. You can use it to download web pages as files. (If you don't have wget you may have '''lwp-request''' or '''curl''' these do the same job)
The command '''wget''' is part of Linux. You can use it to download web pages as files. (If you don't have wget you may have '''lwp-request''' or '''curl''' these do the same job)
  wget http://researchbriefings.files.parliament.uk/documents/CBP-7186/hocl-ge2015-results-full.csv
  wget http://researchbriefings.files.parliament.uk/documents/CBP-7186/hocl-ge2015-results-full.csv
==Move the file to shared space==
(the 2017 general election results are at http://researchbriefings.files.parliament.uk/documents/CBP-7979/HoC-GE2017-results-by-candidate.csv)
 
==Move the file to shared space (Napier students can skip this step)==
The server process may not be able to read data in your home directory; you can move the file to a place that mysqld can read - '''/tmp''' is a handy place for this kind of thing.
The server process may not be able to read data in your home directory; you can move the file to a place that mysqld can read - '''/tmp''' is a handy place for this kind of thing.
  mv hocl-ge2015-results-full.csv /tmp
  mv hocl-ge2015-results-full.csv /tmp
Again, this is a linux command - nothing to do with mysql.
Again, this is a linux command - nothing to do with mysql.
[[File:putty1.png]]
[[File:putty1.png]]


==Go into mysql and create a table for the results==
==Go into mysql and create a table for the results==
Here you create a single flat table that can store all of the unnormalised data.
The first line of the CSV file contains the column headings (ons_id,ons_region_id,constituency_name,county_name,region_name,country_name,constituency_type,party_name,party_abbreviation,firstname,surname,gender,sitting_mp,former_mp,votes,share,change). We create a table with a column for each of these:
  mysql -u 40000036 -ptiger 40000036
  mysql -u 40000036 -ptiger 40000036
The create table statement could be:
The create table statement could be:
Line 32: Line 38:
   PRIMARY KEY(ons_id,firstname,surname)
   PRIMARY KEY(ons_id,firstname,surname)
  );
  );
==Import the csv file into your flat table==
You can import the data using this line
You can import the data using this line
  LOAD DATA INFILE '/tmp/hocl-ge2015-results-full.csv' INTO TABLE ge COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
  LOAD DATA INFILE '/tmp/hocl-ge2015-results-full.csv' INTO TABLE ge COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
 
The above line is complicated but it instructs MySQL about the format of the file.
*CSV stands for "comma separated values", each field is terminated by a comma
*Each line is ended with the newline, carriage return \r\n
*Sometimes a value includes a comma in such cases the field is enclosed by quotes.
**The constituency "Ayr, Carrick and Cumnock" is an example.
*The first row contains column headings not data and so we "IGNORE 1 LINES"
==Run some queries==
Now let's look at some data.
Now let's look at some data.
How many female candidates were there?
How many female candidates were there?
Line 40: Line 53:
Who stood in Edinburgh South?
Who stood in Edinburgh South?
  select surname,votes from ge where constituency_name='Edinburgh South';
  select surname,votes from ge where constituency_name='Edinburgh South';
You can now move to the next stage: [[2015 UK General Election Normalising Data]]

Latest revision as of 21:52, 1 October 2017

This tutorial assumes you have access to a Linux account with mysql. Napier students can use the machine inf08104.napier.ac.uk

Get to the linux prompt and download the csv (Napier students can skip this step)

The command wget is part of Linux. You can use it to download web pages as files. (If you don't have wget you may have lwp-request or curl these do the same job)

wget http://researchbriefings.files.parliament.uk/documents/CBP-7186/hocl-ge2015-results-full.csv

(the 2017 general election results are at http://researchbriefings.files.parliament.uk/documents/CBP-7979/HoC-GE2017-results-by-candidate.csv)

Move the file to shared space (Napier students can skip this step)

The server process may not be able to read data in your home directory; you can move the file to a place that mysqld can read - /tmp is a handy place for this kind of thing.

mv hocl-ge2015-results-full.csv /tmp

Again, this is a linux command - nothing to do with mysql.

Putty1.png

Go into mysql and create a table for the results

Here you create a single flat table that can store all of the unnormalised data.

The first line of the CSV file contains the column headings (ons_id,ons_region_id,constituency_name,county_name,region_name,country_name,constituency_type,party_name,party_abbreviation,firstname,surname,gender,sitting_mp,former_mp,votes,share,change). We create a table with a column for each of these:

mysql -u 40000036 -ptiger 40000036

The create table statement could be:

CREATE TABLE ge(
  ons_id VARCHAR(10),
  ons_region_id VARCHAR(10),
  constituency_name VARCHAR(50),
  county_name VARCHAR(50),
  region_name VARCHAR(50),
  country_name VARCHAR(50),
  constituency_type VARCHAR(10),
  party_name VARCHAR(50),
  party_abbreviation VARCHAR(50),
  firstname VARCHAR(50),
  surname VARCHAR(50),
  gender VARCHAR(6),
  sitting_mp VARCHAR(3),
  former_mp VARCHAR(3),
  votes INT,
  `share` FLOAT,
  `change` VARCHAR(20),
  PRIMARY KEY(ons_id,firstname,surname)
);

Import the csv file into your flat table

You can import the data using this line

LOAD DATA INFILE '/tmp/hocl-ge2015-results-full.csv' INTO TABLE ge COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;

The above line is complicated but it instructs MySQL about the format of the file.

  • CSV stands for "comma separated values", each field is terminated by a comma
  • Each line is ended with the newline, carriage return \r\n
  • Sometimes a value includes a comma in such cases the field is enclosed by quotes.
    • The constituency "Ayr, Carrick and Cumnock" is an example.
  • The first row contains column headings not data and so we "IGNORE 1 LINES"

Run some queries

Now let's look at some data. How many female candidates were there?

select count(1) from ge where gender='female';

Who stood in Edinburgh South?

select surname,votes from ge where constituency_name='Edinburgh South';

You can now move to the next stage: 2015 UK General Election Normalising Data