Difference between revisions of "2015 UK General Election Normalising Data"

From SQLZOO
Jump to: navigation, search
(Created page with "The unnormalised data from the CSV file looks like this: File:ge2015.png Notice that it includes redundancy - constituency names are repeated for example. We can see that...")
(No difference)

Revision as of 20:29, 19 October 2016

The unnormalised data from the CSV file looks like this: Ge2015.png Notice that it includes redundancy - constituency names are repeated for example. We can see that link between the ons_id and the constituency_name is repeated over and over. Our first job is to identify a primary key. There are several options to consider:

  • First we notice that there is no single column that is unique and so we will need to use at least a pair of columns.
  • (firstname,surname) would be good
    • The candidates are unique. By law, no one is allowed to stand as a candidate in two constituencies
    • Unfortunately candidate names are not unique - there are two candidates called "Alan Johnson" for example. You can confirm this with a query such as
      SELECT firstname,surname COUNT(1) FROM ge HAVING COUNT(1)>1