Difference between revisions of "2015 UK General Election Normalising Data"
(Created page with "The unnormalised data from the CSV file looks like this: File:ge2015.png Notice that it includes redundancy - constituency names are repeated for example. We can see that...")
Revision as of 20:29, 19 October 2016
The unnormalised data from the CSV file looks like this: Notice that it includes redundancy - constituency names are repeated for example. We can see that link between the ons_id and the constituency_name is repeated over and over. Our first job is to identify a primary key. There are several options to consider:
- First we notice that there is no single column that is unique and so we will need to use at least a pair of columns.
- (firstname,surname) would be good
- The candidates are unique. By law, no one is allowed to stand as a candidate in two constituencies
- Unfortunately candidate names are not unique - there are two candidates called "Alan Johnson" for example. You can confirm this with a query such as
SELECT firstname,surname COUNT(1) FROM ge HAVING COUNT(1)>1