+1 Recommend
    • Review: found
    Is Open Access

    Review of 'A new attribute-linked residential property price dataset for England and Wales, 2011 to 2019'

    A new attribute-linked residential property price dataset for England and Wales, 2011 to 2019Crossref
    Very good description of a valuable housing dataset. Enjoy the work very much!
    Average rating:
        Rated 4.5 of 5.
    Level of importance:
        Rated 4 of 5.
    Level of validity:
        Rated 4 of 5.
    Level of completeness:
        Rated 4 of 5.
    Level of comprehensibility:
        Rated 5 of 5.
    Competing interests:

    Reviewed article

    • Record: found
    • Abstract: found
    • Article: found
    Is Open Access

    A new attribute-linked residential property price dataset for England and Wales, 2011 to 2019

    Current research on residential house price variation in the UK is limited by the lack of an open and comprehensive house price database that contains both transaction price alongside dwelling attributes such as size. This research outlines one approach which addresses this deficiency in England and Wales through combining transaction information from the official open Land Registry Price Paid Data (LR-PPD) and property size information from the official open Domestic Energy Performance Certificates (EPCs). A four-stage data linkage is created to generate a new linked dataset, representing 79% of the full market sales in the LR-PPD. This new linked dataset offers greater flexibility for the exploration of house price (/m 2 ) variation in England and Wales at different scales over postcode units between 2011 and 2019. Open access linkage codes will allow for future updates beyond 2019.

      Review information

      This work has been published open access under Creative Commons Attribution License CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com.

      data linkage,Land Registry Price Paid Data,Domestic Energy Performance Certificates,Built environment,Energy,England and Wales,Urban studies,Sustainable and resilient cities

      Review text

      This paper describes a sound and logical process that creates a valuable housing dataset. By linking LR-PPD and EPC data, countrywide housing transaction records being enriched with highly useful floor area attributes. Such a dataset is very much welcomed with a high matching rate and open access code. The benefits of this process are evident with stats figures provided in the summary.

      I found the paper is well organized and presented. It is easy to follow although a very complex process was described. The diagrams illustrate the logic behind the steps well. This makes the matching results justifiable.

      I have the following questions or suggestions which hopefully will help to improve it further:

      1. This paper, as it is titled, is a data description summary. It will be great if it can be extended into a method paper where more details about the rules can be included.
      2. Relate to this, I find the paper describes the logic of data processing very well. But there are limited examples provided.  For instance, in P5, 95 new variables in EPC and 180 variables in LR-PPD were mentioned to be included. Would be better to see a couple of examples. Also, I appreciate there are 251 matching rules they are detailed and complex. It would be great to see some examples too. This will make the logic clearer. At the moment the paper is rather conceptual.
      3. In terms of validations, it would be great if a manual random check of the matching results can be included. This will introduce the data with extra (a) examples of matching results (b) accuracy descriptions at the end.
      4. After aggregating to the census unit, whether it is possible to compare with the census housing figures such as the distribution of house types or other types of commonly sorted attributes?
      5. As mentioned, PPD data is updated regularly. It may be worth checking different versions of data to see if they result in different matching rates. This will give us a reliability test.
      6. It may be also worthwhile to include more limitations discussed at the end.

      Thanks to Bin and others for sharing this interesting article and generating a useful dataset. Hope more details of the method and wider application will be made available soon.



      Comment on this review