client logo
Version: 19.0.3 | Published: 29 Jan 2026 | Updated: 26 days ago
ind-dataset-logo

Genomics England - Cancer

Dataset

Summary

Population Size:
-1
Publication Date:
30 March 2023

Documentation

Description:
Cancer data are presented for either the patient level cancer diagnosis or 'disease type' or the tumour specific sample details of participants in the Cancer arm of the 100,000 Genomes Project.Data Relating to Cancer Participants:cancer_participant_disease: For each cancer participant in the 100,000 Genomes Project, this table includes data about their cancer disease type and subtype.cancer_participant_tumour: For each cancer participant's tumour in the 100,000 Genomes Project, this table contains data that characterises the tumour, e.g. staging and grading; morphology and location; recurrence at time of enrolment; and the basis of diagnosis.cancer_participant_tumour_ metastatic_site: For each cancer participant in the 100,000 Genomes Project, this table contains the site of their metastatic disease in the body (if applicable) at diagnosis.cancer_care_plan: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains information from their NHS cancer care plan on their treatment and care intent, in particular outcomes of MDT meetings and coded connected data (e.g. diagnoses from scans).cancer_surgery: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains details of what surgical procedures were had, as well as the specific location of the intervention.cancer_risk_factor_general: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains data on general cancer risk factors, namely smoking status, height, weight and alcohol consumption. This table was compiled with input from GeCIP members.cancer_risk_factor_cancer_specific: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains data on specific risk factors related to particular cancer types. This table was compiled with input from GeCIP members.cancer_invest_imaging: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains: coded data on imaging investigations characterising the scan, its modality, anatomical site and outcome; as well as the outcome of the imaging report in free text form.Data derived from or relating to tumour samples:cancer_invest_sample_pathology: For a proportion of cancer participants in the 100,000 Genomes Project, this table contains full pathology reports and other related data on and from their tumour samples around diagnosis and characterisation of the cancer. Please note that much of this information is also found in the clinic_sample and cancer_participant_tumour tables.cancer_specific_pathology: For a proportion tumours from cancer participants in the 100,000 Genomes Project, this table contains pathology data specific to that participant’s cancer type. This may provide additional data to the cancer_invest_sample_pathology and cancer_participant_tumour tables.cancer_systemic_anti_cancer_therapy: For a proportion tumours from cancer participants in the 100,000 Genome

Coverage

Spatial:
UK
Typical Age Range Min:
0
Typical Age Range Max:
150
Material Type:
  • DNA
  • Tissue
Follow Up:
Other
Pathway:
Linked datasets cover secondary care.

Provenance

Origin

Purpose:
  • Care
  • Study
  • Other
Dataset Type:
Health and disease
Source:
  • EPR
  • Electronic survey
  • LIMS
  • Other
Collection Source:
  • Clinic
  • Secondary care - Outpatients
  • Secondary care - In-patients

Temporal

Publishing Frequency:
Quarterly
Distribution Release Date:
30 March 2023
Start Date:
01 January 2014
End Date:
01 January 2019
Time Lag:
2-6 months

Accessibility

Access

Access Service:
More information about the Genomics England Research Environment can be found here: https://www.genomicsengland.co.uk/research Genomics England 100k participants have consented to longitudinal lifetime followup and recontact safely through our clinical network. BRST (Bioinformatics Research Services) are a team of bioinformatics who know the dataset inside out and provide consultancy projects on a case by case basis. Our network of clinical and medical experts can be made available on case by case basis. Researchers have the opportunity to work with our and access the GeCIP network who are a community of world-leading experts in specific cancers and rare diseases.
Access Request Cost:
Fees will be dependent on the type of access that is necessary. Raw data is not eligible for export. Summary-level data may be exported provided that it is approved through the Genomics England Airlock Process
Delivery Lead Time:
2-6 months
Data Controller:
GENOMICS ENGLAND
Data Processor:
GENOMICS ENGLAND
Jurisdiction:
Great Britain

Usage

Data Use Limitation:
General research use
Data Use Requirements:
  • Ethics approval required
  • Project-specific restrictions
  • Publication moratorium
Resource Creator:
The 100,000 Genomes Project Protocol v3, Genomics England. doi:10.6084/m9.figshare.4530893.v3. 2017. Publications that use the Genomics England Database should include an author as: Genomics England Research Consortium. Please see publication policy.

Format and Standards

Vocabulary Encoding Scheme:
  • OPCS4
  • READ
  • SNOMED CT
  • NHS NATIONAL CODES
  • ODS
  • ICD10
  • HPO
  • OTHER
Conforms To:
OTHER
Language:
English
Format:
Multiple Formats Available

Observations

Statistical Population
Population Description
Population Size
Measured Property
Observation Date
Findings
Rare Disease - Number of genomes
73517
Count
30 March 2023
Findings
Cancer Tumour - Number of genomes
17003
Count
30 March 2023
Findings
Cancer Germline - Number of genomes
32753
Count
30 March 2023
Persons
Cancer Participants
15624
Count
30 March 2023
Persons
Rare Disease Participants
72874
Count
30 March 2023