Feedback

Do you have any question about how the data has been calculated? Do you see something weird in our data, and would like us to check it better? Did you download our data a few months ago, and are not sure if there have been improvements and new releases?

Please use the form below to contact us, or write us at jaume.bertranpetit@upf.edu. We will answer you and, unless you prefer otherwise, we will keep the answer there, so other people will be able to check it.

 

19 thoughts on “Feedback

  1. I was interested to download the Fst, iHH value between populations. Let me know how I can download using your tool.

    • Hello Tes,
      you can download it using the UCSC-Table function.
      Go to our browser and click on the “Tables” link. Then, click on the “Group” and the “track” menus, and select the tracks you want to download. Finally, click on “get output” to get the scores. Please also check that there are many options of interest – for example, you can download only the data for a given region, and also redirect the output directly to Galaxy or Great.

  2. Is there anyway to learn more about the database you used? I’m interested in technical details of how data was manipulated from the source and how it was uploaded to the database.

    • Hi Sooraj,

      we downloaded the VCF files from the 1000 Genomes project, and on top of these we applied a pipeline developed in house, to calculate all the tests for positive selection presented.

      The output of these tests was converted to the bigBed and the Wiggle format, which you can find defined in the UCSC web browser help page [1]. The wiggle format is used for all the tests whose output is per-SNP, while the bigBed is used for those that give an output per region. For example, iHS is calculated for every SNP, thus is Wiggle; CLR is calculated by region, and is thus converted to Bed. Actually I think that for the latests versions of the UCSC browser, you could use the bigBed for all the type of data.

      The browser we used is actually a mirror of the UCSC Genome Browser [2], running on one of our machines in Barcelona. If you need to produce something similar to our browser, but do not have the time to configure a UCSC browser mirror, I recommend you to try a UCSC Track hub instead [3], which is easier to implement and also more flexible, as it can be exported to the main UCSC browser installation.

      [1] https://genome.ucsc.edu/FAQ/FAQformat.html#format6
      [2] https://genome.ucsc.edu/cgi-bin/hgGateway
      [3] https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html

  3. Hi

    I wanted to use the data available here for some of our analysis. While looking at the Fst values calculated for SNPs, the corresponding P-values for some of them goes up to 2.5 .. Can you please elaborate on that?

    • Hi Vivek,
      the values presented as p-values in the data are -log10(rank) of the scores, so this is the reason why you see values higher than one. We convert the ranks to -log10, because this way it is easier to to visualize them in the browser (high peaks for high scores). Moreover, this way all the values higher than 3 are significant for 0.001.

      More info on how the p-values are calculated are here:
      https://hsb.upf.edu/?page_id=594

  4. Im using your data as part of my doctorate thesis. i wanted to put one of your examples as positive control , but nearly in all of them ‘Fay and Wu’s H’ score is not consistent with other methods to detect positive selection.
    does it mean that ‘Fay and Wu’ H’ is not a powerful method?
    what sequence did you use as outgroup for this analysis?

    Best Regards

    • Hi Masyh,
      thank you very much for your question.
      We will reply to you shortly – these are busy days for us 🙂
      Gio

      • Hi Masyh,
        first of all, for a good discussion on the power and drawbacks of Fay and Wu’s H, as well as a comparison with other methods, I recommend you the article by Zeng et al, 2006, and this resource. In general, when looking at Fay and Wu’s H results, it is useful to also turn on the Tajima’s D track, as these two tests provide complementary information. There is also a test called DH, proposed in the Zeng et al 2006 article, which combines Fay and Wu’s H with Tajima’s D, however we did not provide DH in our browser.

        Fay and Wu’s H is more powerful to detect recent selective sweeps in which the final frequency of the selected allele is high. This may be not the case for all the events of selective sweeps, so this is why we don’t see Fay and Wu’s H peaks in all the known cases of positive selection.

        In any case, I checked the examples posted in our page, and I think that at least in a few cases Fay and Wu’s H shows high p-values. For example, both SLC24A5 and SLC45A2 have a peak in Europeans, while OCA2 has a peak in Africans, and EDAR has high scores both in Europeans and Asians.

        • Thanks for reply.
          Fay and Wu has signal on these genes,but picks are not as strong as other methods,thats was a little surprising.
          and one final question, in XPCLR method, scores are calculated for sequences with 2kb gab among nearby sequences , is it possible for you to explain what is the rationale behind this?(im not expert in XPCLR, i couldnt find the answer anywhere else)

          • Hi Masyh,

            In general, Fay & Wu’s H is not a powerful method to detect selection, although in conjunction with Tajima’s D it can be. For this browser and for other projects carried out in our lab, we tested many simulated selection scenarios and compared against a neutral distribution, and in general it turned out that the methods based on Extended Haplotype Heterogyzosity are much better at differentiating events of selection. The methods based on SFS, instead, are not as good to differentiate events of selection in simulations.
            Overall, our suggestion is to not use FayWu’s H as the reference method to detect positive selection, because methods based on EHH are much better.

            Regarding the outgroup used in the analysis, we used the ancestral alleles provided by the 1000 Genomes project. You can download them in this folder of the 1000’s genome FTP
            , and you can have an idea of how they are calculated by looking at the README in the same folder, and at this FAQs.

            Regarding the XP-CLR: please note that some of tests for detecting positive selection are calculated by SNP, while others are calculated by windows. For example, iHS and XP-EHH are calculated for every SNP; instead, all the methods based on the Site Frequency Spectrum (SFS), like CLR, XP-CLR, and FayWu’s H, must be calculated by window. This depends on how the test is calculated: for example, you could not calculate the XP-CLR for a single SNP, because the test is based on the allele frequency of the SNPs in a region.

        • Hi Masyh,
          just another comments, after discussion with the other authors of this database.

          In general, Fay & Wu’s H is not a powerful method to detect selection, although in conjunction with Tajima’s D it can be. For this browser and for other projects carried out in our lab, we tested many simulated selection scenario and compared against a neutral distribution, and in general it turned out that the methods based on Extended Haplotype Heterogyzosity are much better at differentiating events of selection. The methods based on SFS, instead, are not as good to differentiate events of selection in simulations.

          Regarding the outgroup used in the analysis, we used the ancestral alleles provided by the 1000 Genomes project. You can download them in this folder of the 1000’s genome FTP
          , and you can have an idea of how they are calculated by looking at the README in the same folder, and at this FAQs.

          Overall, our suggestion is to not use FayWu’s H as the reference method to detect positive selection, because methods based on EHH are much better.

  5. Hello – the download links you have provided appear to be broken. Is there another way to download your raw data for annotation purposes?

    Thank you!

    • Hi Will,

      We had a problem with the FTP server. Our IT team is fixing it. It may take one or two weeks.

      You can try to download the data directly from the Tables function in the Browser.

      Marc

  6. Hello. I would find it very helpful if I could upload custom tracks to your browser, but currently this does not seem possible, as I get the following error message: Couldn’t connect to database customTrash on sit-mysql2.b.upf.edu as UCSC_GBrowser. Access denied for user ‘UCSC_GBrowser’@’%’ to database ‘customTrash’. Is there any chance this functionality could be added to your browser? Thanks!

    Dan

    • Hi Dan,

      Yes, since this is a custom installation of the UCSC Browser, we can’t allow to upload custom tracks. I’m sorry. However, the results for the Hierarchical Boosting method are uploaded on-the-fly to the official UCSC Browser, where is possible to upload custom tracks.

      Marc

  7. Hello, can your browser be used to know the ancestral allele of a set of SNPs, searching by name (e.g. rs87834) or chromosomal position?

    Thanks very much !

    • Hello Lucas,

      It is not possible to obtain the ancestral allele fro our database. We only provide information of its frequency per population. You should download the original 1KG VCF files and extract this information from there.

      Marc

Comments are closed.