[galaxy-user] Experience with Loading NGS data on standalone instance of galaxy
Abhishek Pratap
abhishek.vit at gmail.com
Fri Oct 2 15:24:31 EDT 2009
Hi Greg
Unfortunately it is not working for me. I made sure I cleared my
browser cache before re-viewing it.
I have set the option as suggested by you in the universe_wsgi.ini file.
-Abhi
On Fri, Oct 2, 2009 at 2:53 PM, Greg Von Kuster <ghv2 at psu.edu> wrote:
> Hello Abhishek,
>
> Add this to your universe_wsgi.ini file:
>
> allow_library_path_paste = True
>
> Then, clicking the down-arrow on the upload form
>
> Create new data library datasets ▼
>
> will give you 4 options, 1 of which is:
>
> Upload files from file system paths
>
> Greg Von Kuster
> Galaxy Development Team
>
>
> Abhishek Pratap wrote:
>>
>> Hi Greg
>>
>> I have updated my galaxy rep to changeset 2825. I dont see the
>> checkbox on the "Upload File" page. Am I missing something ?
>>
>> Thanks,
>> -Abhi
>>
>> On Fri, Oct 2, 2009 at 10:21 AM, Greg Von Kuster <ghv2 at psu.edu> wrote:
>>>
>>> Change set 2812 will be included in a release to the distribution today -
>>> here are details of a new option that we're hoping will provide what is
>>> needed for most labs.
>>>
>>> Add a new option, 'allow_library_path_paste' that adds a new upload page
>>> ("Upload files from file system paths") to the admin-side library upload
>>> pages.
>>> This form contains a textarea that allows Galaxy admins to paste any
>>> number
>>> of
>>> file system paths (files or directories) from which Galaxy will import
>>> library
>>> datasets, saving the directory structure (if desired).  Since such
>>> ability
>>> allows admins access to any file on the Galaxy server which is readable
>>> by
>>> Galaxy's system user, this option is disabled by default, and system
>>> administrators should take care in assigning Galaxy administrators when
>>> this
>>> feature is enabled.  Controls on what files are accessible to this tool
>>> based
>>> on ownership or other properties can be added at a later date if there is
>>> sufficient interest for such features.
>>>
>>> This commit also includes a checkbox on the "Upload directory of files"
>>> page
>>> (as well as the new "Upload files from file system paths" page above)
>>> that
>>> will
>>> prevent Galaxy from copying data to its files directory (by default,
>>> 'database/files/').  This is useful for large library datasets that live
>>> in
>>> their own managed locations on the file system, this will prevent the
>>> existence
>>> of duplicate copies of datasets (but means administrators must take care
>>> to
>>> manage data - moving or removing the data from its Galaxy-external
>>> location
>>> will render these datasets invalid within Galaxy).
>>>
>>> One unique feature to be aware of: when using the "Copy data into
>>> Galaxy?"
>>> checkbox on the "Upload directory of files" page, any symbolic links
>>> encountered in the chosen import directory will be made absolute and
>>> dereferenced ONCE.  This allows administrators to link large datasets to
>>> the
>>> import directory, rather than having to make full copies, while being
>>> able
>>> to
>>> delete such links after importing.  Only the first symlink (the one in
>>> the
>>> import directory itself) is dereferenced; all others remain.  See the
>>> following
>>> for an example:
>>>
>>> library_import_dir = /galaxy/import
>>>
>>> % ls -lR /galaxy/import
>>> /galaxy/import:
>>> total 6
>>> drwxr-xr-x   2 nate     nate         512 Oct  1 11:31 link/
>>>
>>> /galaxy/import/link:
>>> total 10
>>> lrwxrwxrwx   1 nate     nate          71 Oct  1 10:38 1.bed ->
>>> ../../../home/nate/galaxy/test-data/1.bed
>>> lrwxrwxrwx   1 nate     nate          60 Oct  1 10:38 2.bed ->
>>> /home/nate/galaxy/test-data/2.bed
>>> lrwxrwxrwx   1 nate     nate          11 Oct  1 10:38 3.bed ->
>>> ../../3.bed
>>> lrwxrwxrwx   1 nate     nate          35 Oct  1 11:30 4.bed ->
>>> ../../galaxy_symlink/test-data/4.bed
>>> lrwxrwxrwx   1 nate     nate          41 Oct  1 11:31 5.bed ->
>>> /galaxy/galaxy_symlink/test-data/5.bed
>>>
>>> % ls -l /galaxy/3.bed
>>> lrwxrwxrwx   1 nate     nate          60 Oct  1 10:39
>>> /galaxy/3.bed ->
>>> /home/nate/galaxy/test-data/3.bed
>>>
>>> % ls -l /galaxy/galaxy_symlink
>>> lrwxrwxrwx   1 nate     nate          44 Oct  1 11:30
>>> /galaxy/galaxy_symlink
>>> -> /home/nate/galaxy/
>>>
>>> In this example,
>>>
>>> 1.bed is a relative symbolic link to the real 1.bed.
>>>
>>> 2.bed is an absolute symlink to the real 2.bed.
>>>
>>> 3.bed is a relative symlink to ../../3.bed, aka /galaxy/3.bed, which
>>> itself
>>> is
>>> a symlink to the real 3.bed.
>>>
>>> 4.bed is a relative symlink which follows another symlink
>>> (/galaxy/galaxy_symlink) to the real 4.bed.
>>>
>>> 5.bed is an absolute symlink in the same fashion as 4.bed
>>>
>>> If the 'link' server directory is chosen on the "Upload directory of
>>> files"
>>> page, and "Copy data into Galaxy?" is checked "No", the following files
>>> will
>>> be
>>> referenced by Galaxy:
>>>
>>> /home/nate/galaxy/test-data/1.bed
>>> /home/nate/galaxy/test-data/2.bed
>>> /galaxy/3.bed
>>> /galaxy/galaxy_symlink/test-data/4.bed
>>> /galaxy/galaxy_symlink/test-data/5.bed
>>>
>>> The Galaxy administrator may now safely delete /galaxy/import/link, but
>>> should
>>> take care not to remove the referenced symbolic links (/galaxy/3.bed,
>>> /galaxy/galaxy_symlink).
>>>
>>> Not all symbolic links are dereferenced because it is assumed that if an
>>> administrator links to a path in the import directory which itself is (or
>>> contains) links, that is the preferred path for accessing the data.
>>>
>>>
>>>
>>> Oliver Hofmann wrote:
>>>>
>>>> Dear all,
>>>>
>>>>
>>>> to echo what Abhi said: we are also currently looking of ways to
>>>> automatically import data sets (libraries) into Galaxy without having to
>>>> manually trigger the import via the administration interface, and
>>>> ideally
>>>> while keeping the data in the original place. The idea here is to have
>>>> multiple tools all point at the original 'source data' without having to
>>>> replicate terabytes of data.
>>>>
>>>> Not quite sure how feasible this is in practice, but it certainly would
>>>> be
>>>> incredibly helpful.
>>>>
>>>> Best,
>>>>
>>>>    Oliver
>>>>
>>>>
>>>>
>>>>
>>>> On 28 Sep 2009, at 14:24, Abhishek Pratap wrote:
>>>>
>>>>> HI Greg
>>>>>
>>>>> Thanks for a quick reply and making some requested changes. However I
>>>>> am
>>>>> not still sure if importing NGS data will help in long run.
>>>>>
>>>>> For Centers generating NGS data which could 2-3 T.B / week depending on
>>>>> no. of sequencers I think importing another copy of raw data into
>>>>> galaxy
>>>>> workspace will be asking for lot of disk space. I understand it is a
>>>>> neat
>>>>> way of doing things as it becomes agnostic of the raw data location
>>>>>  but
>>>>> might not be the best way for handling huge data in long run for
>>>>> centers
>>>>> like ours.
>>>>>
>>>>> Please correct me if I am wrong. I think we could also have a simple
>>>>> option without having to import the data and just using it for analysis
>>>>> from
>>>>> the current location, also storing results at the same location. That
>>>>> way in
>>>>> future even if the data set is moved analysis also stays with it.
>>>>>
>>>>> Let me know what you feel. I will be happy to know if there are any
>>>>> other
>>>>> smart reasons of importing the data in galaxy workspace just for
>>>>> curiosity
>>>>> sake.
>>>>>
>>>>> Thanks,
>>>>> -Abhi
>>>>>
>>>>> On Mon, Sep 28, 2009 at 9:28 AM, Greg Von Kuster <ghv2 at psu.edu> wrote:
>>>>> Hello Abhishek,
>>>>>
>>>>> The Galaxy distribution includes the enhancements to which I previously
>>>>> referred for uploading history files.  Uploading files to a history
>>>>> now
>>>>> creates a Galaxy job just like any other tool, and can be run on a
>>>>> cluster
>>>>> node, allowing upload of very large files.  The initial pass of this
>>>>> work is
>>>>> also completed for uploading to a Data Library, but this enhancement is
>>>>> still in test, so it should soon be available in the distribution.
>>>>>
>>>>> Do you want to avoid having to import at all (e.g. allow Galaxy to
>>>>> refer
>>>>> to datasets that live in their original locations)?  This is not
>>>>> currently
>>>>> possible, but if this is what you are looking for, we can consider some
>>>>> additional options on the current upload form, or possibly a new,
>>>>> separate
>>>>> form.
>>>>>
>>>>>
>>>>> Greg Von Kuster
>>>>> Galaxy Development Team
>>>>>
>>>>>
>>>>> Abhishek Pratap wrote:
>>>>> Hi Greg, Anton and all
>>>>>
>>>>> Just wondering if there has been any progress made on this end. I am
>>>>> sorry I was not able to follow it up on Assaf's suggestion due to other
>>>>> things at work.
>>>>>
>>>>> I did try the latest version of galaxy and looks like the files are
>>>>> still
>>>>> transferred over HTTP before they could be used in the galaxy
>>>>> workspace.
>>>>> Also I would again like to highlight that many labs might want to use
>>>>> the
>>>>> local instance of galaxy and prefer to point to a local path where the
>>>>> file
>>>>> is being stored. That way we will have both the benefits of using a
>>>>> cool GUI
>>>>> and process data stored locally.
>>>>>
>>>>> Let me know if you guys need some feedback or have more questions. I
>>>>> will
>>>>> be happy to discuss them.
>>>>>
>>>>> best,
>>>>> -Abhi
>>>>>
>>>>> On Tue, Jul 21, 2009 at 4:26 PM, Greg Von Kuster <ghv2 at psu.edu
>>>>> <mailto:ghv2 at psu.edu>> wrote:
>>>>>
>>>>>   Hello Abishek,
>>>>>
>>>>>   We are currently in the process of significantly enhancing the
>>>>>   current Galaxy upload utilities, and the new version should
>>>>>   eliminate the issue you've raised about the time needed to upload
>>>>>   large files via HTTP ( not for making an initial copy of the file in
>>>>>   the Galaxy environment ). However, it will probably not be ready for
>>>>>   release for a few more weeks, so if you can take advantage of
>>>>>   Assaf's script in the meantime, that's great. ¨ÜI can't guarantee
>>>>>   that all Galaxy features will function correctly if you do this
>>>>> though.
>>>>>
>>>>>   Assaf, have you found that using your script breaks anything?
>>>>>
>>>>>   Also, if you upload a file to a library rather than a history,
>>>>>   multiple users can "import" the library dataset into their history
>>>>>   for analysis, but there is only 1 file on disk ( users are pointing
>>>>>   to it from their histories ). ¨ÜBut uploading a file to a history
>>>>>   will create a new copy of the file each time it is uploaded.
>>>>>
>>>>>   Greg Von Kuster
>>>>>   Galaxy Development Team
>>>>>
>>>>>
>>>>>
>>>>>   Abhishek Pratap wrote:
>>>>>
>>>>>       Hi All
>>>>>
>>>>>       @Greg : Please find my comments below.
>>>>>
>>>>>       On Tue, Jul 21, 2009 at 10:44 AM, Greg Von Kuster<ghv2 at psu.edu
>>>>>       <mailto:ghv2 at psu.edu>> wrote:
>>>>>
>>>>>           Hello Abhi,
>>>>>
>>>>>           Can you clarify the steps you took that produced the
>>>>>           behavior? ǃÜSee my
>>>>>
>>>>>           comments below.
>>>>>
>>>>>           Anton Nekrutenko wrote:
>>>>>
>>>>>               Abhishek:
>>>>>
>>>>>               Let talk. This is the area of active current
>>>>>               development. We are ǃÜlooking
>>>>>
>>>>>               at implementing a universal fastq-like format or
>>>>>               supporting ǃÜmultiple
>>>>>
>>>>>               formats. Perhaps we should join efforts in ironing
>>>>> out
>>>>>               ǃÜspecifications.
>>>>>
>>>>>
>>>>>               anton
>>>>>               galaxy team
>>>>>
>>>>>
>>>>>               On Jul 20, 2009, at 5:18 PM, Abhishek Pratap
>>>>> wrote:
>>>>>
>>>>>                   Hi All
>>>>>
>>>>>
>>>>>                   I recently came to know about NGS analysis
>>>>> on galaxy
>>>>>                   during ISMB.
>>>>>                   Getting excited I tried couple of things
>>>>> basically
>>>>>                   to play with it.
>>>>>
>>>>>                   Few comments : I may have interepretted
>>>>> something
>>>>>                   described below in a
>>>>>                   wrong way. My apologies before hand.
>>>>>
>>>>>
>>>>>
>>>>>                   On a standalone installation of galaxy while
>>>>> I was
>>>>>                   trying to explore
>>>>>                   one FASTQ(sequence) file. It takes
>>>>> considerable (>
>>>>>                   20 min) for a fastq
>>>>>                   file to get uploaded (2 GB).
>>>>>
>>>>>           Are you using the Galaxy upload utility to create an
>>>>> item in
>>>>>           your history
>>>>>           that points to the dataset file on disk?
>>>>>
>>>>>
>>>>>       Yes that is precisely correct, I am trying to upload a solexa
>>>>> FASTQ
>>>>>       file but on a standalone galaxy installation from my local
>>>>> file
>>>>>       system.
>>>>>
>>>>>           I am not sure what is the rationale
>>>>>
>>>>>                   behind that. Ideally I think there should be
>>>>> no need
>>>>>                   to upload such
>>>>>                   heavy files into the workspace.
>>>>>
>>>>>           A data file that originates from a place external to
>>>>> Galaxy
>>>>>           must be uploaded
>>>>>           into Galaxy so that the disk file can be placed in the
>>>>>           location configured
>>>>>           in the Galaxy config file. ǃÜAlso, when data is
>>>>> uploaded to
>>>>>
>>>>>           Galaxy ( either
>>>>>           to a history or a library ), several database table
>>>>> settings
>>>>>           are created
>>>>>           that are used by various Galaxy features.
>>>>>
>>>>>           They could actually be used straight
>>>>>
>>>>>
>>>>>       Thanks for the clarification but I am not sure this will help
>>>>> a
>>>>>       lot of
>>>>>       people who are interested to install and run galaxy locally
>>>>>       mainly for
>>>>>       the following reasons. May be it is just local to me.
>>>>>
>>>>>       A. We already one instance of data saved on the local file
>>>>> system
>>>>>       B. Making another copy via galaxy will eat away a lot of space
>>>>>       in long run.
>>>>>       C. The time needed to import the files into galaxy space is
>>>>> huge
>>>>>
>>>>>                   away by the path specified.
>>>>>
>>>>>           What do you mean by "the path specified"?
>>>>>
>>>>>
>>>>>
>>>>>       Well what I mean was a way to specify the path of the file/run
>>>>>       on the
>>>>>       lcoal file system and galaxy could directly pick it up from
>>>>> there
>>>>>       rather than uploading it into its own space. Now I understand
>>>>> this
>>>>>       might not work based on the way the system was designed.
>>>>>
>>>>>
>>>>>           Also is there any way to access the
>>>>>
>>>>>                   scripts for analysis on the command line. I
>>>>> know
>>>>>                   this undermines the
>>>>>                   main aim of working with galaxy but rite now
>>>>> I am
>>>>>                   concerned about the
>>>>>                   performance/time.
>>>>>
>>>>>           You should be able to run any Galaxy tool from the
>>>>> command
>>>>>           line as long as
>>>>>           you have all of the tool's required binaries in your
>>>>> path.
>>>>>           ǃÜHowever, running
>>>>>
>>>>>           a tool from within Galaxy should generally not be any
>>>>> slower
>>>>>           than running it
>>>>>           outside of Galaxy, depending, of course, on what you are
>>>>> doing.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>       Ok I was under the impression that running from SHELL will
>>>>> eliminate
>>>>>       the step of uploading them into galaxy file space.
>>>>>
>>>>>
>>>>>       -Abhi
>>>>>
>>>>>                   I will be happy to discuss more about this
>>>>> in case
>>>>>                   you have some
>>>>>                   comments/questions for me.
>>>>>
>>>>>
>>>>>
>>>>>                   Best,
>>>>>                   -Abhi
>>>>>
>>>>>
>>>>>
>>>>>                   -----------------------------
>>>>>
>>>>>                   Abhishek Pratap
>>>>>
>>>>>                   Bioinformatics Software Engineer
>>>>>
>>>>>                   Institute for Genome Sciences
>>>>>
>>>>>                   School of Medicine, Univ of Maryland
>>>>>
>>>>>                   801, W. Baltimore Street, Baltimore, MD
>>>>> 21209
>>>>>
>>>>>                   Ph: (+1)-410-706-2296
>>>>>
>>>>>                   www.igs.umaryland.edu/
>>>>> <http://www.igs.umaryland.edu/>
>>>>>                  
>>>>> _______________________________________________
>>>>>                   galaxy-user mailing list
>>>>>                   galaxy-user at bx.psu.edu
>>>>> <mailto:galaxy-user at bx.psu.edu>
>>>>>
>>>>>
>>>>> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
>>>>>
>>>>>               Anton Nekrutenko
>>>>>               http://nekrut.bx.psu.edu
>>>>>               http://galaxyproject.org
>>>>>
>>>>>               _______________________________________________
>>>>>               galaxy-user mailing list
>>>>>               galaxy-user at bx.psu.edu
>>>>> <mailto:galaxy-user at bx.psu.edu>
>>>>>
>>>>>              
>>>>> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> galaxy-user mailing list
>>>>> galaxy-user at bx.psu.edu
>>>>> http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
>>>>
>>>> --
>>>> Research Associate    Department of Biostatistics
>>>> Associate Director    Bioinformatics Core
>>>>                      Harvard School of Public Health
>>>> Skype: ohofmann       Phone: +1 (617) 365 0984
>>>>
>>>>
>>>>
>>>
>>
>>
>
>
More information about the galaxy-user
mailing list