[galaxy-dev] using Galaxy for map/reduce
james at jamestaylor.org
Tue Aug 2 12:43:01 EDT 2011
On Aug 2, 2011, at 10:12 AM, Andrew Straw wrote:
> 1) My first specific problem is that loading many datasets (e.g. 250)
> be extremely slow.
What browser are you using?
> 2) My second specific problem is that applying a workflow with N steps
> to many datasets creates even more datasets (Nx250 additional datasets).
> issues I haven't diagnosed further, but the console in which I'm running
> run.sh indicates many errors of the type "Exception AssertionError:
> AssertionError('State <sqlalchemy.orm.state.MutableAttrInstanceState
> object at 0x7f5c18c47990> is not present in this identity map',) in
> <bound method MutableAttrInstanceState._cleanup of
> <sqlalchemy.orm.state.MutableAttrInstanceState object at
> 0x7f5c18c47990>> ignored". Furthermore the webserver gets slow and my
> nginx frontend proxy gives 504 gateway time-outs.
Yes, creating all the jobs and datasets for a workflow is relatively slow right now. We have some optimizations for this that are not in the mainline (not well tested) however there is a limit to how fast it can be with so many new datasets and objects being created.
The better solution is probably to move workflow creation into a background process. Starting the workflow would just save the initial state, and a background process would actually create all the datasets and jobs and get it running. The downside is that the history would not be completely populated by the time the page had returned.
James Taylor, Assistant Professor, Biology / Computer Science, Emory University
More information about the galaxy-dev