Condor week summary: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(added talk 2,3 and 4)
Line 55: Line 55:
use macro subs: %this gives the process id..
use macro subs: %this gives the process id..
InitialDir=run_$(Process)
InitialDir=run_$(Process)
===Condor and Workflows: Nathan Panike===
Introduction: Workflows? sequence of connected steps..
launch and forget
DAG MAN -dependencies define possible order of job execution
===Pegasus - A system to run, manage and debug complex workflows on top of Condor: Karan Vahi===
Scientific workflows,larger monolithic applications broken to smaller jobs
'''Why workflows:''' portable, scalable, reuse, reproduce, WMS-recovery
'''Pegasus:''' local desktop, local condor pool. campus cluster.
Pegasus GUI
'''
Mapping-'''
workflow monitoring: SQLite and MySQL, python api to query, transfers executable s as part of workflow
===Basic Condor Administration: Alan De Semet===
'''Starting job:'''
condor_master= all machines.. (start other processes)
'''Central manager:''' master, negotiator, collector
'''collector:''' daemon knows about other daemons
'''Submit:''' master, schedd,
schedd--.>shadow
'''Compute machine:''' master, startd,
startd --->starter ---> launches Job
condor compile---> calls condor Syscall Lib
<pre>
***** configuration file*****
/etc/condor/condor_config
LOCAL_CONFIG_FILE (CSV)
long entry \ splits across multiple lines
</pre>
'''****Policy******'''
specified in condor_config
ends up in slot ClassAD
'''Machine''' -- one computer, managed by one started
START Policy:
RANK- floating point, larger number are higher ranked,
Suspend and continue:
Preemt (polite)  Kill (sigkill)
Slot states:
Custom slot attributes:
dynamic attributes settings STARTD_CRON_*
'''***Job priorities***'''
condor_userprio:  lower number means more machines..
real priority and priority factor:
priority factor-default is 1,assign it user by user basis
preemption_requirements=false  % no preemption
'''***Tools***'''
<pre>
condor_config_val
condor_conifg_cal -v CONDOR_HOST
condor_conifg_cal config
condor_Status -master
conder_status -long (everything)
condor_status -format '%s' Arch -format '%s\n'
-constraint  -format
condor_q -analyze
Debug level: D_FULLDEBUG D_COMMAND
</pre>


==Day 2: Talks==
==Day 2: Talks==

Revision as of 14:56, 8 May 2012

Condor week 2012, UW-Madison, May 1 - May 5, 2012

Attendees: John-Paul Robinson, Poornima Pochana, Thomas Anthony

Website: http://research.cs.wisc.edu/condor/CondorWeek2012/

Condor Week is a four day annual event that gives collaborators and users the chance to exchange ideas and experiences, to learn about latest research, and to influence our short and long term research and development directions.


Day 1: Tutorials

Basic Introduction to using Condor: Karen Miller

Background HTC Definitions: Job, Class Ads, Match Making, Central Manager, Submit host, Execute Host

What Condor does: submit- condor bundles up the executable and input files, condor locates a machine, runs the job, and gets the output back to the submit host.

Requirements (needs), Rank( preferences)

Condor Class Ads: used to describe aspects of each item outside condor. job Class ad: machine class ad:

Match making: requirement, rank and priorities (fair share allocation)

Getting started: universe, make job batch -ready, submit file, condor_submit

Universe-environment batch ready- run w/o interaction (as if in the background), make input, output available, data files submit description file- # comments, commands on left are not case sensitive, filenames are

Good advice: always have a log file

file transfer; Transfer_Input_Files, Transfer_Output_Files

Should_transfer_Files: Yes (no shared files system), NO (use shared FS) IF_NEEDED

emails: NOTIFICATION = complete, never, error, error, always.

Job Identifier: cluster.process eg. 20.1, 20.2 etc..

Multiple jobs : to create directories (based on the process id) InitialDir=run_0,run_1 etc… Queue all 1,000,000 jobs

Queue 100000 $(Process)

use macro subs: %this gives the process id.. InitialDir=run_$(Process)


Condor and Workflows: Nathan Panike

Introduction: Workflows? sequence of connected steps..

launch and forget

DAG MAN -dependencies define possible order of job execution


Pegasus - A system to run, manage and debug complex workflows on top of Condor: Karan Vahi

Scientific workflows,larger monolithic applications broken to smaller jobs

Why workflows: portable, scalable, reuse, reproduce, WMS-recovery

Pegasus: local desktop, local condor pool. campus cluster.

Pegasus GUI Mapping- workflow monitoring: SQLite and MySQL, python api to query, transfers executable s as part of workflow


Basic Condor Administration: Alan De Semet

Starting job: condor_master= all machines.. (start other processes)

Central manager: master, negotiator, collector collector: daemon knows about other daemons

Submit: master, schedd, schedd--.>shadow

Compute machine: master, startd, startd --->starter ---> launches Job

condor compile---> calls condor Syscall Lib

***** configuration file*****
/etc/condor/condor_config

LOCAL_CONFIG_FILE (CSV)

long entry \ splits across multiple lines

****Policy****** specified in condor_config ends up in slot ClassAD

Machine -- one computer, managed by one started

START Policy: RANK- floating point, larger number are higher ranked, Suspend and continue: Preemt (polite) Kill (sigkill)

Slot states: Custom slot attributes: dynamic attributes settings STARTD_CRON_*

***Job priorities*** condor_userprio: lower number means more machines.. real priority and priority factor:

priority factor-default is 1,assign it user by user basis

preemption_requirements=false  % no preemption

***Tools***

condor_config_val 

condor_conifg_cal -v CONDOR_HOST

condor_conifg_cal config


condor_Status -master

conder_status -long (everything)

condor_status -format '%s' Arch -format '%s\n' 

-constraint  -format

condor_q -analyze

Debug level: D_FULLDEBUG D_COMMAND

Day 2: Talks

Day 3: Talks

Day 4: Discussion Panels