DUNE Data Catalog: Data Samples

35-ton Prototype

Tips on how to access the data files

If you know the run number you are interested in, just find the corresponding file (one file per run) in the list linked above. Look for its location in samweb -- here's a minimal set of commands. Setting up a dunetpc offline environment will accompish all of the setups needed.
 $ source /grid/fermiapp/products/dune/setup_dune.sh
 $ setup sam_web_client
 $ setup ifdhc
 $ samweb -e lbne get-file-access-url lbne_r004024_sr01_20151027T154934.root

which prints this out:

gsiftp://fndca1.fnal.gov:2811/pnfs/fnal.gov/usr/lbne/test-data/lbne/raw/00/04/41/51/lbne_r004024_sr01_20151027T154934.root

 $ cd to a directory with lot of free space
 $ ifdh cp -D gsiftp://fndca1.fnal.gov:2811/pnfs/fnal.gov/usr/lbne/test-data/lbne/raw/00/04/41/51/lbne_r004024_sr01_20151027T154934.root .

You can also access the file directly in dCache using the /pnfs file system, but the directory name in the url above should
have /pnfs/fnal.gov/usr/lbne/ replaced with /pnfs/lbne/.  Example:

 $ config_dumper /pnfs/lbne/test-data/lbne/raw/00/04/41/51/lbne_r004024_sr01_20151027T154934.root

To get SAM metadata for a file for which you know the full name:

 $ sam_metadata_dumper /pnfs/lbne/test-data/lbne/raw/00/04/41/51/lbne_r004024_sr01_20151027T154934.root

A more complete metadata printout can be had with this command:

 $ samweb -e lbne get-metadata lbne_r004024_sr01_20151027T154934.root

To list rawdata files for a given run:

 $ samweb list-files "run_number=4024 and data_tier=raw"

The "=" signs are optional.  We have a second value for data_tier and that's "sliced", for files run after the splitter.
The files were split using the current 'default' parameters which is to take all of every payload, therefore ghost 
triggers will be present in this sample and the size of each event is 15,000 TPC ticks.  Note -- sliced data with application
art, family daqag and version v00_00_01 are known to be buggy -- they have incorrect timestamp information for the Penn
Trigger Board and SSP's.  TPC data should be fine however.  We will version the sliced data in the future with
the dunetpc version.


To create a SAM dataset definition:

 $ kx509 #you have to first create a kerberos certificate from your kerberos ticket
 $ samweb -e lbne create-definition rawdata_run_8257 'run_number=8257 and data_tier=raw'

To make sure that you got the things right:
 $ samweb list-definition-files rawdata_run_8257

Another example of a query on contributing components.  There are sixteen RCE's (00 through 15), seven SSP's: (ssp01 through 07, though 07 is not
connected to a detector), and penn01.  This command gets a list of files that have RCE 0, SSP 2, and the Penn Trigger board contributing:

 $ samweb -e lbne list-files  "(lbne_data.detector_type like %rce00%) and (lbne_data.detector_type like %penn01%) and (lbne_data.detector_type like %ssp02%)"

Here's an example requiring all components to be present (cut and paste to use):

 $ samweb -e lbne list-files "\
(lbne_data.detector_type like %rce00%) and \
(lbne_data.detector_type like %rce01%) and \
(lbne_data.detector_type like %rce02%) and \
(lbne_data.detector_type like %rce03%) and \
(lbne_data.detector_type like %rce04%) and \
(lbne_data.detector_type like %rce05%) and \
(lbne_data.detector_type like %rce06%) and \
(lbne_data.detector_type like %rce07%) and \
(lbne_data.detector_type like %rce08%) and \
(lbne_data.detector_type like %rce09%) and \
(lbne_data.detector_type like %rce10%) and \
(lbne_data.detector_type like %rce11%) and \
(lbne_data.detector_type like %rce12%) and \
(lbne_data.detector_type like %rce13%) and \
(lbne_data.detector_type like %rce14%) and \
(lbne_data.detector_type like %rce15%) and \
(lbne_data.detector_type like %ssp01%) and \
(lbne_data.detector_type like %ssp02%) and \
(lbne_data.detector_type like %ssp03%) and \
(lbne_data.detector_type like %ssp04%) and \
(lbne_data.detector_type like %ssp05%) and \
(lbne_data.detector_type like %ssp06%) and \
(lbne_data.detector_type like %penn01% and \
data_tier=raw)"

Hee's an example filtering on the daq configuration:

 $ samweb -e lbne list-files (lbne_data.run_mode ssps_and_ptb_pd_filter_study)

Another useful command:

 $ samweb -e lbne locate-file lbne_r004024_sr01_20151027T154934.root

which gives a dCache directory and a tape label, enclosed in parentheses (if the file is on tape).  Accessing a file
that has migrated to tape and off of dCache should proceed as if it were on tape, but with a delay while the data are staged.

It seems as if specifying samweb -e dune and samweb -e lbne do almost the same thing, except the list of users
who are able to change the SAM database (declare metadata or dataset definitions) is different.  If you are not
authorized to do either of these things, submit a Service Desk ticket at http://servicedesk.fnal.gov
You can skip the -e lbne or -e dune by defining the environment variable GROUP to be lbne or dune.


Alex Himmel has a handy script

~ahimmel/bin/sam_query_paths 

which will go straight from a query to a list of dCache paths

35-ton Useful Run Numbers

Useful Links

Data Quality Monitoring for the 35-ton run: lbne-dqm.fnal.gov

SAM query dimension syntax

SAM user guide

samweb command line reference

To Do: Add more instructions on defining and using datasets in SAM.