API
IRMA.DataSetEntry
IRMA.SHist
IRMA.Stopwatch
IRMA.Stopwatch
IRMA.Stopwatch
OnlineStats.Hist
IRMA.addFileToDataSetEntry
IRMA.analyzeInputFiles
IRMA.analyzeInputFiles
IRMA.analyzeInputFiles
IRMA.asNamedTuple
IRMA.chooseDataSets
IRMA.deserializeArray
IRMA.displayDataSetEntries
IRMA.makeGetStructureVisitor
IRMA.mergeStatsCollectionWithSHist
IRMA.mpiAllgatherSerialized
IRMA.mpiGatherSerialized
IRMA.mpi_shared_array
IRMA.partitionDS
IRMA.rankConfig
IRMA.rankTimings
IRMA.rankTotalTime
IRMA.stamp
IRMA.visitH5Contents
IRMA.DataSetEntry
— TypeDataSetEntry is an object that represents an HDF5 dataset that can be in many files
IRMA.SHist
— MethodSHist(h::Hist)
Create a SHist (Static Histogram) from an already filled Hist.
Note that the SHist is immutable.
IRMA.Stopwatch
— TypeStopwatch
Stopwatch is an object that keeps track of MPI.Wtime when asked.
At construction time, the "start" time is recorded.
Call `stamp(sw, stamp)` to record the current MPI.Wtime and label
it with "stamp"
Call `asNamedTuple(sw)` for transferring to other MPI ranks. The
resulting NamedTuple with be `isbits` type.
IRMA.Stopwatch
— MethodStopwatch(nt::NamedTuple)
Create a Stopwatch from a previously filled named tuple
IRMA.Stopwatch
— MethodStopwatch()
Create a Stopwatch. The "start" entry will automatically be made
and the MPI.Wtime will be filled in.
OnlineStats.Hist
— MethodHist(sh::SHist)
Create an OnlineStats.Hist from a SHist
IRMA.addFileToDataSetEntry
— MethodAdd a file to the DataSetEntry - This will also record what row numbers should come from this file
IRMA.analyzeInputFiles
— FunctionanalyzeInputFiles(path, outFileName="out.jld2")
Analyze input files and write dataset data to a jld2 file.
Use this one for a "CLI like" experience. For example
analyzeInputFiles( joinpath(ENV["CSCRATCH"], "irmaData2", "2C"), "2C_analyze.jld2")
IRMA.analyzeInputFiles
— MethodanalyzeInputFiles(inFiles::Vector{String}, groups, inDataSets)
Fill groups and inFiles from a vector of file names to be read and analyzed.
A good way to get a list of file names is to use Glob.glob(path).
IRMA.analyzeInputFiles
— MethodanalyzeInputFiles(path, groups, inDataSets)
Fill groups and inFiles from a vector of file names to be read and analyzed.
IRMA.asNamedTuple
— MethodasNamedTuple(sw::Stopwatch)
Convert `sw` into a `NamedTuple` for MPI transport.
IRMA.chooseDataSets
— FunctionchooseDataSets(inDataSets, selectThese=[], group="")
Choose the datasets to use from the file. You need the DataSetEntry dictionary (`inDataSets`), a
string vector (`selectThese`) of the dataset names you want. If many come from the same group, you can
set `group` to that group name and relative names in `selectThese` (if there is an absolute path in `selectThese`,
then the group name won't be applied).
A vector of the matching DataSetEntry elements will be returned.
IRMA.deserializeArray
— MethoddeserializeArray(a, s) When you use MPI.Gather, you get one long array with all of the contents from the ranks mushed together. They need to be separated and deserialized.
Returns an array of deserialized objects
a is the array of data all mushed together
s is an array of the data size for each rank
IRMA.displayDataSetEntries
— MethoddisplayDataSetEntries(inDataSets::Dict)
Print info about DataSetEntry objects in dictionary.
IRMA.makeGetStructureVisitor
— MethodmakeGetStructureVisitor(groups, datasets) Populate groups and datasets structures with this object
If this object is a group, add the name to the groups list if it is not there
If this object is a dataset, and if it is the first time we've seen it, then add this dataset to the DataSetEntry structure.
then, everytime we see this dataset, we'll add the size to the structure and a mapping to the input file
IRMA.mergeStatsCollectionWithSHist
— MethodmergeStatsCollectionWithSHist(s1::Series, s2::Series)
Because Static Histograms are immutable, we cannot use the standard `OnlineStats.merge`
function (actually, it is `OnlineStatsBase.merge`) because the underlying function
is `merge!`.
IRMA.mpiAllgatherSerialized
— MethodmpiAllgatherSerialized(obj, comm)
Serializes the object, determines the size, calls MPI.Allgather on the sizes,
calls MPI.Gather on the serialized data, deserializes the data.
Returns an array of deserialized data from all of the ranks.
All of the ranks get the full data in an array.
obj is the object to serialize and send
comm is the MPI communicator
IRMA.mpiGatherSerialized
— MethodmpiGatherSerialized(obj, isroot, root, comm)
Serializes the object, determines the size, calls MPI.Gather on the sizes,
calls MPI.Gather on the serialized data, deserializes the data.
Returns an array of deserialized data from all of the ranks. Only the root rank
gets the full gathered array
obj is the object to serialize and send
isroot is a boolean which is true if this rank is the root rank
root is the root rank id
comm is the MPI communicator
IRMA.mpi_shared_array
— Methodmpisharedarray(nodecomm, Type, size; ownerrank) From https://github.com/JuliaParallel/MPI.jl/blob/master/test/testsharedwin.jl Create a shared array, allocated by process with rank owner_rank
on the nodecomm provided (i.e. when `MPI.Commrank(nodecomm) == ownerrank`). Assumes all processes on the nodecomm are on the same node, or, more precisely that they can create/access a shared mem block between them. usage: nrows, ncols = 100, 11 const arr = mpisharedarray(MPI.COMMWORLD, Int, (nrows, nworkersnode), ownerrank=0)
IRMA.partitionDS
— MethodpartitionDS(dsLength, nRanks)
Given the length of a dataset (or anything, really), determine and return partitions over
nRanks MPI ranks that are as close to the same size as possible. This is really just a wrapper
around Distributed.splitrange with some added error checking to produce nice messages.
IRMA.rankConfig
— FunctionrankConfig(comm) Determines the MPI configuration of this rank, in three spaces
- Global space - space of all ranks
- Node space - The space of ranks on a particular node
- Among Node Roots space - The space of node-root ranks
This function determines,
* The global rank number (myRank)
* The number of global ranks (nprocs)
* The # of the root rank in global space (rootRank)
* True if this rank is global root (isRoot)
* The number of ranks on this node (nprocsOnNode)
* The node-space rank number (myRankOnNode)
* The # of the root rank in node-space (rootRankOnNode)
* True if this rank is a root rank in node-space (isRootOnNode)
* The number of nodes in use (nNodes)
* The # of the node this rank is on (myNode)
* If this rank is a node root rank, # of rank within that space (myRankAmongNodeRoots)
* If this rank is a node root rank, the # of ranks in that space (nprocsAmongNodeRoots)
For the last two, disregard if this rank is not a node root rank (they are the values
in the Among Node non-root ranks, which isn't all that useful)
Returns a Named Tuple of information above along with the commOnNode communicator
IRMA.rankTimings
— MethodrankTimings(arrayOfNamedTuples)
Process an array of named tuples (e.g. from MPI ranks from `asNamedTuple`
that was gathered and saved) turning them into an array of NamedTuples
of timing differences for each step.
IRMA.rankTotalTime
— MethodrankTotalTime(arrayOfNamedTuples)
Return an array for the total time of each ranks
IRMA.stamp
— Methodstamp(sw::Stopwatch, stamp::String)
For a `Stopwatch` `sw`, record the MPI time and the stamp.
It returns the elapsed time from the previous stamp.
IRMA.visitH5Contents
— MethodvisitH5Contents(inH5, isMine, visitor)
Walk the contents of an H5 file, visiting each group and
dataset in the hierarchy.
inH5 is the opened HDF5 file object or group object to walk
This functions will walk within HDF5 file and group objects and will
recursively dive into a hierarchy of groups. The visted object is passed
to the visitor function (it must handle whatever object is passed in)