-
Notifications
You must be signed in to change notification settings - Fork 461
The BOINC test drive
Suppose we've solved the supply side of the problem; BOINC has 10 million users, supplying many ExaFLOPS How do we get more scientists to use it?
The major conference and trade show for scientific computing is Supercomputing. Scientists who do HTC go there. Suppose BOINC had a booth at SC 2022 Scientists walk up, we give them a flyer What should it say? What "test drive" experience do we want them to have?
Ideally, in 10 or 15 minutes they'd be running jobs ~100 CPUs, and there's be a clear path to scaling up to millions.
The test drive can't include:
- reading any existing BOINC doc
- writing any XML
- doing sysadmin
- creating a web site
- recruiting volunteers
- building apps on Windows, Mac, or Android
- developing validators or assimilators
First, we create a "BOINC app library". It includes a number of widely-used apps (like Autodock, Charm, Rosetta, etc), compiled to run on BOINC (w/ the BOINC library). For each app, the library includes app versions for various platforms, CPU features, and GPUs. Each app version has an associated plan class specification. One of the apps is the VBox wrapper.
These apps are viewed as "secure": running them on a computer doesn't pose a security risk, regardless of the input files and cmdline parameters, even if the job was created by a malevolent hacker. That means we have to be careful about what we put in the library; we need to build the versions ourselves or vet the people who build them. And the apps themselves must not have - for example - the ability to run scripts in input files.
The app library exports a list of the app versions and their hashes. The BOINC client imports this list, so it can know if an app version is from the BOINC library.
In the BOINC client, an attachment to a project can be marked "restricted", in which case the client will only run apps for that project that are from the app library.
Notes:
- maintaining this library could be a lot of work!
- the library could be useful for other purposes; e.g. we could bundle Android app versions with the BOINC Android client
Second, we create a "Demo grid": a set of computers willing to run jobs for anyone, in restricted mode. Could be volunteers, or cluster nodes somewhere, or Amazon spot instances. The BOINC client running on these nodes is attached to an account manager which lets us dynamically attach them to projects. This may as well be an enhanced version of Science United.
Third, we create a BOINC project that I'll call BOINC Central (the name doesn't matter, no one sees it). Its job is to dispatch jobs for users who don't or can't run their own BOINC server. It has all the apps in library, and all versions, with the plan classes set up. (these are the only app versions it has).
Finally, we use Science United as a "switchboard" for dynamically attaching hosts to project. It knows which hosts are part of the Demo grid. For each project, it knows whether it is
- unvetted
- vetted (partial or full; see below) This info is used in deciding what projects to attach each host to.
goal: quickly run batches of jobs on computers you don't own
User experience:
- create an account on BOINC Central, Recaptcha, verify email address
2 variants:
1) Command line interface (Condor-like)
install a package
make a "submit file" that specifies a batch of jobs
- app
- input files
- cmdline params
- possible resource usage estimates
run "boinc_submit"
other cmdline commands to
- wait for competion of batch
(or email notification)
- show pending jobs (condor_q)
- abort jobs
- get resource usage of completed jobs
(for use in later submissions)
- get output files of completed jobs
2) Web interface: go to BOINC Central
pick an application
specify (through a web interface) a set of cmdline args
and/or a range of input files
click submit
email notification option
web interfaces for showing status, aborting
download output files as zip
How to implement
- Use BOINC Central for dispatching jobs
use existing job-submission and file-management RPCs
- Use the Demo grid;
SU attaches all Demo nodes to BOINC Central
(in restricted mode, though apps coming from there are secure).
There are limits on
- how much computing you get per week
- size of input/output files
possible variant:
- you can pay to get more computing
This is similar to Open Science Grid but
- no vetting of job submitters.
- has the BOINC "polymorphic app" concept
This is the "test drive" experience. It gives anyone - scientist or not - sporadic access to a few hundred computers. This may be all that some scientists need.
One of the apps in the library is the VBox wrapper, so you can bring your own apps but they have to run in VMs. Use boinc2docker (and TACC's extensions) to automate converting any Linux/Intel app to a Docker image. Could also develop tools for managing a set of these images. (my earlier "tire-kicking" google doc describes this)
Notes:
- no result validation is done; Demo grid nodes are assumed to be reliable.
- you don't have to specify job sizes (CPU, RAM, disk). We could have a system that estimates these for you, based on past jobs
Similar, but user has their own BOINC server;
avoids storage and bandwidth bottleneck of central server
Also lets you attach your own computers directly.
- get a Linux machine visible on Internet
could be Cloud node
- install BOINC server on that machine and create a project
could be from a package
could be BOINC server Docker
could be from a VM image
- BOINC server is a black box to user
- run commands to install apps from library
- submit jobs through same cmdline or web interface
- register your BOINC server with SU
no vetting
server is registered with SU as "unvetted project"
Implementation
Uses Demo grid hosts
Science United attaches Demo grid hosts to unvetted projects in restricted mode
Vetting:
partially vetted: we believe that
- your identity and affiliation are true
- you're doing the kind of computing you claim
(science area, location)
This gives you access to more computing but you can only use library apps
fully vetting: partial vetting plus
- we believe that your apps are not malware
- we believe that you do code signing
This lets you use your own non-VM apps
Partially vetted
You can use either the central or distributed model.
Your apps run on all Science United hosts (currently about 5,000).
Fully vetted
Use with distributed model (your own server)
You can add your own apps and app versions.
May as well use the current BOINC tools for this;
requires logging in to your project server,
code-signing, maybe writing XML plan class specs
Your project is registered on Science United,
and it's attached to hosts based on science area
and computing resources (that's how SU currently works)
Your apps run on all Science United hosts in trusted mode
Your project is listed on the BOINC web site,
and in the project list in the client GUI,
so volunteers can attach to it explicitly.
Notes:
- result validation becomes an issue,
mostly because of possible credit cheating.
Need to figure out how to do this in a way that doesn't require
users to write validators.
Or get rid of credit
How hard is this to implement?
Things I can do:
BOINC library framwork
BOINC Central
Changes to SU
Changes to BOINC client
Things I'd need help with:
Job submission interfaces
Things others would have to do
build app versions for BOINC library