WCG restart delayed...until...13May 2023

Vester

Well-Known Member
USA team member
I am running my one computer, Intel i9-10850K, at NFS@Home until it maxes in the Top Computers list. I may run Folding@home later, but FAH is boring. My first project was United Devices (backing off unable to connect) in January 2002. At this time I am unable to be enthusiastic about any project.
 

supdood

Well-Known Member
USA team member
I've been parked at Universe with two Androids on Einstein. While the Universe credit bump has been nice, I'm more than ready to get back to cancer and Covid work. Realistically for me, though, my computing has been and will continue to be on a slow decline. New hardware is too expensive and even if it wasn't, electricity around here is. We had a jump from regularly overpriced to extremely overpriced this winter with the increase in natural gas, and it doesn't appear that it will be going down for the next 6-month rate period. Donating to these projects is worthwhile, but I'd rather spend the money on other things that improve my experience or that I enjoy: switching to an induction range, getting an EV for that instant torque and silent ride, swapping the water heater for a heat pump version to harvest some heat from the basement, replacing the old steam boiler with a heat pump and backup heater (and it's looking like A/C is going to be more important, even in New England, going forward). All are going to require an awful lot of that overpriced electricity, and I've already maxed out my very limited roof space with 8 solar panels. The electricity company is going to love me. The NG company, not as much.
 

doneske

Well-Known Member
USA team member
After sleeping on it overnight, I think I have reached the acceptance phase. It is what it is. I'm fearful the research teams have started looking to other alternatives or they might decide to shutdown with the data they have now. There hasn't been a lot of discussion concerning ARP other than they are working to make the data public. They could decide to shutdown and use the data they have now. I could see where the original shutdown period allowed the researchers an opportunity to consolidate, analyze, and cleanup the work that has come in so far but their funding doesn't last forever and nothing is being done while the clock continues to tick. If this outage drags on we may lose some projects. WCG was down to about 5 active projects and it wouldn't surprise me that ARP or HST or both would cease processing. LaJolla (OPN) might find an alternative like MIP. I suspect MCM would be the last to go for obvious reasons. After all this, I don't see Kembril wanting to on-board any additional work in the near term. I'm a goals oriented person so I need goals to shoot for like standings, number of WUs per day, completion of projects or batches, etc. It's very difficult to just sit and watch 6 million WUs of the same thing run day in and day out. I can get enthusiastic about any project as long as there is a goal in sight. I just can't bring myself to support SiDock though. What I'm hearing, at least from the vocal members of the team, is that members are just focused on their project of choice or are slowly deciding to wind down for various reasons. I'm willing to apply my limited resources (until I get moved into my new place) to any project this team thinks it would like to work on.
 

Nick Name

Administrator
USA team member
I haven't followed the transition in great detail. I just figured they will return to production when they're ready. I'm not surprised they ran into problems, but I expected better support from IBM until they started. If that support isn't there, I think they should have moved to open-source solutions even if it took a lot longer to get things operational. They would be better off in the long run. I'm sure they took what they thought was the simplest route to get things going. I'm not ready to throw in the towel here yet, hopefully they get things worked out.
 

supdood

Well-Known Member
USA team member
Another delay, but it seems like they are finally making some progress.

WCG Twitter
"QA testing has finally successfully finished and all the bugs have been resolved. Production environment is being tested right now. Considering the unexpected issues we ran into, we prefer to test it a couple of days more. We will provide further details in the next few days."
 

Nick Name

Administrator
USA team member
I was about to post this myself, so thanks for the update. Hopefully no more showstoppers are found. :)
 

doneske

Well-Known Member
USA team member
They posted a slightly more detailed statement on the website:

QA testing has finally successfully finished and all the bugs have been resolved. Production environment is being tested right now. Considering the unexpected issues we ran into with the QA system, we prefer to test it a couple of days further. The website and forum are ready to be relaunched, the production environment is successfully querying BOINC locally and we are going to restart BOINC slowly in the next few days. We will provide further details early next week.

If the website is ready, why not go ahead and launch it and let contributors get their information from there? Even if there isn't any work yet. Seems like an all or nothing approach. Wonder what "restart BOINC slowly" means?
 

Vester

Well-Known Member
USA team member
If the website is ready, why not go ahead and launch it and let contributors get their information from there? Even if there isn't any work yet. Seems like an all or nothing approach. Wonder what "restart BOINC slowly" means?
Restarting the forum is a good idea, even if there are complaints about delays.

Restarting BOINC slowly probably means limited tasks.
 

doneske

Well-Known Member
USA team member
If I were a Principal Investigator and I haven't been able to get any research done for 3 months going on 4 months, I would be looking for alternatives. Especially when I was originally told two months. Undergrads graduate, Post-Docs move on, funding terminates. If I were working in a lab with nothing happening, I would be looking for work elsewhere. I see this affecting HST and ARP more than the other two projects. I'm still amazed by the fact that they spent 6 months (Sept - Feb) planning this migration with IBM and still don't have it working more than 3 months after shutdown.
 

Nick Name

Administrator
USA team member
It's not good and doesn't bode well for when production actually starts. We'll just have to wait and hope for the best.
 

Vester

Well-Known Member
USA team member
There have been members with expertise volunteering to help, but there has been no response from WCG on Facebook or Twitter. There was supposed to be an update last Monday and I hear crickets.
 

supdood

Well-Known Member
USA team member
I wonder at what point it is time to scrap the IBM setup and go with a standard BOINC server install? I'm not sure how it would handle the load, but other projects have proven that you can have many discrete applications running on a single project without issue. There is clearly a lack of familiarity and expertise with what IBM transferred, and not having a standard BOINC setup means that they likely can't seek help from the BOINC admin community. I'm unclear on whether this would require them to rebuild from scratch or if it would allow them to ditch some of the IBM-specific components for simplicity.

If they aren't already, Krembil's administrators are going to be chaffing at the costs with absolutely zero return at the moment. How much longer do they let the tech team work on this until they decide to cut their losses and shut it down?
 

Jason Jung

Well-Known Member
USA team member
How much longer do they let the tech team work on this until they decide to cut their losses and shut it down?
I'm pretty sure they don't have a real tech team. It's literally the MCM researchers trying to go from zero to hero in a few months. I'd imagine whatever IBM handed over would be easier for them to work with than trying to start from scratch. I don't think a generic BOINC server installation is going to cut it for WCG. I'm sure they'll get it working eventually but lacking staff that deals with these kind of systems regularly makes me think onboarding new projects in the future may never happen.
 

doneske

Well-Known Member
USA team member
I was sort of thinking along the same lines as Supdood. At this point it might have been better to just install a standard BOINC install and go from there. There are some large projects out there running on that configuration. However, I kind of go back and forth on that because I would assume they would want to transfer the 14 years of processing history to the new project if they were going to keep the WCG name. That is problem number 1. If they would have chosen to start from scratch with a new name and not transfer the data, then what do you do with the active projects on WCG? That's problem number 2. I also think Jason is right that they don't have "IT" people and are trying to run this massive BOINC project with researchers. I'm starting to wonder how they even get research done based on their decision making to date. A real IT organization in this position would have negotiated a minimum 1 year support/knowledge transfer contract with IBM beginning when the IBM WCG contract ended (March). IBM would have provided all the open source (RedHat software, BOINC config, etc) and proprietary software support (WebSphere, DB2, etc) until Krembil got up to speed. I suspect that they could have negotiated a favorable contract based on the fact that IBM was wanting to unload WCG and Krembil was willing to take it. Part of that knowledge transfer would be to help them learn enough to take this thing over at the end of the contract and put milestones in the contract. The contract could be structured as a fixed price contract or time and materials contract.
 

Nick Name

Administrator
USA team member
It would make sense to stick with the IBM structure IF there was proper support during the transition period. I expected the former staff to at least get things up and running before handing over the reins but that doesn't seem to have happened. A standard BOINC / open source solutions setup makes even more sense if there isn't a full IT staff.
 

doneske

Well-Known Member
USA team member
From the prior update:

The website and forum are ready to be relaunched

Now I guess they aren't ready...... I step forward and 2 steps back
 

supdood

Well-Known Member
USA team member
The only good news I get from the update is it appears that they are still dedicated to getting everything up and running. I have the same concerns that many of you have expressed: if it is this difficult to launch, how will they handle the load and onboarding of new projects?
 
Top