Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com.
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.
6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
view the rest of the comments
These production clusters I have at work are a nightmare to (re)boot. They run in a rather hostile environment, so sometimes we need to take it all down due to external factors. The rule of thumb is that it takes and hour to shut down and two hours to start.
There are 6 servers, and they have to start (and stop) in the correct order. Each takes around 10 minutes to boot, so if all is to be done correctly, it's roughly 40 minutes. The rest of the startup procedure is checking internal stuff as well as interfacing with various robotics and misc.
It's possible to gamble a bit, though: start 1, wait a bit and then start the next one, hoping that they come online in the correct order. But sometimes it doesn't and this gamble results in having to shut down everything and start over.
....If you follow procedure, that is. I know the system well enough that I can start all machines at the same time and just interrogate and sort out any misbehaving components, thus cutting down the startup time a lot.
So yeah, while the system takes a lot of time to start, it's mostly due to procedural reasons. In theory it could all be booted and ready in~15 minutes if we make the startup sequence more forgiving.
That's brutal. Is it clustered data storage of some sort? All the most offensive startup and shutdown sequence I've seen are giant storage systems.
You nailed it. Each server has 36 hard drives forming three RAIDs. These 18 RAIDs form a disaster-tolerant beegfs volume of 1.6PB.
On top of that, there's a bunch of highly specialized geophysical software, an oracle database, and misc mundane services.