bkgoksel
u/bkgoksel
would definitely keep my all city gorilla monsoon. It's a drop bar steel frame bike with mountain bike-sh gearing, 2.1" tires (53mm?), and built with a very upright geometry. Can do non-technical singletrack or triple digit miles on the road with it and enjoy everything in the spectrum in between. Geometry, gearing and being steel means I can really pack it up with stuff for bikepacking. Hell, these days I use it for my commute and am thankful for those chonky tires given the state of potholes on my local roads.
For me loading the 8x7b@8.0bpw to two cards takes around 30 minutes (using tabbyAPI, but I've also used https://github.com/pytorch-labs/gpt-fast and it also took that long to load, could get 90t/s inference but obviously has no HTTP server support etc.) TBH this is loading the model weights from a HDD that's on the Windows filesystem while using WSL. So a lot of room for non-GPU related slowness. I don't mind because I load the model once at boot and keep my server up constantly. If you're curious I can try copying the weights to SSD in linux partition and measure how long loading takes.
I have 2x3090Ti on an old-ish board with 16x and 4x lanes. I've found tabby api to be a good backend for my purposes and am able to run mixtral8x7b at 8bpw with 32k context and getting around 40t/s. Loading models to the cards takes insanely long but once they're there inference seems to be reasonably fast (compared to numbers reported by people running 2x3090 on better boards)