An EPiC Adventure!

In late October 2009, I formed Electric Pillow Company (aka EPiC), LLC (a Delaware company) as a home for intellectual property and business arising from my avocation of biomechanical R&D. Upon being RIF’d from Sun Microsystems in November of 2009, I immediately shifted to making EPiC my full-time endeavour. While the charter purpose for EPiC remains R&D, my primary activities for the foreseeable future will remain centered on the delivery of consulting and training in the area of computer systems performance and capacity, which I’ll deliver under the trade name of “EPiC Performance Associates”. Government customers can easily contract for my services via GSA Schedule GS-10F-0144J, and all other customers can inquire with me directly.

In my almost 13 years at Sun, I was blessed to learn quite a lot about what goes wrong with computer systems performance and with the processes and politics that arise when things don’t work. I played a major leadership role in Sun’s internal communities focused on performance and Oracle, and was the driving force in the creation of the Sun team that handles performance-related service calls. I cherish the many amazing – and sometimes confounding – colleagues and customers with whom I’ve had the privilege of working. I now feel like I’ve graduated — with honors! In my “post-grad” work, I will continue seeking opportunities to apply and extend my knowledge and skills in the areas of computer systems performance and capacity, and I will continue to be active in forums such as the Computer Measurement Group (CMG) and the annual Hotsos Symposium.

For years I’ve fought the notion that “premium talent is too expensive” in industry – though everyone of that opinion always seems to expect the very best medical care if they themselves ever happen to end up in an Emergency Room! When the stakes are high, experience counts! After years of “smoke jumping” for Sun, I’m now feeling like an IT analog to “Doctors Without Borders”. My business model for consulting will be simple; leverage premium experienced resources for triage, diagnosis, strategy, and training – and thereby empower IT organizations to succeed using their own wits and staff talent. Between myself and my broad network of expert associates, I believe I’ll be able to help some companies find their way out of their performance and capacity miseries, and onto smoother waters.

This entry was posted on January 16, 2010 at 4:25 pm and is filed under General, Performance & Capacity, Sun & Solaris. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

8 Responses to “An EPiC Adventure!”

Cary Millsap Says:
January 16, 2010 at 8:05 pm | Reply
Bob, I believe you are going to help a lot of people and do very well for yourself. Best wishes. –Cary
- bobsneed Says:
  January 19, 2010 at 9:36 pm | Reply
  Thank you, Cary! My book of ready-to-go sermons includes a major pitch for Method R! I’ll be in touch.
stefan Parvu Says:
January 21, 2010 at 1:41 pm | Reply
Hi Bob,

All the best with the new company ! I hope you can travel to EU too 🙂
If we will need a really expert in DB performance issues you are on
top of my list.

cheers
- bobsneed Says:
  January 21, 2010 at 2:53 pm | Reply
  @ Stefan – Yep, I’ll travel anywhere. I’m also hoping to have a couple of seminars on the road by this Autumn.
Ken Hemmerling Says:
March 17, 2010 at 4:19 pm | Reply
Hey Bob, I have a follow-up question to your “CPU QoS” presentation from the 2010 Hotsos Symposium. There’s no e-mail address on the slides so I’m posting this on your blog in hopes of reaching you.

On slide 22 you say to force LGWR into FX 60.

I’m running Solaris 10 on a 5220 server. When I run “ps -e -o pid,ppid,class,pri,args” I find that the priority of my lgwr process has dropped to 1 (one). My sys admin tells me we run the FSS and SYS schedulers:
> ps -ef -o pset,class | grep -v CLS | sort | uniq
> – FSS
> – SYS

My questions are: Can I set the log writer to FX 60 with the schedulers currently running on my server? If so, what is the command?

I enjoyed your presentation. Any help is greatly appreciated.

Ken
- bobsneed Says:
  March 17, 2010 at 8:29 pm | Reply
  Thanks for posing the question here, Ken; it has been a popular one! I’ve just passed my updated slides to Hotsos, in which I’ve added a slide about the appropriate priocntl command along with how it is derived and a few caveats. I believe they are already posted on the Hotsos site for attendees. I do not normally widely-publish my Hotsos materials until some time well-after the Symposium. That’s my way of encouraging more folks to pay up and show up for this outstanding annual event!
  
  Keep in mind that the actual pain comes when LGWR actually suffers from significant scheduling latency or involuntary context switches as observable by ‘prstat -mL -p ‘. On a system with lots of idle time, even low-priority processes may be serviced “well enough” – so the correct expectation is that the benefit will be primarily experienced at high utilization. The fair-share scheduler (FSS) does its work by manipulating process priorities. In that context, I like to say that LGWR deserves an “unfair share”.
  
  As to “what’s available” on your system, use ‘priocntl -l’ to see the scheduling classes. The ps command only shows what’s currently being used. The subject could benefit from a deeper discussion to clarify things like what global priority level 60 actually is and why it’s OK to use in conjunction with FSS, but that’s more than I can manage right now.
  
  There is yet-another way to help assure LGWR CPU QoS when using FSS, and although I can’t recall seeing it done before, I’ll mention it here. That concept is to define a project for LGWR alone and assign a huge share to it. I like the FX 60 solution better. Besides, most of what I’ve seen with FSS usage has been customers running with default shares rather than purposefully assigning shares in relation to the relative business priorities of different projects. That’s a bit worrisome to me. OTOH, inasmuch as most shops are loathe to ever cross the line of 80% utilization, and given that Solaris is pretty efficient in delivering low scheduling latency – there is a lot of room for accidentally succeeding in spite of such things!
Ken Hemmerling Says:
March 17, 2010 at 9:31 pm | Reply
Thanks Bob, I’ve grabbed the updated slides from Hotsos and see your addition. I was just watching the output of ‘prstat -mL’ and saw a max Latency of 0.5 and a max ICX of 87. That’s for all process, not just lgwr. The brief period I watched just the lgwr, the highest ICX I saw was 1 and LAT never got above 0.0. While not busy, my system is handling normal Wednesday afternoon throughput.

I’ll definitely keep this information at the back of my mind for the next time we have a burst of extra processing dumped on my system.

Thanks again for the info.
- bobsneed Says:
  March 17, 2010 at 9:47 pm | Reply
  Excellent; you are now proficient with the relevant secondary metrics!
  
  I have a longstanding interest in configuring systems to degrade gracefully under load. My longer-term interest is in having specific elements of heterogeneous workloads degrade under load in inverse proportion to their business importance. There’s a lot of room for innovation in that area!

Bob Sneed's Blog