150 stories
1 follower

“It was just the three of us.  And dad was a truck driver so he...

1 Comment and 3 Shares

“It was just the three of us.  And dad was a truck driver so he was gone most of the time.  It could be a lot of stress.  My mom was almost like a single mother.  On my third birthday we moved to a small house outside of Denver.  Next door there lived an older couple named Arlene and Bill, and they started talking to me through the fence.  My first memory is Arlene handing me strawberries from her garden.  It was a wonderful connection.  After a few months, I knocked on their door, sat down in their living room, and said: ‘Will you guys be my grandparents?’  It was so silly.  They could have laughed it off.  But instead they started crying.  They printed out an adoption certificate and hung it on their living room wall.  That certificate remained until I left for college.  They became so important to me.  Their house was a refuge.  Bill was the kind of grandfather that always smelled like oil.  He taught me to drive everything.  He was always fixing stuff.  But he’d stop anything to sit down with me and have a glass of tea.  Arlene was the type of grandmother that loved crafts, which was perfect for a kid.  We were always putting tiny sequins on things.  Both of them supported me in all my dreams.  Through all my phases.  They encouraged me to apply for college, even though I didn’t have the money to go.  And when I got accepted, they presented me with a fund.  They told me they’d been putting away money since the day I adopted them.  Since I’ve become an adult, I’ve learned more about my grandparents.  They both grew up poor.  Arlene struggled with alcoholism when she was young, and that’s why they never had children.  Their lives weren’t as perfect as they seemed through the fence.  My grandmother passed away in 2013.  It was two days before our adoption anniversary.  My grandfather gave her eulogy.  And at the end, he said: ‘Arlene leaves behind her husband Bill.  And the greatest joy of her life her granddaughter Katie.’”


Read the whole story
4 days ago
Share this story

A kindness

1 Share

Dino works six days a week as a porter in my apartment building, cleaning walls and floors, removing trash, distributing recyclables. He’s one of those essential workers who are suddenly on the front lines. We’ve always been friendly.

I’ve been hibernating in my apartment for days, because it’s what we’re all supposed to do, and also because I have a bad cold. Today, when I ventured out of my apartment for 30 seconds to toss a trash bag down the chute, Dino was hard at work decontaminating the hallway. For the first time that I know of, he was wearing a respiratory face mask. I stood about twelve feet from him, smiled and waved, embarrassed to be in sleepwear in the middle of the day but glad to see a friendly pair of eyes.

Dino asked if I had a respiratory mask. I told him no—the stores have been sold out for months—but not to worry about me. He said he had an extra. I was, like, you need it more. He insisted. Won’t you take? For when you go shopping?

Finally I stopped being polite and guilty and class-conscious and embarrassed and allowed him to give me the mask. Finally we stopped being two players in an economic system and were just two souls in New York trying to survive the day and the next few months.

It has been eight hours since Dino’s act of kindness, and I’m still thinking about it, still thinking how I can pay it forward to someone who needs my help.

The post A kindness appeared first on Zeldman on Web & Interaction Design.

Read the whole story
5 days ago
Share this story


2 Comments and 5 Shares
So excited to see everyone after my luxury cruise home from the World Handshake Championships!
Read the whole story
8 days ago
Zoe will love this one!
Share this story
1 public comment
8 days ago
So excited to see everyone after my luxury cruise home from the World Handshake Championships!

Silo Launched Model Rocket Goes Thoomp

1 Comment

While rockets launched from silos are generally weapons of war, [Joe Barnard] of [BPS.Space] thought model rocketry could still do with a little more thoomp. So he built a functional tube launched model rocket.

Like [Joe]’s other rockets, it features a servo-actuated thrust vectoring system instead of fins for stabilization. The launcher consists of a 98 mm cardboard tube, with a pneumatic piston inside to eject the rocket out of the tube before it ignites its engine in mid-air. When everything works right, the rocket can be seen hanging motionlessly in the air for a split second before the motor kicks in.

The launcher also features a servo controlled hatch, which opens before the rocket is ejected and then closes as soon as the rocket is clear to protect the tube. The rocket itself is recovered using a parachute, and for giggles he added a tiny Tesla Roadster with its own parachute.

Projects as complex as this rarely work on the first attempt, and Thoomp was no exception. Getting the Signal flight computer to ignite the rocket motors at the correct instant proved challenging, and required some tuning on how the accelerometer inputs were used to recognize a launch event. The flight computer is also a very capable data logger, so every launch attempt, failed or successful, became a learning opportunity. Check out the second video after the break for a fascinating look at how all this data was analyzed.

[Joe]’s willingness to fail quickly and repeatedly as part of the learning process is a true display of the hacker spirit. We’ll definitely be keeping a close eye on his work.

Read the whole story
14 days ago
Great stuff.
14 days ago
Share this story

Layoffs Are Coming

1 Share

Layoffs are Coming

It’s looking increasingly likely that the COVID-19 pandemic will cause a recession. I’m neither an epidemiologist nor an economist, but I have worked in the tech industry through two recessions, and I can say with some certainty: layoffs are coming.

It’s easy to think we might be immune from the effects of a global recession, but my experience is that tech companies are quick to cut staff, especially engineers, in the face of declining markets. I hope I’m wrong, but I don’t think I am. Either way, it’s not going to hurt to prepare. 

Who’s likely to get laid off?

This bit is for anyone who’s never been through a round of layoffs — they’re wild, and fairly unlike any sort of business-as-usual. This is going to seem very cold, but it’s important to be highly aware of who’s likely to get laid off, and how this works in practice. Layoffs are highly political: your organization will probably get a specific target — a specific number of people — that comes as a mandate with very little if any flexibility. Lots of horse-trading ensues. Because this is so political, layoffs don’t track very closely to objective business needs. 

Instead, people are more likely to get laid off if:

  • They’re significantly more junior. Juniors are more likely to get laid off because they haven’t had the time to build the relationships and connections they need to be immune, and because their jobs tend to be perceived as less important. 
  • They’re significantly more senior. Senior employees are paid more, often significantly more. This a budget exercise, at core, and management will be considering the “ROI” on a layoff. If they can cut two senior engineers instead of 3-4 mid-careers, they may do so. This is particularly the case for anyone close to retirement: forcing early retirement is a classic layoff move.
  • They have a poor relationship with their boss, or their boss has a bad relationship with her boss, and so on up the chain. Your boss will end up in a room with her peers and their boss, and everyone will fight for their people. If your boss is weak politically, or he doesn’t care to fight for you, you’re in trouble.
  • They’ve gotten poor performance reviews recently. Performance reviews are usually fairly pro-forma, but they become critically important in a time of layoffs. Anyone who’s gotten a “below expectations” in the last year or so is likely to be first on the chopping block.

How to prepare for a layoff

Given the economic uncertainty, I’d argue that everyone should take the following steps to prepare for a layoff. But this is especially important for anyone who sees themselves in any of the above points.

Get your finances in order

If you are laid off, you’re going to need savings. Common advice is to have 6 months of living expenses in liquid assets (e.g. a savings account, cash). If this hits tech as hard as the .com bust — and I’m thinking it could — a year might be a better target. 

Hopefully this is something you’ve already got squared away. Savings aren’t something you can create quickly. But you do have some time; layoffs probably won’t start for a month or more. If your savings are thin, do whatever you can in the coming weeks and months to bulk them up.

Update your resume

When layoffs happen, they hit quick. You probably won’t see it coming. You don’t want to be trying to get your resume in order while negotiating severance and figuring out COBRA! So in the next couple of weeks, get your resume all updated and ready to send at a moment’s notice. Put a recurring reminder in your calendar/todo app so you remember to do this regularly (quarterly, perhaps.)

If you need good advice on writing a strong resume, I recommend spending some quality time over at Ask a Manager

If we know each other a little bit, I’d be happy to glance at your resume and give you some feedback. Get in touch!

Prepare for interviews

Interviewing is a skill: you can get better by practicing. Interview practices do vary widely, but there are usually some commonalities. For example, you’ll probably be asked about projects you’ve delivered, about your strengths and weaknesses, about how you resolve conflict with colleagues, and so on. Take some time to think through questions you might be asked and prepare notes. Better yet, find a friend who can ask you some questions and give you feedback on your responses.

For some sample questions, a good resource is 18F’s Engineering Hiring guide (bias alert: I wrote the first draft of this a couple years ago), particularly the technical interview and core values questions.

Refresh your professional network

If and when you need a new job, your professional network can be a huge asset. This’ll be especially true if layoffs are widespread, as you’ll be competing with tons of other out-of-work people. A personal connection, a strong reference, can make a huge difference.

So take some time to think through your professional network with an eye towards who could be helpful to you in a job hunt. Think about people who could be allies; folks you’ve worked with in the past who’ve seen your work and results, and could be called upon to give you a hand if needed.I’d recommend developing (and keeping updated) a little database including: 

  • Former managers (and higher-ups). These are of course great references, and also you may have had former bosses who you’d want to work for again. (In fact, my current job came about in precisely this way: a former grandboss was hiring, so it was an easy call for me.)
  • People who’ve been professional sponsors (e.g. supported you for a promotion, given you wider job responsibility, etc). These are particularly important. They dd those things because they believe in you, and will probably go to bat for you again.
  • Peers who you’ve worked closely with and can speak to your work. Particularly those who work places that you’d like to work. A personal recommendation can cut through several layers of HR screening.
  • People who you think are just terrific and want to work for. We’ve all probably worked with folks who we’d love to work under some day. If you’re out of job, why not tell them you want to work for them and see what happens?

Don’t worry if you haven’t talked to folks in a while; anyone reasonable should be happy to help if you reach out, even if it’s been a while. 

Consider your technical skills, and brush up if you can

I’ve left this for last, because it’s probably the least important. You’re unlikely to materially change your technical profile between now and a potential layoff. And, a few months of online classes on some new tech probably won’t move the needle much if and when you need to look. 

But, it is worth thinking about your technical skills and how they might play during a downturn. The skills that will be the most resilient to a downturn are going be the older, more conservative technical choices. Larger companies will be more resilient, and will tend to use older technology. And, in a downturn, companies will prioritize keeping existing things running over new development. 

To put a fine point on it: if we hit a downturn, Java’s going to get you a job far faster than Rust. If you’ve got some of those more old-school technologies — Java, C/C++, .NET, etc. — in your background, but haven’t touched them in a while, it could be worth your while to brush back up.

Further resources

Good luck in the weeks and months ahead. I’m afraid they’re going to be fairly rough. But hang in there; you can do this!

If you found the above useful, here are a few more resources you may want to check out:

Let's block ads! (Why?)

Read the whole story
15 days ago
Share this story

Finding a problem at the bottom of the Google stack

1 Comment

At Google, our teams follow site reliability engineering (SRE) practices to help keep systems healthy and users productive. There is a phrase we often use on our SRE teams: "At Google scale, million-to-one chances happen all the time." This illustrates the massive complexity of the system that powers Google Search, Gmail, Ads, Cloud, Android, Maps, and many more. That type of scale creates complex, emergent modes of failure that aren’t seen elsewhere. Thus, SREs within Google have become adept at developing systems to track failures deep into the many layers of our infrastructure. Not every failure can be automatically detected, so investigative tools, techniques, and most importantly, attitude are essential. Rare, unexpected chains of events happen often. Some have visible impact, but most don't.

At Google scale, million-to-one chances happen all the time.

This was illustrated in a recent incident that Google users would likely not have noticed. We consider these types of failures "within error budget" events. They are expected, accepted, and engineered into the design criteria of our systems. However, they still get tracked down to make sure they aren’t forgotten and accumulated into technical debt—we use them to prevent this class of failures across a range of systems, not just the one that had the problem. This incident serves as a good example of tracking down a problem once initial symptoms were mitigated, finding underlying causes and preventing it from happening again—without users noticing. This level of rigor and responsibility is what underlies the SRE approach to running systems in production. 

Digging deep for a problem’s roots

In this event, an SRE on the traffic and load balancing team was alerted that some GFEs (Google front ends) in Google's edge network, which statelessly cache frequently accessed content, were producing an abnormally high number of errors. The on-call SRE was paged. They immediately removed ("drained") the machines from serving, thus eliminating the errors that might result in a degraded state for customers. This ability to rapidly mitigate an incident in this way is a core competency within Google SRE. Because we have confidence in our capacity models, we know that we have redundant resources to allow for this mitigation at any time.

At this point, our SRE had mitigated the issue with the drain, but they weren’t done yet. Based on previous similar issues, they knew this type of error is often caused by a transient network issue. After finding evidence of packet loss, isolated to a single rack of machines, our SRE got in touch with the edge networking team, which identified correlated BGP flapping on the router in the affected rack. However, the nature of the flaps hinted at a problem with the machines rather than the router. This indicated that the problem revolved around a particular machine or set of machines.

Further investigation uncovered kernel messages in the GFE machines' base system log. These errors indicated CPU throttling:

MMM DD HH:mm:ss xxxxxxx kernel: [3220998.149713] CPU16: Package temperature above threshold, cpu clock throttled (total events = 1596886)

The process on the machine responsible for BGP announcements showed higher-than-usual CPU usage, which perfectly correlated with both the onset of the errors and the CPU throttling. This confirmed the theory that the throttling was significant enough to be impactful and measurable by Google's monitoring system:


The SRE then checked on adjacent machines to find if there were any other similarly failing systems. Notably, the only machines that were affected were on a single rack. Machines on adjacent racks were not affected!

Why would a single rack be overheating to the point of CPU throttling when its neighbors were totally unaffected?? What is it about the physical support for machines that would cause kernel errors? It didn't add up.

The SRE then sent the machine to repairs, which means that they filed a bug in our company-wide issue tracking system. In this case, the bug was sent to the on-site hardware operations and management team.

This bug was clear and to the point:

Please repair the following:
Machines in XXXXXX are seeing thermal events in syslog:
MMM DD HH:mm:ss xxxxxxx kernel: [3220998.149713] CPU16: Package temperature above threshold, cpu clock throttled (total events = 1596886)
This throttling is ultimately causing user harm, so I've drained user traffic.

This bug, or ticket, clearly specified the machine(s) that were affected and described the symptoms and actions taken up to that point. At this point, the hardware team took over the investigation and determined the physical issue that resulted in this chain of events in the software. Google's 24x7 team is composed of many teams, working together to ensure problems are well-understood at all levels of the stack.

Finding the cause of a chain of events

So what was the problem?

Hello, we have inspected the rack. The casters on the rear wheels have failed and the machines are overheating as a consequence of being tilted.


The wheels (casters) supporting the rack had been crushed under the weight of the fully loaded rack. The rack then had physically tilted forward, disrupting the flow of liquid coolant and resulting in some CPUs heating up to the point of being throttled.

Problem solved? Not quite. This looks alarmingly like a refrigerator about to tip over.

The caster got fixed and the rack was returned to proper alignment. But the greater issues of "How did this happen?" and "How can we prevent it?" needed to be addressed.

The hardware teams discussed potential options, ranging from distributing wheel repair kits to all locations to improving the rack-moving procedures to avoid damaging the wheels, and even considered improving the method of transporting new racks to data centers during initial build-out.

The team also considered how many existing racks risk similar failures. This then resulted in a systematic replacement of all racks with the same issue, while avoiding any customer impact.

Talk about deep analysis! The SRE tracked the problem all the way from an external, front-end system down to the hardware that holds up the machines. This type of deep troubleshooting happens within Google's production teams due to clear communication, shared goals, and a common expectation to not only fix problems, but prevent all future occurrences.  

Another phrase we commonly use here on SRE teams is "All incidents should be novel"—they should never occur more than once. In this case, the SREs and hardware operation teams worked together to ensure that this class of failure would never happen again.

All incidents should be novel.

This level of rigorous analysis and persistence is a great example of incident response using deep and broad monitoring and the culture of responsibility that keeps Google running 24x7. 

Google Cloud customers often ask how SRE can work in a hybrid, on-prem, or multi-cloud environment. SRE practices can be used to work across teams within an organization, across multiple environments. SRE helps teams work together during incidents like this, from traffic management to data center hardware operations.  

Find out more about the SRE approach to running systems and how your team can adopt SRE best practices.

Let's block ads! (Why?)

Read the whole story
16 days ago
Share this story
Next Page of Stories