Author: Niall Richard Murphy
Publisher: "O'Reilly Media, Inc."
ISBN: 1491951176
Category :
Languages : en
Pages : 552
Book Description
The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems? In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization. This book is divided into four sections: Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE) Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems Management—Explore Google's best practices for training, communication, and meetings that your organization can use
Site Reliability Engineering
Author: Niall Richard Murphy
Publisher: "O'Reilly Media, Inc."
ISBN: 1491951176
Category :
Languages : en
Pages : 552
Book Description
The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems? In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization. This book is divided into four sections: Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE) Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems Management—Explore Google's best practices for training, communication, and meetings that your organization can use
Publisher: "O'Reilly Media, Inc."
ISBN: 1491951176
Category :
Languages : en
Pages : 552
Book Description
The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems? In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization. This book is divided into four sections: Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE) Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems Management—Explore Google's best practices for training, communication, and meetings that your organization can use
Reliability Engineering in Systems Design and Operation
Author: Balbir S. Dhillon
Publisher: New York ; Toronto : Van Nostrand Reinhold
ISBN:
Category : Technology & Engineering
Languages : en
Pages : 350
Book Description
Good,No Highlights,No Markup,all pages are intact, Slight Shelfwear,may have the corners slightly dented, may have slight color changes/slightly damaged spine.
Publisher: New York ; Toronto : Van Nostrand Reinhold
ISBN:
Category : Technology & Engineering
Languages : en
Pages : 350
Book Description
Good,No Highlights,No Markup,all pages are intact, Slight Shelfwear,may have the corners slightly dented, may have slight color changes/slightly damaged spine.
Database Reliability Engineering
Author: Laine Campbell
Publisher: "O'Reilly Media, Inc."
ISBN: 149192621X
Category : Computers
Languages : en
Pages : 309
Book Description
The infrastructure-as-code revolution in IT is also affecting database administration. With this practical book, developers, system administrators, and junior to mid-level DBAs will learn how the modern practice of site reliability engineering applies to the craft of database architecture and operations. Authors Laine Campbell and Charity Majors provide a framework for professionals looking to join the ranks of today’s database reliability engineers (DBRE). You’ll begin by exploring core operational concepts that DBREs need to master. Then you’ll examine a wide range of database persistence options, including how to implement key technologies to provide resilient, scalable, and performant data storage and retrieval. With a firm foundation in database reliability engineering, you’ll be ready to dive into the architecture and operations of any modern database. This book covers: Service-level requirements and risk management Building and evolving an architecture for operational visibility Infrastructure engineering and infrastructure management How to facilitate the release management process Data storage, indexing, and replication Identifying datastore characteristics and best use cases Datastore architectural components and data-driven architectures
Publisher: "O'Reilly Media, Inc."
ISBN: 149192621X
Category : Computers
Languages : en
Pages : 309
Book Description
The infrastructure-as-code revolution in IT is also affecting database administration. With this practical book, developers, system administrators, and junior to mid-level DBAs will learn how the modern practice of site reliability engineering applies to the craft of database architecture and operations. Authors Laine Campbell and Charity Majors provide a framework for professionals looking to join the ranks of today’s database reliability engineers (DBRE). You’ll begin by exploring core operational concepts that DBREs need to master. Then you’ll examine a wide range of database persistence options, including how to implement key technologies to provide resilient, scalable, and performant data storage and retrieval. With a firm foundation in database reliability engineering, you’ll be ready to dive into the architecture and operations of any modern database. This book covers: Service-level requirements and risk management Building and evolving an architecture for operational visibility Infrastructure engineering and infrastructure management How to facilitate the release management process Data storage, indexing, and replication Identifying datastore characteristics and best use cases Datastore architectural components and data-driven architectures
Reliability Growth
Author: Panel on Reliability Growth Methods for Defense Systems
Publisher: National Academy Press
ISBN: 9780309314749
Category : Technology & Engineering
Languages : en
Pages : 235
Book Description
A high percentage of defense systems fail to meet their reliability requirements. This is a serious problem for the U.S. Department of Defense (DOD), as well as the nation. Those systems are not only less likely to successfully carry out their intended missions, but they also could endanger the lives of the operators. Furthermore, reliability failures discovered after deployment can result in costly and strategic delays and the need for expensive redesign, which often limits the tactical situations in which the system can be used. Finally, systems that fail to meet their reliability requirements are much more likely to need additional scheduled and unscheduled maintenance and to need more spare parts and possibly replacement systems, all of which can substantially increase the life-cycle costs of a system. Beginning in 2008, DOD undertook a concerted effort to raise the priority of reliability through greater use of design for reliability techniques, reliability growth testing, and formal reliability growth modeling, by both the contractors and DOD units. To this end, handbooks, guidances, and formal memoranda were revised or newly issued to reduce the frequency of reliability deficiencies for defense systems in operational testing and the effects of those deficiencies. "Reliability Growth" evaluates these recent changes and, more generally, assesses how current DOD principles and practices could be modified to increase the likelihood that defense systems will satisfy their reliability requirements. This report examines changes to the reliability requirements for proposed systems; defines modern design and testing for reliability; discusses the contractor's role in reliability testing; and summarizes the current state of formal reliability growth modeling. The recommendations of "Reliability Growth" will improve the reliability of defense systems and protect the health of the valuable personnel who operate them.
Publisher: National Academy Press
ISBN: 9780309314749
Category : Technology & Engineering
Languages : en
Pages : 235
Book Description
A high percentage of defense systems fail to meet their reliability requirements. This is a serious problem for the U.S. Department of Defense (DOD), as well as the nation. Those systems are not only less likely to successfully carry out their intended missions, but they also could endanger the lives of the operators. Furthermore, reliability failures discovered after deployment can result in costly and strategic delays and the need for expensive redesign, which often limits the tactical situations in which the system can be used. Finally, systems that fail to meet their reliability requirements are much more likely to need additional scheduled and unscheduled maintenance and to need more spare parts and possibly replacement systems, all of which can substantially increase the life-cycle costs of a system. Beginning in 2008, DOD undertook a concerted effort to raise the priority of reliability through greater use of design for reliability techniques, reliability growth testing, and formal reliability growth modeling, by both the contractors and DOD units. To this end, handbooks, guidances, and formal memoranda were revised or newly issued to reduce the frequency of reliability deficiencies for defense systems in operational testing and the effects of those deficiencies. "Reliability Growth" evaluates these recent changes and, more generally, assesses how current DOD principles and practices could be modified to increase the likelihood that defense systems will satisfy their reliability requirements. This report examines changes to the reliability requirements for proposed systems; defines modern design and testing for reliability; discusses the contractor's role in reliability testing; and summarizes the current state of formal reliability growth modeling. The recommendations of "Reliability Growth" will improve the reliability of defense systems and protect the health of the valuable personnel who operate them.
Building Secure and Reliable Systems
Author: Heather Adkins
Publisher: O'Reilly Media
ISBN: 1492083097
Category : Computers
Languages : en
Pages : 558
Book Description
Can a system be considered truly reliable if it isn't fundamentally secure? Or can it be considered secure if it's unreliable? Security is crucial to the design and operation of scalable systems in production, as it plays an important part in product quality, performance, and availability. In this book, experts from Google share best practices to help your organization design scalable and reliable systems that are fundamentally secure. Two previous O’Reilly books from Google—Site Reliability Engineering and The Site Reliability Workbook—demonstrated how and why a commitment to the entire service lifecycle enables organizations to successfully build, deploy, monitor, and maintain software systems. In this latest guide, the authors offer insights into system design, implementation, and maintenance from practitioners who specialize in security and reliability. They also discuss how building and adopting their recommended best practices requires a culture that’s supportive of such change. You’ll learn about secure and reliable systems through: Design strategies Recommendations for coding, testing, and debugging practices Strategies to prepare for, respond to, and recover from incidents Cultural best practices that help teams across your organization collaborate effectively
Publisher: O'Reilly Media
ISBN: 1492083097
Category : Computers
Languages : en
Pages : 558
Book Description
Can a system be considered truly reliable if it isn't fundamentally secure? Or can it be considered secure if it's unreliable? Security is crucial to the design and operation of scalable systems in production, as it plays an important part in product quality, performance, and availability. In this book, experts from Google share best practices to help your organization design scalable and reliable systems that are fundamentally secure. Two previous O’Reilly books from Google—Site Reliability Engineering and The Site Reliability Workbook—demonstrated how and why a commitment to the entire service lifecycle enables organizations to successfully build, deploy, monitor, and maintain software systems. In this latest guide, the authors offer insights into system design, implementation, and maintenance from practitioners who specialize in security and reliability. They also discuss how building and adopting their recommended best practices requires a culture that’s supportive of such change. You’ll learn about secure and reliable systems through: Design strategies Recommendations for coding, testing, and debugging practices Strategies to prepare for, respond to, and recover from incidents Cultural best practices that help teams across your organization collaborate effectively
The Site Reliability Workbook
Author: Betsy Beyer
Publisher: "O'Reilly Media, Inc."
ISBN: 1492029459
Category : Computers
Languages : en
Pages : 505
Book Description
In 2016, Googleâ??s Site Reliability Engineering book ignited an industry discussion on what it means to run production services todayâ??and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment. This new workbook not only combines practical examples from Googleâ??s experiences, but also provides case studies from Googleâ??s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didnâ??t. Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is. Youâ??ll learn: How to run reliable services in environments you donâ??t completely controlâ??like cloud Practical applications of how to create, monitor, and run your services via Service Level Objectives How to convert existing ops teams to SREâ??including how to dig out of operational overload Methods for starting SRE from either greenfield or brownfield
Publisher: "O'Reilly Media, Inc."
ISBN: 1492029459
Category : Computers
Languages : en
Pages : 505
Book Description
In 2016, Googleâ??s Site Reliability Engineering book ignited an industry discussion on what it means to run production services todayâ??and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment. This new workbook not only combines practical examples from Googleâ??s experiences, but also provides case studies from Googleâ??s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didnâ??t. Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is. Youâ??ll learn: How to run reliable services in environments you donâ??t completely controlâ??like cloud Practical applications of how to create, monitor, and run your services via Service Level Objectives How to convert existing ops teams to SREâ??including how to dig out of operational overload Methods for starting SRE from either greenfield or brownfield
Reliability Engineering
Author: Edgar Bradley
Publisher: CRC Press
ISBN: 149876584X
Category : Technology & Engineering
Languages : en
Pages : 425
Book Description
Reliability Engineering – A Life Cycle Approach is based on the author’s knowledge of systems and their problems from multiple industries, from sophisticated, first class installations to less sophisticated plants often operating under severe budget constraints and yet having to deliver first class availability. Taking a practical approach and drawing from the author’s global academic and work experience, the text covers the basics of reliability engineering, from design through to operation and maintenance. Examples and problems are used to embed the theory, and case studies are integrated to convey real engineering experience and to increase the student’s analytical skills. Additional subjects such as failure analysis, the management of the reliability function, systems engineering skills, project management requirements and basic financial management requirements are covered. Linear programming and financial analysis are presented in the context of justifying maintenance budgets and retrofits. The book presents a stand-alone picture of the reliability engineer’s work over all stages of the system life-cycle, and enables readers to: Understand the life-cycle approach to engineering reliability Explore failure analysis techniques and their importance in reliability engineering Learn the skills of linear programming, financial analysis, and budgeting for maintenance Analyze the application of key concepts through realistic Case Studies This text will equip engineering students, engineers and technical managers with the knowledge and skills they need, and the numerous examples and case studies include provide insight to their real-world application. An Instructor’s Manual and Figure Slides are available for instructors.
Publisher: CRC Press
ISBN: 149876584X
Category : Technology & Engineering
Languages : en
Pages : 425
Book Description
Reliability Engineering – A Life Cycle Approach is based on the author’s knowledge of systems and their problems from multiple industries, from sophisticated, first class installations to less sophisticated plants often operating under severe budget constraints and yet having to deliver first class availability. Taking a practical approach and drawing from the author’s global academic and work experience, the text covers the basics of reliability engineering, from design through to operation and maintenance. Examples and problems are used to embed the theory, and case studies are integrated to convey real engineering experience and to increase the student’s analytical skills. Additional subjects such as failure analysis, the management of the reliability function, systems engineering skills, project management requirements and basic financial management requirements are covered. Linear programming and financial analysis are presented in the context of justifying maintenance budgets and retrofits. The book presents a stand-alone picture of the reliability engineer’s work over all stages of the system life-cycle, and enables readers to: Understand the life-cycle approach to engineering reliability Explore failure analysis techniques and their importance in reliability engineering Learn the skills of linear programming, financial analysis, and budgeting for maintenance Analyze the application of key concepts through realistic Case Studies This text will equip engineering students, engineers and technical managers with the knowledge and skills they need, and the numerous examples and case studies include provide insight to their real-world application. An Instructor’s Manual and Figure Slides are available for instructors.
Engineering a Safer World
Author: Nancy G. Leveson
Publisher: MIT Press
ISBN: 0262297302
Category : Science
Languages : en
Pages : 555
Book Description
A new approach to safety, based on systems thinking, that is more effective, less costly, and easier to use than current techniques. Engineering has experienced a technological revolution, but the basic engineering techniques applied in safety and reliability engineering, created in a simpler, analog world, have changed very little over the years. In this groundbreaking book, Nancy Leveson proposes a new approach to safety—more suited to today's complex, sociotechnical, software-intensive world—based on modern systems thinking and systems theory. Revisiting and updating ideas pioneered by 1950s aerospace engineers in their System Safety concept, and testing her new model extensively on real-world examples, Leveson has created a new approach to safety that is more effective, less expensive, and easier to use than current techniques. Arguing that traditional models of causality are inadequate, Leveson presents a new, extended model of causation (Systems-Theoretic Accident Model and Processes, or STAMP), then shows how the new model can be used to create techniques for system safety engineering, including accident analysis, hazard analysis, system design, safety in operations, and management of safety-critical systems. She applies the new techniques to real-world events including the friendly-fire loss of a U.S. Blackhawk helicopter in the first Gulf War; the Vioxx recall; the U.S. Navy SUBSAFE program; and the bacterial contamination of a public water supply in a Canadian town. Leveson's approach is relevant even beyond safety engineering, offering techniques for “reengineering” any large sociotechnical system to improve safety and manage risk.
Publisher: MIT Press
ISBN: 0262297302
Category : Science
Languages : en
Pages : 555
Book Description
A new approach to safety, based on systems thinking, that is more effective, less costly, and easier to use than current techniques. Engineering has experienced a technological revolution, but the basic engineering techniques applied in safety and reliability engineering, created in a simpler, analog world, have changed very little over the years. In this groundbreaking book, Nancy Leveson proposes a new approach to safety—more suited to today's complex, sociotechnical, software-intensive world—based on modern systems thinking and systems theory. Revisiting and updating ideas pioneered by 1950s aerospace engineers in their System Safety concept, and testing her new model extensively on real-world examples, Leveson has created a new approach to safety that is more effective, less expensive, and easier to use than current techniques. Arguing that traditional models of causality are inadequate, Leveson presents a new, extended model of causation (Systems-Theoretic Accident Model and Processes, or STAMP), then shows how the new model can be used to create techniques for system safety engineering, including accident analysis, hazard analysis, system design, safety in operations, and management of safety-critical systems. She applies the new techniques to real-world events including the friendly-fire loss of a U.S. Blackhawk helicopter in the first Gulf War; the Vioxx recall; the U.S. Navy SUBSAFE program; and the bacterial contamination of a public water supply in a Canadian town. Leveson's approach is relevant even beyond safety engineering, offering techniques for “reengineering” any large sociotechnical system to improve safety and manage risk.
Advances in System Reliability Engineering
Author: Mangey Ram
Publisher: Academic Press
ISBN: 0128162724
Category : Technology & Engineering
Languages : en
Pages : 320
Book Description
Recent Advances in System Reliability Engineering describes and evaluates the latest tools, techniques, strategies, and methods in this topic for a variety of applications. Special emphasis is put on simulation and modelling technology which is growing in influence in industry, and presents challenges as well as opportunities to reliability and systems engineers. Several manufacturing engineering applications are addressed, making this a particularly valuable reference for readers in that sector. - Contains comprehensive discussions on state-of-the-art tools, techniques, and strategies from industry - Connects the latest academic research to applications in industry including system reliability, safety assessment, and preventive maintenance - Gives an in-depth analysis of the benefits and applications of modelling and simulation to reliability
Publisher: Academic Press
ISBN: 0128162724
Category : Technology & Engineering
Languages : en
Pages : 320
Book Description
Recent Advances in System Reliability Engineering describes and evaluates the latest tools, techniques, strategies, and methods in this topic for a variety of applications. Special emphasis is put on simulation and modelling technology which is growing in influence in industry, and presents challenges as well as opportunities to reliability and systems engineers. Several manufacturing engineering applications are addressed, making this a particularly valuable reference for readers in that sector. - Contains comprehensive discussions on state-of-the-art tools, techniques, and strategies from industry - Connects the latest academic research to applications in industry including system reliability, safety assessment, and preventive maintenance - Gives an in-depth analysis of the benefits and applications of modelling and simulation to reliability
Reliability, Maintainability, and Supportability
Author: Michael Tortorella
Publisher: John Wiley & Sons
ISBN: 1118858883
Category : Technology & Engineering
Languages : en
Pages : 464
Book Description
Focuses on the core systems engineering tasks of writing, managing, and tracking requirements for reliability, maintainability, and supportability that are most likely to satisfy customers and lead to success for suppliers This book helps systems engineers lead the development of systems and services whose reliability, maintainability, and supportability meet and exceed the expectations of their customers and promote success and profit for their suppliers. This book is organized into three major parts: reliability, maintainability, and supportability engineering. Within each part, there is material on requirements development, quantitative modelling, statistical analysis, and best practices in each of these areas. Heavy emphasis is placed on correct use of language. The author discusses the use of various sustainability engineering methods and techniques in crafting requirements that are focused on the customers’ needs, unambiguous, easily understood by the requirements’ stakeholders, and verifiable. Part of each major division of the book is devoted to statistical analyses needed to determine when requirements are being met by systems operating in customer environments. To further support systems engineers in writing, analyzing, and interpreting sustainability requirements, this book also Contains “Language Tips” to help systems engineers learn the different languages spoken by specialists and non-specialists in the sustainability disciplines Provides exercises in each chapter, allowing the reader to try out some of the ideas and procedures presented in the chapter Delivers end-of-chapter summaries of the current reliability, maintainability, and supportability engineering best practices for systems engineers Reliability, Maintainability, and Supportability is a reference for systems engineers and graduate students hoping to learn how to effectively determine and develop appropriate requirements so that designers may fulfil the intent of the customer.
Publisher: John Wiley & Sons
ISBN: 1118858883
Category : Technology & Engineering
Languages : en
Pages : 464
Book Description
Focuses on the core systems engineering tasks of writing, managing, and tracking requirements for reliability, maintainability, and supportability that are most likely to satisfy customers and lead to success for suppliers This book helps systems engineers lead the development of systems and services whose reliability, maintainability, and supportability meet and exceed the expectations of their customers and promote success and profit for their suppliers. This book is organized into three major parts: reliability, maintainability, and supportability engineering. Within each part, there is material on requirements development, quantitative modelling, statistical analysis, and best practices in each of these areas. Heavy emphasis is placed on correct use of language. The author discusses the use of various sustainability engineering methods and techniques in crafting requirements that are focused on the customers’ needs, unambiguous, easily understood by the requirements’ stakeholders, and verifiable. Part of each major division of the book is devoted to statistical analyses needed to determine when requirements are being met by systems operating in customer environments. To further support systems engineers in writing, analyzing, and interpreting sustainability requirements, this book also Contains “Language Tips” to help systems engineers learn the different languages spoken by specialists and non-specialists in the sustainability disciplines Provides exercises in each chapter, allowing the reader to try out some of the ideas and procedures presented in the chapter Delivers end-of-chapter summaries of the current reliability, maintainability, and supportability engineering best practices for systems engineers Reliability, Maintainability, and Supportability is a reference for systems engineers and graduate students hoping to learn how to effectively determine and develop appropriate requirements so that designers may fulfil the intent of the customer.