Author: Andy Petrella
Publisher: "O'Reilly Media, Inc."
ISBN: 1098133269
Category : Computers
Languages : en
Pages : 267
Book Description
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work. Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. Learn the core principles and benefits of data observability Use data observability to detect, troubleshoot, and prevent data issues Follow the book's recipes to implement observability in your data projects Use data observability to create a trustworthy communication framework with data consumers Learn how to educate your peers about the benefits of data observability
Fundamentals of Data Observability
Author: Andy Petrella
Publisher: "O'Reilly Media, Inc."
ISBN: 1098133269
Category : Computers
Languages : en
Pages : 267
Book Description
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work. Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. Learn the core principles and benefits of data observability Use data observability to detect, troubleshoot, and prevent data issues Follow the book's recipes to implement observability in your data projects Use data observability to create a trustworthy communication framework with data consumers Learn how to educate your peers about the benefits of data observability
Publisher: "O'Reilly Media, Inc."
ISBN: 1098133269
Category : Computers
Languages : en
Pages : 267
Book Description
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work. Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. Learn the core principles and benefits of data observability Use data observability to detect, troubleshoot, and prevent data issues Follow the book's recipes to implement observability in your data projects Use data observability to create a trustworthy communication framework with data consumers Learn how to educate your peers about the benefits of data observability
Fundamentals of Data Observability
Author: Andy Petrella
Publisher: O'Reilly Media
ISBN: 9781098133290
Category : Computers
Languages : en
Pages : 0
Book Description
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enable data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer, or if the quality of your work depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work. Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. Learn the core principles and benefits of data observability Use data observability to detect, troubleshoot, and prevent data issues Follow the book�¢??s recipes to implement observability in your data projects Use data observability to create a trustable communication framework with data consumers Learn how to educate your peers about the benefits of data observability
Publisher: O'Reilly Media
ISBN: 9781098133290
Category : Computers
Languages : en
Pages : 0
Book Description
Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enable data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer, or if the quality of your work depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work. Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. Learn the core principles and benefits of data observability Use data observability to detect, troubleshoot, and prevent data issues Follow the book�¢??s recipes to implement observability in your data projects Use data observability to create a trustable communication framework with data consumers Learn how to educate your peers about the benefits of data observability
Fundamentals of Data Engineering
Author: Joe Reis
Publisher: "O'Reilly Media, Inc."
ISBN: 1098108272
Category : Computers
Languages : en
Pages : 446
Book Description
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology. This book will help you: Get a concise overview of the entire data engineering landscape Assess data engineering problems using an end-to-end framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle
Publisher: "O'Reilly Media, Inc."
ISBN: 1098108272
Category : Computers
Languages : en
Pages : 446
Book Description
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology. This book will help you: Get a concise overview of the entire data engineering landscape Assess data engineering problems using an end-to-end framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle
Data Observability for Data Engineering
Author: Michele Pinto
Publisher: Packt Publishing Ltd
ISBN: 180461209X
Category : Computers
Languages : en
Pages : 228
Book Description
Discover actionable steps to maintain healthy data pipelines to promote data observability within your teams with this essential guide to elevating data engineering practices Key Features Learn how to monitor your data pipelines in a scalable way Apply real-life use cases and projects to gain hands-on experience in implementing data observability Instil trust in your pipelines among data producers and consumers alike Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionIn the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization. This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization. Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.What you will learn Implement a data observability approach to enhance the quality of data pipelines Collect and analyze key metrics through coding examples Apply monkey patching in a Python module Manage the costs and risks associated with your data pipeline Understand the main techniques for collecting observability metrics Implement monitoring techniques for analytics pipelines in production Build and maintain a statistics engine continuously Who this book is for This book is for data engineers, data architects, data analysts, and data scientists who have encountered issues with broken data pipelines or dashboards. Organizations seeking to adopt data observability practices and managers responsible for data quality and processes will find this book especially useful to increase the confidence of data consumers and raise awareness among producers regarding their data pipelines.
Publisher: Packt Publishing Ltd
ISBN: 180461209X
Category : Computers
Languages : en
Pages : 228
Book Description
Discover actionable steps to maintain healthy data pipelines to promote data observability within your teams with this essential guide to elevating data engineering practices Key Features Learn how to monitor your data pipelines in a scalable way Apply real-life use cases and projects to gain hands-on experience in implementing data observability Instil trust in your pipelines among data producers and consumers alike Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionIn the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization. This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization. Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.What you will learn Implement a data observability approach to enhance the quality of data pipelines Collect and analyze key metrics through coding examples Apply monkey patching in a Python module Manage the costs and risks associated with your data pipeline Understand the main techniques for collecting observability metrics Implement monitoring techniques for analytics pipelines in production Build and maintain a statistics engine continuously Who this book is for This book is for data engineers, data architects, data analysts, and data scientists who have encountered issues with broken data pipelines or dashboards. Organizations seeking to adopt data observability practices and managers responsible for data quality and processes will find this book especially useful to increase the confidence of data consumers and raise awareness among producers regarding their data pipelines.
Data Quality Fundamentals
Author: Barr Moses
Publisher: "O'Reilly Media, Inc."
ISBN: 1098112016
Category : Computers
Languages : en
Pages : 311
Book Description
Do your product dashboards look funky? Are your quarterly reports stale? Is the data set you're using broken or just plain wrong? These problems affect almost every team, yet they're usually addressed on an ad hoc basis and in a reactive manner. If you answered yes to these questions, this book is for you. Many data engineering teams today face the "good pipelines, bad data" problem. It doesn't matter how advanced your data infrastructure is if the data you're piping is bad. In this book, Barr Moses, Lior Gavish, and Molly Vorwerck, from the data observability company Monte Carlo, explain how to tackle data quality and trust at scale by leveraging best practices and technologies used by some of the world's most innovative companies. Build more trustworthy and reliable data pipelines Write scripts to make data checks and identify broken pipelines with data observability Learn how to set and maintain data SLAs, SLIs, and SLOs Develop and lead data quality initiatives at your company Learn how to treat data services and systems with the diligence of production software Automate data lineage graphs across your data ecosystem Build anomaly detectors for your critical data assets
Publisher: "O'Reilly Media, Inc."
ISBN: 1098112016
Category : Computers
Languages : en
Pages : 311
Book Description
Do your product dashboards look funky? Are your quarterly reports stale? Is the data set you're using broken or just plain wrong? These problems affect almost every team, yet they're usually addressed on an ad hoc basis and in a reactive manner. If you answered yes to these questions, this book is for you. Many data engineering teams today face the "good pipelines, bad data" problem. It doesn't matter how advanced your data infrastructure is if the data you're piping is bad. In this book, Barr Moses, Lior Gavish, and Molly Vorwerck, from the data observability company Monte Carlo, explain how to tackle data quality and trust at scale by leveraging best practices and technologies used by some of the world's most innovative companies. Build more trustworthy and reliable data pipelines Write scripts to make data checks and identify broken pipelines with data observability Learn how to set and maintain data SLAs, SLIs, and SLOs Develop and lead data quality initiatives at your company Learn how to treat data services and systems with the diligence of production software Automate data lineage graphs across your data ecosystem Build anomaly detectors for your critical data assets
Data Curious
Author: Carl Allchin
Publisher: "O'Reilly Media, Inc."
ISBN: 1098143809
Category : Business & Economics
Languages : en
Pages : 140
Book Description
Data has been a missing part of most academic curriculums for a long time, and we're all being affected. During challenging times, creating a data-informed culture can help you pivot quickly or prevent expensive missteps. Developing a data curious organization will take advantage of the burgeoning data resources available as a result of increasing digitalization. With this book, author Carl Allchin shows today's business professionals how to become data empowered. These tech-savvy business professionals will learn data literacy fundamentals—from understanding the possibilities to asking the right questions. You'll discover how to make the right technology choices and avoid pitfalls that could put your career and company at risk. Discover what an agile, empowered, data-driven organization should look like Examine how to use data in new ways to help your business come to life Learn key terms and concepts around data management and analytics Understand the differences between spreadsheet analysis and a data analytics pipeline Get advice for working with data scientists and explore ways to mitigate the IT department's concerns
Publisher: "O'Reilly Media, Inc."
ISBN: 1098143809
Category : Business & Economics
Languages : en
Pages : 140
Book Description
Data has been a missing part of most academic curriculums for a long time, and we're all being affected. During challenging times, creating a data-informed culture can help you pivot quickly or prevent expensive missteps. Developing a data curious organization will take advantage of the burgeoning data resources available as a result of increasing digitalization. With this book, author Carl Allchin shows today's business professionals how to become data empowered. These tech-savvy business professionals will learn data literacy fundamentals—from understanding the possibilities to asking the right questions. You'll discover how to make the right technology choices and avoid pitfalls that could put your career and company at risk. Discover what an agile, empowered, data-driven organization should look like Examine how to use data in new ways to help your business come to life Learn key terms and concepts around data management and analytics Understand the differences between spreadsheet analysis and a data analytics pipeline Get advice for working with data scientists and explore ways to mitigate the IT department's concerns
Fundamentals of Analytics Engineering
Author: Dumky De Wilde
Publisher: Packt Publishing Ltd
ISBN: 1837632111
Category : Computers
Languages : en
Pages : 332
Book Description
Gain a holistic understanding of the analytics engineering lifecycle by integrating principles from both data analysis and engineering Key Features Discover how analytics engineering aligns with your organization's data strategy Access insights shared by a team of seven industry experts Tackle common analytics engineering problems faced by modern businesses Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionWritten by a team of 7 industry experts, Fundamentals of Analytics Engineering will introduce you to everything from foundational concepts to advanced skills to get started as an analytics engineer. After conquering data ingestion and techniques for data quality and scalability, you’ll learn about techniques such as data cleaning transformation, data modeling, SQL query optimization and reuse, and serving data across different platforms. Armed with this knowledge, you will implement a simple data platform from ingestion to visualization, using tools like Airbyte Cloud, Google BigQuery, dbt, and Tableau. You’ll also get to grips with strategies for data integrity with a focus on data quality and observability, along with collaborative coding practices like version control with Git. You’ll learn about advanced principles like CI/CD, automating workflows, gathering, scoping, and documenting business requirements, as well as data governance. By the end of this book, you’ll be armed with the essential techniques and best practices for developing scalable analytics solutions from end to end.What you will learn Design and implement data pipelines from ingestion to serving data Explore best practices for data modeling and schema design Scale data processing with cloud based analytics platforms and tools Understand the principles of data quality management and data governance Streamline code base with best practices like collaborative coding, version control, reviews and standards Automate and orchestrate data pipelines Drive business adoption with effective scoping and prioritization of analytics use cases Who this book is for This book is for data engineers and data analysts considering pivoting their careers into analytics engineering. Analytics engineers who want to upskill and search for gaps in their knowledge will also find this book helpful, as will other data professionals who want to understand the value of analytics engineering in their organization's journey toward data maturity. To get the most out of this book, you should have a basic understanding of data analysis and engineering concepts such as data cleaning, visualization, ETL and data warehousing.
Publisher: Packt Publishing Ltd
ISBN: 1837632111
Category : Computers
Languages : en
Pages : 332
Book Description
Gain a holistic understanding of the analytics engineering lifecycle by integrating principles from both data analysis and engineering Key Features Discover how analytics engineering aligns with your organization's data strategy Access insights shared by a team of seven industry experts Tackle common analytics engineering problems faced by modern businesses Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionWritten by a team of 7 industry experts, Fundamentals of Analytics Engineering will introduce you to everything from foundational concepts to advanced skills to get started as an analytics engineer. After conquering data ingestion and techniques for data quality and scalability, you’ll learn about techniques such as data cleaning transformation, data modeling, SQL query optimization and reuse, and serving data across different platforms. Armed with this knowledge, you will implement a simple data platform from ingestion to visualization, using tools like Airbyte Cloud, Google BigQuery, dbt, and Tableau. You’ll also get to grips with strategies for data integrity with a focus on data quality and observability, along with collaborative coding practices like version control with Git. You’ll learn about advanced principles like CI/CD, automating workflows, gathering, scoping, and documenting business requirements, as well as data governance. By the end of this book, you’ll be armed with the essential techniques and best practices for developing scalable analytics solutions from end to end.What you will learn Design and implement data pipelines from ingestion to serving data Explore best practices for data modeling and schema design Scale data processing with cloud based analytics platforms and tools Understand the principles of data quality management and data governance Streamline code base with best practices like collaborative coding, version control, reviews and standards Automate and orchestrate data pipelines Drive business adoption with effective scoping and prioritization of analytics use cases Who this book is for This book is for data engineers and data analysts considering pivoting their careers into analytics engineering. Analytics engineers who want to upskill and search for gaps in their knowledge will also find this book helpful, as will other data professionals who want to understand the value of analytics engineering in their organization's journey toward data maturity. To get the most out of this book, you should have a basic understanding of data analysis and engineering concepts such as data cleaning, visualization, ETL and data warehousing.
Implementing Data Mesh
Author: Jean-Georges Perrin
Publisher: "O'Reilly Media, Inc."
ISBN: 1098156226
Category : Computers
Languages : en
Pages : 268
Book Description
As data continues to grow and become more complex, organizations seek innovative solutions to manage their data effectively. Data Mesh is one solution that provides a new approach to managing data in complex organizations. This practical guide offers step-by-step guidance on how to implement data mesh in your organization. In this book, Jean-Georges Perrin and Eric Broda focus on the key components of data mesh and provide practical advice supported by code. You'll explore a simple and intuitive process for identifying key data mesh components and data products, and learn about a consistent set of interfaces and access methods that make data products easy to consume. This approach ensures that your data products are easily accessible and the data mesh ecosystem is easy to navigate. With this book, you'll learn how to: Identify, define, and build data products that interoperate within an enterprise data mesh Build a data mesh fabric that binds data products together Build and deploy data products in a data mesh Establish the organizational structure to operate data products, data platforms, and data fabric Learn an innovative architecture that brings data products and data fabric together into the data mesh About the authors: Jean-Georges "JG" Perrin is a technology leader focusing on building innovative and modern data platforms. Eric Broda is a technology executive, practitioner, and founder of a boutique consulting firm that helps global enterprises realize value from data.
Publisher: "O'Reilly Media, Inc."
ISBN: 1098156226
Category : Computers
Languages : en
Pages : 268
Book Description
As data continues to grow and become more complex, organizations seek innovative solutions to manage their data effectively. Data Mesh is one solution that provides a new approach to managing data in complex organizations. This practical guide offers step-by-step guidance on how to implement data mesh in your organization. In this book, Jean-Georges Perrin and Eric Broda focus on the key components of data mesh and provide practical advice supported by code. You'll explore a simple and intuitive process for identifying key data mesh components and data products, and learn about a consistent set of interfaces and access methods that make data products easy to consume. This approach ensures that your data products are easily accessible and the data mesh ecosystem is easy to navigate. With this book, you'll learn how to: Identify, define, and build data products that interoperate within an enterprise data mesh Build a data mesh fabric that binds data products together Build and deploy data products in a data mesh Establish the organizational structure to operate data products, data platforms, and data fabric Learn an innovative architecture that brings data products and data fabric together into the data mesh About the authors: Jean-Georges "JG" Perrin is a technology leader focusing on building innovative and modern data platforms. Eric Broda is a technology executive, practitioner, and founder of a boutique consulting firm that helps global enterprises realize value from data.
Data Quality in the Age of AI
Author: Andrew Jones
Publisher: Packt Publishing Ltd
ISBN: 1835088562
Category : Computers
Languages : en
Pages : 50
Book Description
Unlock the power of data with expert insights to enhance data quality, maximizing the potential of AI, and establishing a data-centric culture Key Features Gain a profound understanding of the interplay between data quality and AI Explore strategies to improve data quality with practical implementation and real-world results Acquire the skills to measure and evaluate data quality, empowering data-driven decisions Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionAs organizations worldwide seek to revamp their data strategies to leverage AI advancements and benefit from newfound capabilities, data quality emerges as the cornerstone for success. Without high-quality data, even the most advanced AI models falter. Enter Data Quality in the Age of AI, a detailed report that illuminates the crucial role of data quality in shaping effective data strategies. Packed with actionable insights, this report highlights the critical role of data quality in your overall data strategy. It equips teams and organizations with the knowledge and tools to thrive in the evolving AI landscape, serving as a roadmap for harnessing the power of data quality, enabling them to unlock their data's full potential, leading to improved performance, reduced costs, increased revenue, and informed strategic decisions.What you will learn Discover actionable steps to establish data quality as the foundation of your data culture Enhance data quality directly at its source with effective strategies and best practices Elevate data quality standards and enhance data literacy within your organization Identify and measure data quality within the dataset Adopt a product mindset to address data quality challenges Explore emerging architectural patterns like data mesh and data contracts Assign roles, responsibilities, and incentives for data generators Gain insights from real-world case studies Who this book is for This report is for data leaders and decision-makers, including CTOs, CIOs, CISOs, CPOs, and CEOs responsible for shaping their organization's data strategy to maximize data value, especially those interested in harnessing recent AI advancements.
Publisher: Packt Publishing Ltd
ISBN: 1835088562
Category : Computers
Languages : en
Pages : 50
Book Description
Unlock the power of data with expert insights to enhance data quality, maximizing the potential of AI, and establishing a data-centric culture Key Features Gain a profound understanding of the interplay between data quality and AI Explore strategies to improve data quality with practical implementation and real-world results Acquire the skills to measure and evaluate data quality, empowering data-driven decisions Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionAs organizations worldwide seek to revamp their data strategies to leverage AI advancements and benefit from newfound capabilities, data quality emerges as the cornerstone for success. Without high-quality data, even the most advanced AI models falter. Enter Data Quality in the Age of AI, a detailed report that illuminates the crucial role of data quality in shaping effective data strategies. Packed with actionable insights, this report highlights the critical role of data quality in your overall data strategy. It equips teams and organizations with the knowledge and tools to thrive in the evolving AI landscape, serving as a roadmap for harnessing the power of data quality, enabling them to unlock their data's full potential, leading to improved performance, reduced costs, increased revenue, and informed strategic decisions.What you will learn Discover actionable steps to establish data quality as the foundation of your data culture Enhance data quality directly at its source with effective strategies and best practices Elevate data quality standards and enhance data literacy within your organization Identify and measure data quality within the dataset Adopt a product mindset to address data quality challenges Explore emerging architectural patterns like data mesh and data contracts Assign roles, responsibilities, and incentives for data generators Gain insights from real-world case studies Who this book is for This report is for data leaders and decision-makers, including CTOs, CIOs, CISOs, CPOs, and CEOs responsible for shaping their organization's data strategy to maximize data value, especially those interested in harnessing recent AI advancements.
Scaling Machine Learning with Spark
Author: Adi Polak
Publisher: "O'Reilly Media, Inc."
ISBN: 1098106776
Category : Computers
Languages : en
Pages : 323
Book Description
Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better. Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology. You will: Explore machine learning, including distributed computing concepts and terminology Manage the ML lifecycle with MLflow Ingest data and perform basic preprocessing with Spark Explore feature engineering, and use Spark to extract features Train a model with MLlib and build a pipeline to reproduce it Build a data system to combine the power of Spark with deep learning Get a step-by-step example of working with distributed TensorFlow Use PyTorch to scale machine learning and its internal architecture
Publisher: "O'Reilly Media, Inc."
ISBN: 1098106776
Category : Computers
Languages : en
Pages : 323
Book Description
Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better. Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology. You will: Explore machine learning, including distributed computing concepts and terminology Manage the ML lifecycle with MLflow Ingest data and perform basic preprocessing with Spark Explore feature engineering, and use Spark to extract features Train a model with MLlib and build a pipeline to reproduce it Build a data system to combine the power of Spark with deep learning Get a step-by-step example of working with distributed TensorFlow Use PyTorch to scale machine learning and its internal architecture