Author: Benjamin Bengfort
Publisher: "O'Reilly Media, Inc."
ISBN: 1491913762
Category : Computers
Languages : en
Pages : 288
Book Description
Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus on particular analyses you can build, the data warehousing techniques that Hadoop provides, and higher order data workflows this framework can produce. Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical processes and data systems available to build and empower data products that can handle—and actually require—huge amounts of data. Understand core concepts behind Hadoop and cluster computing Use design patterns and parallel analytical algorithms to create distributed data analysis jobs Learn about data management, mining, and warehousing in a distributed context using Apache Hive and HBase Use Sqoop and Apache Flume to ingest data from relational databases Program complex Hadoop and Spark applications with Apache Pig and Spark DataFrames Perform machine learning techniques such as classification, clustering, and collaborative filtering with Spark’s MLlib
Data Analytics with Hadoop
Author: Benjamin Bengfort
Publisher: "O'Reilly Media, Inc."
ISBN: 1491913762
Category : Computers
Languages : en
Pages : 288
Book Description
Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus on particular analyses you can build, the data warehousing techniques that Hadoop provides, and higher order data workflows this framework can produce. Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical processes and data systems available to build and empower data products that can handle—and actually require—huge amounts of data. Understand core concepts behind Hadoop and cluster computing Use design patterns and parallel analytical algorithms to create distributed data analysis jobs Learn about data management, mining, and warehousing in a distributed context using Apache Hive and HBase Use Sqoop and Apache Flume to ingest data from relational databases Program complex Hadoop and Spark applications with Apache Pig and Spark DataFrames Perform machine learning techniques such as classification, clustering, and collaborative filtering with Spark’s MLlib
Publisher: "O'Reilly Media, Inc."
ISBN: 1491913762
Category : Computers
Languages : en
Pages : 288
Book Description
Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus on particular analyses you can build, the data warehousing techniques that Hadoop provides, and higher order data workflows this framework can produce. Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical processes and data systems available to build and empower data products that can handle—and actually require—huge amounts of data. Understand core concepts behind Hadoop and cluster computing Use design patterns and parallel analytical algorithms to create distributed data analysis jobs Learn about data management, mining, and warehousing in a distributed context using Apache Hive and HBase Use Sqoop and Apache Flume to ingest data from relational databases Program complex Hadoop and Spark applications with Apache Pig and Spark DataFrames Perform machine learning techniques such as classification, clustering, and collaborative filtering with Spark’s MLlib
Data Mesh
Author: Zhamak Dehghani
Publisher: "O'Reilly Media, Inc."
ISBN: 1492092363
Category : Computers
Languages : en
Pages : 387
Book Description
Many enterprises are investing in a next-generation data lake, hoping to democratize data at scale to provide business insights and ultimately make automated intelligent decisions. In this practical book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today's organizations. A distributed data mesh is a better choice. Dehghani guides architects, technical leaders, and decision makers on their journey from monolithic big data architecture to a sociotechnical paradigm that draws from modern distributed architecture. A data mesh considers domains as a first-class concern, applies platform thinking to create self-serve data infrastructure, treats data as a product, and introduces a federated and computational model of data governance. This book shows you why and how. Examine the current data landscape from the perspective of business and organizational needs, environmental challenges, and existing architectures Analyze the landscape's underlying characteristics and failure modes Get a complete introduction to data mesh principles and its constituents Learn how to design a data mesh architecture Move beyond a monolithic data lake to a distributed data mesh.
Publisher: "O'Reilly Media, Inc."
ISBN: 1492092363
Category : Computers
Languages : en
Pages : 387
Book Description
Many enterprises are investing in a next-generation data lake, hoping to democratize data at scale to provide business insights and ultimately make automated intelligent decisions. In this practical book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today's organizations. A distributed data mesh is a better choice. Dehghani guides architects, technical leaders, and decision makers on their journey from monolithic big data architecture to a sociotechnical paradigm that draws from modern distributed architecture. A data mesh considers domains as a first-class concern, applies platform thinking to create self-serve data infrastructure, treats data as a product, and introduces a federated and computational model of data governance. This book shows you why and how. Examine the current data landscape from the perspective of business and organizational needs, environmental challenges, and existing architectures Analyze the landscape's underlying characteristics and failure modes Get a complete introduction to data mesh principles and its constituents Learn how to design a data mesh architecture Move beyond a monolithic data lake to a distributed data mesh.
Designing Great Data Products
Author: Jeremy Howard
Publisher: "O'Reilly Media, Inc."
ISBN: 1449333680
Category : Computers
Languages : en
Pages : 25
Book Description
In the past few years, we’ve seen many data products based on predictive modeling. These products range from weather forecasting to recommendation engines like Amazon's. Prediction technology can be interesting and mathematically elegant, but we need to take the next step: going from recommendations to products that can produce optimal strategies for meeting concrete business objectives. We already know how to build these products: they've been in use for the past decade or so, but they're not as common as they should be. This report shows how to take the next step: to go from simple predictions and recommendations to a new generation of data products with the potential to revolutionize entire industries.
Publisher: "O'Reilly Media, Inc."
ISBN: 1449333680
Category : Computers
Languages : en
Pages : 25
Book Description
In the past few years, we’ve seen many data products based on predictive modeling. These products range from weather forecasting to recommendation engines like Amazon's. Prediction technology can be interesting and mathematically elegant, but we need to take the next step: going from recommendations to products that can produce optimal strategies for meeting concrete business objectives. We already know how to build these products: they've been in use for the past decade or so, but they're not as common as they should be. This report shows how to take the next step: to go from simple predictions and recommendations to a new generation of data products with the potential to revolutionize entire industries.
Data Privacy
Author: Nishant Bhajaria
Publisher: Simon and Schuster
ISBN: 1638357188
Category : Computers
Languages : en
Pages : 632
Book Description
Engineer privacy into your systems with these hands-on techniques for data governance, legal compliance, and surviving security audits. In Data Privacy you will learn how to: Classify data based on privacy risk Build technical tools to catalog and discover data in your systems Share data with technical privacy controls to measure reidentification risk Implement technical privacy architectures to delete data Set up technical capabilities for data export to meet legal requirements like Data Subject Asset Requests (DSAR) Establish a technical privacy review process to help accelerate the legal Privacy Impact Assessment (PIA) Design a Consent Management Platform (CMP) to capture user consent Implement security tooling to help optimize privacy Build a holistic program that will get support and funding from the C-Level and board Data Privacy teaches you to design, develop, and measure the effectiveness of privacy programs. You’ll learn from author Nishant Bhajaria, an industry-renowned expert who has overseen privacy at Google, Netflix, and Uber. The terminology and legal requirements of privacy are all explained in clear, jargon-free language. The book’s constant awareness of business requirements will help you balance trade-offs, and ensure your user’s privacy can be improved without spiraling time and resource costs. About the technology Data privacy is essential for any business. Data breaches, vague policies, and poor communication all erode a user’s trust in your applications. You may also face substantial legal consequences for failing to protect user data. Fortunately, there are clear practices and guidelines to keep your data secure and your users happy. About the book Data Privacy: A runbook for engineers teaches you how to navigate the trade-off s between strict data security and real world business needs. In this practical book, you’ll learn how to design and implement privacy programs that are easy to scale and automate. There’s no bureaucratic process—just workable solutions and smart repurposing of existing security tools to help set and achieve your privacy goals. What's inside Classify data based on privacy risk Set up capabilities for data export that meet legal requirements Establish a review process to accelerate privacy impact assessment Design a consent management platform to capture user consent About the reader For engineers and business leaders looking to deliver better privacy. About the author Nishant Bhajaria leads the Technical Privacy and Strategy teams for Uber. His previous roles include head of privacy engineering at Netflix, and data security and privacy at Google. Table of Contents PART 1 PRIVACY, DATA, AND YOUR BUSINESS 1 Privacy engineering: Why it’s needed, how to scale it 2 Understanding data and privacy PART 2 A PROACTIVE PRIVACY PROGRAM: DATA GOVERNANCE 3 Data classification 4 Data inventory 5 Data sharing PART 3 BUILDING TOOLS AND PROCESSES 6 The technical privacy review 7 Data deletion 8 Exporting user data: Data Subject Access Requests PART 4 SECURITY, SCALING, AND STAFFING 9 Building a consent management platform 10 Closing security vulnerabilities 11 Scaling, hiring, and considering regulations
Publisher: Simon and Schuster
ISBN: 1638357188
Category : Computers
Languages : en
Pages : 632
Book Description
Engineer privacy into your systems with these hands-on techniques for data governance, legal compliance, and surviving security audits. In Data Privacy you will learn how to: Classify data based on privacy risk Build technical tools to catalog and discover data in your systems Share data with technical privacy controls to measure reidentification risk Implement technical privacy architectures to delete data Set up technical capabilities for data export to meet legal requirements like Data Subject Asset Requests (DSAR) Establish a technical privacy review process to help accelerate the legal Privacy Impact Assessment (PIA) Design a Consent Management Platform (CMP) to capture user consent Implement security tooling to help optimize privacy Build a holistic program that will get support and funding from the C-Level and board Data Privacy teaches you to design, develop, and measure the effectiveness of privacy programs. You’ll learn from author Nishant Bhajaria, an industry-renowned expert who has overseen privacy at Google, Netflix, and Uber. The terminology and legal requirements of privacy are all explained in clear, jargon-free language. The book’s constant awareness of business requirements will help you balance trade-offs, and ensure your user’s privacy can be improved without spiraling time and resource costs. About the technology Data privacy is essential for any business. Data breaches, vague policies, and poor communication all erode a user’s trust in your applications. You may also face substantial legal consequences for failing to protect user data. Fortunately, there are clear practices and guidelines to keep your data secure and your users happy. About the book Data Privacy: A runbook for engineers teaches you how to navigate the trade-off s between strict data security and real world business needs. In this practical book, you’ll learn how to design and implement privacy programs that are easy to scale and automate. There’s no bureaucratic process—just workable solutions and smart repurposing of existing security tools to help set and achieve your privacy goals. What's inside Classify data based on privacy risk Set up capabilities for data export that meet legal requirements Establish a review process to accelerate privacy impact assessment Design a consent management platform to capture user consent About the reader For engineers and business leaders looking to deliver better privacy. About the author Nishant Bhajaria leads the Technical Privacy and Strategy teams for Uber. His previous roles include head of privacy engineering at Netflix, and data security and privacy at Google. Table of Contents PART 1 PRIVACY, DATA, AND YOUR BUSINESS 1 Privacy engineering: Why it’s needed, how to scale it 2 Understanding data and privacy PART 2 A PROACTIVE PRIVACY PROGRAM: DATA GOVERNANCE 3 Data classification 4 Data inventory 5 Data sharing PART 3 BUILDING TOOLS AND PROCESSES 6 The technical privacy review 7 Data deletion 8 Exporting user data: Data Subject Access Requests PART 4 SECURITY, SCALING, AND STAFFING 9 Building a consent management platform 10 Closing security vulnerabilities 11 Scaling, hiring, and considering regulations
Products and Services from ERS-NASS.
Next-Gen Digital Services. A Retrospective and Roadmap for Service Computing of the Future
Author: Marco Aiello
Publisher: Springer Nature
ISBN: 3030732037
Category : Computers
Languages : en
Pages : 282
Book Description
This book is a festschrift in honour of Mike Papazoglou’s 65th birthday and retirement. It includes 20 contributions from leading researchers who have worked with Mike in his more than 40 years of academic research. Topics are as varied as Mike’s and include service engineering, service management, services and human, IoT, and data-driven services.
Publisher: Springer Nature
ISBN: 3030732037
Category : Computers
Languages : en
Pages : 282
Book Description
This book is a festschrift in honour of Mike Papazoglou’s 65th birthday and retirement. It includes 20 contributions from leading researchers who have worked with Mike in his more than 40 years of academic research. Topics are as varied as Mike’s and include service engineering, service management, services and human, IoT, and data-driven services.
The Data Warehouse Toolkit
Author: Ralph Kimball
Publisher: John Wiley & Sons
ISBN: 1118082141
Category : Computers
Languages : en
Pages : 464
Book Description
This old edition was published in 2002. The current and final edition of this book is The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition which was published in 2013 under ISBN: 9781118530801. The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including: Retail sales and e-commerce Inventory management Procurement Order management Customer relationship management (CRM) Human resources management Accounting Financial services Telecommunications and utilities Education Transportation Health care and insurance By the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.
Publisher: John Wiley & Sons
ISBN: 1118082141
Category : Computers
Languages : en
Pages : 464
Book Description
This old edition was published in 2002. The current and final edition of this book is The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition which was published in 2013 under ISBN: 9781118530801. The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including: Retail sales and e-commerce Inventory management Procurement Order management Customer relationship management (CRM) Human resources management Accounting Financial services Telecommunications and utilities Education Transportation Health care and insurance By the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.
Data Jujitsu
Author: D. J. Patil
Publisher: "O'Reilly Media, Inc."
ISBN: 1449341152
Category : Data mining
Languages : en
Pages : 26
Book Description
Publisher: "O'Reilly Media, Inc."
ISBN: 1449341152
Category : Data mining
Languages : en
Pages : 26
Book Description
The Self-Service Data Roadmap
Author: Sandeep Uttamchandani
Publisher: "O'Reilly Media, Inc."
ISBN: 1492075205
Category : Computers
Languages : en
Pages : 297
Book Description
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw data can still take days or weeks. Most organizations can’t scale data science teams fast enough to keep up with the growing amounts of data to transform. What’s the answer? Self-service data. With this practical book, data engineers, data scientists, and team managers will learn how to build a self-service data science platform that helps anyone in your organization extract insights from data. Sandeep Uttamchandani provides a scorecard to track and address bottlenecks that slow down time to insight across data discovery, transformation, processing, and production. This book bridges the gap between data scientists bottlenecked by engineering realities and data engineers unclear about ways to make self-service work. Build a self-service portal to support data discovery, quality, lineage, and governance Select the best approach for each self-service capability using open source cloud technologies Tailor self-service for the people, processes, and technology maturity of your data platform Implement capabilities to democratize data and reduce time to insight Scale your self-service portal to support a large number of users within your organization
Publisher: "O'Reilly Media, Inc."
ISBN: 1492075205
Category : Computers
Languages : en
Pages : 297
Book Description
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw data can still take days or weeks. Most organizations can’t scale data science teams fast enough to keep up with the growing amounts of data to transform. What’s the answer? Self-service data. With this practical book, data engineers, data scientists, and team managers will learn how to build a self-service data science platform that helps anyone in your organization extract insights from data. Sandeep Uttamchandani provides a scorecard to track and address bottlenecks that slow down time to insight across data discovery, transformation, processing, and production. This book bridges the gap between data scientists bottlenecked by engineering realities and data engineers unclear about ways to make self-service work. Build a self-service portal to support data discovery, quality, lineage, and governance Select the best approach for each self-service capability using open source cloud technologies Tailor self-service for the people, processes, and technology maturity of your data platform Implement capabilities to democratize data and reduce time to insight Scale your self-service portal to support a large number of users within your organization
Law of Raw Data
Author: Jan Bernd Nordemann
Publisher: Kluwer Law International B.V.
ISBN: 9403532815
Category : Law
Languages : en
Pages : 605
Book Description
Data, in its raw or unstructured form, has become an important and valuable economic asset, lending it the sobriquet of ‘the oil of the twenty-first century’. Clearly, as intellectual property, raw data must be legally defined if not somehow protected to ensure that its access and re-use can be subject to legal relations. As legislators struggle to develop a settled legal regime in this complex area, this indispensable handbook will offer a careful and dedicated analysis of the legal instruments and remedies, both existing and potential, that provide such protection across a wide variety of national legal systems. Produced under the auspices of the International Association for the Protection of International Property (AIPPI), more than forty of the association’s specialists from twenty-three countries worldwide contribute national chapters on the relevant law in their respective jurisdictions. The contributions thoroughly explain how each country approaches such crucial matters as the following: if there is any intellectual property right available to protect raw data; the nature of such intellectual property rights that exist in unstructured data; contracts on data and which legal boundaries stand in the way of contract drafting; liability for data products or services; and questions of international private law and cross-border portability. Each country’s rules concerning specific forms of data – such as data embedded in household appliances and consumer goods, criminal offence data, data relating to human genetics, tax and bank secrecy, medical records, and clinical trial data – are described, drawing on legislation, regulation, and case law. A matchless legal resource on one of the most important raw materials of the twenty-first century, this book provides corporate counsel, practitioners and policymakers working in the field of intellectual property rights, and concerned academics with both a broad-based global overview on emerging legal strategies in the protection of unstructured data and the latest information on existing legislation and regulation in the area.
Publisher: Kluwer Law International B.V.
ISBN: 9403532815
Category : Law
Languages : en
Pages : 605
Book Description
Data, in its raw or unstructured form, has become an important and valuable economic asset, lending it the sobriquet of ‘the oil of the twenty-first century’. Clearly, as intellectual property, raw data must be legally defined if not somehow protected to ensure that its access and re-use can be subject to legal relations. As legislators struggle to develop a settled legal regime in this complex area, this indispensable handbook will offer a careful and dedicated analysis of the legal instruments and remedies, both existing and potential, that provide such protection across a wide variety of national legal systems. Produced under the auspices of the International Association for the Protection of International Property (AIPPI), more than forty of the association’s specialists from twenty-three countries worldwide contribute national chapters on the relevant law in their respective jurisdictions. The contributions thoroughly explain how each country approaches such crucial matters as the following: if there is any intellectual property right available to protect raw data; the nature of such intellectual property rights that exist in unstructured data; contracts on data and which legal boundaries stand in the way of contract drafting; liability for data products or services; and questions of international private law and cross-border portability. Each country’s rules concerning specific forms of data – such as data embedded in household appliances and consumer goods, criminal offence data, data relating to human genetics, tax and bank secrecy, medical records, and clinical trial data – are described, drawing on legislation, regulation, and case law. A matchless legal resource on one of the most important raw materials of the twenty-first century, this book provides corporate counsel, practitioners and policymakers working in the field of intellectual property rights, and concerned academics with both a broad-based global overview on emerging legal strategies in the protection of unstructured data and the latest information on existing legislation and regulation in the area.