Author: Jiawei Han
Publisher: Elsevier
ISBN: 0123814804
Category : Computers
Languages : en
Pages : 740
Book Description
Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining. - Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects - Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields - Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
Data Mining: Concepts and Techniques
Author: Jiawei Han
Publisher: Elsevier
ISBN: 0123814804
Category : Computers
Languages : en
Pages : 740
Book Description
Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining. - Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects - Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields - Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
Publisher: Elsevier
ISBN: 0123814804
Category : Computers
Languages : en
Pages : 740
Book Description
Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining. - Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects - Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields - Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
Multidimensional Item Response Theory
Author: Wes Bonifay
Publisher: SAGE Publications
ISBN: 1506384234
Category : Social Science
Languages : en
Pages : 105
Book Description
Several decades of psychometric research have led to the development of sophisticated models for multidimensional test data, and in recent years, multidimensional item response theory (MIRT) has become a burgeoning topic in psychological and educational measurement. Considered a cutting-edge statistical technique, the methodology underlying MIRT can be complex, and therefore doesn’t receive much attention in introductory IRT courses. However author Wes Bonifay shows how MIRT can be understood and applied by anyone with a firm grounding in unidimensional IRT modeling. His volume includes practical examples and illustrations, along with numerous figures and diagrams. Multidimensional Item Response Theory includes snippets of R code interspersed throughout the text (with the complete R code included on an accompanying website) to guide readers in exploring MIRT models, estimating the model parameters, generating plots, and implementing the various procedures and applications discussed throughout the book.
Publisher: SAGE Publications
ISBN: 1506384234
Category : Social Science
Languages : en
Pages : 105
Book Description
Several decades of psychometric research have led to the development of sophisticated models for multidimensional test data, and in recent years, multidimensional item response theory (MIRT) has become a burgeoning topic in psychological and educational measurement. Considered a cutting-edge statistical technique, the methodology underlying MIRT can be complex, and therefore doesn’t receive much attention in introductory IRT courses. However author Wes Bonifay shows how MIRT can be understood and applied by anyone with a firm grounding in unidimensional IRT modeling. His volume includes practical examples and illustrations, along with numerous figures and diagrams. Multidimensional Item Response Theory includes snippets of R code interspersed throughout the text (with the complete R code included on an accompanying website) to guide readers in exploring MIRT models, estimating the model parameters, generating plots, and implementing the various procedures and applications discussed throughout the book.
SQL Server 2019 Administrator's Guide
Author: Marek Chmel
Publisher: Packt Publishing Ltd
ISBN: 1789950333
Category : Computers
Languages : en
Pages : 522
Book Description
Use Microsoft SQL Server 2019 to implement, administer, and secure a robust database solution that is disaster-proof and highly available Key FeaturesExplore new features of SQL Server 2019 to set up, administer, and maintain your database solution successfullyDevelop a dynamic SQL Server environment and streamline big data pipelinesDiscover best practices for fixing performance issues, database access management, replication, and securityBook Description SQL Server is one of the most popular relational database management systems developed by Microsoft. This second edition of the SQL Server Administrator's Guide will not only teach you how to administer an enterprise database, but also help you become proficient at managing and keeping the database available, secure, and stable. You’ll start by learning how to set up your SQL Server and configure new and existing environments for optimal use. The book then takes you through designing aspects and delves into performance tuning by showing you how to use indexes effectively. You’ll understand certain choices that need to be made about backups, implement security policy, and discover how to keep your environment healthy. Tools available for monitoring and managing a SQL Server database, including automating health reviews, performance checks, and much more, will also be discussed in detail. As you advance, the book covers essential topics such as migration, upgrading, and consolidation, along with the techniques that will help you when things go wrong. Once you’ve got to grips with integration with Azure and streamlining big data pipelines, you’ll learn best practices from industry experts for maintaining a highly reliable database solution. Whether you are an administrator or are looking to get started with database administration, this SQL Server book will help you develop the skills you need to successfully create, design, and deploy database solutions. What you will learnDiscover SQL Server 2019’s new features and how to implement themFix performance issues by optimizing queries and making use of indexesDesign and use an optimal database management strategyCombine SQL Server 2019 with Azure and manage your solution using various automation techniquesImplement efficient backup and recovery techniques in line with security policiesGet to grips with migrating, upgrading, and consolidating with SQL ServerSet up an AlwaysOn-enabled stable and fast SQL Server 2019 environmentUnderstand how to work with Big Data on SQL Server environmentsWho this book is for This book is for database administrators, database developers, and anyone who wants to administer large and multiple databases single-handedly using Microsoft's SQL Server 2019. Basic awareness of database concepts and experience with previous SQL Server versions is required.
Publisher: Packt Publishing Ltd
ISBN: 1789950333
Category : Computers
Languages : en
Pages : 522
Book Description
Use Microsoft SQL Server 2019 to implement, administer, and secure a robust database solution that is disaster-proof and highly available Key FeaturesExplore new features of SQL Server 2019 to set up, administer, and maintain your database solution successfullyDevelop a dynamic SQL Server environment and streamline big data pipelinesDiscover best practices for fixing performance issues, database access management, replication, and securityBook Description SQL Server is one of the most popular relational database management systems developed by Microsoft. This second edition of the SQL Server Administrator's Guide will not only teach you how to administer an enterprise database, but also help you become proficient at managing and keeping the database available, secure, and stable. You’ll start by learning how to set up your SQL Server and configure new and existing environments for optimal use. The book then takes you through designing aspects and delves into performance tuning by showing you how to use indexes effectively. You’ll understand certain choices that need to be made about backups, implement security policy, and discover how to keep your environment healthy. Tools available for monitoring and managing a SQL Server database, including automating health reviews, performance checks, and much more, will also be discussed in detail. As you advance, the book covers essential topics such as migration, upgrading, and consolidation, along with the techniques that will help you when things go wrong. Once you’ve got to grips with integration with Azure and streamlining big data pipelines, you’ll learn best practices from industry experts for maintaining a highly reliable database solution. Whether you are an administrator or are looking to get started with database administration, this SQL Server book will help you develop the skills you need to successfully create, design, and deploy database solutions. What you will learnDiscover SQL Server 2019’s new features and how to implement themFix performance issues by optimizing queries and making use of indexesDesign and use an optimal database management strategyCombine SQL Server 2019 with Azure and manage your solution using various automation techniquesImplement efficient backup and recovery techniques in line with security policiesGet to grips with migrating, upgrading, and consolidating with SQL ServerSet up an AlwaysOn-enabled stable and fast SQL Server 2019 environmentUnderstand how to work with Big Data on SQL Server environmentsWho this book is for This book is for database administrators, database developers, and anyone who wants to administer large and multiple databases single-handedly using Microsoft's SQL Server 2019. Basic awareness of database concepts and experience with previous SQL Server versions is required.
Elasticsearch: The Definitive Guide
Author: Clinton Gormley
Publisher: "O'Reilly Media, Inc."
ISBN: 1449358500
Category : Computers
Languages : en
Pages : 659
Book Description
Whether you need full-text search or real-time analytics of structured data—or both—the Elasticsearch distributed search engine is an ideal way to put your data to work. This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with the complexities of human language, geolocation, and relationships. If you’re a newcomer to both search and distributed systems, you’ll quickly learn how to integrate Elasticsearch into your application. More experienced users will pick up lots of advanced techniques. Throughout the book, you’ll follow a problem-based approach to learn why, when, and how to use Elasticsearch features. Understand how Elasticsearch interprets data in your documents Index and query your data to take advantage of search concepts such as relevance and word proximity Handle human language through the effective use of analyzers and queries Summarize and group data to show overall trends, with aggregations and analytics Use geo-points and geo-shapes—Elasticsearch’s approaches to geolocation Model your data to take advantage of Elasticsearch’s horizontal scalability Learn how to configure and monitor your cluster in production
Publisher: "O'Reilly Media, Inc."
ISBN: 1449358500
Category : Computers
Languages : en
Pages : 659
Book Description
Whether you need full-text search or real-time analytics of structured data—or both—the Elasticsearch distributed search engine is an ideal way to put your data to work. This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with the complexities of human language, geolocation, and relationships. If you’re a newcomer to both search and distributed systems, you’ll quickly learn how to integrate Elasticsearch into your application. More experienced users will pick up lots of advanced techniques. Throughout the book, you’ll follow a problem-based approach to learn why, when, and how to use Elasticsearch features. Understand how Elasticsearch interprets data in your documents Index and query your data to take advantage of search concepts such as relevance and word proximity Handle human language through the effective use of analyzers and queries Summarize and group data to show overall trends, with aggregations and analytics Use geo-points and geo-shapes—Elasticsearch’s approaches to geolocation Model your data to take advantage of Elasticsearch’s horizontal scalability Learn how to configure and monitor your cluster in production
Mining of Massive Datasets
Author: Jure Leskovec
Publisher: Cambridge University Press
ISBN: 1107077230
Category : Computers
Languages : en
Pages : 480
Book Description
Now in its second edition, this book focuses on practical algorithms for mining data from even the largest datasets.
Publisher: Cambridge University Press
ISBN: 1107077230
Category : Computers
Languages : en
Pages : 480
Book Description
Now in its second edition, this book focuses on practical algorithms for mining data from even the largest datasets.
Building the Data Warehouse
Author: W. H. Inmon
Publisher: John Wiley & Sons
ISBN: 0471270482
Category : Computers
Languages : en
Pages : 434
Book Description
The data warehousing bible updated for the new millennium Updated and expanded to reflect the many technological advances occurring since the previous edition, this latest edition of the data warehousing "bible" provides a comprehensive introduction to building data marts, operational data stores, the Corporate Information Factory, exploration warehouses, and Web-enabled warehouses. Written by the father of the data warehouse concept, the book also reviews the unique requirements for supporting e-business and explores various ways in which the traditional data warehouse can be integrated with new technologies to provide enhanced customer service, sales, and support-both online and offline-including near-line data storage techniques.
Publisher: John Wiley & Sons
ISBN: 0471270482
Category : Computers
Languages : en
Pages : 434
Book Description
The data warehousing bible updated for the new millennium Updated and expanded to reflect the many technological advances occurring since the previous edition, this latest edition of the data warehousing "bible" provides a comprehensive introduction to building data marts, operational data stores, the Corporate Information Factory, exploration warehouses, and Web-enabled warehouses. Written by the father of the data warehouse concept, the book also reviews the unique requirements for supporting e-business and explores various ways in which the traditional data warehouse can be integrated with new technologies to provide enhanced customer service, sales, and support-both online and offline-including near-line data storage techniques.
Collect, Combine, and Transform Data Using Power Query in Excel and Power BI
Author: Gil Raviv
Publisher: Microsoft Press
ISBN: 1509307974
Category : Computers
Languages : en
Pages : 874
Book Description
Using Power Query, you can import, reshape, and cleanse any data from a simple interface, so you can mine that data for all of its hidden insights. Power Query is embedded in Excel, Power BI, and other Microsoft products, and leading Power Query expert Gil Raviv will help you make the most of it. Discover how to eliminate time-consuming manual data preparation, solve common problems, avoid pitfalls, and more. Then, walk through several complete analytics challenges, and integrate all your skills in a realistic chapter-length final project. By the time you’re finished, you’ll be ready to wrangle any data–and transform it into actionable knowledge. Prepare and analyze your data the easy way, with Power Query · Quickly prepare data for analysis with Power Query in Excel (also known as Get & Transform) and in Power BI · Solve common data preparation problems with a few mouse clicks and simple formula edits · Combine data from multiple sources, multiple queries, and mismatched tables · Master basic and advanced techniques for unpivoting tables · Customize transformations and build flexible data mashups with the M formula language · Address collaboration challenges with Power Query · Gain crucial insights into text feeds · Streamline complex social network analytics so you can do it yourself For all information workers, analysts, and any Excel user who wants to solve their own business intelligence problems.
Publisher: Microsoft Press
ISBN: 1509307974
Category : Computers
Languages : en
Pages : 874
Book Description
Using Power Query, you can import, reshape, and cleanse any data from a simple interface, so you can mine that data for all of its hidden insights. Power Query is embedded in Excel, Power BI, and other Microsoft products, and leading Power Query expert Gil Raviv will help you make the most of it. Discover how to eliminate time-consuming manual data preparation, solve common problems, avoid pitfalls, and more. Then, walk through several complete analytics challenges, and integrate all your skills in a realistic chapter-length final project. By the time you’re finished, you’ll be ready to wrangle any data–and transform it into actionable knowledge. Prepare and analyze your data the easy way, with Power Query · Quickly prepare data for analysis with Power Query in Excel (also known as Get & Transform) and in Power BI · Solve common data preparation problems with a few mouse clicks and simple formula edits · Combine data from multiple sources, multiple queries, and mismatched tables · Master basic and advanced techniques for unpivoting tables · Customize transformations and build flexible data mashups with the M formula language · Address collaboration challenges with Power Query · Gain crucial insights into text feeds · Streamline complex social network analytics so you can do it yourself For all information workers, analysts, and any Excel user who wants to solve their own business intelligence problems.
Mastering pandas
Author: Ashish Kumar
Publisher: Packt Publishing Ltd
ISBN: 1789343356
Category : Computers
Languages : en
Pages : 658
Book Description
Perform advanced data manipulation tasks using pandas and become an expert data analyst. Key FeaturesManipulate and analyze your data expertly using the power of pandasWork with missing data and time series data and become a true pandas expertIncludes expert tips and techniques on making your data analysis tasks easierBook Description pandas is a popular Python library used by data scientists and analysts worldwide to manipulate and analyze their data. This book presents useful data manipulation techniques in pandas to perform complex data analysis in various domains. An update to our highly successful previous edition with new features, examples, updated code, and more, this book is an in-depth guide to get the most out of pandas for data analysis. Designed for both intermediate users as well as seasoned practitioners, you will learn advanced data manipulation techniques, such as multi-indexing, modifying data structures, and sampling your data, which allow for powerful analysis and help you gain accurate insights from it. With the help of this book, you will apply pandas to different domains, such as Bayesian statistics, predictive analytics, and time series analysis using an example-based approach. And not just that; you will also learn how to prepare powerful, interactive business reports in pandas using the Jupyter notebook. By the end of this book, you will learn how to perform efficient data analysis using pandas on complex data, and become an expert data analyst or data scientist in the process. What you will learnSpeed up your data analysis by importing data into pandasKeep relevant data points by selecting subsets of your dataCreate a high-quality dataset by cleaning data and fixing missing valuesCompute actionable analytics with grouping and aggregation in pandasMaster time series data analysis in pandasMake powerful reports in pandas using Jupyter notebooksWho this book is for This book is for data scientists, analysts and Python developers who wish to explore advanced data analysis and scientific computing techniques using pandas. Some fundamental understanding of Python programming and familiarity with the basic data analysis concepts is all you need to get started with this book.
Publisher: Packt Publishing Ltd
ISBN: 1789343356
Category : Computers
Languages : en
Pages : 658
Book Description
Perform advanced data manipulation tasks using pandas and become an expert data analyst. Key FeaturesManipulate and analyze your data expertly using the power of pandasWork with missing data and time series data and become a true pandas expertIncludes expert tips and techniques on making your data analysis tasks easierBook Description pandas is a popular Python library used by data scientists and analysts worldwide to manipulate and analyze their data. This book presents useful data manipulation techniques in pandas to perform complex data analysis in various domains. An update to our highly successful previous edition with new features, examples, updated code, and more, this book is an in-depth guide to get the most out of pandas for data analysis. Designed for both intermediate users as well as seasoned practitioners, you will learn advanced data manipulation techniques, such as multi-indexing, modifying data structures, and sampling your data, which allow for powerful analysis and help you gain accurate insights from it. With the help of this book, you will apply pandas to different domains, such as Bayesian statistics, predictive analytics, and time series analysis using an example-based approach. And not just that; you will also learn how to prepare powerful, interactive business reports in pandas using the Jupyter notebook. By the end of this book, you will learn how to perform efficient data analysis using pandas on complex data, and become an expert data analyst or data scientist in the process. What you will learnSpeed up your data analysis by importing data into pandasKeep relevant data points by selecting subsets of your dataCreate a high-quality dataset by cleaning data and fixing missing valuesCompute actionable analytics with grouping and aggregation in pandasMaster time series data analysis in pandasMake powerful reports in pandas using Jupyter notebooksWho this book is for This book is for data scientists, analysts and Python developers who wish to explore advanced data analysis and scientific computing techniques using pandas. Some fundamental understanding of Python programming and familiarity with the basic data analysis concepts is all you need to get started with this book.
Spark: The Definitive Guide
Author: Bill Chambers
Publisher: "O'Reilly Media, Inc."
ISBN: 1491912294
Category : Computers
Languages : en
Pages : 594
Book Description
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. Youâ??ll explore the basic operations and common functions of Sparkâ??s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ??s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasetsâ??Sparkâ??s core APIsâ??through worked examples Dive into Sparkâ??s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Sparkâ??s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation
Publisher: "O'Reilly Media, Inc."
ISBN: 1491912294
Category : Computers
Languages : en
Pages : 594
Book Description
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. Youâ??ll explore the basic operations and common functions of Sparkâ??s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ??s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasetsâ??Sparkâ??s core APIsâ??through worked examples Dive into Sparkâ??s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Sparkâ??s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation
Data Warehousing in the Age of Big Data
Author: Krish Krishnan
Publisher: Newnes
ISBN: 0124059201
Category : Computers
Languages : en
Pages : 371
Book Description
Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing data warehouse. As Big Data continues to revolutionize how we use data, it doesn't have to create more confusion. Expert author Krish Krishnan helps you make sense of how Big Data fits into the world of data warehousing in clear and concise detail. The book is presented in three distinct parts. Part 1 discusses Big Data, its technologies and use cases from early adopters. Part 2 addresses data warehousing, its shortcomings, and new architecture options, workloads, and integration techniques for Big Data and the data warehouse. Part 3 deals with data governance, data visualization, information life-cycle management, data scientists, and implementing a Big Data–ready data warehouse. Extensive appendixes include case studies from vendor implementations and a special segment on how we can build a healthcare information factory. Ultimately, this book will help you navigate through the complex layers of Big Data and data warehousing while providing you information on how to effectively think about using all these technologies and the architectures to design the next-generation data warehouse. - Learn how to leverage Big Data by effectively integrating it into your data warehouse. - Includes real-world examples and use cases that clearly demonstrate Hadoop, NoSQL, HBASE, Hive, and other Big Data technologies - Understand how to optimize and tune your current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements
Publisher: Newnes
ISBN: 0124059201
Category : Computers
Languages : en
Pages : 371
Book Description
Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing data warehouse. As Big Data continues to revolutionize how we use data, it doesn't have to create more confusion. Expert author Krish Krishnan helps you make sense of how Big Data fits into the world of data warehousing in clear and concise detail. The book is presented in three distinct parts. Part 1 discusses Big Data, its technologies and use cases from early adopters. Part 2 addresses data warehousing, its shortcomings, and new architecture options, workloads, and integration techniques for Big Data and the data warehouse. Part 3 deals with data governance, data visualization, information life-cycle management, data scientists, and implementing a Big Data–ready data warehouse. Extensive appendixes include case studies from vendor implementations and a special segment on how we can build a healthcare information factory. Ultimately, this book will help you navigate through the complex layers of Big Data and data warehousing while providing you information on how to effectively think about using all these technologies and the architectures to design the next-generation data warehouse. - Learn how to leverage Big Data by effectively integrating it into your data warehouse. - Includes real-world examples and use cases that clearly demonstrate Hadoop, NoSQL, HBASE, Hive, and other Big Data technologies - Understand how to optimize and tune your current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements