Trending February 2024 # Applications Of Ai And Big Data Analytics In M # Suggested March 2024 # Top 3 Popular

You are reading the article Applications Of Ai And Big Data Analytics In M updated in February 2024 on the website Minhminhbmm.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Applications Of Ai And Big Data Analytics In M

Have you heard about the idea of monitoring health with the help of mobile devices?

It is related to the term m-Health that makes use of m-Health apps along with AI and Big Data in healthcare. Owing to the surge in the usage of smartphones and other devices, people have started interacting with doctors and hospitals differently. You will realize there is an app for every task right from managing doctor’s appointments to maintaining records.

At this juncture, where every business is fighting hard to appeal to the interests and goals of the customers AI and big data are redefining the healthcare industry. In this blog, we will take a look at the applications of AI and big data and how it has revolutionized the entire healthcare system.

Let’s begin:

AI in Healthcare

AI in healthcare relates to the usage of machine learning algorithms and software to mimic human cognition that aids in analysis, presentation, and understanding of complex data.

Right from detecting links between genetic codes, putting surgical robots to use, or maximizing hospital efficiency, AI is a powerful tool to streamline the healthcare industry. Let’s see what AI has to offer to healthcare:

1. AI Supports Decision Making

Healthcare developers and professionals must consider a crucial piece of information for app development and diagnosis. They go through various complicated unstructured information in medical records. A single mistake can have huge implications.

AI in healthcare makes it convenient for everyone to narrow down the big chunks of information into relevant pieces of information.

It can store and organize these large chunks of information and provide a knowledge database that can, later on, facilitate inspection and analysis to draw meaningful conclusions. This way, it helps clinical decision support, where doctors can rely on it for detecting risk factors.

One such example of AI is IBM’s Watson that predicts heart failure with the help of AI.

2. Chatbots to Prioritize and Enhance Primary Care

People tend to book appointments even at the slightest of medical issues, which often causes chaos and confusion. Later on, there are usually discovered to be issues that could be taken care of by self-treatment. Here AI can be of great use to enable smooth flow and automation that facilitates primary care. It will help doctors to focus more on critical cases.

The best example is medical chatbots that can save you from medical trips to doctors that could be easily avoided. Chatbots when incorporated with smart algorithms can provide patients with instant answers to patient queries and concerns.

Also read: Top 3 Lessons I Learned from Growing a $100K+ Business

3. Robotic Surgeries

A combination of AI in healthcare and collaborative robots has helped achieve desired speed and depth in making delicate incisions. These surgeries have been given the name of robotic surgeries that eliminates the issue of fatigue and helps in lengthy and critical medical procedures.

With the help of AI, one can develop new surgical methods from past operations that will help gain more preciseness. This accuracy and precision will surely reduce accidental movements during the surgeries.

The best example of robotic surgeries is Vicarious Surgical, which combines virtual reality with AI-enabled robots. The purpose of developing such robots is to help surgeons perform minimally invasive operations.

Another great example of AI in robotic surgery is the Heartlander. It is a miniature mobile robot aimed to facilitate heart therapy. The robot is developed by the robotics department at Carnegie Mellon University.

4. Virtual nursing assistants

Virtual nursing assistants are another example of AI in healthcare that can help in providing excellent healthcare services by way of performing a range of tasks. These tasks include addressing patient queries, directing them to the best and effective care unit, monitoring high-risk patients, assisting with admissions and discharge, and surveying patients in real-time. The best part is that you can avail the services of these virtual nurses 24/7 and get instant solutions to your problems.

When you explore the market, you would realize many AI-powered applications of virtual nursing assistants are in use. They help facilitate regular interactions between patients and care providers that save the patients from unnecessary hospital visits. Care Angel is the world’s first virtual nurse assistant that facilitates wellness checks through voice and AI.

5. Accurate Diagnosis of Diseases

AI in healthcare can surpass human efforts and help in the detection, prediction, and diagnosis of diseases quickly and accurately. Have a look at the specialty-level diagnosis, here AI algorithms have proven to be cost-effective in the detection of diseases like diabetic retinopathy.

PathAI is a machine learning technology that helps pathologists in determining the issues with more accuracy. It aims to reduce errors in cancer diagnosis and develop methods for individual medical treatment.

Also read: Best CRM software for 2023

Big Data in Healthcare

Big data in healthcare is essential to handle the risks involved with hospital management that can improve the quality of patient care. Moreover, it can also organize and streamline the activities of the hospital staff. Apart from this, there’s a lot that big data has to offer, let’s see how it can help:

1. Monitoring patient vitals

When it comes to the usage of big data in healthcare, it is helping hospital staff to monitor the records and other vital information about patients and encourages them to work efficiently.

The best example is the usage of sensors besides patient beds that keeps an eye on the patient’s vitals like blood pressure, heartbeat, and respiratory rate. Any change in pattern is quickly recorded and the doctors and healthcare administrators are alerted immediately.

Apart from this, Electronic Health Records (EHRs) are also a part of big data in healthcare that includes critical information about the patients.

It includes medical history, demographics, and results of the lab test, and more. The records consist of at least one modifiable file that can be edited later on by the doctor on noticing any further changes or updates without any danger of data duplication.

2. Streamline the Administration

Big data in healthcare has also helped administrative staff streamline their activities. It helps gain a realistic view of activities in real-time.

They get insights into how resources are used and allocated that will let the administrative staff make substantial actions. They may try to streamline activities like overviewing surgery schedules and coordinate with more precision, cutting down resources wasted, and reduce the cost of care measurement.

It will help the hospital management to provide the best clinical support, and manage the population of at-risk patients. Moreover, doctors and other medical experts can also use big data for proper analysis and identify deviations among patients so that they can receive effective treatments.

3. Big Data for Fraud Prevention

We all know medical billing is prone to errors and waste owing to the complexity of medical procedures and endless options available in healthcare services. These errors may include wrong medical billing codes, false claims, wrong dosage, wrong, medicines, wrong estimation of costs for the healthcare services provided, and more.

Also read: Top 7 Best ECommerce Tools for Online Business

4. Offers Practical Healthcare Data solutions

Hospitals and other administrative staff can store a wide range of data systematically. The data provided is organized and facilitates further analysis.

It may include a healthcare dashboard for the hospitals that give a big picture of things that are going around. Right from the attendance of the hospital staff to the cost incurred on every treatment you have the access to all the crucial aspects.

Doctors and other healthcare practitioners can use the data to draw meaningful conclusions and reach an informed decision.

If we look at the bigger picture the AI and big data are going to have a vital role to play in the healthcare sector. Predictive analysis is one thing that the industry hasn’t explored much, but yes we can see the growth in most mundane areas like patient care, waste management, and inventory.

We all are expecting change and AI and big data will be one of the major forces that will bring that change

You're reading Applications Of Ai And Big Data Analytics In M

Top Big Data Tools Of 2023 For Data Analytics And Business Intelligence

To make the colossal data talk intelligent, enterprises need big data frameworks

While we all would have heard that data is the new oil, the question that grips enterprises is how to mine this valuable oil for its business gains? Data resides in massive warehouses, pipelines and lakes, and to bridge the gap between enterprises and business markets big data frameworks form an imperative channel helping the businesses to rise to the call and move towards a data-driven future. To address the data needs to the future, Analytics Insight compiles the top big data tools of 2023 for Apache Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop is a highly scalable storage platform; it can store and distribute big data sets across hundreds of inexpensive servers. Users can increase the size of our cluster by adding new nodes as per requirement without any downtime.  

MongoDB is the next-generation database helping businesses transform their operations by harnessing the power of data open-source document database and leading NoSQL database. MongoDB’s greatest strength is its robustness, capable of far more flexibility than Hadoop, it is written in C++.  

Pentaho Apache Cassandra is the leading NoSQL, distributed database management system, well suited for hybrid and multi-cloud environments. Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data.  

While we all would have heard that data is the new oil, the question that grips enterprises is how to mine this valuable oil for its business gains? Data resides in massive warehouses, pipelines and lakes, and to bridge the gap between enterprises and business markets big data frameworks form an imperative channel helping the businesses to rise to the call and move towards a data-driven future. To address the data needs to the future, Analytics Insight compiles the top big data tools of 2023 for data analytics and business intelligence-Apache Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop is a highly scalable storage platform; it can store and distribute big data sets across hundreds of inexpensive servers. Users can increase the size of our cluster by adding new nodes as per requirement without any downtime.MongoDB is the next-generation database helping businesses transform their operations by harnessing the power of data open-source document database and leading NoSQL database. MongoDB’s greatest strength is its robustness, capable of far more flexibility than Hadoop, it is written in C++.Pentaho Big Data analytics is a comprehensive, unified solution that supports an enterprise’s entire big data life-cycle. Pentaho Big Data analytics offers full array of analytics solutions that include data access and integration to data visualization and predictive analytics.Apache Cassandra is the leading NoSQL, distributed database management system, well suited for hybrid and multi-cloud environments. Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data.RapidMiner is a software platform for bigdata science teams that unites data prep, machine learning , and predictive model deployment. RapidMiner is free of charge, open source software tool for data and text mining and easily the most powerful and intuitive graphical user interface for the design of analysis processes.

Ai, Data Analytics Can Change The Face Of Healthcare Services In India

Technology can be transformative in healthcare services delivery, improving the quality of life, even where there is a density of doctors is one per thousands of people. Since AI, data analytics, machine learning among others have been transforming care services, India is one of the countries in the world with huge scope to improve medical treatment. With AI, data analytics and all the technology there, treatments can perhaps be done better in India as we go forward. Former Niti Aayog Vice-Chairman Arvind Panagariya said, “India’s health sector is still very much evolving and very informal as it is still largely dominated by the private sector and government’s role largely had been into setting up medical colleges.”  

Tech Opportunity in India

As technologies can bring

Need to Ease Challenges

In India, AI can potentially bound some other technologies, but to be used at any scale, digitalisation is a prerequisite. In many Indian healthcare centres, medical records are still paper registered, and radiology still uses films. Considering other countries, this scenario is changing rapidly.

Technology can be transformative in healthcare services delivery, improving the quality of life, even where there is a density of doctors is one per thousands of people. Since AI, data analytics, machine learning among others have been transforming care services, India is one of the countries in the world with huge scope to improve medical treatment. With AI, data analytics and all the technology there, treatments can perhaps be done better in India as we go forward. Former Niti Aayog Vice-Chairman Arvind Panagariya said, “India’s health sector is still very much evolving and very informal as it is still largely dominated by the private sector and government’s role largely had been into setting up medical colleges.”As technologies can bring healthcare services closer to the community, India can be benefited from an integrated health information system (HIS) across all states. With this system, both doctors and patients will have the access to manage all aspects of healthcare planning, delivery, and monitoring, such as disease observation, patient medical records, planning for human resources, continuing medical education, facility registration, and telemedicine initiatives. Over the last decade, the country has seen rapid diffusion in the internet and smartphones and now it is meeting the requirements for efficient delivery of digital care solutions. The interest for innovation from governments made technology is at an all-time high at the central policy level, along with at the local level. At present, every state is seeking to surpass each other at the adoption of new technology that can assist and support overcome old problems. Arvind says “The biggest problem that India had was that in the rural areas and even in tier 2-3 cities, the qualified doctors just don’t go and much of the provision is done by people who have just kind of learned the job or somebody who have worked as an assistant with a doctor.” The convergence of technological solutions with cloud computing, data analytics, telecommunications, and wireless technologies will also enhance the accessibility and manage shortages of skilled doctors or physicians more efficiently in the healthcare chúng tôi India, AI can potentially bound some other technologies, but to be used at any scale, digitalisation is a prerequisite. In many Indian healthcare centres, medical records are still paper registered, and radiology still uses films. Considering other countries, this scenario is changing rapidly. Another challenge that needs to overcome is the cost of delivering medical services, which has been increasing steadily. When technological innovation is better incorporated with healthcare delivery, it can enable scale and minimise costs, stimulating adoption. This adoption will also be driven by the automation of critical processes in administration, finance, billing, patient records, and pharmacies.

Voxco: Boosting Roi And Efficiency With The Power Of Data & Analytics

The power of data and insight is a herculean tool to solve business problems that every industry faces. By harnessing this power,

Insights about the Company

Voxco is a global leader in omnichannel cloud and on-premise enterprise feedback management solutions. The company provides market research organizations, government & government agencies, universities, and global corporations with a platform to collect data anytime and anywhere via online and mobile surveys, over-the-phone interviews, or face-to-face offline surveys. Founded over 45 years ago and with offices around the world, Voxco services more than 450 clients in over 40 countries.  

Few Words about the CEO

Sumit brings strong leadership to Voxco with his experience of founding companies and working at the International Monetary Fund, Bank of America, Merrill Lynch, and Houlihan Lokey. Throughout his career, he has worked at the intersection of technology, finance, and data in parallel. Sumit founded Pivoton Capital (which acquired Voxco in 2023) with a passion to help lower middle-market companies unlock untapped value. He holds an MBA from Yale School of Management, an MS in Financial Engineering from Claremont Colleges Consortium, and a BE in Electrical Engineering from Punjab Engineering College.  

Experiences that Shaped the Entrepreneurial Journey

Despite his experience and skills from Yale and previous employments, it was his family that eventually propelled Sumit into running a technology business. His grandfather had caught the entrepreneurial bug and focused on the agriculture industry. In addition, after retiring from his government job as a civil engineer, Sumit’s uncle started a construction business. He saw the challenges they were up against and realized that running a business wasn’t easy or risk-free. But Sumit was also enamoured with the process of starting and growing a company, which outweighed the risks. “I wanted to know what it was like to take a business from point A to point B — to know what it was like to create something. After all, it’s one thing to sit in the background, enter some data into a model, and make a few guesses”, stated Sumit. “It’s a whole different ballgame to see success as a direct result of the work you do every day — and, of course, to demonstrate the opportunity to shareholders.” As he is currently the CEO of Voxco, Sumit is on the other side of the table, managing a company rather than dealing with transactions. Even with the effects of COVID-19, Voxco has grown its global family by 30% since May 2023 (when Pivoton Capital acquired Voxco). The company is investing heavily in virtually every department, and its subscription revenue is up around 25% year over year. It’s been an incredible ride so far, and Voxco aims at doing more incredible things in the future.  

Challenges that Paved the Road to Success

“I came to California from a Tier 2 city in India in 2007 and faced the same challenges as all immigrant students at the time: no money, visa issues while trying to find an initial job. All of this was intensified because of the financial crisis of 2008. It was not an easy time. The effects were felt all over the world”, said the CEO. At such a time, Sumit was working at the International Monetary Fund where he conducted macroeconomic research with other economists to assist countries such as the United States, the United Kingdom, and many others in the European Union in finding solutions to the financial crisis.  

Serving Customers with Innovation

Voxco is an omnichannel feedback platform, which strives to add value for all kinds of enterprises. As customer experience makes a huge impact on company revenue, brand affinity & growth, the company constantly keeps adding new features to help enterprises deliver exceptional customer experience across all their channels, with a determination to be accessible to small & medium businesses too. Therefore, to help them understand the impact Net Promoter Score (NPS) can make on the revenue, Voxco keeps building free customer experience offerings such as its

Leveraging the Might of Disruptive Technologies

Voxco provides organizations with the platform and tools needed to unlock the full potential of data. Artificial intelligence and predictive analytics are helping organizations go through heaps of data to derive actionable insights that will make their operations more efficient and improve profit margins.  

What does Voxco’s Crystal Ball say?

When asked about the future of the company, Sumit said, “Our industry is more than US$80 billion in size and growing at a rate of 15%+ year over year. As the importance of data and most importantly, actionable insights are becoming more prevalent every day, so are our company and industry. We have aggressive plans for growth over the next several years and are expanding our product offering as well. It’s a very exciting time for the business as well as the industry overall.”  

A Piece of Advice for Emerging Leaders

Pyspark For Beginners – Take Your First Steps Into Big Data Analytics (With Code)

Overview

Big Data is becoming bigger by the day, and at an unprecedented pace

How do you store, process and use this amount of data for machine learning? There’s where Spark comes into play

Learn all about what Spark is, how it works, and what are the different components involved

Introduction

We are generating data at an unprecedented pace. Honestly, I can’t keep up with the sheer volume of data around the world! I’m sure you’ve come across an estimate of how much data is being produced – McKinsey, Gartner, IBM, etc. all offer their own figures.

Here are some mind-boggling numbers for your reference – more than 500 million tweets, 90 billion emails, 65 million WhatsApp messages are sent – all in a single day! 4 Petabytes of data are generated only on Facebook in 24 hours. That’s incredible!

This, of course, comes with challenges of its own. How does a data science team capture this amount of data? How do you process it and build machine learning models from it? These are exciting questions if you’re a data scientist or a data engineer.

And this is where Spark comes into the picture. Spark is written in Scala and it provides APIs to work with Scala, JAVA, Python, and R. PySpark is the Python API written in Python to support Spark.

One traditional way to handle Big Data is to use a distributed framework like Hadoop but these frameworks require a lot of read-write operations on a hard disk which makes it very expensive in terms of time and speed. Computational power is a significant hurdle.

PySpark deals with this in an efficient and easy-to-understand manner. So in this article, we will start learning all about it. We’ll understand what is Spark, how to install it on your machine and then we’ll deep dive into the different Spark components. There’s a whole bunch of code here too so let’s have some fun!

Here’s a quick introduction to the world of Big Data in case you need a refresher. Keep in mind that the numbers have gone well beyond what’s shown there – and it’s only been 3 years since we published that article!

Table of Contents

What is Spark?

Installing Apache Spark on your Machine

What are Spark Applications?

Then, what is a Spark Session?

Partitions in Spark

Transformations

Lazy Evaluation in Spark

Data Types in Spark

What is Spark?

Apache Spark is an open-source, distributed cluster computing framework that is used for fast processing, querying and analyzing Big Data.

It is the most effective data processing framework in enterprises today. It’s true that the cost of Spark is high as it requires a lot of RAM for in-memory computation but is still a hot favorite among Data Scientists and Big Data Engineers. And you’ll see why that’s the case in this article.

Organizations that typically relied on Map Reduce-like frameworks are now shifting to the Apache Spark framework. Spark not only performs in-memory computing but it’s 100 times faster than Map Reduce frameworks like Hadoop. Spark is a big hit among data scientists as it distributes and caches data in memory and helps them in optimizing machine learning algorithms on Big Data.

I recommend checking out Spark’s official page here for more details. It has extensive documentation and is a good reference guide for all things Spark.

Installing Apache Spark on your Machine 1. Download Apache Spark

One simple way to install Spark is via pip. But that’s not the recommended method according to Spark’s official documentation since the Python package for Spark is not intended to replace all the other use cases.

There’s a high chance you’ll encounter a lot of errors in implementing even basic functionalities. It is only suitable for interacting with an existing cluster (be it standalone Spark, YARN, or Mesos).

So, the first step is to download the latest version of Apache Spark from here. Unzip and move the compressed file:

tar xzvf chúng tôi mv spark-2.4.4-bin-hadoop2.7 spark sudo mv spark/ /usr/lib/ 2. Install JAVA

Make sure that JAVA is installed in your system. I highly recommend JAVA 8 as Spark version 2 is known to have problems with JAVA 9 and beyond:

sudo apt install default-jre sudo apt install openjdk-8-jdk 3. Install Scala Build Tool (SBT)

When you are working on a small project that contains very few source code files, it is easier to compile them manually. But what if you are working on a bigger project that has hundreds of source code files? You would need to use build tools in that case.

SBT, short for Scala Build Tool, manages your Spark project and also the dependencies of the libraries that you have used in your code.

Keep in mind that you don’t need to install this if you are using PySpark. But if you are using JAVA or Scala to build Spark applications, then you need to install SBT on your machine. Run the below commands to install SBT:

sudo apt-get update sudo apt-get install sbt

4. Configure SPARK

Next, open the configuration directory of Spark and make a copy of the default Spark environment template. This is already present there as spark-env.sh.template. Open this using the editor:

cd /usr/lib/spark/conf/ cp spark-env.sh.template chúng tôi sudo gedit spark-env.sh

Now, in the file chúng tôi , add the JAVA_HOME path and assign memory limit to SPARK_WORKER_MEMORY. Here, I have assigned it to be 4GB:

## add variables JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 SPARK_WORKER_MEMORY=4g 5. Set Spark Environment Variables

Open and edit the bashrc file using the below command. This bashrc file is a script that is executed whenever you start a new terminal session:

## open bashrc file sudo gedit ~/bashrc

Add the below environment variables in the file:

## add following variables export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export SBT_HOME=/usr/share/sbt/bin/sbt-launch.jar export SPARK_HOME=/usr/lib/spark export PATH=$PATH:$JAVA_HOME/bin export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS='notebook' export PYSPARK_PYTHON=python3 export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH

Now, source the bashrc file. This will restart the terminal session with the updated script:

## source bashrc file source ~/.bashrc

What are Spark Applications?

A Spark application is an instance of the Spark Context. It consists of a driver process and a set of executor processes.

The driver process is responsible for maintaining information about the Spark Application, responding to the code, distributing, and scheduling work across the executors. The driver process is absolutely essential – it’s the heart of a Spark Application and maintains all relevant information during the lifetime of the application

The executors are responsible for actually executing the work that the driver assigns them. So, each executor is responsible for only two things:

Executing code assigned to it by the driver, and

Reporting the state of the computation, on that executor, back to the driver node

Then what is a Spark Session?

We know that a driver process controls the Spark Application. The driver process makes itself available to the user as an object called the Spark Session.

The Spark Session instance is the way Spark executes user-defined manipulations across the cluster. In Scala and Python, the Spark Session variable is available as spark when you start up the console:

Partitions in Spark

Partitioning means that the complete data is not present in a single place. It is divided into multiple chunks and these chunks are placed on different nodes.

If you have one partition, Spark will only have a parallelism of one, even if you have thousands of executors. Also, if you have many partitions but only one executor, Spark will still only have a parallelism of one because there is only one computation resource.

In Spark, the lower level APIs allow us to define the number of partitions.

Let’s take a simple example to understand how partitioning helps us to give faster results. We will create a list of 20 million random numbers between 10 to 1000 and will count the numbers greater than 200.

Let’s see how fast we can do this with just one partition:

It took 34.5 ms to filter the results with one partition:

Now, let’s increase the number of partitions to 5 and check if we get any improvements in the execution time:

It took 11.1 ms to filter the results using five partitions:

Transformations in Spark

Data structures are immutable in Spark. This means that they cannot be changed once created. But if we cannot change it, how are we supposed to use it?

So, In order to make any change, we need to instruct Spark on how we would like to modify our data. These instructions are called transformations.

Recall the example we saw above. We asked Spark to filter the numbers greater than 200 – that was essentially one type of transformation. There are two types of transformations in Spark:

Narrow Transformation: In Narrow Transformations, a

ll the elements that are required to compute the results of a single partition live in the single partition of the parent RDD. For example, if you want to filter the numbers that are less than 100, you can do this on each partition separately. The transformed new partition is dependent on only one partition to calculate the results

Wide Transformation: In Wide Transformations, all the elements that are required to compute the results of single partitions may live in more than one partition of the parent RDD. For example, if you want to calculate the word count, then your transformation is dependent on all the partitions to calculate the final result

Lazy Evaluation

Let’s say you have a very large data file that contains millions of rows. You need to perform analysis on that by doing some manipulations like mapping, filtering, random split or even very basic addition or subtraction.

Now, for large datasets, even a basic transformation will take millions of operations to execute.

It is essential to optimize these operations when working with Big Data, and Spark handles it in a very creative way. All you need to do is tell Spark what are the transformations you want to do on the dataset and Spark will maintain a series of transformations. When you ask for the results from Spark, it will then find out the best path and perform the required transformations and give you the result.

Now, let’s take an example. You have a text file of 1 GB and have created 10 partitions of it. You also performed some transformations and in the end, you requested to see how the first line looks. In this case, Spark will read the file only from the first partition and give you the results as your requested results do not require to read the complete file.

Let’s take a few practical examples to see how Spark performs lazy evaluation. In the first step, we have created a list of 10 million numbers and created a RDD with 3 partitions:

Next, we will perform a very basic transformation, like adding 4 to each number. Note that Spark at this point in time has not started any transformation. It only records a series of transformations in the form of RDD Lineage. You can see that RDD lineage using the function toDebugString:

We can see that PythonRDD[1] is connected with ParallelCollectionRDD[0].  Now, let’s go ahead and add one more transformation to add 20 to all the elements of the list.

You might be thinking it would be better if added 24 in a single step instead of making an extra step. But check the RDD Lineage after this step:

We can see that it has automatically skipped that redundant step and will add 24 in a single step instead of how we defined it. So, Spark automatically defines the best path to perform any action and only perform the transformations when required.

Let’s take another example to understand the Lazy Evaluation process.

Suppose we have a text file and we created an RDD of it with 4 partitions. Now, we define some transformations like converting the text data to lower case, slicing the words, adding some prefix to the words, etc.

But in the end, when we perform an action like getting the first element of the transformed data, Spark performs the transformations on the first partition only as there is no need to view the complete data to execute the requested result:

View the code on Gist.

Here, we have converted the words to lower case and sliced the first two characters of each word (and then requested for the first word).

What happened here? We created 4 partitions of the text file. But according to the result we needed, it was not required to read and perform transformations on all the partitions, hence Spark only did that.

What if we want to count the unique words? Then we need to read all the partitions and that’s exactly what Spark does:

Data Types in Spark MLlib

MLlib is Spark’s scalable Machine Learning library. It consists of common machine learning algorithms like Regression, Classification, Dimensionality Reduction, and some utilities to perform basic statistical operations on the data.

In this article, we will go through some of the data types that MLlib provides. We’ll cover topics like feature extraction and building machine learning pipelines in upcoming articles.

Local Vector

MLlib supports two types of Local Vectors: dense and sparse. Sparse Vectors are used when most of the numbers are zero. To create a sparse vector, you need to provide the length of the vector – indices of non-zero values which should be strictly increasing and non-zero values.

View the code on Gist.

Labeled Point

Labeled Point is a local vector where a label is assigned to each vector. You must have solved supervised problems where you have some target corresponding to some features. Label Point is exactly the same where you provide a vector as a set of features and a label associated with it.

View the code on Gist.

Local Matrix

View the code on Gist.

Distributed Matrix

Distributed matrices are stored in one or more RDDs. It is very important to choose the right format of distributed matrices. Four types of distributed matrices have been implemented so far:

Row Matrix

Each row is a local vector. You can store rows on multiple partitions

Algorithms like Random Forest can be implemented using Row Matrix as the algorithm divides the rows to create multiple trees. The result of one tree is not dependent on other trees. So, we can make use of the distributed architecture and do parallel processing for algorithms like Random Forest for Big Data

View the code on Gist.

Indexed Row Matrix

It is similar to the row matrix where rows are stored in multiple partitions but in an ordered manner. An index value is assigned to each row. It is used in algorithms where the order is important like Time Series data

It can be created from an RDD of IndexedRow

View the code on Gist.

Coordinate Matrix

A coordinate matrix can be created from an RDD of MatrixEntry

We only use a Coordinate matrix when both the dimensions of the matrix are large

View the code on Gist.

Block Matrix

In a Block Matrix, we can store different sub-matrices of a large matrix on different machines

We need to specify the block dimensions. Like in the below example, we have 3X3 and for each of the blocks, we can specify a matrix by providing the coordinates

View the code on Gist.

End Notes

We’ve covered quite a lot of ground today. Spark is one of the more fascinating languages in data science and one I feel you should at least be familiar with.

This is just the start of our PySpark learning journey! I plan to cover a lot more ground in this series with multiple articles spanning different machine learning tasks.

Related

The Big New Updates To Alexa, And Amazon’s Pursuit Of Ambient Ai

Alexa, Amazon’s ubiquitous voice assistant, is getting an upgrade. The company revealed some of these changes in a virtual event today. One of the most interesting developments is that users can now personalize their Alexa-enabled devices to listen for specific sound events in their homes.

Amazon also unveiled new features and products, including additional accessories for Ring and Halo devices, and access to invite-only devices such as the Always Home Cam and a cute home-roving, beat-boxing robot called Astro. 

Alexa’s latest capabilities are part of the Amazon team’s work on ambient computing—a general term that refers to an underlying artificial intelligence system that surfaces when you need it and recedes into the background when you don’t. This is enabled through the connected network of Amazon devices and services that interact with each other and with users. 

“Perhaps the most illustrative example of our ambient AI is Alexa,” Rohit Prasad, senior vice president and head scientist for Alexa at Amazon, tells Popular Science. “Because, it’s not just a spoken language service that you issue a bunch of requests. As an ambient intelligence that is available on many different devices around you, it understands the state of your environment and even acts perhaps on your behalf.”

Alexa already has the ability to detect what Prasad calls “global” ambient sounds or sound events. These are things like glass breaking, or a fire alarm, smoke alarm going off. Those are events that make your home more safe while you’re away, he says. If anything goes wrong, Alexa can send you a notification. It can also detect more innocuous sounds like your dog barking or your partner snoring. 

[Related: How Amazon’s radar-based sleep tracking could work]

Now, Prasad and his team are taking this pre-trained model for global sound events that they built using thousands of real world sound samples, and are offering a way for users to create alerts for their own custom sound events by manually adding 5-10 examples for a specific sound that they would like Alexa to keep an ear out for at home. “All the other kinds of data that you’ve collected via us can be used to make the custom sound events happen with fewer samples,” he says. 

This could be something like the refrigerator door being left open by kids after school for more than 10 minutes. “The two refrigerators in my home, they both make different sounds when the door is left open by one of our kids,” says Prasad. That way, even if he’s not at home when his kids are, Alexa could send him a notification if someone didn’t shut the refrigerator door properly. 

You could set an alert for a whistling kettle, a washer running, or a doorbell ringing, an oven timer going off while you’re upstairs. “And if you have an elder person in the home who can’t hear well and is watching television, if it’s hooked to a Fire TV, then you can send a message on the TV that someone’s at the door, and that the doorbell rang,” says Prasad.  

Ring can tell you the specific items it sees out of place

In addition to custom sound events, Alexa could also alert you about certain visual events through its Ring cameras and some Ring devices. “What we found is that Ring cameras, especially for outdoor settings, [are] great for looking at the binary states of objects of interest in your homes,” Prasad says. For example, if you have a Ring camera facing an outdoor shed, you can teach it to check if the door has been left open or not by providing it some pictures for the open state, and for the closed state, and have it send you an alert if it’s open. 

[Related: Amazon’s home security drone may actually be less creepy than a regular camera]

Amazon

“You’re now combining computer vision and a few short learning techniques,” says Prasad. The team has collected a large sample of publicly available photos of garage and shed doors to help with the pre-training, just like with the audio component of the ambient AI. “But my shed door can look different from the shed you may have, and then the customization is still required, but now it can happen with very few samples.” 

Alexa will soon be able to learn your preferences

Last year, Amazon updated Alexa so that if it doesn’t recognize the concept in a customer’s request, it will come back to you and ask “what do you mean by that?” 

This could be a request like: set my thermostat to vacation mode, while vacation mode is a setting that’s not known. Plus, your preference for the setting could be 70 degrees, instead of 60 degrees. That’s where users could come in and customize Alexa through natural language. 

“Typically when you have these alien concepts, or unknown and ambiguous concepts, it will require some input either through human labelers [on the developer end] saying “vacation mode” is a type of setting for a smart appliance like a thermostat,” Prasad explains. 

This type of data is hard to gather without real-world experience, and new terms and phrases pop up all this time. The more practical solution was for the team to build out Alexa’s ability for generalized learning or generalized AI. Instead of relying on supervised learning from human labelers at Amazon, Alexa can learn directly from end users, making it easier for them to adapt Alexa to their lives. 

In a few months, users will be able to use this capability to ask Alexa to learn their preferences, which is initiated by saying, “Alexa, learn my preferences.” They can then go through a dialogue with Alexa to learn about three areas of preferences to start with: those are food preferences, sports teams, and weather providers, like the Big Sky app. 

If you say, “Alexa, I’m a vegetarian,” when Alexa takes you through the dialogue, then, the next time you look for restaurants to eat nearby, it will remember and prioritize restaurants that have vegetarian options. And if you just ask for recipes for dinner, it would prioritize vegetarian options over others. 

For sports teams, if you’ve said that you like the Boston Red Sox for baseball, and the New England Patriots, and then you ask Alexa for sports highlights on Echo show, you’ll get more custom highlights for your preferred teams. And if another family member likes other teams, you can add that to the preference as well. 

[Related: The ‘artificial intelligence’ in your new smart gadget may not be what you think]

“We already know customers express these preferences in their regular interactions many times every day,” says Prasad. “Now we are making it very simple for these preferences to work.” You can go through the preset prompts with Alexa to teach it your preferences, or teach it in the moment. For example, if you ask Alexa for restaurants and it recommends steakhouses, you can say, “Alexa, I’m a vegetarian,” and it will automatically learn that for future encounters. 

“These three inventions that are making the complex simple are also illustrative of more generalized learning capabilities, with more self-supervised learning, transfer learning, and few short learning, and also deep learning, to make these kinds of interactive dialogues happen,” Prasad says. “This is the hallmark for generalized intelligence,” which is similar to how humans learn. 

Alexa learns and grows

These three new features—the custom sounds, custom visuals, and preferences— are not only coming together to improve AI, but also improve Alexa’s self-learning, self-service, and self-awareness of your ambient environment. “Alexa is just more aware of your surroundings to help you when you need it,” Prasad says. Along with a feature like Routines, or Blueprints, these new add-ons allow Alexa to give more custom responses, without requiring any coding proficiency. 

Alexa automatically learns how to improve itself as you use it more. In fact, Prasad says that with more than 20 percent of defects that Alexa has, it’s now able to automatically correct through no human supervision. “If it did something wrong and you barge in and say no, Alexa, I meant that,” it will remember that for the next time you ask for something similar, he says. 

In his case, sometimes when he asks Alexa to play BBC, it registers what he says as BPC. “For an AI, simple is hard. Occasionally it recognizes ‘play BPC.’ But it recognizes usage patterns,” Prasad says. That way, it can fix the request automatically without asking every time, “did you mean BBC?” 

[Related: If you’re worried about Amazon Sidewalk, here’s how to opt out]

“This is the type of automatic learning based on context in both your personalized usage and cohort usage that Alexa is able to be much smarter worldwide and try to estimate defects and correct automatically without any human input,” says Prasad. “If you look at the old days of supervised learning, even with active learning, Alexa will say ‘this is the portion I am having trouble with, let’s get some human input.’ Now that human input comes directly from the end user.” 

Update the detailed information about Applications Of Ai And Big Data Analytics In M on the Minhminhbmm.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!