Siddharth Chandracodekaro.hashnode.net·Dec 2, 2021Big Data Open Source FrameworksBig Data is a term used to define large scale data sets that are too complex to be manipulated with basic DBMS. Handling Big Data requires sophisticated hardware and software technologies. Just as open-source has been the primary reason for the Big D...Discuss·77 likes·369 readsScalabig data
Sahil Negicodereek.hashnode.net·Nov 25, 2020Big Data worldBigData World What is Big data? Big data is also data but with an immense scale. Big Data is a concept used to characterize a data set that is immense in volume and yet exponentially increasing over time. In short, such data is so huge and complex th...Discuss·10 likes·35 readsbig data
Mayur Saidbuildprojectswithmayur.hashnode.net·Mar 26, 2023Scaling KNN Using MapReduceK-Nearest Neighbors (KNN), a non-parametric lazy learning technique, is considered one of the best techniques for classification. Unlike other classification algorithms like Logistic Regression, Naïve Bayes, etc, the biggest advantage of KNN is that ...Discuss·10 likes·49 readsHashnode
Mayur Saidbuildprojectswithmayur.hashnode.net·Mar 26, 2023Scaling KNN Using MapReduceK-Nearest Neighbors (KNN), a non-parametric lazy learning technique, is considered one of the best techniques for classification. Unlike other classification algorithms like Logistic Regression, Naïve Bayes, etc, the biggest advantage of KNN is that ...Discuss·10 likes·49 readsHashnode
Vijay Uradeshuffleandsort.hashnode.net·Feb 24, 2023Hadoop Single node cluster quick setupInstalling Javaa. apt-get update.b. apt-get install openjdk-8-jre.c. apt-get install openjdk-8-jdk. Installing ssh (Secure Shell)a. sudo apt-get -y install openssh-serverb. ssh-keygen -t rsac. cd .ssh <press enter>d. cp id_rsa.pub authorized_keyse. s...Discuss·1 like·28 readshadoop
Atharva Shivaji Yemulatharvayemul.hashnode.net·Feb 2, 2023Big Data: Understanding Its Significance in Today's WorldIntroduction to the Big Data Big data refers to the large and complex sets of data generated by various sources in today's digital world. With the rise of connected devices and the internet, the amount of data generated every day has increased dramat...Discussbig data
Aloysius Vidhun Monaloysius05.hashnode.net·Jan 23, 2023Introduction to Big Data (Hadoop)Introduction to Big Data and Hadoop: Note: These are just study materials I made for myself for career development when I was learning BigData. Overview: Understand the concepts of Big Data. Explain Hadoop and how it addresses Big Data challenges. ...Discuss·65 readsbig data
Uche Okoyeafewfigures.hashnode.net·Jan 15, 2023Big DataBig data is a term used to describe the massive amount of data that organizations need to process and analyze in order to gain insights and make informed decisions. It can be anything from customer data, financial data, social media, healthcare recor...Discuss·2 likesdata
padmanabha reddypadmanabha.hashnode.net·Jan 13, 2023Transactional vs Analytical systemsWhat Is A Transactional Database? Transactional data is information captured from day-to-day business activities such as sales, discounts, payment methods, supplier purchase orders, customer support receipts, email confirmations, payment of employees...DiscussDatabases
Yash Srivastavayash722.hashnode.net·Jan 9, 2023Shared variables in sparkSometimes in a spark application, we need to share small data across all the machines for processing. For example, if you want to filter some set of words from a large dataset residing in a datalake. Or if we simply just want to know how many blank l...Discuss·32 readsbig data
padmanabha reddypadmanabha.hashnode.net·Jan 8, 2023Distributed computing framework - MapReduceWhat is MapReduce? MapReduce is a software framework for processing large data sets that are distributed over several machines. MapReduce facilitates concurrent processing by splitting petabytes of data into smaller chunks and processing them in para...Discussmapreduce
padmanabha reddypadmanabha.hashnode.net·Jan 6, 2023HDFS Architecture and workingHadoop Distributed File System(HDFS) is the world’s most reliable storage system. It is best known for its fault tolerance and high availability. What is Hadoop HDFS? HDFS stores very large files running on a cluster of commodity hardware. It works o...Discusshdfs
Yash Srivastavayash722.hashnode.net·Dec 30, 2022Introduction to HiveWe cannot use an analytical storage system for transactional requirements and vice versa. But have you ever wondered why is that so? Transactional vs Analytical storage system Transactional storage (ex - MySQL, Postgres, etc.) is used to work with da...Discuss·43 readshadoop