Uche OkoyeforA Few Figuresafewfigures.hashnode.net·Jan 15, 2023Big DataBig data is a term used to describe the massive amount of data that organizations need to process and analyze in order to gain insights and make informed decisions. It can be anything from customer data, financial data, social media, healthcare recor...Discuss·2 likesdata
Renjitha Kforrenjithak.hashnode.net·Apr 7, 2023Setting up Apache SparkIn this blog, I will be focusing on setting up the workspace for Windows so that we can get started with Apache Spark and do some hands-on in my upcoming series of Apache Kafka. If you haven't taken a look at it and wish to, here is the link https://...Discuss·1 like·73 readsspark
Priyank PatelforDeveloper Talkspriyank-1623299322987.hashnode.net·Feb 18, 2023Balancing Performance and Scalability with Elasticsearch Shards and ReplicasDisclaimer The entire article is based on a Stack Overflow response about shards and replicas, and all credit goes to Javanna for providing an outstanding explanation. The explanation is so simple that even if you have no idea what the hell shard an...Discuss·51 readselasticsearch
Renjitha Kforrenjithak.hashnode.net·Apr 7, 2023Setting up Apache SparkIn this blog, I will be focusing on setting up the workspace for Windows so that we can get started with Apache Spark and do some hands-on in my upcoming series of Apache Kafka. If you haven't taken a look at it and wish to, here is the link https://...Discuss·1 like·73 readsspark
Renjitha Kforrenjithak.hashnode.net·Mar 27, 2023Understanding MapReduce: A Beginners GuideMost of us have been hearing the term MapReduce for a long while now, I have been wondering what this term means, Let's try to understand the basics of the same. So, MapReduce is a powerful programming model and software framework for processing larg...Discuss·173 readsBigData
Priyank PatelforDeveloper Talkspriyank-1623299322987.hashnode.net·Feb 18, 2023Balancing Performance and Scalability with Elasticsearch Shards and ReplicasDisclaimer The entire article is based on a Stack Overflow response about shards and replicas, and all credit goes to Javanna for providing an outstanding explanation. The explanation is so simple that even if you have no idea what the hell shard an...Discuss·51 readselasticsearch
Evan Chanforevanchan.hashnode.net·Jan 16, 2023Windowing Operations in PySpark(Note: this is adapted from my talk at 2021 Scale by the Bay, Location-Based Data Engineering for Good) If you are a data scientist, chances are you are coding Python and most likely using pandas. You might have heard of or are learning Apache Spark,...Discuss·115 readsPython
Uche OkoyeforA Few Figuresafewfigures.hashnode.net·Jan 15, 2023Big DataBig data is a term used to describe the massive amount of data that organizations need to process and analyze in order to gain insights and make informed decisions. It can be anything from customer data, financial data, social media, healthcare recor...Discuss·2 likesdata