Brian Roepkebroepke.hashnode.net·Feb 4, 2023FeaturedHow to Setup a Simple ETL Pipeline with AWS Lambda for Data ScienceIntroduction to ETL with AWS Lambda When it comes time to build an ETL pipeline, many options exist. You can use a tool like Astronomer or Prefect for Orchestration, but you will also need somewhere to run the compute. With this, you have a few optio...Osvaldo Brignoni and 2 others are discussing this3 people are discussing thisDiscuss·56 likes·366 readsPython
Harsh Daiyahd.hashnode.net·Jan 11, 2023FeaturedIdempotency in Data pipelines - OverviewIdempotency is an important concept in data engineering, particularly when working with distributed systems or databases. In simple terms, an operation is said to be idempotent if running it multiple times has the same effect as running it once. This...Daniil Korbut and 1 other are discussing this2 people are discussing thisDiscuss·30 likes·396 readsPython
Vipul Tripathiitsvipul.hashnode.net·Apr 2, 2023Snowflake: Pre-Requisite to Get StartedThe definition of Snowflake is that it is a cloud-based data warehousing platform that provides a fully managed, scalable, and secure solution for storing, managing, and analyzing large amounts of data. Whenever and wherever you'll hear of Snowflake,...Discuss·22 likes·43 readsZero to Snowflakesnowflake
Karl Bolingerkbolinger.hashnode.net·Apr 24, 2023Understanding ETL and ELT Workflows in Data Engineering: An Easy Guide with ExamplesData engineering is a complex field where many different technologies, frameworks, and techniques come into play. Two of the most common data processing workflows data engineers use are ETL and ELT. ETL stands for Extract, Transform, and Load, while ...Discussdata-engineering
Martijn Sturmmartijn-sturm.hashnode.net·Apr 23, 2023Defining ETL jobs as Infrastructure-as-CodeUsing Infrastructure-as-Code (IaC) for deployment of resources to the cloud is a no-brainer nowadays. The learning-curve at the start is a bit steeper than applying click-ops, but will pay off in te long-term. In this post I try to assist in getting ...DiscussHow to Data Engineering on AWSETL
Rajanand Ilangovanrajanand.hashnode.net·Apr 4, 2023Beginner's Guide to ETL: Extract, Transform and LoadETL (Extract, Transform, Load) is a crucial process that enables organizations to extract data from various sources, transform it into a useful form, and then load it into a data warehouse system for further analysis. It involves three main steps: ex...DiscussETL
Vipul Tripathiitsvipul.hashnode.net·Apr 2, 2023Snowflake: Pre-Requisite to Get StartedThe definition of Snowflake is that it is a cloud-based data warehousing platform that provides a fully managed, scalable, and secure solution for storing, managing, and analyzing large amounts of data. Whenever and wherever you'll hear of Snowflake,...Discuss·22 likes·43 readsZero to Snowflakesnowflake
Eric Hartfordehartford.hashnode.net·Mar 26, 2017Uploading CSV to DynamoDB with Node JSSo I wanted to upload CSV to DynamoDB.Easy right?Not so fast. It turns out, you have to obey your provisioned write capacity. Unlike S3, "Simple Storage Service" where you simply upload a file, DynamoDB isn't "Simple". There's no "upload CSV" bu...Discuss·27 readsDynamoDB
Mayur Saidbuildprojectswithmayur.hashnode.net·Mar 9, 2023Building ETL Pipeline in Google Cloud Platform: A Project-Based Guide with PySpark and AirflowETL (Extract, Transform, Load) is a process of integrating data from various sources, transforming it into a format that can be analysed, and loading it into a data warehouse for business intelligence purposes. Building an ETL pipeline can be a daunt...Discuss·11 likes·287 readsGoogle Cloud Platform
Stephen David-Williamsspiritman7.hashnode.net·Mar 9, 2023Using Power Query to clean data in Power BIAlthough most of my data cleaning is conducted with Spark in Databricks, there are scenarios where my data doesn't need to undergo heavy transformations. If I can get my data to the output I need in just a few short steps, why not do that? In this bl...Discuss·49 readsPowerBI
Stephen David-Williamsspiritman7.hashnode.net·Feb 27, 2023How I created a Postgres data warehouse with Python & SQL🐘🐍Preface🌟 Disclaimer: This was just for fun. In a real-world setting, there are more appropriate options that address modern data warehousing challenges for different businesses depending on the problem statement at hand, so consider your company's (...Discuss·418 readsPostgreSQL
Mariano Gonzálezmarianogg9.hashnode.net·Feb 24, 2023Airflow in ECS with Redis - Part 2: Hands OnPreviously on How to set up a containerised Airflow installation in AWS ECS using Redis as its queue orchestrator, I gave an overview of the infrastructure and Airflow components. Now let's deploy all that. This deployment will incur charges!! Base...Discuss·85 readsAirflow in ECSairflow
Elvis Davidtechml.hashnode.net·Feb 23, 2023Apache Flink 101: Understanding the ArchitectureTime = value Introduction Data is generated from many sources, including financial transactions, location-tracking feeds, measurements from Internet of Things (IoT) devices, and web user activity. Formerly, batch processing was used to manage these ...Discuss·11 likesapache-flink