About me

My name is Heitor Felix

I have a degree in Data Science from Uninter and work as a Data Engineer II at Sapiensia Tecnologia. Below you will learn about the tools and skills I possess to solve problems through Data Science, Data Analysis, Data Engineering, Artificial Intelligence and Infrastructure, as well as some professional experiences. Feel free to contact me through the links at the end of the page.

Skills

Data Engineering and Science:

Python for data engineering and science
Advanced feature creation with pandas
Databases: SQL Server, PostgreSQL, Google Big Query and Snowflake
Data pipelines with Azure Data Factory and Databricks
dbt Core
Data pipelines with Azure Data Factory and Databricks
Orchestration with Apache Airflow
Elasticsearch

Machine Learning and AI:

Regression, classification and clustering models
Creation of customized LLMs with RAG
Automated deployment in different environments (DevOps and DataOps)
Fundamentals of statistics and mathematics

Data Visualization and Communication:

Power BI, Tableau, Metabase
Python: Matplotlib, Seaborn, Plotly
Advanced Excel
Streamlit dashboards

Infrastructure and Cloud Computing:

Solid experience with Azure (certifications AZ-900, DP-900, DP-203)
Experience with AWS
Infrastructure as Code (IaC) with Bicep template
API creation in Python and serverless deployment
CI/CD in different environments with Azure DevOps and GitHub Actions

Certifications

Microsoft Certified: Azure Data Engineer Associate

DP-203

Microsoft Certified: Azure Fundamentals

AZ-900

Microsoft Certified: Azure Data Fundamentals

DP-900

Astronomer Certification for Apache Airflow 3 Fundamentals

Astronomer

Professional Experience

Data Engineer II at Sapiensia

Implementation of data pipelines, dashboards, specialized LLMs, and automations with Python in Serverless architecture. Experience in Azure, including disaster recovery plans and certifications.

Data Science Intern at 027capital

Development of data processing software, churn prediction models, and data ingestion pipelines using Python and Google Cloud.

Projects

Brazilian Congress Deputies Data Pipeline

Complete data pipeline focused on data engineering, with automated ingestion of public information from all federal deputies. The data includes biography, mandates, expenses and parliamentary activity, extracted via official API. The architecture implements a modern ELT approach with Snowflake and dbt . Daily incremental ingestion is orchestrated with Airflow and stored in S3 in Parquet format. Transformations follow robust dimensional modeling patterns (SCD Type 2). The project ensures end-to-end scalability, automation and data quality.

Tools used

Python, Pandas and requests
Apache Airflow
Amazon S3 and SQS
Snowflake, Snowpipe
dbt Core
Streamlit and Jupyter notebook

Data Lakehouse: Olist

This project used the Databricks Data Lakehouse architecture to manage data in layers (Raw, Bronze, Silver, and Gold) and simulate ingestion scenarios with CDC (Change Data Capture). The data, from a Kaggle dataset, was enriched to create a complete pipeline, from ingestion to business analysis. I implemented data governance with Unity Catalog, orchestration with Databricks Workflows, and continuous integration via GitHub Actions. The project consolidated skills in data pipelines, automation, and analysis with the Medallion architecture, optimizing the use of data for insights and analytical applications.

Tools used

Pandas
Git, GitHub, GitHub Actions
Azure Blob Storage, Parquet
Databricks, UnityCatalog
Spark, Delta Lake
Databricks Workflows

Learn more

Chatbot with GPT-4 and Azure

In this project, I explored Azure Artificial Intelligence tools to build a chatbot specialized in Azure using GPT-4. I copied the data from the Azure documentation on GitHub to the Storage Account, used Azure AI Search to perform embedding and indexing of the content, and Azure OpenAI to build the chatbot in an App on Azure. The goal is to provide accurate and contextualized answers about Azure services and functionalities.

Tools used

Python
Azure Blob Storage
Azure AI Search
Azure OpenAI
Git, GitHub
Bicep template (IaC)

Learn more

IN PROGRESS: Telegram Bot: Text Recognition (Computer Vision)

In this project, I explored Azure Artificial Intelligence tools for optical character recognition (OCR), such as Azure Computer Vision and Azure AI Document Intelligence. I used Python to develop a Telegram bot that processes images sent by the user, returning the extracted text and the confidence interval for each recognized word. I implemented dynamic settings in the bot, allowing adjustment of parameters such as the minimum confidence level to accept words and the application of pre-processing. This project demonstrates skills in API integration, image processing and the creation of interactive interfaces with bots.

Tools used

Python
Telegram API
Azure Computer Vision
Azure AI Document Intelligence
Git, GitHub
Bicep template (IaC)

Learn more

Older Projects (2021 - 2022)

Stone Data Challenge 2022

I was a semifinalist in the Stone Data Challenge 2022. In this Stone challenge, my task was to use historical data from a loan program from 2019 to April 2022 from 14,700 clients. The business problem was related to contacting clients who were behind on payments. The question to be answered was: What is the ideal curve of times we should contact a client? To answer it, I used Python and Power BI to answer the question with data analysis.

Tools used

Git, GitHub, LSF Git files
Python, Pandas, Seaborn, Plotly
Power BI

Sales Prediction

I used Python to create a Machine Learning model to predict the sales of each of the 3,000 registered stores in the next 6 weeks. The model was put into production and can be requested via API by Telegram, just needing internet access to use it. The model had a 90% prediction of the real value, allowing the CFO to make decisions based on the future revenue of each store unit and thus be able to make investments without losses.

Tools used

Git, GitHub
Python, Pandas, Seaborn, Boruta
Scikit-Learn and Scipy
Flask
Heroku Cloud
Telegram API

Learn more

Classification of customers most likely to buy

I used Python to create a Machine Learning model to rank the customers most likely to purchase a new product (cross-sell strategy). With an accuracy of 33.5% for the top 20,000 customers in the database, the sales team is able to reach interested parties with much less cost.

Tools used

Git, GitHub
Python, Pandas, Seaborn, Extra Tree Classifier
Scikit-Learn, Scipy and Scikit-Plot
Flask
Heroku Cloud
Google Sheets API with Google Scripts

Learn more

Customer loyalty with clustering

I used Python to create a Machine Learning model to find the "Insiders," the best customers of the company. The objective of this project was to group customers with similar behaviors so that the business team can build personalized actions, based on the characteristics of each cluster.

Tools used

Git, GitHub
Python, Pandas, Seaborn, GMM
Scikit-Learn, Scipy and Yellowbrick
SQLite
Metabase
Papermill

Learn more

House Rocket Data Analysis

I used Python and Power BI to perform exploratory data analysis and thus confirmed or not some hypotheses about the business, resulting in insights for better business performance. The analysis aimed to increase the revenue of the fictitious company, House Rocket, which works with the buying and selling of real estate, finding the best times to buy or sell the property.

Tools used

Git, GitHub
Python, Pandas, Seaborn, Plotly
Geopy API
Power BI
SQLite

Hello, welcome to my project portfolio!

About me

My name is Heitor Felix

Skills

Data Engineering and Science:

Machine Learning and AI:

Data Visualization and Communication:

Infrastructure and Cloud Computing:

Certifications

Microsoft Certified: Azure Data Engineer Associate

Microsoft Certified: Azure Fundamentals

Microsoft Certified: Azure Data Fundamentals

Astronomer Certification for Apache Airflow 3 Fundamentals

Professional Experience

Data Engineer II at Sapiensia

Data Science Intern at 027capital

Projects

Brazilian Congress Deputies Data Pipeline

Tools used

Data Lakehouse: Olist

Tools used

Chatbot with GPT-4 and Azure

Tools used

IN PROGRESS: Telegram Bot: Text Recognition (Computer Vision)

Tools used

Older Projects (2021 - 2022)

Stone Data Challenge 2022

Tools used

Sales Prediction

Tools used

Classification of customers most likely to buy

Tools used

Customer loyalty with clustering

Tools used

House Rocket Data Analysis

Tools used

Contact