About me

My name is Heitor Felix

I have a degree in Data Science from Uninter and work as a Data Engineer II at Sapiensia Tecnologia. Below you will learn about the tools and skills I possess to solve problems through Data Science, Data Analysis, Data Engineering, Artificial Intelligence and Infrastructure, as well as some professional experiences. Feel free to contact me through the links at the end of the page.

Skills

Data Engineering and Science:

  • Python for data engineering and science
  • Advanced feature creation with pandas
  • Databases: SQL Server, PostgreSQL, Google Big Query and Snowflake
  • Data pipelines with Azure Data Factory and Databricks
  • dbt Core
  • Data pipelines with Azure Data Factory and Databricks
  • Orchestration with Apache Airflow
  • Elasticsearch

Machine Learning and AI:

  • Regression, classification and clustering models
  • Creation of customized LLMs with RAG
  • Automated deployment in different environments (DevOps and DataOps)
  • Fundamentals of statistics and mathematics

Data Visualization and Communication:

  • Power BI, Tableau, Metabase
  • Python: Matplotlib, Seaborn, Plotly
  • Advanced Excel
  • Streamlit dashboards

Infrastructure and Cloud Computing:

  • Solid experience with Azure (certifications AZ-900, DP-900, DP-203)
  • Experience with AWS
  • Infrastructure as Code (IaC) with Bicep template
  • API creation in Python and serverless deployment
  • CI/CD in different environments with Azure DevOps and GitHub Actions

Certifications

DP-203 Certification

Microsoft Certified: Azure Data Engineer Associate

DP-203

AZ-900 Certification

Microsoft Certified: Azure Fundamentals

AZ-900

DP-900 Certification

Microsoft Certified: Azure Data Fundamentals

DP-900

Astronomer Certification

Astronomer Certification for Apache Airflow 3 Fundamentals

Astronomer

Professional Experience

Data Engineer II at Sapiensia

Implementation of data pipelines, dashboards, specialized LLMs, and automations with Python in Serverless architecture. Experience in Azure, including disaster recovery plans and certifications.

Data Science Intern at 027capital

Development of data processing software, churn prediction models, and data ingestion pipelines using Python and Google Cloud.

Projects

Deputies project diagram

Brazilian Congress Deputies Data Pipeline

Complete data pipeline focused on data engineering, with automated ingestion of public information from all federal deputies. The data includes biography, mandates, expenses and parliamentary activity, extracted via official API. The architecture implements a modern ELT approach with Snowflake and dbt . Daily incremental ingestion is orchestrated with Airflow and stored in S3 in Parquet format. Transformations follow robust dimensional modeling patterns (SCD Type 2). The project ensures end-to-end scalability, automation and data quality.

Tools used

  • Python, Pandas and requests
  • Apache Airflow
  • Amazon S3 and SQS
  • Snowflake, Snowpipe
  • dbt Core
  • Streamlit and Jupyter notebook
Olist project diagram

Data Lakehouse: Olist

This project used the Databricks Data Lakehouse architecture to manage data in layers (Raw, Bronze, Silver, and Gold) and simulate ingestion scenarios with CDC (Change Data Capture). The data, from a Kaggle dataset, was enriched to create a complete pipeline, from ingestion to business analysis. I implemented data governance with Unity Catalog, orchestration with Databricks Workflows, and continuous integration via GitHub Actions. The project consolidated skills in data pipelines, automation, and analysis with the Medallion architecture, optimizing the use of data for insights and analytical applications.

Tools used

  • Pandas
  • Git, GitHub, GitHub Actions
  • Azure Blob Storage, Parquet
  • Databricks, UnityCatalog
  • Spark, Delta Lake
  • Databricks Workflows
Diagram of the OCR project

Chatbot with GPT-4 and Azure

In this project, I explored Azure Artificial Intelligence tools to build a chatbot specialized in Azure using GPT-4. I copied the data from the Azure documentation on GitHub to the Storage Account, used Azure AI Search to perform embedding and indexing of the content, and Azure OpenAI to build the chatbot in an App on Azure. The goal is to provide accurate and contextualized answers about Azure services and functionalities.

Tools used

  • Python
  • Azure Blob Storage
  • Azure AI Search
  • Azure OpenAI
  • Git, GitHub
  • Bicep template (IaC)
Diagram of the OCR project

IN PROGRESS: Telegram Bot: Text Recognition (Computer Vision)

In this project, I explored Azure Artificial Intelligence tools for optical character recognition (OCR), such as Azure Computer Vision and Azure AI Document Intelligence. I used Python to develop a Telegram bot that processes images sent by the user, returning the extracted text and the confidence interval for each recognized word. I implemented dynamic settings in the bot, allowing adjustment of parameters such as the minimum confidence level to accept words and the application of pre-processing. This project demonstrates skills in API integration, image processing and the creation of interactive interfaces with bots.

Tools used

  • Python
  • Telegram API
  • Azure Computer Vision
  • Azure AI Document Intelligence
  • Git, GitHub
  • Bicep template (IaC)

Older Projects (2021 - 2022)

Stone Data Challenge 2022

I was a semifinalist in the Stone Data Challenge 2022. In this Stone challenge, my task was to use historical data from a loan program from 2019 to April 2022 from 14,700 clients. The business problem was related to contacting clients who were behind on payments. The question to be answered was: What is the ideal curve of times we should contact a client? To answer it, I used Python and Power BI to answer the question with data analysis.

Tools used

  • Git, GitHub, LSF Git files
  • Python, Pandas, Seaborn, Plotly
  • Power BI

Sales Prediction

I used Python to create a Machine Learning model to predict the sales of each of the 3,000 registered stores in the next 6 weeks. The model was put into production and can be requested via API by Telegram, just needing internet access to use it. The model had a 90% prediction of the real value, allowing the CFO to make decisions based on the future revenue of each store unit and thus be able to make investments without losses.

Tools used

  • Git, GitHub
  • Python, Pandas, Seaborn, Boruta
  • Scikit-Learn and Scipy
  • Flask
  • Heroku Cloud
  • Telegram API

Classification of customers most likely to buy

I used Python to create a Machine Learning model to rank the customers most likely to purchase a new product (cross-sell strategy). With an accuracy of 33.5% for the top 20,000 customers in the database, the sales team is able to reach interested parties with much less cost.

Tools used

  • Git, GitHub
  • Python, Pandas, Seaborn, Extra Tree Classifier
  • Scikit-Learn, Scipy and Scikit-Plot
  • Flask
  • Heroku Cloud
  • Google Sheets API with Google Scripts

Customer loyalty with clustering

I used Python to create a Machine Learning model to find the "Insiders," the best customers of the company. The objective of this project was to group customers with similar behaviors so that the business team can build personalized actions, based on the characteristics of each cluster.

Tools used

  • Git, GitHub
  • Python, Pandas, Seaborn, GMM
  • Scikit-Learn, Scipy and Yellowbrick
  • SQLite
  • Metabase
  • Papermill
  • Learn more
  • House Rocket Data Analysis

    I used Python and Power BI to perform exploratory data analysis and thus confirmed or not some hypotheses about the business, resulting in insights for better business performance. The analysis aimed to increase the revenue of the fictitious company, House Rocket, which works with the buying and selling of real estate, finding the best times to buy or sell the property.

    Tools used

    • Git, GitHub
    • Python, Pandas, Seaborn, Plotly
    • Geopy API
    • Power BI
    • SQLite

    Contact