Name: Divyanshu Mittal

Current Job: Data Scientist in Adastra

Studies: Master's in AI & ML

Institute: Indian Institute of Information Technology, Lucknow

Skills

PYTHON 90%
SQL 85%
Amazon Web Services (AWS) 70%
Large Language Models (LLMs) 60%
Machine Learning 90%

About

About Me

My name is Divyanshu Mittal, I am a Junior Data Scientist at Adastra working on building scalable AI and data solutions using cloud and Generative AI technologies. My work primarily focuses on developing data pipelines, AI-driven workflows, and GenAI applications to process and analyze large-scale datasets. At Adastra, I work extensively with AWS services such as Athena, Glue, S3, and Bedrock, where I help build and deploy AI pipelines for large patent datasets. My work includes extracting and transforming data from sources like the USPTO API and Google BigQuery, converting complex JSON structures into analytics-ready formats for AWS Athena, and developing scalable pipelines for data processing and dashboard analytics. I have also worked on OCR pipeline optimization to improve extraction of text, tables, and images from documents, as well as prompt engineering and testing of Bedrock models for Generative AI applications. Previously, I worked as a Data Engineer Intern at Heart It Out, where I built automated ETL pipelines using n8n and Make, managed data workflows, and handled data ingestion into PostgreSQL and MySQL databases. My work involved pipeline debugging, workflow automation, and data monitoring to ensure reliable and efficient data processing. I am particularly interested in Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), chatbots, and AI pipelines, and I enjoy exploring how these technologies can be applied to build practical and scalable systems. Outside of work, I enjoy playing table tennis and badminton, and I like reading books and continuously learning about new developments in AI, data engineering, and cloud technologies.

  • Profile: Data Scientist, Gen AI Engineer& Data Engineer
  • Domain: AI & ML, Data Science & Statistics
  • Education: Master of Science in AI & ML
  • Language: English, Hindi
  • Programming Skills: Java, Python, SQL
  • Tools & Frameworks: AWS, Terraform, CI/CD Pipelines, Agentic AI, LangGraph, RAG, LLMs, Langchain, Git/GitHub
  • Interest: Traveling, Exploring, Badminton, Books

0 +   Projects completed

Kaggle

Resume

Resume

AI and Data Science professional currently working as a Junior Data Scientist at Adastra, with a strong background in Artificial Intelligence, Machine Learning, Statistics and Mathematics. Experienced in building scalable data pipelines, ML models and Generative AI solutions using Python, SQL, AWS and modern ML frameworks. Passionate about developing practical AI systems involving LLMs, RAG pipelines and advanced models such as Claude, Gemini, GPT and Groq to solve real-world problems.

Experience


Dec 2024 - Present

Junior Data Scientist

Adastra

Adastra, a company dedicated for providing data driven solutions and consultation.

  • Designed and implemented a scalable Retrieval-Augmented Generation (RAG) pipeline by indexing hierarchical patent documents into OpenSearch with hybrid search (BM25 + vector KNN using FAISS/HNSW), enabling context-aware LLM responses through AWS Bedrock and improving semantic search relevance.
  • Implemented infrastructure on AWS including Glue Crawlers, Athena, Terraform and storage pipelines to support scalable data querying and dashboard KPIs.
  • Built automated data pipelines by extracting patent datasets from the USPTO API and Google BigQuery, transforming JSON data into AWS Athena-compatible formats and enabling large-scale analytics.
  • Optimized OCR pipelines by 60% speed latency to improve extraction accuracy of text, tables, and images from structured and unstructured documents.
  • Developed GenAI features using prompt engineering and AWS Bedrock models, improving chatbot response quality and logging feedback for iterative model tuning.
  • Lead and presented in client meetings and client discussions.

Aug 2024 - Oct 2024

Data Engineer Intern

Heart It Out

Heart It Out, a company dedicated for making mental healthcare accessible to all over the world.

  • Developed automated ETL pipelines using n8n and Make to collect, transform and load data from multiple sources into PostgreSQL.
  • Designed and managed data workflows for reliable data ingestion, improving data availability for analytics and internal tools.
  • Implemented workflow automation and monitoring to streamline data processing and reduce manual intervention.

June 2024 - July 2024

Generative AI Intern

Logiciel Analytics

Logiciel Analytics, a company dedicated for making the real world services using AI.

  • Worked with AI agents and RAG to fine-tune the LLM model to make an accounting chatbot.
  • Worked with the MongoDB databases.
  • Performed Json data into tabular format.



Education


2023-2025

Master of Science

(AI & ML)

Institute of Information Technology, Lucknow

CGPA: 9.06

2020-2023

Bachelor of Science

(Statistics, Mathematics and Physics)

BSA College, Mathura

CGPA: 7.5

Projects

Projects

Below are the AI, Data Science and Data Analytics projects including LLMs (Llama-3, Gemini-1.5-Flash), Machine Learning, ARIMA, NLP, Excel, SQL and Power BI. 🌟

TripGenie AI

Developed an AI-powered Multi-Agent Trip Planner using LangGraph, Gemini API, Tavily, Wikipedia to generate personalized itineraries with real-time flights, hotels, weather and activities; integrated Streamlit UI, PDF export, for end-to-end trip planning

SmartCart Recommender

Developed a personalized shopping web app using Flask and Machine Learning with content-based, collaborative and hybrid recommendation models.

CodeLens AI

Designed a Flask-based code analysis app, integrating logic flow with LangChain to detect inefficiencies and style violations; applied few-shot chain-of-thought prompting, improving developer review speed by 40%.

UltraBot

Using the power of LangChain, Llama-3, Gemini-1.5-Flash, developed a chatbot which can perform text to text and image to text tasks with many customization options also.

ChefBotπŸ‘¨β€πŸ³

Using the power of Groq low latency and the knowledge of Llama, built a user-centric Streamlit application for recipe generator for any type of dish.

Satellite Position Forecasting

Using the power of LSTM and ARIMA techniques, forecast the position of a satellite to reduce the risk of collision of LEO satellites.

Text Emotion Detection

Using the knowledge of NLP and Neattext, built a user-centric Streamlit application to predict the sentiment of any text.

Pizza Sales Analysis Using SQL

Analysed the data of the pizza sales and solved the SQL queries related to it, to develop a portfolio project.

Zomato Sales Dashboard Using Power BI

Analysed the data of the zomato sales and using the Power BI, developed a dashboard for representation and understanding of the data.

Coffee Shop Sales DashBoard Using Excel

Analysed the data of a coffee shop sales and using the Excel, developed a dashboard for representation and understanding of the data.


More projects on Github

I thrive on unraveling AI and data science complexities, delving into LLM and GAN advancements to unearth profound data narratives.


GitHub

Contact

Contact Me

Below are the details to reach out to me!

Address

Mathura, India

Contact Number

+91-8272873098

Download Resume

resumelink