Browse 186 ML and AI students looking for summer 2026 roles. Select who you like, and we'll make the intro.
Our highest (AI) rated candidates this cohort. Standout profiles you'll want to see first.
“IIT Madras AI researcher | Built Multi-HyDE RAG system that cut hallucinations 15% on financial 10-Ks → EMNLP '25”
One of the hardest technical problems I worked on was making RAG truly trustworthy for financial QA over SEC 10-K filings for the Inter-IIT Tech Meet 13.0 for Pathway, which eventually became our EMNLP 2025 FinNLP paper. The core difficulty was that vanilla dense retrieval or single-step HyDE would miss or confuse critical sections that looked semantically similar but differed in key numbers or years, especially in long multi-year reports and tables. That meant the model either answered with incomplete evidence or hallucinated details, which is unacceptable in finance. My approach was to redesign the retrieval and reasoning stack end-to-end. First, we extended HyDE into Multi-HyDE, where for each question we generate multiple non-equivalent hypothetical answers and queries, and then combine their embeddings with a hybrid BM25 retriever specialized for both text and table segments in 10-Ks. This significantly improved coverage and helped disambiguate sections that differ only in subtle numerical or temporal details. Second, we wrapped retrieval inside an agentic workflow: a central LLM agent decomposes complex questions into smaller steps, calls Multi-HyDE, keyword search, and table-specific tools as needed, and maintains a unified state so it can iteratively pull more evidence instead of guessing. We also enforced grounding by requiring all final answers to be explicitly backed by retrieved spans, which reduced the model's tendency to fall back on latent knowledge. Finally, we validated the system on standard financial QA benchmarks and with human evaluation. Compared to strong RAG baselines using single-step retrieval, our pipeline improved accuracy by about 11.2% and cut hallucinations by roughly 15%, at similar token cost to HyDE. For me, the key learning was that fixing "hard" hallucination problems in high-stakes domains often means rethinking retrieval and agent orchestration, not just swapping in a bigger model. We containerized the system with Docker, exposed it via FastAPI, and deployed on Azure so judges could hit a live endpoint and test arbitrary queries.
“CS @ Toronto | Built multi-objective energy grid optimizer with JIT+Ray achieving 10× speedup & 2 lead-author papers”
The hardest technical problem I faced was turning a vague research goal ("use Python to model and optimize an energy grid" plus a list of technologies) into a full, research-grade optimization framework. The problem was beyond a typical ML task: I had to model multiple physical energy systems from first principles (wind, PV, SMRs, geothermal, batteries), define engineering simulation with economics, lifecycle emissions, and a social-impact metric, and then solve a constrained multi-objective optimization problem over a huge nonlinear design space. Each candidate solution required expensive simulation and constraint checks, so the implementation would take minutes per evaluation, making evolutionary optimization effectively infeasible. The core challenge became: how do I evaluate complex grid designs fast enough to optimize them reliably? I tackled it in layers. First, I built stable, composable models for each technology and objective (LCOE, emissions intensity, social disagreement). Next, I designed a two-stage optimization strategy: Differential Evolution to tailor technology parameters to site conditions, then NSGA-II to optimize the full grid mix under feasibility constraints. The biggest wall was performance, so I treated the simulator like an HPC system: JIT + vectorization + parallel evaluation with Ray + caching of expensive intermediate computations, which delivered a 10× speedup and made the search tractable. Finally, to improve convergence quality, I added a Bayesian warm-start seeding module (BoTorch/GPyTorch) that boosted Pareto hypervolume by 23%. The outcome was a modular framework strong enough to produce real-world results (e.g., ~50% lower cost and >90% lower emissions vs. diesel baselines in our case study), and it directly led to two peer-reviewed publications with me as lead author: an accepted conference paper and an under-review journal paper.
“WPI | AI/ML engineer building FDA-compliant test automation with VLMs and RAG”
The most challenging technical problem I faced was building a reliable AI-driven test automation system that could handle the unpredictable nature of web UIs while meeting FDA compliance requirements for pharmaceutical clients. Traditional test automation breaks constantly—a button moves, a class name changes, and suddenly your entire test suite fails. We needed something that could "see" and understand the UI the way a human does, not just match XPath selectors. But here's the catch: in regulated environments, you can't have a black box making decisions. Every action needs to be explainable and auditable. How I Solved It: I architected a system combining Vision-Language Models with a RAG pipeline: The perception layer used VLMs to understand UI state semantically—identifying elements by what they are ("the submit button," "the patient ID field") rather than brittle selectors. The knowledge layer was a RAG system that grounded every decision in documented test procedures. When the AI decided to click something, it could trace that decision back to specific validation requirements. The infrastructure challenge was real—GPU inference at scale isn't cheap. I designed the pipeline to batch intelligently and cache embeddings, getting inference costs down to a level that made sense for continuous validation. The result was a 90% reduction in manual validation effort while maintaining full audit trails for FDA compliance. Why it was hard: It wasn't just an ML problem or just an infrastructure problem—it was both, plus navigating regulatory constraints that most AI systems don't have to consider. The solution required thinking across the full stack.
CS @ Mizzou | Trained ML model to stabilize droplet formation, cutting printer setup from days to hours
Most difficult technical problem I faced was stabilizing droplet formation on the Dimatix printer for a new ionic-liquid ink. I had no starting parameters, so I designed a small experiment grid over voltage, frequency, meniscus, and temperature, logged high-speed videos for each point, and manually labeled droplet outcomes. Using that data I trained a simple decision-tree model that predicted good settings and removed most of the trial-and-error, cutting setup time from days to about an afternoon.
UC Berkeley hardware engineer | Rebuilt broken space payload in 2 weeks using breadboards, BART trains & speaker vibration tests
Last August, I was on a short deadline—just a couple of weeks from launch, one from when we ship our payload—when our PCBs turned out to be broken, with our copper connections not having continuity, making them useless. In addition, we didn't have sufficient electromagnetic interference (EMI) testing. More bad news was to follow: our servos couldn't push our injectors into our payload, and it would take too long to ship a new one. All seemed lost, but I made up my mind to do the best we possibly could. I redid the entire circuit on breadboards, modifying the circuit to use fewer components; a blessing in disguise, this made it a lot easier to certify EMI since we were using off-the-shelf parts now. We used a different lubricant we found in Ace Hardware, which let our existing motors work. We repurposed huge, unused speakers in the physics building storeroom to do our vibration testing and took the payload with us on the BART for acceleration testing. Somehow, piece by piece, we managed to complete the payload in time. It taught me that sometimes the perfect solution isn't available, but creativity and persistence can make up for a lot.
Advertise your open role here. Only 1 ad slot available.
IIT Madras AI researcher | Built Multi-HyDE RAG system that cut hallucinations 15% on financial 10-Ks → EMNLP '25
CS @ Saint Louis University | Built AI agents analyzing drone crashes for 80+ researchers and real-time voice coaching for 60+ runners
Two problems, both are live in production now. 1) Flight Assistant (flight-assistant.app): Drone crash analysis used to take days. Analysts manually dig through 200+ CSV files trying to correlate sensor data. I built an AI agent that understands flight logs and lets you ask things like "correlate motor output with voltage drops" in plain English. The hard part was building tools that actually understand flight data, not just an LLM wrapper. 80+ researchers use it now. 2) Stride AI (App Store): Real-time voice coaching that adapts to your body. The tricky part was coordinating live heart rate/pace data with an LLM while keeping latency low enough that coaching feels responsive, not delayed. Built native iOS/Android integrations, WebSocket pipelines, and a multi-agent system. 60+ runners actively using it. Both ship to real users. That's what I care about. Quick walkthrough: https://www.loom.com/share/ce03372ed2f6483987d1fe3284f87617
LibreOffice GSoC engineer | Built safe introspection for 10k+ UNO APIs without breaking BASIC runtime
One of the hardest problems I've worked on was during my LibreOffice GSoC project on the BASIC IDE Object Browser. The goal was to introspect and expose a huge symbol surface: tens of thousands of UNO APIs, application macros, and user document symbols without breaking the BASIC runtime or freezing the IDE. The problem wasn't just performance; it was safety. Some ways of reading module code could trigger library loading, side effects, or even modify runtime state while merely "looking." The first versions worked, but they were fragile. I kept seeing strange runtime behavior and regressions that didn't map cleanly to my changes. That's when I realized the issue wasn't a bug — it was the approach. I stepped back and traced how BASIC modules are loaded internally, how parsing is tied to execution, and where I could safely read code without triggering runtime effects. That led me to design a safe parsing path: reading raw module source directly from files and adding a bSafeParsing mode that explicitly prevents side effects during analysis. 2024 GSoC work: https://devanshvarshney.com/libreoffice-google-summer-of-code-final-report 2025 GSoC work: https://devanshvarshney.com/libreoffice-google-summer-of-code-final-report-basic-ide
UT Knoxville AI researcher | Built BCI music generator with adaptive harmonizer that transforms EEG signals into real-time jazz
When I built my BCI real-time music generator project, the most significant challenge was bridging the gap between noisy, real-time EEG probabilities and the strict mathematical constraints of functional jazz harmony. A simple direct mapping failed because static harmonic weights could not accommodate the full dynamic range of the bio-signal, often resulting in progressions that felt either musically incoherent or unresponsive. I solved this by engineering an adaptive tension harmonizer that implements a proportional feedback control loop. This system continuously calculates the error between the user's target mental state (derived from the BCI classifier) and the realized harmonic tension, dynamically auto-tuning the coefficients for chord quality, extension complexity, and circle-of-fifths distance in real-time. This transformed the project from a random chord generator into a cohesive instrument that maintains smooth voice leading while fluidly navigating complex dissonance based on live neural feedback.
CS @ UT Dallas | Built Multi-Agent AI Research Engine with custom queuing to run 100% of queries to completion
Situation: While building a Multi-Agent AI Research Engine, I hit a major roadblock where the system would frequently crash with 429 Resource Exhausted errors and hit API quota limits during complex, multi-step research tasks. Task: I needed to ensure the agents could complete long-running research workflows without being throttled or losing progress mid-task. Action: I implemented a custom request-queuing mechanism with exponential backoff. I also refined the model selection logic—using lighter models for basic reasoning tasks and reserving the high-tier models only for final synthesis. Additionally, I debugged architectural issues where missing dependencies in the execution environment were causing silent failures in the agentic loops. Result: This stabilized the engine, allowing it to run 100% of multi-stage research queries to completion without manual restarts, and optimized cost by reducing unnecessary high-tier API calls.
MBZUAI AI/ML researcher who debugged unstable multi-GPU training runs and scaled models reliably
One of the hardest problems I dealt with was getting a large model training run to behave reliably once we scaled it up. On a single machine, everything looked normal, but as soon as we moved to multiple GPUs, the run became unpredictable. Sometimes it would diverge, sometimes two runs with the same setup would end up with noticeably different results, and occasionally it would crash after running for hours, which was painful because it wasted a lot of compute. I stopped treating it like a hyperparameter tuning issue and approached it like a debugging problem. First, I improved the monitoring so I could see what was happening right before things went wrong. I tracked how quickly the model was changing, whether gradients were spiking, how the loss and other signals behaved over time, and whether certain data batches were consistently involved when failures happened. After that, I ran a series of small controlled experiments where I changed only one variable at a time, like the batch setup, the update size, and how strongly we regularized the updates. That process helped narrow it down to two main causes. The updates were occasionally too aggressive early in training, and there was also a mismatch between how we generated training data and how the distributed training loop consumed it, which created subtle instability. Once we fixed those and added a few safeguards, the training runs became stable and repeatable. We could finish jobs consistently and we saw clear improvements in model quality.
U of Toronto robotics researcher who debugged a VLA blindly copying proprioceptive state instead of learning to move
When training a VLA to do common household tasks in sim for the BEHAVIOR-1k benchmark, we spent 3 weeks debugging a model that was scoring incredibly well on loss metrics but at inference would do absolutely nothing but stare at the wall. We were puzzled for a long time. Thankfully, my background is in hardware, so it wasn't the first time I'd been faced with a seemingly bottomless issue, so I knew the deal. Not to drone on, the fix came when I decided to stop pouring through code and just sit down for a few days and analyze the rollout data. What was the robot doing? What was it trying to do? What was it doing during training? Eventually we ran a test where the robot would spend 10 seconds just performing ground truth actions. Then, we'd drop the policy in and let it take over. Funny enough, it fixed the issue. Why did it work? Through some more conjecture we realized the robot had been, well, cheating. At training it was taking 3 inputs: camera, language commands, and proprioceptive state. It had learned a cool trick to game loss — if I just continue the CURRENT trajectory, copying the last proprioceptive movement, I'll get it right most of the time. This was all well and good when movement was guided by ground truth (training time), but during rollout it had NO idea how to START moving. It just stood there, copying the previous proprioceptive vector delta (~0), staring at the wall. The solution was to hide the proprioceptive state from the model some percentage of the time (on a learning schedule) during training. Magically, it started moving.
AI/ML researcher who cracked ViT training on small data, hitting 93.4% on CIFAR-10 with CNN backbones in 50 epochs
Figuring out how to train ViTs on small data like CIFAR-10 as they're known to be data hungry. I first identified the main bottleneck in the architectural design of ViTs and kept adding incremental modifications to improve accuracy until I concluded that ViTs lack inductive bias which can be found in CNNs, so I replaced the raw patching with a CNN backbone as well as augmenting the data to superficially add as much variance as possible and mimic data abundance. The end result was achieving Pareto Frontier; meaning: the lightest ViT trained from scratch to achieve ~93.4 top-1 accuracy on CIFAR-10 over only 50 epochs. Project can be found here: https://github.com/Brokttv/Vit-on-small-data
CS @ UW | Built a production-grade BPE tokenizer from scratch, optimized 11GB datasets with heap tuning
These days, I am really excited to be working on building a language model from scratch (motivated by Stanford's CS 336 course). I started by building a Tokenizer in Python from scratch using the Byte-Pair Encoding algorithm. It was awesome! I wrote a blog about it. Blog: https://vitthal-bhandari.github.io/blogs/experiments-with-tokenization.html Code: https://github.com/vitthal-bhandari/cs-336-assignment1-llms/tree/main I think the best part of writing something from scratch without using Claude Code/Cursor is the serendipity. Halfway through, I realized why I love coding. The findings that stuck most with me were: > Data structure optimization helps—until it doesn't. Heap-based selection is great when the heap stays "clean". On the Open Web Text dataset (11.92 GB), my heap exploded due to stale entries, and the algorithm slowed down (it took 3 hours to tokenize without heap and 6 hours with heap) > On smaller datasets (< 1 GB), pre-tokenization is the bottleneck, while on larger datasets (> 1 GB), merging is the bottleneck > Multiprocessing is a win (with guardrails) – Pre-tokenization parallelizes cleanly. But "max workers" is not the goal; "max throughput without memory death" is. > Surprisingly, cold cache gave a highly skewed approximation of the total tokenization time. I had to take an average of 2-5 runs to get a better idea! > Tokenization is a function of corpus size, algorithmic complexity, parallelization, and compute
UW full-stack engineer who cut automation workflow creation from 2 hours to 5 minutes with AI at UBS
The hardest problem I tackled was building AutoFlow at UBS, an AI system that converts natural language descriptions into executable automation workflows. I identified a real bottleneck: only a handful of people had the know-how and experience to create automations in our proprietary automation platform (built on top of Amelia), and they were spending 4+ hours per workflow. Engineers had to understand the exact syntax, map out all the logic branches, and test everything. It was tedious, error-prone, and blocking teams from automating their work. I wanted to make it conversational: just describe what you want automated, and the system builds it for you. The technical difficulty was multi-layered. First, I had to parse natural language that was often vague or ambiguous. Engineers would say things like "check if the server is healthy" without specifying what "healthy" means. I used GPT-4.1 for intent understanding, but raw LLM outputs weren't reliable enough for production code. Second, I needed to orchestrate multiple steps: understanding the request, breaking it into sub-tasks, generating the actual workflow code, and validating it wouldn't break anything. That's where LangGraph came in for multi-step orchestration. The breakthrough came when I stopped trying to make the AI perfect and instead built a feedback loop. The system would generate a workflow, show it to the engineer for validation, and learn from corrections. I also created a library of common patterns the AI could reference, which dramatically improved accuracy. The result? We cut down 95% of the time spent creating automations, from 2 hours to 5 minutes. It helped 10 teams ship automations they couldn't have built otherwise, and honestly, seeing engineers who used to dread the process actually excited to use AutoFlow made all the debugging worth it. The lesson I learned: sometimes the hardest technical problems aren't solved by making the technology more complex. They're solved by designing the right human-AI collaboration.
UMass Amherst | Built custom Spark parallelization for 8x faster ML training & DFS inference across 5M-node graphs at Morgan Stanley
At Morgan Stanley, I worked on a real-time ETA inferencing application that needed to handle queries across a graph with 5 million nodes simultaneously. The core challenge was training ML models for time series data—we had massive data volume combined with numerous custom datasets that each required individual handling. Training was taking far too long to be practical for production deployment. Each dataset had unique characteristics requiring custom preprocessing and feature engineering. We also needed real-time inference, so we couldn't just batch process everything offline. To solve this, I built custom Spark logic to parallelize the ML training across datasets and then aggregate the results, which gave us an 8x speedup. For the real-time inference piece, I implemented a custom DFS-based approach that could truncate at currently running jobs and parse values back up to the root node. This allowed us to get estimates without having to traverse the entire 5-million-node graph for every query, making real-time inference 25-30x faster.
NYU AI/ML engineer | Cut 55% latency on Bank of America's real-time trader data platform serving multiple LOBs
Bank of America: One of the most difficult technical problems I faced was owning a centralized data management platform integrating multiple Line-of-Business systems with different schemas, refresh cycles, and latency constraints, where traders required near real-time consistency. The system was experiencing high latency and unreliable downstream analytics, so I approached it by first instrumenting the ETL pipeline to identify bottlenecks, which revealed heavy serialization overhead and inefficient query patterns. I redesigned the pipelines to use incremental processing instead of full batch refreshes, optimized database indexing and joins, and introduced asynchronous processing for non-blocking tasks. I also added monitoring dashboards, alerting, and automated data validation checks to improve reliability and observability. This reduced system latency by about 55%, improved downstream analytics accuracy, and significantly reduced operational overhead, reinforcing my approach of measuring first and solving problems at the architecture level rather than just optimizing code. DemoDay AI: The hardest challenge was building a real-time voice-first AI feedback system that could process speech, generate investor-style feedback, maintain conversation context, and respond with low enough latency to feel interactive. I designed a FastAPI-based orchestration layer that handled streaming voice input, transcription, LLM feedback generation, and text-to-speech output, while storing session context efficiently to maintain conversational continuity. To reduce latency, I parallelized transcription and context retrieval, cached YC knowledge embeddings, and optimized container startup times. I also implemented structured prompt engineering using investor personas and added conversation summarization to control token usage while preserving context quality. This enabled a real-time conversational feedback experience that scaled across concurrent users and delivered meaningful investor-style responses, teaching me that real-time AI products are fundamentally distributed systems problems as much as they are ML problems.
CS @ UW | Reverse-engineered Xbox One controller protocol from USB packets to ship macOS driver
The hardest technical problem I've faced was reverse-engineering the Xbox One controller's USB protocol with zero documentation. I wanted to build a macOS driver for game streaming, but the controller wouldn't respond to anything: it just sent the same 64-byte packet no matter what buttons I pressed. There was no existing way to use these controllers on macOS, and the implementations I could find on GitHub were all broken. After days of debugging, I discovered it needed a specific 5-byte initialization handshake before it would send real input data. I found it by capturing USB traffic from a working Linux driver and comparing packets. Once it started responding, I built a debug tool to map buttons to bytes. I'd press A, note which byte changed, press B, compare—slowly piecing together that buttons were bit flags, triggers were 16-bit integers, and analog sticks were signed values.
CS @ Bucknell | Rebuilt Prometric's export system with async queues to serve 1,000+ districts, eliminating 50+ weekly support tickets
The hardest technical problem I faced was building a reliable data export system for Prometric's EdPower platform that could generate large compliance reports for over 1,000 school districts without slowing down the production database. My first approach used a synchronous C#/.NET Core endpoint that ran heavy SQL Server queries directly, which caused timeouts and locking issues whenever multiple districts requested exports at the same time. To fix this, I redesigned the feature around background processing. I put each export request into a worker queue, had a dedicated service slowly ingest and process jobs in controlled batches, and tuned the queries and indexing so exports could complete without locking critical tables. The API now just validates the request, enqueues it, and returns a tracking ID, while the worker generates the file in the background and notifies the user when it is ready. This shift from synchronous queries to an asynchronous worker-queue architecture made the exports both scalable and safe for production traffic. It let districts self-serve their reports and eliminated over 50 recurring support tickets per week that used to come from failed or manual exports.
CS @ CCNY | Built Yamalverse from scratch—scraped, validated & monetized live soccer analytics for real users
One of the hardest technical problems I faced was finding and validating reliable data when I first started building Yamalverse, a soccer analytics website I built from scratch. Early on, there wasn't a single trusted source that had all the data I needed, and a lot of public soccer data online is incomplete, inconsistent, or poorly documented. I had to scrape data from multiple sources, each with different formats, naming conventions, and update frequencies. I used Python to build scraping and normalization scripts, then cross-checked the data across sources to catch discrepancies. When values conflicted, I prioritized the standards used by the most well-established and trusted data providers in football analytics, and I encoded those assumptions directly into the data pipeline so they were consistent and repeatable. I also added validation checks to flag outliers and missing fields before data ever reached the database. That forced me to treat data quality as a first-class problem, not something to fix later in the UI. What I'm most proud of is that this wasn't just a technical exercise. It was a project built around something I genuinely care about—soccer—and it ended up attracting real users and even generating my first dollar in revenue. That combination of personal interest, technical rigor, and real-world impact made it one of the most meaningful problems I've worked on.
Techno International New Town | Built multi-source research agent with cyclic graph architecture, cutting token waste 30% via stateful LLM orchestration
The Problem: While building my Multi-Source Research Agent, I faced state divergence. Orchestrating concurrent calls to Google, Bing, and Reddit caused the LLM to lose context because it couldn't reliably merge unstructured, disparate JSON schemas, leading to cyclic hallucination loops. The Solution: I re-architected the system into a stateful, cyclic graph using LangGraph. I implemented a global Pydantic state schema to enforce strict typing across nodes and built a "Review Node" to score context density. If the data threshold wasn't met, the graph triggered a recursive search refinement instead of proceeding to synthesis. The Result: This eliminated infinite loops and reduced token waste by 30%, resulting in a robust, multi-hop agent capable of handling high-latency research tasks with full state integrity.
CS @ Pimpri Chinchwad University | Built production LLM system with schema enforcement & retry logic to eliminate hallucinations at scale
During my remote AI engineering internship, the most difficult technical problem I faced was designing a reliable LLM-powered system that produced consistent, structured outputs despite highly variable user inputs. Early versions of the system suffered from hallucinations, inconsistent JSON outputs, and brittle prompt behavior—especially when handling edge cases or long contexts. Instead of adding ad-hoc fixes, I broke the problem down into model behavior, prompt design, and system constraints. I iteratively redesigned the prompt structure, enforced strict output schemas, added lightweight validation and retry logic, and introduced context chunking to control token usage. I also ran controlled experiments to isolate failure modes and adjusted parameters based on observed behavior rather than intuition. This approach significantly improved reliability and made the system production-usable. More importantly, it taught me how to treat LLMs as probabilistic systems that need engineering guardrails, not just APIs to call.
CS @ Ahmadu Bello University | Built robust autonomous driving pipeline that handled noisy sensors at Shell Competition
One of the most difficult technical problems I faced was achieving reliable autonomy in a simulated self-driving environment during the Shell Autonomous Programming Competition. The challenge wasn't just controlling the vehicle, but integrating perception, localization, and planning in a way that remained stable under noisy sensor data and changing conditions. I solved this by breaking the system into modular ROS nodes, validating each component independently in simulation, and iteratively tuning parameters using logged data. When the vehicle behaved unpredictably, I introduced better state estimation and fail-safe logic rather than overfitting control gains. This systematic, test-driven approach allowed me to move from brittle behavior to a robust autonomous pipeline.
MIT CSAIL | Built auto-tuning Lambda concurrency system to deploy AI agents in 5 seconds at scale
I was working on infrastructure for deploying AI agents or MCP servers within 5 seconds from the CLI, and I hit a wall where I was hitting my AWS account concurrency limits for Lambda queries and AWS was not updating my concurrency threshold. So I wrote a custom concurrency auto-tune cron job that auto-adjusts agent and MCP Lambda function concurrency allocation by checking their popularity among users (for public deployments) and also RPH (requests per hour) to auto-adjust their allocation, preventing throttling and bottlenecks for heavy traffic deployments. One of the most fun engineering problems I solved.
CS @ Huston-Tillotson | Built semantic-to-index translation layer making LLMs reliably edit Google Docs at scale
One of the most difficult technical problems I faced was making LLMs reliably edit Google Docs in my project Izzy Docs. The core issue: Google Docs API is purely index-based (character positions), but LLMs think semantically ("bold the Introduction section"). Every existing MCP server for Google Docs was broken because no one had bridged this gap properly. The problems compound quickly: The API has invisible paragraph boundaries. If you delete characters 89-178 and 178 happens to be a paragraph start, you silently delete the next section too. There's no "replace text" operation—only delete and insert, which must be sequenced correctly. Table cells have internal structure you can't see. Inserting at cell.startIndex corrupts the cell; you have to drill into the paragraph structure to find the actual insertion point. Google's errors are cryptic: "Invalid deleteContentRange: Index 178 must be less than the end index of the referenced segment, 178" tells you nothing actionable. How I solved it: Text→index mapping: I built a system that extracts all text with a segment map, so the LLM can say "bold Introduction" and I translate that to indices 45-57. Outline extraction with duplicate handling: I parse the document structure to identify all headings, track occurrence counts (for when "Methods" appears twice), and map semantic sections to character ranges. Deletion-safe boundaries: Every range calculation automatically subtracts 1 from the end index to avoid clipping into the next section's paragraph boundary. Error translation layer: I pattern-match Google's cryptic errors and return recovery paths—"try ending at 177 instead"—so the LLM can retry intelligently instead of failing. Format normalization: LLMs output table data inconsistently (2D arrays, 1D lists, CSV strings), so I normalize everything before touching the API. The lesson: the hard part of AI tooling isn't generation—it's building the reliability layer that makes AI outputs safe to apply to real systems.
CS & AI @ University of Plymouth | Debugged cancer classifier from data leakage to real-world generalization
The most difficult technical problem I've faced was a "too-good-to-be-true" model that collapsed in the real test. I was building a cancer subtype classifier using multi-omics data. My cross-validation metrics looked insane until I evaluated on a properly held-out split and the performance dropped close to random. That was a classic silent failure caused by data leakage + split mistakes. Here's how I solved it: Diagnosed leakage: I traced every pre-processing step and found I was doing things like scaling / PCA / feature selection on the full dataset before splitting. That lets information from the test fold bleed into training. Fixed the split logic: Some patients had multiple samples, so random splitting put related samples in both train and test. I switched to group-based splitting so all samples from the same patient stayed on one side. Made pre-processing leak-proof: I rebuilt everything using a strict pipeline so transformations were fit only on the training fold and then applied to validation/test. Validated honestly: I used nested CV for tuning and kept one final untouched hold-out set for the real score. The headline metric became lower, but the model became stable, reproducible, and actually generalizes, which is what matters in real ML work and what I wanted in the end.
CS @ University of Stuttgart | Built speaker-rate conditioning to control TTS duration across speakers in neural codec systems
The most difficult technical problem I faced was controlling duration in an autoregressive neural codec TTS system—getting the model to speak neither too fast nor too slow, and doing so reliably across different texts and speakers. To solve this, I designed a custom conditioning signal I call speaker rate, computed from the relationship between how much text is being spoken and how many acoustic (audio) tokens the model needs to generate for it. I embedded this speaker-rate signal and injected it into the model as an additional conditioning input—both into the attention mechanism and alongside other embeddings—so the decoder had an explicit, learnable handle on "how fast should this be spoken?" rather than relying on implicit timing cues. After integrating this into training, I was able to steer generation to produce more consistent speaking pace and directly control output length by adjusting the speaker-rate conditioning at inference time.
Duke ML researcher | Shipped real-time satellite tracker with <500μs latency using neuromorphic cameras at IISc
The most challenging technical problem I faced was during my time as a Machine Learning Research Intern at the Indian Institute of Science. I was working on an asynchronous satellite tracking algorithm using neuromorphic event camera data. The goal was to track satellites in real-time from events captured by event cameras attached to a telescope, with low latency, high accuracy, and optimized for edge performance. Initially, the problem didn't seem too difficult. I developed a simple clustering algorithm that grouped events into clusters representing stars or satellites, with centroids and velocities updated through a rolling window as events streamed in. I built a working prototype in Python and ran test cases, but to my surprise, it only worked 60% of the time. There were several issues: satellite trajectories would curve at certain points, throwing off my estimates; the latency was high because every event had to be compared to every other event to find the closest cluster; very sparse satellite trajectories were never captured; and the trajectory estimates were noisy and not smooth enough. I tackled each problem systematically. For curved trajectories, I modified the algorithm to calculate rolling updates over only the last 500 events instead of the entire history, allowing the estimates to adapt. I also tuned the hyperparameters to give more weight to recent events. To reduce computational complexity, I split the 2D grid into quadrants so events only needed to be compared within their local region. I used an Extended Kalman Filter to smooth out the trajectories, and I implemented dynamic hyperparameters based on cluster density to handle sparse trajectories. After these changes, the model performed really well, capturing almost all trajectories perfectly, with no parameters to learn. I thought I was done and just needed to convert the code to C for deployment on a Raspberry Pi. But when I did that, the latency was in the thousands of microseconds, which was way too high for real-time processing. That's when I realized the C implementation needed to be designed completely differently. I had to redesign the data structures, implement multithreading with separate load and process buffers running in parallel, simplify computations to avoid heavy math, and make strategic assumptions. This taught me that design is critical—if you invest time upfront in designing with hardware constraints in mind, everything else falls into place. I spent a few extra weeks redesigning the Python prototype with hardware efficiency as a priority, and when I converted it to C after that, it only took a few hours. Our final latency was well below 500 microseconds with an error rate lower than many baselines. This work was published at ICASSP 2026 and remains one of my proudest achievements.
Ridge HS → UIUC | Won hackathon by hacking Ray-Ban glasses into real-time poker assistant via OBS pipeline
The most difficult technical problem I faced was during a hackathon where our team decided to build a real-time poker assistant using Meta Ray-Ban smart glasses. The core challenge was that there was no available SDK or direct camera access, yet our concept depended on capturing a live visual feed to analyze the player's hand, the table state, and opponents' facial expressions, while also supporting parallel tasks like automated homework solving. To work around the lack of an SDK, I engineered an unconventional but effective pipeline: I created a virtual camera stream using OBS, routed the glasses' output through WhatsApp video, and then ingested that stream on our backend for processing. This allowed us to bypass hardware limitations and still perform computer vision analysis in near real-time. Midway through the hackathon, the team was reduced to just me on the development side, which meant I handled all system design, integration, debugging, and deployment under extreme time pressure. Despite this, I stabilized the pipeline, delivered a working demo, and ensured the system performed reliably enough to showcase the concept. The project ultimately won the hackathon, largely due to the technical workaround and execution under constraints.
CS @ Cal Poly Pomona | Wrote custom CUDA kernels to cut model parameters 99.5% — shipped to PyPI with 300+ installs
Standard PyTorch was too slow and memory-bound for the runtime layer patching I needed for a PEFT library. I bypassed high-level abstractions and wrote custom compiled CUDA kernels to scale singular values of weight matrices directly on the GPU. This reduced trainable parameters by 99.5% while maintaining full-tuning accuracy. I productionized the tool, shipped it to PyPI (EigenTune), and gained 300+ installs in the first month.
CS @ Carleton | Rebuilt Coinbase's market data service from 2000ms to 50ms p99 while learning the stack
At Coinbase, I had to optimize a market data service for institutional clients that was riddled with bugs and duplicated code (7000+ LOC). Despite not knowing the language it was built in, the architecture, or background on trading systems, I locked in and decreased p99 latencies from 2000ms to 50ms. I approached the problem with a systems design mindset, finding all of the breaking points and architecting a more optimal system, all while teaching myself the language and how trading systems work.
High schooler who merged 5 fine-tuned LoRAs into one model at 85% specialist accuracy and 5x lower cost
I fine-tuned 5 separate LoRAs on different subjects (math, history, science, English, coding) using practice problems and essay examples. Naive merging destroyed performance across all tasks. I built a weighted merge algorithm that computed cosine similarity between adapter weight matrices and combined them proportionally. Deployed on Modal with automatic task routing based on input classification. The single model hit 85% of specialist accuracy at 5x lower inference cost. Multi-task merging beats training one giant model every time.
CS @ Toronto | Built multi-objective energy grid optimizer with JIT+Ray achieving 10× speedup & 2 lead-author papers
CS @ UNM | Built COVID-19 simulator that reached 20k daily users in Bangladesh, shaping national policy
Problem: In March 2020, Bangladesh had no accessible COVID-19 modeling tools, leaving millions unable to understand transmission dynamics or policy impacts. Traditional academic models were locked behind paywalls or too technical for public consumption. Solution: I synthesized epidemiological research, adapted SIR/SEIR models, and built an interactive web-based simulator (https://alhridoy.github.io/bdcovid19/model.html) that let users explore intervention scenarios in real-time. I optimized for accessibility—ensuring it worked on low-bandwidth connections and mobile devices prevalent in Bangladesh. Result: 20,000 daily active visitors, coverage by national and international media, and it became a reference tool for policy discussions. More importantly, I demonstrated the ability to rapidly enter an unfamiliar domain (epidemiology), identify what excellence looks like, synthesize complex information, and deliver a product that reached exactly the right audience through the right channels.
CS @ Auburn | Published vision LLM research cited by OpenAI, Anthropic & DeepMind engineers
To better understand why popular vision-based LLMs kept failing in my projects, I designed a set of experiments that resulted in a published paper that has been cited by researchers from OpenAI, Anthropic, and DeepMind (https://vlmsareblind.github.io/). I enjoy getting into the weeds of hard technical problems to understand why systems break. Please don't hesitate to reach out! I would love to talk.
CS @ University of Indonesia | Won Kaggle silver by precomputing math functions to free RAM for deeper chess AI search
During the FIDE & Google Efficient Chess AI Kaggle Challenge, we got Stockfish Classic running under the constraints (5 MiB RAM, 64 KiB compressed size, single CPU). But we were stuck. We needed to free up more RAM so the engine could search deeper and play better. I found that a C math library was eating up a lot of memory at runtime. I realized we had space left in our 64 KiB binary size, but RAM was the real problem if we wanted the engine to play better. My solution was to precompute the math functions that were actually used and hardcode them into the binary instead of loading the library at runtime. This moved memory usage from RAM to storage. This freed up enough RAM for deeper search and helped us win a silver medal and rank 18th out of 1,127 teams.
Tomsk State University AI/ML engineer | Top 30 globally deploying INT8-quantized ensemble under 20MB for real-time market forecasting
Most difficult technical problem: The most difficult technical problem I faced was during the Wunder Fund RNN Challenge, where I had to deploy a machine learning model for high-frequency market state forecasting under extreme system constraints: single-core CPU execution and a strict <20MB memory limit, while still maintaining competitive predictive performance. How I solved it: Instead of relying on a single large model, I redesigned the solution from a systems perspective. I engineered a custom INT8 dynamic quantization pipeline in PyTorch, which reduced the model memory footprint by around 75%. This allowed me to deploy a 10-model ensemble within the resource budget. I also optimized the architecture itself by using lightweight GRU and LSTM variants, residual connections, and efficient activations to balance latency and accuracy. To avoid overfitting on small and noisy financial datasets, I implemented rigorous cross-validation and custom loss functions. As a result, the system achieved stable inference performance and earned a Top 30 global ranking in the competition.
NYU AI researcher | Published VISTA-CLIP at CVPR 2025 for continual segmentation, built $3.5M health startup
I developed VISTA-CLIP, a framework published at CVPR 2025 for continual panoptic segmentation that mitigates catastrophic forgetting without expanding the model backbone. By injecting semantic priors from a frozen CLIP text encoder into the transformer decoder and utilizing visual prompt tuning, the model adapts to novel classes while strictly preserving base knowledge. This work was recognized by top deep tech product companies like Qualcomm's 3D vision team and Waymo for autonomous driving systems, and demonstrates that language-grounded priors are critical for building scalable, lifelong learning systems that can adapt to dynamic environments without retraining from scratch. I have also built a startup previously valued at $3.5M, with a couple of US and India patents in the healthcare domain using on-edge AI computer vision models for patient rehabilitation and recovery in orthopedic and cardiac surgeries.
UC Berkeley researcher who cracked register decompilation by deriving novel recursive formulas for n-bit circuits
During a research project with UCSB, a thorough literature review had led me to two potential research objectives: decompiling finite state machines or decompiling memory elements. State machines were well-studied, familiar to me, and more tractable overall than memory elements, but the latter seemed more interesting, and I impulsively decided to pursue it after discussion with my mentor. It seemed like leveraging some existing work in equality saturation and condensing netlist subgraphs could be a good starting point. But after days of careful analysis, diligently pursuing each lead, we discovered what I least expected to find: Nothing. An absolute impasse. I felt lost and unprepared, left completely to my own devices. Remembering what had appealed to me about this topic, I hunkered down and dug deeper. I went back to a tangential source—my mentor's most recent paper, which had little to do with registers and memory blocks. I really liked the section on a formal mathematical statement of the problem, so I tried to mathematically characterize my research question. I latched onto a recursive sequence representation of registers in a counter circuit, using Boolean operators instead of algebraic ones. Scribbling away, I came up with the general recursive formula for an n-bit counter circuit. But wait a minute… The existence of this general formula—doesn't it mean that every register bit has a connection to all the less significant bits belonging to the same register? This mathematical idea, which I frantically sent to my mentor, paved the way for a successful discovery. An obsessive curiosity coupled with patience and calculated risk-taking can pay off!
BITS Pilani full-stack & AI/ML engineer | Cut LLM latency 6x to 250ms with KV-cache reuse & speculative decoding
The hardest technical problem I faced was reducing end-to-end latency for GLM-4.7-Flash (a 30B open-source LLM) to feel instantaneous in a real-time UI, similar to on-the-fly interface generation demos. The main challenge was that raw model inference time was only part of the delay; token streaming, attention memory bandwidth, and scheduling overhead dominated latency at small time scales. I profiled the entire inference path and applied a combination of aggressive KV-cache reuse, FlashAttention-based kernels, continuous batching with prefill/decoding separation, and speculative decoding using a smaller draft model. Although I couldn't reach the ~100 ms target, I reduced first-1000 tokens average latency to ~250 ms, which was a significant improvement (6-fold improvement!) and close to the perceptual threshold. This specific optimization reflects my deep understanding of LLM architecture.
VIT Vellore | Cut 30B LLM latency 6x to 250ms with speculative decoding & KV-cache optimization
CS @ Boston University | Built full games platform for Daily Free Press serving thousands with 2FA, leaderboards & live persistence
The most difficult technical problem I have faced during the past two semesters has been developing a full game platform for the Daily Free Press (DFP) at Boston University to replace their outdated puzzle distribution system. The DFP is an independent student newspaper dedicated to informing and connecting the BU community, and their puzzle offerings historically served as a fun complement to their reporting. However, their existing "platform" consisted only of crossword puzzles shared through one-time links, created and emailed out individually. Engagement was extremely low, not only because the URLs changed every time, but also because there was no centralized hub where students could access past puzzles or discover new ones. This also prevented the DFP from using puzzles as a way to drive more readers to their news content, a strategic opportunity they were missing. Wanting to apply my software engineering skills to a real organization while creating something meaningful for the BU community, I reached out to the DFP's executive board and offered to build a complete, modern games platform. The site would enable front-end puzzle creation, deletion, and publishing; authenticated and anonymous puzzle solving; persistent gameplay; leaderboards; BU-affiliated two-factor authentication; and intentional linking pathways back to DFP news articles. They immediately saw how this could centralize their games, improve workflow, increase student engagement, and strengthen visibility for their reporting. Beyond building a software product, my goal was to cultivate community, giving BU students a place to compete with friends, share puzzle solutions, and stay connected to the DFP's journalism. To build the platform, I used Python's Django web framework to structure the application according to the MVC pattern. Django's robust ORM and built-in security features made it ideal for handling authentication, database transactions, and administrative tools. On the front end, I used HTML, CSS, JavaScript, and eventually TypeScript to implement the puzzle interfaces, creation tools, and interactive features. The back-end data was stored in SQLite during development and PostgreSQL in production to handle the write-heavy nature of puzzle interactions. The platform is deployed through PythonAnywhere, which integrates well with Django and allows scalable access for BU students. While the final architecture looks cohesive, developing it required several major refactors. One of the most significant decisions was transitioning major portions of the front-end logic from JavaScript to TypeScript. As features grew more complex, particularly the crossword interaction system and the on-screen keyboard, I found that TypeScript's static typing and class-based structure allowed me to reduce redundancy and create cleaner abstractions. For example, the core crossword logic—consisting of cell objects, clue synchronization, entry validation, navigation, and keyboard events—needed to behave consistently across anonymous solvers, authenticated users, and administrators. By building these features in TypeScript, I created maintainable classes and interfaces that could be reused across different puzzle modes, greatly improving scalability. Another major challenge involved the system's performance under frequent writes. Every time a user typed a letter into a crossword cell, that keystroke had to be saved immediately to ensure persistence on refresh or device change. With SQLite, these writes caused noticeable lag. Migrating to PostgreSQL, which is optimized for concurrent transactions and heavy data writes, immediately solved the problem and made gameplay feel smooth and responsive. This decision reinforced the importance of choosing technologies based not only on simplicity, but on the behavioral patterns of actual users. Security was also a significant design concern, especially for the leaderboard. The DFP wanted competition, but only among verified BU students. I explored multiple authentication approaches, beginning with BU's Shibboleth Duo integration, consulting BU IT, reviewing OAuth-like workflows, and ultimately implementing a two-factor authentication system linked to BU email addresses. This solution ensured that only legitimate BU students could participate in leaderboard features while maintaining usability. I also added anti-cheating mechanisms, including server-side verification of completion times, encrypted frontend solutions, input patterns, and suspicious solving behaviors, to preserve fairness. Beyond the engineering itself, I was responsible for designing a cohesive user experience across more than a dozen pages: the landing page, puzzle creation interface, puzzle play pages, previews, admin dashboards, authentication workflows, and leaderboards. A major goal was to support the DFP's broader mission of building community and increasing visibility for their news content. To support this, I added smart linking features that direct players from puzzles and leaderboards to the DFP news website, encouraging exploration of campus news and creating a feedback loop between casual puzzle players and the newspaper's reporting. As engagement grows, these puzzles become a playful gateway to the DFP's journalism, helping the organization reach a broader audience
Penn State AI/ML engineer who shipped a production lead-gen system blending LLMs with guardrails for non-technical users
The hardest technical problem I faced was building a production lead-generation system during my first internship that non-technical users could interact with naturally, using tools I had never used before. Initially, the system was very technical: SQL-heavy, rigid filters, and brittle logic. It technically worked, but users had to think like engineers to get value out of it, which defeated the purpose. Around the same time, Snowflake released Cortex, and I saw an opportunity to let users describe what they wanted in natural language instead of navigating complex queries. The challenge was that I had no prior experience integrating LLMs into production systems, only tools used at a small scale like in classrooms, and early results were unreliable. Natural language queries were ambiguous, outputs were inconsistent, and in some cases the model confidently returned bad leads. To solve this, I treated the LLM as an assistant, not a source of truth. I constrained Cortex behind structured prompts, added deterministic filters, and built lightweight evaluation checks to catch obvious failures. I compared LLM-generated leads against rule-based baselines, manually reviewed edge cases, and iterated on prompts based on where the model felt wrong to users rather than just where it was technically incorrect. The result was a hybrid system: users could describe leads in plain English, but the backend enforced consistency, interpretability, and guardrails. That balance made the tool both powerful and trustworthy, and it significantly lowered the barrier for non-technical teammates to use it effectively.
CS @ Oklahoma Christian | Debugged 3D segmentation pipeline & containerized MLOps to hit 0.235 on Vesuvius leaderboard
While competing in the Vesuvius Surface Detection challenge, I faced repeated submission failures due to unstable 3D segmentation pipelines. The 3D TIFF data compression caused dependency conflicts in the inference environment, and I discovered critical bugs in my Test-Time Augmentation (TTA) logic where tensor strides were becoming negative during rotation, causing silent failures in PyTorch. I engineered a standardized, containerized inference workflow to match my local environment with the competition runner. I converted my PyTorch checkpoints to TorchScript to resolve the stride errors and optimize runtime. I also implemented a deterministic submission system with strict shape and type validation to catch artifacts before upload. This stabilized my pipeline, moving me from constant timeout errors to consistent, valid submissions. It allowed me to automate multi-GPU training experiments on my university's OSCER cluster via Slurm, establishing a reliable experimentation loop and achieving a verified public leaderboard score of 0.235.
Columbia AI researcher scaling RL in Minecraft to study emergent survival & long-horizon learning
Built a large-scale reinforcement learning system to explore whether complex survival behaviors can emerge from carefully shaped rewards in an open-world setting like Minecraft. Motivated by the challenge of training long-horizon agents in environments with sparse feedback, I reimplemented and extended the Phasic Policy Gradient (PPG) algorithm from OpenAI's Video PreTraining work to fine-tune foundation Minecraft models toward exploration and survival objectives. I engineered a multi-process, multi-threaded training pipeline with parallel environment orchestration, asynchronous rollout buffering, and a dedicated optimization thread that alternated PPO "wake" updates with PPG auxiliary phases for improved stability. To overcome catastrophic forgetting and retain prior competencies, I integrated KL-regularized policy updates while leveraging transfer learning from a diamond-pickaxe policy to accelerate adaptation. This framework produced agents that autonomously discovered new biomes and maintained basic survival strategies, yet it exposed a critical limitation: reinforcement learning, even on top of a pretrained model, could not reliably drive the discovery of entirely new, combinatorially rich mechanics or very long action sequences when the problem space is vast and rewards are extremely sparse, and the sheer compute required to meaningfully explore that space quickly became the limiting factor. Conducting this experiment was a great way for me to learn valuable engineering and research skills, from building scalable distributed training systems to designing reward functions that balance exploration and stability.
AI/ML researcher at CUNY building scalable pipelines that preserve experimental rigor
I independently refactored and abstracted a one-off multimodal data generation pipeline into a scalable system without breaking experimental validity. The hardest part wasn't performance but correctness: introducing abstractions (geometry, camera placement, difficulty axes) without changing the underlying data distribution used in earlier pilot human and model evaluations. I solved this by formalizing intermediate geometric computations, snapshotting configs, and building verification passes before scaling to new data generation. I was able to preserve model evaluation results within ~1% of the original pilot setup with the new pipeline.
Talladega College AI/ML engineer debugging medical imaging models in San Francisco
The most significant technical hurdle I encountered was during a deep learning research initiative where I was implementing a 3x3 factorial experiment to evaluate 3D medical image segmentation architectures, specifically comparing 3D U-Net, UNETR, and SegResNet on the BraTS and MSD Liver datasets. The critical failure occurred when the models consistently returned zero Dice scores during validation, effectively halting progress. The root cause was a subtle and persistent tensor dimension mismatch within the MONAI library's data transformation pipeline, which was difficult to trace because it didn't throw explicit runtime errors. I solved this by methodically debugging the entire data loading sequence, inspecting tensor shapes at each transformation step to pinpoint exactly where the spatial dimensions were being collapsed. Once I identified the mismatch in the loader's output, I refactored the preprocessing code to enforce correct dimensionality, which immediately resolved the scoring issue and allowed me to successfully benchmark the transformer-based models against standard CNNs.
UT Dallas AI/ML researcher building RL agents that learn to play games from pixels alone
The most difficult technical problem I faced was getting a reinforcement learning agent to interact with a real game environment where I didn't have access to the game's internal state. I was working on training an agent to play Geometry Dash, which meant I had to first extract meaningful state information directly from the screen in real time. This introduced major challenges around latency, accuracy, and stability, especially since I was running everything on a Mac without access to a GPU. I solved this by redesigning the system end-to-end: using YOLO to detect and classify objects, compressing those detections into structured state vectors, and offloading inference so it wouldn't block the game loop. I also introduced imitation learning before reinforcement learning so the agent had a stable starting policy. This experience taught me how to debug complex systems where model performance, infrastructure limits, and algorithm design all interact, and how small architectural decisions can completely determine whether a system is usable or not.
Shiv Nadar University full-stack dev building intelligent code navigation with graph algorithms and RAG
The most difficult technical problem I faced was making ChatGIT reliable on large, messy GitHub repos. On paper it was simple—parse code into ASTs, build embeddings, run PageRank on a call graph, and use RAG to answer questions. In reality, two things broke everything: parsing produced a noisy/incomplete graph, and naive PageRank made unimportant "utility" files look critical, so retrieval fed the LLM the wrong context. To fix this, I first hardened the parsing layer: Python used the ast module, other languages went through a dedicated parser that normalized paths, ignored build/test artifacts, and surfaced parse errors instead of silently skipping files. Then I redesigned the graph and PageRank: separate graphs for files/functions/modules, weighted edges (cross-module calls > internal ones, tests down-weighted), and sanity checks against known repos to see if "core" modules ranked correctly. Finally, I combined semantic similarity with normalized PageRank in the RAG layer and grouped snippets by file with paths and line numbers in the prompt. After these iterations, ChatGIT started pointing to the right files and functions with accurate locations, and the "Top Files/Functions" view matched what human maintainers considered important. I later reused the same idea of combining semantic relevance + structural importance when building JARVIS's meeting-prep pipeline for financial advisors.
UC San Diego CS | Built browser-based coding platform with shared filesystem using WebContainers
The hardest technical problem was giving my AI coding tutorial platform a real, interactive terminal backed by a shared, writable filesystem. This was so that users could run commands and see the same files the editor and tutorials used without things getting out of sync. I first built both the PTYs with node-pty and WebSockets for terminal I/O from scratch and custom logic to keep the terminal's view of the filesystem in sync with the editor and file explorer. We hit scaling limits, security concerns, and a lot of operational complexity keeping terminal cwd, file events, and persistence all aligned. We pivoted to WebContainers so the whole runtime (filesystem & shell) simply lives in the browser. I designed a small WebContainer service that boots once and is shared by the terminal and the file layer. The program spawns shells with proper TTY dimensions and pipes them to xterm.js, and the same WebContainer instance provides the in-browser filesystem that the terminal, editor, and dev server preview all share. That served as the single source of ground truth for the workspace in the client and removed server-side terminal handling. The backend now focuses on persistence and sync instead of running terminals and serving live file state.
Backend engineer at Georgia Tech building production-scale search systems with hybrid AI retrieval
One of the hardest problems I faced was building an open-text hotel search system at Flipkart that had to interpret natural language queries under strict latency and reliability constraints. Early embedding-only approaches had good relevance but were unstable at production scale. I solved this in three steps. First, I profiled the pipeline and redesigned the index to reduce the candidate set earlier. Second, I introduced a hybrid lexical + vector retrieval approach with lightweight re-ranking to balance relevance and efficiency. Third, I added caching, fallbacks, and timeout handling to keep the system reliable during traffic spikes. This made the system stable enough for production and allowed it to handle millions of queries daily.
CMU AI/ML engineer who slashed telco data pipeline from 4 days to 16 hours, unlocking $11M mobility analytics revenue
At Telkomsel (170M subscribers), our Mobility Data Pipeline processed terabytes of CDR data but took 4 days per run, making it viable only as a yearly product — a huge bottleneck for enterprise clients who needed monthly mobility insights. The root cause wasn't obvious. I profiled the pipeline on our Hadoop cluster and found three compounding issues: redundant full-table scans on a 2TB+ dataset, inefficient join operations that caused massive data shuffling across nodes, and poor partitioning that ignored actual query patterns. I led a team of 2 data engineers through a full re-architecture using PySpark on YARN. We replaced broad joins with broadcast joins for dimension tables under 100MB, redesigned partition keys to align with downstream access patterns, migrated repetitive transformations to Pandas UDF for vectorized execution, and switched from full reprocessing to incremental loads. The trickiest part was validating output consistency — telco mobility data has subtle edge cases with roaming subscribers and cell tower handoffs that could silently corrupt aggregations. Result: processing time dropped from 4 days to 16 hours (85% reduction), compute costs fell ~60%, and we upgraded the product from yearly to monthly delivery. This directly enabled our data monetization unit to sell mobility analytics to government and enterprise clients, contributing to $11M in annual revenue.
AI/ML engineer at IIT building low-latency real-time speech systems through first-principles thinking
I built and optimized a real-time multi-speaker diarization and ASR inference pipeline under strict latency constraints. The core challenge was not model accuracy but system behavior under real-time load. GPU inference, audio chunking, and decoding competed for resources. Naive batching increased end-to-end latency, while naive parallelism caused speaker boundary errors and temporal inconsistency in diarization. I reduced the problem to first principles: - Separated IO-bound audio ingestion from compute-bound GPU inference - Profiled GPU utilization, kernel launch overhead, and memory transfers - Redesigned the pipeline as asynchronous stages with bounded queues - Enforced synchronization only where temporal correctness was mathematically required This resulted in a stable, low-latency inference system that maintained diarization consistency while scaling efficiently on cloud GPUs. The key insight was that real-time ML failures are almost always architectural, not model-level.
JIIT Noida | Building graph-constrained AI reasoning systems that don't hallucinate in production
One of the hardest problems I faced was designing a multi-agent scientific reasoning system that produced traceable outputs rather than hallucinated summaries. Early versions of my project (SciNets) generated plausible hypotheses but lacked structural grounding and reproducibility. I solved this by redesigning the architecture around graph-constrained reasoning. I built concept graphs from literature, enforced structured causal chains, and added evaluation metrics like grounding stability and symbolic depth to monitor reasoning collapse. I also addressed production issues including cross-user state leakage and auth failures under load by restructuring session isolation and streaming pipelines. The result was a deployed system capable of generating inspectable hypotheses, mechanistic chains, and experiment suggestions rather than opaque text summaries.
Virginia Tech AI researcher building reliable multi-agent systems that reason over scientific evidence without hallucinating
The hardest technical problem I've worked on was designing an agentic ML system that could reason reliably over noisy, partially conflicting scientific evidence, rather than just generating fluent outputs. In practice, this meant building a multi-agent pipeline where different agents handled retrieval, evidence grounding, verification, and uncertainty estimation for biomedical questions. Early versions hallucinated or over-trusted weak evidence. I fixed this by introducing explicit evidence anchoring, quote-level verification, selective prediction (abstain when uncertain), and structured intermediate representations shared across agents. I iteratively stress-tested the system on adversarial cases, added failure-mode logging, and enforced constraints (e.g., every claim must be traceable to a source). The result was a more reliable agent that knew when not to answer, which mattered more than raw accuracy. This taught me that building agentic systems is less about clever prompts and more about interfaces, contracts between agents, and evaluation of reasoning behavior.
AI/ML researcher at Edinburgh engineering stable recursive architectures for uncertainty quantification
While working on recursive models for uncertainty quantification, I discovered a stability problem: naive training produced noisy, divergent learning curves that never converged to optimal performance (documented in my ICLR 2026 paper). To diagnose and solve this problem, I built a custom training framework from scratch with comprehensive logging of activations, gradients, weights, predictions, and metrics at each recursive depth. I then conducted extensive ablation studies across key hyperparameters: recursive block depth, total recursive depth, and truncated backpropagation windows. Through this, I identified a stable parameter window that enabled consistent convergence. The result was a general framework for converting standard architectures into recursive variants that achieve better performance, gain uncertainty quantification capabilities, and use approximately 50% fewer parameters. This work was accepted at ICLR 2026.
UCLA AI/ML engineer building scalable video processing systems for real-time sports analytics
In ShotVision, processing full tennis match videos (30+ minutes at 60fps) would take hours and crash the server due to memory constraints. I needed frame-by-frame pose estimation on 100,000+ frames while keeping the web app responsive. I implemented asynchronous video processing with a Flask worker queue system. Videos were chunked into 10-second segments, processed in parallel, then stitched back together. I added server-side caching for keypoint data and implemented a progress tracking system so users could see real-time updates. This reduced processing time from 4 hours to 15 minutes for a 30-minute video while keeping memory usage under 2GB.
ETS Montreal researcher engineering cost-efficient synthetic datasets for visual reasoning at scale
The most difficult technical problem I faced was automating a pipeline to generate a large-scale synthetic dataset for image-based reasoning under a strict budget constraint (≈$100). The objective was to produce thousands of images paired with reasoning-intensive questions, accurate answers, and reliable Chain-of-Thought annotations. To reduce costs, we avoided direct image generation and instead generated LaTeX and Python code, which we then compiled into images. This drastically lowered expenses, as text generation is significantly cheaper than image synthesis. We also allocated API credits strategically across different models, proportionally to their likelihood of producing non-compiling code, which helped maintain both cost efficiency and dataset quality. The full implementation is available here: https://github.com/AI-4-Everyone/Visual-TableQA-v2
WPI | AI/ML engineer building FDA-compliant test automation with VLMs and RAG
Full-stack dev building cross-platform jewelry management system | Solved concurrency nightmares and data integrity puzzles at scale
I'm building a Jewelry Management System from scratch, a fully cross-platform application using Flutter, Go, and SQL Server. The two most difficult challenges I faced were the Trial Balance TCP connection crisis and the Stone Stock Report balance discrepancy. The Trial Balance was a concurrency nightmare as my Go backend was spawning 80+ simultaneous database connections through worker goroutines, causing TCP timeouts on the remote SQL Server. After multiple iterations with semaphores and retry logic, I made the pragmatic call to strip out concurrency entirely, then tackled a SQL parameter limit issue on top of that. The Stock Report had a 0.55 carat difference between one day's closing and the next day's opening. I had to trace data flow across 7+ tables, fix sign convention inconsistencies, eliminate double counting from approval to purchase conversions, and completely rewrite the stored procedure's CTE structure.
UF researcher engineering smarter AI inference—prunes chain-of-thought sampling to cut cost without losing accuracy
For my NeurIPS paper, the hardest problem I solved was making Best of N chain-of-thought sampling much cheaper without losing most of the accuracy gains. The challenge was that early pruning is irreversible, and the signals you can observe during decoding (KL divergence, entropy, confidence) are noisy and can spike for reasons unrelated to correctness. I designed a progressive branch-and-prune algorithm that scores branches at each step, stabilizes the signals with windowed aggregation and smoothing, and prunes on a controlled schedule that preserves diversity early and commits later. I validated it on math reasoning benchmarks and measured both accuracy and compute savings.
AI/ML engineer at DTU | Built real-time perception systems for DARPA disaster response challenge
One of the most difficult technical problems I faced was during the DARPA Triage Challenge, where I led the perception pipeline for our autonomous system. We had to run multiple deep learning models (person detection, re-identification, tracking, and decision modules) in real-time on limited GPU hardware, while maintaining robustness in unpredictable disaster environments. The main challenge was scheduling and optimizing these models so they could operate concurrently without exceeding memory or latency constraints. Initially, naive parallel execution caused GPU memory spikes, unstable frame rates, and bottlenecks in downstream decision modules. To solve this, I redesigned the pipeline as a staged, asynchronous system. I prioritized critical inference paths, decoupled modules using message queues (ZeroMQ), and implemented dynamic batching and conditional execution (e.g., running heavier models only when required). I also optimized models using mixed precision, ONNX/TensorRT acceleration, and careful memory management to reduce redundant tensor allocations. This restructuring reduced latency significantly, stabilized GPU usage, and allowed the full perception–decision loop to run reliably under strict hardware constraints. It taught me how to think in terms of systems optimization, not just model accuracy.
Cornell AI/ML researcher who builds software by watching how people actually work, not how engineers think they do
The hardest technical challenge I faced wasn't a single algorithm, but turning messy, real-world operations into a CRM that non-technical staff could use confidently without training. Instead of guessing from requirements, I went onsite and worked directly alongside the actual users, observing how they processed leads and cases, where they hesitated, what steps they repeated every day, and which tasks were truly automatable versus needing human judgment. That discovery work let us redesign the entire UX logic around their natural workflow rather than our assumptions. We rebuilt the experience with workflow-first navigation, progressive disclosure, and opinionated defaults across core flows like pipelines, queue/claim, and public application intake with approval gates. On the backend, we enforced invariants—status transition rules, approvals, and audit logging—so that the UI could stay simple without risking incorrect states, and we kept the full stack coherent through typed API contracts, consistent validation, and predictable loading/error patterns. The outcome was higher adoption and fewer "how do I do X?" moments, because the product matched how the team actually works day-to-day.
Full-stack engineer building ML tools for rare disease diagnosis | Turning messy medical data into life-saving predictions
The hardest problem in the MEN2 Predictor project was the data. MEN2 is rare. There is no clean dataset online. I had to read hundreds of research papers with my teammate and manually extract data for 152 confirmed RET mutation carriers. Every paper reported things differently. Units were inconsistent. Some values were missing. Some cases were incomplete. The toughest issue was missing CEA values. CEA is an important biomarker along with calcitonin for predicting medullary thyroid cancer. Many papers reported calcitonin but not CEA. I did not want to drop those patients because the dataset was already small. So I used MICE, Multiple Imputation by Chained Equations, with Predictive Mean Matching. I used calcitonin and other available clinical features to estimate realistic CEA values. PMM helped because it does not just predict a number from a formula. It picks values from similar real patients. That kept the data grounded and reduced unrealistic imputation. After cleaning and imputing, the next challenge was model behavior. In medical screening, recall matters more than accuracy. Missing a cancer case is worse than over-flagging someone. Some early models had good accuracy but lower recall. That was not acceptable. I tuned XGBoost and SVM to prioritize sensitivity. On the real clinical dataset, both reached 100 percent recall in hold-out testing. That meant zero missed documented cancer cases. The biggest lesson was simple. Think about the real-world cost of mistakes first. Then design the model around that.
AI/ML Engineer at KSR College | Building multi-agent systems that turn subjective design critique into quantifiable, automated audits
The most difficult technical challenge I faced was building an AI agent that could autonomously audit subjective website design quality—evaluating whether CTAs are "effective," themes are "consistent," and layouts match a client's "vibe"—with the reliability and reproducibility of traditional automated testing tools. The core problem was that design evaluation is inherently qualitative, yet I needed quantitative, defensible output. To solve this, I architected a multi-agent orchestration system with three key innovations: first, I combined BFS web crawling with stateful browser automation (Playwright + browser-use) to systematically discover and analyze pages viewport-by-viewport, simulating real user scrolling behavior while capturing screenshot evidence at each step; second, I implemented a dual-model LLM pipeline where Gemini 2.5 Flash Lite extracts structured design intent from natural language (website_type, tone, audience, primary_goal), which then constrains a more powerful Gemini 2.0 Pro agent during live analysis to prevent hallucinations and ensure every finding aligns with the specified criteria; and third, I built a real-time event stream architecture using Flask-SocketIO that intercepts raw agent logs, parses them into semantic events (thoughts, actions, results), and streams them to the React frontend, creating a transparent audit trail where users watch the AI "think" through each design decision—ultimately turning subjective design critique into a scored, screenshot-backed, PDF-exportable report that scales automated design QA in a way no existing tool does.
AI researcher at Manipal using representation projection to fight hallucinations in LLMs
The most difficult technical problem I faced recently was during my independent research on hallucinations in large language models and methods to mitigate them. My initial experiments involved using activation steering on a small Qwen-1.7B model to shift its behavior from a hallucinatory response space toward honest refusal. However, these attempts consistently failed. After investigation, I realized that smaller models may not contain a cleanly separable "hallucination subspace," making targeted steering unreliable. I then considered selectively removing neurons that consistently activated during hallucinated outputs. This approach also proved unsuitable because of neuron polysemanticity in LLMs; individual neurons encode multiple overlapping behaviors, so pruning them risked degrading unrelated capabilities. Then, after digging through several arXiv papers, I came across a fascinating implementation which attached a projection matrix during the forward pass to selectively remove undesired directions in the hidden representation rather than deleting parameters. The method works by: Defining a retain set of behaviors and computing a projection matrix P over them. Applying PCA to obtain a basis W, followed by QR decomposition to produce an orthonormal matrix Q. During inference, subtracting the forbidden subspace from the hidden state: h_final = h_out − (h_out @ Q^T @ Q) This effectively erases the targeted representation directions while preserving the rest of the model's knowledge. When combined with light fine-tuning, you should end up with a model which has unlearned the nasty stuff. Since mathematically speaking, the projection operation is "irreversible," MRP (Metamorphosis Representation Projection) does a pretty good job in shaping model capabilities.
CS + AI/ML researcher at Obafemi Awolowo building systems from scratch—lossless compression suite in pure C++ with zero dependencies
I built a lossless compression benchmark suite from scratch in C++ using five algorithms, a CLI, a streaming API, and a full benchmark harness with zero external libraries. It's the project I'm most proud of because it forced me to go deep on bit-level data structures, algorithm tradeoffs, and systems-level performance work all in one codebase. I'd been working with C++ in my systems engineering role (order processing, memory management) and wanted to tackle something where the algorithms and the systems work were equally hard. Compression fit perfectly: the algorithms require real CS depth (entropy coding, dictionary methods, block framing), but making them fast requires systems thinking — cache-aware memory access, SIMD, threading, and careful benchmarking methodology. I also wanted something I could benchmark rigorously, not just "it works." I wanted to know *how well* it works, on what data, and why. The suite implements five compression algorithms, each written from scratch: - Huffman — canonical Huffman coding with frequency analysis and optimal prefix codes - LZ77 — sliding-window compression with bounded hash chains for near-linear performance - DEFLATE — my own block-framed implementation combining LZ77 tokenization with Huffman coding (stored, fixed, and dynamic Huffman blocks). Not RFC 1951 wire-compatible, but architecturally faithful to how DEFLATE works - RLE — run-length encoding for highly repetitive data - LZW — dictionary-based compression (the algorithm behind GIF/TIFF) On top of the algorithms, I built: A streaming API — You can compress in chunks, which matters for real-world use where you don't have the entire file in memory. A self-describing container format — Every compressed file has a header with a magic number, algorithm ID, original size, and a CRC32 checksum computed over the original data. Decompression verifies the checksum, so corruption is caught automatically. Multi-threaded DEFLATE — DEFLATE blocks can be compressed independently, so I added a `--threads N` flag that parallelizes block compression. This was a good exercise in partitioning work and managing thread synchronization without introducing correctness bugs. AVX2 SIMD acceleration — For LZ77, the inner loop that extends byte matches (once a hash chain finds a candidate) is the hot path. I added an AVX2-accelerated version that compares 32 bytes at a time, which measurably speeds up compression on repetitive data. It compiles conditionally based on the target architecture. A benchmark harness with proper methodology: configurable warmup iterations (to prime CPU caches and frequency scaling), multiple measurement iterations, and median reporting (resistant to outliers). It collects compression ratio, compress/decompress speed in MB/s, peak memory delta, CPU utilization, and token-level stats (match count, literal count for LZ77/DEFLATE). Results can be output as terminal tables, HTML reports, or CSV for further analysis. A 14-file test corpus spanning four categories — text (books, logs, source code), binary (zeros, random data, repeated payloads), structured (CSV, JSON, XML, SQL), and synthetic edge cases (worst-case inputs). This matters because compression algorithms have wildly different performance characteristics depending on the data — Huffman is great on natural text, RLE dominates on runs of zeros, and LZW handles structured streams well. The benchmark exposes all of that. I started with Huffman because it's the most self-contained — you can get a working compressor in a day and verify correctness trivially. Then I built LZ77, which introduced the sliding window and hash chain data structures. DEFLATE was the hardest because it combines both: you tokenize with LZ77, then entropy-code the tokens with Huffman, and you need to decide block boundaries and whether to use fixed or dynamic Huffman tables per block. I built RLE and LZW last as they're simpler but round out the suite for comparison. The benchmark harness came next. I wrote it to be usable as both a CLI tool and a C++ library so that other projects could link against it. The CLI supports `compress`, `decompress`, `benchmark` (run all algorithms across a dataset), and `compare` (side-by-side algorithms on a single file with optional HTML output). I wrote correctness tests covering edge cases (empty files, single-byte files, all-zeros, random data, files that don't compress at all) and set up CI with GitHub Actions so that every push runs the test suite and a benchmark
University of Mumbai AI/ML researcher scaling graph-based retrieval and building distributed training systems from scratch
(1) Work: Reduced multi-hop query latency from 42s to 10–12s with graph-based recommendation (attribute bucketing + weighted seed expansion) on an Agentic Knowledge Graph. (2) Projects: Yuntun—Implemented Megatron-style tensor parallelism with custom autograd for column/row/vocab-sharded layers and correct gradient flow. Weigou—Built a 4D-parallel training stack (TP/CP/PP/DP) with ring-attention CP and pipeline parallelism; unified process groups and bucketed gradient sync kept training correct.
Fordham AI/ML engineer who debugged a SCORM integration by building a feature-flagged test harness in prod
During my Samsara internship, I faced a debugging scenario with our SCORM video player integration. Our system used launch links from Rustici (generated from a GraphQL mutation) that expired after 2 minutes if not accessed, and these links only worked in prod due to API credential restrictions. Due to these constraints, I was not able to test the feature in staging. To make it even worse, certain browsers handled Rustici's third-party cookies differently (so video worked in Chrome, but not Safari). What made this difficult was that there were no error signals (the code seemed correct, and when testing manually, the links worked). My solution was to build a test harness directly in prod (I put a feature flag in place) where I could verify the integration end-to-end across different browsers. This simple (and unorthodox) approach yielded pretty promising results, as I was finally able to verify the entire feature without unknown errors or timeouts. I learned that, with proper guardrails, it is okay to test in prod!
CS @ McMaster | Orchestrated 5-server architecture with Redis sync for ML-powered SMS system at scale
The hardest challenge was coordinating multiple concurrent servers: frontend, backend, database, SMS send/receive servers, and a separate ML server using Cohere with Tavily, so everything stayed synchronized. I owned the system architecture, built the SMS servers, and synced all services using Redis, which I had never used before. We split tasks across the team, but I handled the integration layer end-to-end. I worked through it by learning from documentation, YouTube resources, extensive debugging, and occasional mentor input.
CS @ UC Santa Barbara | Refactored Cisco's LDAP auth agent, cutting packet loss from 35% to 2% at scale
The most difficult technical problem I faced was when I first started working on refactoring the LDAP authentication agent during my Cisco internship. I had just been introduced to the codebase and had been shown this simple driver one of my colleagues had made when they had attempted to refactor the codebase several years ago but had stopped. I was using that as part of the development process to quickly test how my refactored authentication agent compared to the monolithic, fully featured process inside the main LiNA codebase. After I had made some changes to my auth agent—increasing the number of threads, pipelining, and simplifying the internal FSM—I ran the driver. When I did, every single one of the connections from my auth agent failed, whereas the monolithic process only failed on 35% of them. I was really confused at the time, since I thought I had done everything right and a total failure seemed impossible from the changes I had made. I thought I had perhaps multi-threaded it too much, so I experimented with reducing the number of threads, but that didn't fix anything. I then thought I had implemented pipelining improperly, so I reviewed my implementation to see if I was losing requests somehow. Neither I nor my manager could find any issues with that. After inspecting my code every possible way, I realized that I had removed the timeout limitation, and that had led to a default value from somewhere else in the codebase being applied, which was so low that no connections could be made. Resolving that issue by setting the timeout to what it was before (I had removed it because I thought for debugging I didn't want requests to timeout) led to my auth agent losing only 2% of packets on average. Even though the resolution was pretty simple, that was probably the most difficult technical problem I've faced due to how far away it was abstracted from where I was working and how much time I spent debugging it.
CS @ UC Santa Cruz | Built PromoPilot with 4 AI agents orchestrating cross-protocol marketing campaigns
I built PromoPilot, an autonomous marketing loop that uses four specialized agents to generate and schedule multi-modal campaigns for resource-constrained startups. The most difficult technical problem was orchestrating the state and reliable handoffs between four disparate agents operating on different protocols. I had a Content Agent using Claude, a Media Agent using AWS, and a Scheduling Agent on Fetch.ai's decentralized network. Wiring them together was a nightmare because I had to manage state across a distributed system where agents could time out or fail silently. For instance, the media handoff was particularly brittle; getting the AWS-based video generator to pass asset data reliably to the decentralized scheduling uAgent required me to implement a direct uAgent-to-uAgent communication protocol. On top of that, I faced severe dependency hell trying to get the uagents library, boto3 (for AWS), and Flask to coexist, as they had conflicting version requirements that kept breaking the build. I had to methodically isolate these components and wrap the social posting logic inside a uAgent carefully to handle Twitter's strict rate limits, which made end-to-end testing of the autonomous loop incredibly slow and fragile.
Emory researcher who debugged noisy sensor data to hit R² of .87 by fixing preprocessing, not the model
The most difficult technical problem I faced was during undergrad AI research. The core challenge wasn't the model, it was getting usable signal from extremely noisy, real-world data. I was working on a project that was using sensor-based data for classification. In controlled environments it was proven to work, but in real life was a different story (lots of noise and inconsistencies). I spent a long time trying to tweak the model or adjust inputs, none of it was working. I solved this by switching from trying to fix the model, to understanding the data. I ran various feature analyses, ablations, and refining preprocessing and collection methods. Once I clarified this, the model started to stabilize and I actually got an R² of .87. I learned that many AI issues are much less about the actual model and more about the data/assumptions.
Northeastern AI/ML engineer who built real-time bike safety AR system fusing radar + webcam at 25 FPS, winning 2nd in capstone
Real-time webcam object detection fusing with radar detections to determine hazards to bicycle riders in real time. Displayed hazards on AR HUD. Solved it with optimized edge computing for real-time object detection and solved fusion algorithm coordinate transforms to align radar and camera data in 3D space. Achieved 25+ FPS and ~100ms latency for our system, winning 2nd place in senior capstone.
NYU full-stack engineer who debugged a production race condition in multi-threaded C++ under high concurrency
I debugged a production crash caused by a race condition in a multi-threaded C++ component that only showed up under high concurrency. I traced it using logs and thread-level instrumentation, then fixed it by tightening synchronization and ownership boundaries.
CS @ Arizona State | Built a deterministic Vedic astrology engine from scratch with zero tolerance for error
I built the entire Vedic astrology calculation engine by myself, and that was easily the hardest technical problem I've tackled. Vedic astrology is extremely unforgiving. If your time conversions, ayanamsa, planetary positions, or house calculations are even slightly off, the whole chart is wrong and nothing built on top of it matters. I could not rely on existing libraries because I did not trust them blindly, so I implemented everything from scratch and validated it chart by chart against known references and professional tools. Whenever something did not match, I traced it back to the underlying astronomical assumptions instead of patching the code. I rebuilt parts multiple times until the engine became deterministic, accurate, and scalable. In the end, I had a foundation I fully trust because I know exactly why every number exists.
Oxford AI researcher who built uncertainty-quantified fusion reactor models hitting 92% accuracy on plasma predictions
I was tasked with predicting plasma shape parameters from videos of plasma in a fusion reactor (tokamak). An important requirement for the model was quantifying uncertainty in its predictions. I was using a deep learning-based segmentation model (Meta's SAM) and had to figure out how to map from the shape of the plasma region to the shape parameters—elongation and triangularity (which are traditionally predicted using magnetic probes). To solve this, I ended up using a Gaussian Process for this mapping, which also gave nice uncertainty bounds. Over 92% of the true parameter values were in the 2-sigma confidence interval of the model. Here's the link to a nice demo of the final model: https://cvprojectapp-wcvsmvztvb52thgbk998o8.streamlit.app/ You can see my CV for some other cool projects: https://drive.google.com/file/d/1jU7xYtWZAhXqUDVbyz7MxfULIa4rIz8N/view?usp=sharing
UC Riverside AI/ML engineer who built a multi-agent AutoML system that researches domains and auto-generates features at scale
I designed and implemented an orchestrated workflow where multiple AI agents could analyze natural-language machine learning use cases, research external sources for relevant domain factors and strategies specific to the use case, propose and generate features, and then assess their impact on model performance. This system was built as a submodule of a larger AutoML project Finarb (the company I was working for) was developing. I implemented the entire pipeline, from prompt design and guardrails to execution logic, error handling, and evaluation. Because the system relied on LLMs, it was inherently non-deterministic, which meant testing and validation were significantly more challenging than in a typical software engineering project. Making the system reliable enough to be used in practice required careful design choices around constraints, verification, and failure modes, as well as extensive testing. I would say this is the most difficult technical problem I had faced so far.
CS @ UChicago | Built Hostess after shipping vLLM + full-stack K8s apps from scratch—Docker Compose for production
Starting with zero Kubernetes experience, I took a first-principles approach to learning by building: first deploying a vLLM instance on GKE, followed by a full-stack Next.js, FastAPI, and Postgres application on a raw K8s cluster. Once I grasped the underlying abstractions, I formalized them into Hostess—a "Docker Compose for production." It automates the entire lifecycle (CLI → API → Docker → K8s) from a single hostess.yml, handling service discovery, secrets, and observability for the full stack. This methodology—shipping the leanest possible system to identify core patterns, then distilling them into reusable primitives—is the foundation of how I build.
CS @ Chhattisgarh Swami Vivekanand | Built MTL-PORL framework cutting catastrophic forgetting in molecular AI to near-zero
Problem: Catastrophic forgetting during sequential molecular property prediction—maintaining performance on earlier tasks while training on new tasks. My role: Lead implementer and co-author of the MTL-PORL framework (refresh-learning + Pareto optimization). I designed episodic training pipelines, implemented refresh (unlearn + relearn) strategies, and integrated Pareto-optimal multi-task gradient aggregation with hyper-gradient-based unlearning into ChemBERTa-based models. Solution highlights: Built robust episodic training and evaluation pipelines, added hyper-gradient unlearning modules, and implemented Pareto gradient aggregation to balance stability–plasticity tradeoffs. Results: Significant reduction in forgetting and strong anytime/test accuracies on multiple molecular datasets (Anytime Avg. Accuracies ≈ 91.63%, 94.89%, 92.67%; Test Accuracies ≈ 92.48%, 96.48%, 96.86%; Forgetting measures ≈ –0.0048, –0.0045, –0.0063).
Alexandria University | Built open-source AI agent matching SOTA on multi-hunk code understanding for bug testing at UIUC
Building techniques to evaluate state-of-the-art AI agents in specialized software testing tasks and designing an open-source agent with enhanced understanding of multiple code hunks simultaneously, achieving results comparable to state-of-the-art agents. Improved agents' understanding of bug reports, resulting in a paper submission to ISSTA 2026. Conducted as a Research Assistant at the University of Illinois under Prof. Darko Marinov.
CS @ Auburn Montgomery | Architected real-time WebSocket gateway handling thousands of concurrent sessions with zero latency spikes
The Problem: I needed to build a real-time WebSocket gateway for a "Smart Support" system that could handle thousands of concurrent state-heavy sessions without significant latency or memory leaks. The Solution: I architected a custom synchronization layer using asynchronous processing in Python (FastAPI). I implemented an event-driven model to manage socket heartbeats and state persistence, significantly reducing overhead per connection. This ensured that even under high throughput, data consistency across the dashboard remained near-instant.
UMass Amherst AI/ML researcher who built a modular framework cutting document AI delivery from 18 days to 3
One of the most difficult technical problems I faced was in my previous company. The issue was slow turnaround time for clients asking for different problem statements around document understanding, QA, and RAG. Creating a novel MVP each time took around 7 days of dev, 4-5 days of testing, and 7 days of back and forth with the client. I led the team to create our own framework at the company, reducing the turnaround time to clients to 3 days (dev+test). The architecture was pretty simple to understand, with "modularization" being the key focus. I added support for local model deployments using sync between sglang, vllm, etc. and online models (whatever the client demanded).
CS @ K J Somaiya | Built RAG system parsing complex tables with 40% better accuracy using Docling
The most technical problem I faced was while building a RAG product for internal documents at a previous internship. My problem was finding local-first tools to be used on their servers, which was challenging as their documents contained a lot of tables that were difficult to parse. I did almost a week of research on it, finally landing on Docling by IBM for my parser instead of general PDF parsers, which gave me almost 30-40% better accuracy on table data.
CS @ UIUC | Built automation to isolate multi-agent failures in 1M+ word transcripts for cleaner AI research
Working on multi-agent system research, I was stuck on the problem of causal attribution - my codebase generated transcripts that were over 1 million words, which made it incredibly difficult to attribute performance failures to specific causes. To solve it, I came up with an automation mechanism which hard-coded one side of the agent interaction, cleanly isolating agent issues into competence failures (the agent isn't capable) and cooperative failures (the agent doesn't cooperate), which helped push our research forward!
NC State | Built self-supervised vision system to classify 50K-res cancer scans with just 250 images
One client project I worked on involved a dataset of images with huge resolution (40,000-50,000), and the task was unsupervised binary classification. The images were histopathological (skin tissue) scans, and they were divided based on whether a new cancer drug was effective or not. There were only ~250 such images in the dataset. Problems here were: low data volume, very high dimensionality, and prohibited use of labels. I tried to solve this by splitting up the huge images into patches using windows, and then trained an encoder on them using self-supervised learning. Using clustering on the generated embeddings from this encoder, I could categorize patches into smaller groups. At the end, we did use the labels and saw some decent classification performance, but we did not have access to experts who could tell us more about how to interpret the scans. What we did succeed in showing was that it was possible to train an embedding model to recognize patterns in parts of these gigantic scans.
BITS Pilani backend engineer | Scaled LightGBM to 10k concurrent requests with 99.8% cache hit rate via Redis + DuckDB
Handling 10k concurrent inference requests without slowdown was the toughest problem, so I optimized the LightGBM serving path and added request deduplication plus Redis and DuckDB caching to reach a 99.8% hit rate.
CS @ Madhav Institute | Built outreach automation at scale using CRM-as-database hack to track 1000s of prospects without added infrastructure
The most difficult technical problem I faced was when I was creating an internal tool for outreach automation (a side panel-based browser extension) at a startup where I was interning as a Growth Engineer. I was given a requirement to track all the people and messages sent to them through this tool. We needed to track it so we would know whether a person was a duplicate prospect or not, i.e., already added to the CRM. However, they didn't want me to use a database for that, as it would have introduced one more component to maintain. So I needed to find a hacky solution. To resolve it, I used the CRM we were already using as our database itself. I created a custom field called "prospect_metadata" and inside that, I put all the metadata related to the prospect. Each time a person was added to the sequence, it would check that particular field, and if it was found, it would tell the user that this prospect already exists. Then they could decide whether they wanted to add them again, and if they did, the previous messages sent to them would be used as context to create new messages (the user had autonomy to choose whether they wanted the previous messages included as context or not), and that would be appended to the metadata again. This was a really challenging problem that required me to think creatively and use a different approach.
CS @ Pimpri Chinchwad College of Engineering | Built YouTube chat SaaS with smart routing that handles both pinpoint Q&A and full-context video analysis
As I was developing my AI SaaS platform, which allows you to chat with any YouTube video, I discovered that the RAG approach did not work when users asked questions that needed the entire context of the video. After brainstorming with Claude and ChatGPT, and scribbling on the whiteboard, I redesigned my backend to include a router component, which routes requests according to their context requirements. Users can still ask questions directly to the video and receive accurate answers, and they can now ask questions like a list of the key topics covered in the video that require the entire video context to consider.
CS @ University of Lagos | Built emergency response backend for Vital Aid with optimized hospital search and AI-driven first aid at scale
One of the most difficult technical problems I faced was building a backend system that needed to respond quickly and reliably during emergency scenarios while working on the Vital Aid project. The challenge was balancing speed, accuracy, and reliability, especially when handling location-based hospital searches and AI-driven first-aid responses under different conditions. I approached this by breaking the problem down into smaller parts. I optimized database queries to reduce response time, simplified API request flows, and added validation and fallback mechanisms to handle incomplete data or service failures gracefully. I also restructured parts of the backend to separate core logic from external integrations, which made the system easier to maintain and more reliable.
CS @ AKTU | Built RAG chatbot with cross-encoder reranking to fix LLM hallucinations on technical docs
PROJECT: RAG Chatbot for Chatting with arXiv Documents While building a RAG agent (chatting with documents) for querying dense technical documentation, I faced a significant issue with the "lost in the middle" phenomenon, where the model would hallucinate answers because the retrieved context chunks were not ranked by relevance. To solve this, I moved beyond simple cosine similarity. I engineered a two-stage retrieval pipeline: first using a vector store (FAISS/Chroma) for broad semantic search, and then implementing a Cross-Encoder Reranking step to strictly filter and re-order the retrieved chunks before feeding them to the LLM context window. This improved the answer accuracy and significantly reduced hallucinations on specific technical queries. PROJECT: Note-Taking Tools Problem: PDF Rendering with Lazy Loading, Pinch-to-Zoom, and Memory-Safe Image Handling Built a continuous-scroll PDF viewer (pdf_viewer.py) solving three critical challenges: Memory-safe PyMuPDF-to-Qt conversion — PyMuPDF's pix.samples buffer gets invalidated on garbage collection, causing silent crashes. Fixed by calling .copy() immediately after QImage construction to decouple Qt's pixel buffer from PyMuPDF's memory. Viewport-aware lazy loading — Loading all pages caused massive memory usage. Implemented placeholder-based system: pages render only when visible (plus one-page buffer), distant pages unload back to placeholders. Required coordinating scroll events, geometry calculations, and re-render cycles. Cross-platform pinch-to-zoom — Handled both QGestureEvent and QNativeGestureEvent for trackpad support, tracked base zoom at gesture start, applied incremental scaling, and throttled re-renders (150ms QTimer) to prevent flicker while re-rendering only visible pages.
IISER Bhopal | Built CodeShield to validate AI code at scale using static analysis over LLM calls
One difficult technical problem was building CodeShield from nothing. I had to design a system that could analyze, validate, and clean up AI-generated code without burning through tokens or falling apart on edge cases. The solution came from a lot of low-level profiling, rewriting modules that behaved badly, and creating a pipeline that relied on static analysis first instead of throwing everything at an LLM. It took patience, but the system eventually became fast, stable, and predictable. A completely separate challenge was getting into the WorldQuant Brain environment with zero background in quantitative finance. I had no clue about factor models, alphas, or market structure, so I had to teach myself the entire workflow while competing with people who had years of experience. I solved it by studying successful alphas, running small controlled tests, reading research papers in plain English until they finally made sense, and building intuition through failed attempts. It was slow at first, but the trial-and-error approach paid off.
CS @ Yeshiva University | Built rare-disease AI system with source-grounded retrieval to eliminate hallucination at scale
I built an end-to-end rare-disease AI research system spanning data ingestion, normalization, semantic chunking, retrieval, evidence-grounded generation, confidence scoring, and human review. The hardest problem was hallucination under sparse data. I solved it by enforcing source-linked outputs, uncertainty thresholds, and iterative failure logging, making the system reliable for researchers.
Waterloo CS | Built GooseDoor to scale salary data across universities with secure, real-time architecture
One of the most difficult technical problems I faced was designing GooseDoor to scale reliably while handling sensitive, user-submitted salary data. Early on, I realized that a naïve backend setup would struggle with spikes in traffic, slow queries, and data integrity issues as the platform expanded beyond a single university. I tackled this by redesigning the backend around Supabase with PostgreSQL, carefully normalizing schemas for offers, companies, and users, and adding indexes to keep queries fast as data volume grew. I also implemented server-side validation and row-level security to ensure only verified university users could submit or view certain data, which was critical for trust and privacy. On the frontend, I optimized data fetching and caching to reduce redundant requests and keep dashboards responsive. I approached the problem iteratively by profiling slow endpoints, stress-testing with realistic data, and refining the architecture until latency dropped to near real-time levels. This experience taught me how to think holistically about scalability, performance, and security rather than just getting something working.
Independent AI/ML researcher who solved sparse 3D printer defect detection by stitching image streams into rich temporal composites
The most difficult technical problem I faced was ensuring consistent data flow for our cloud-based 3D printer defect detection system. Images from user printers were often too sparse and irregular for accurate nozzle clump detection, creating gaps that undermined the model. To solve this, I led a collaboration with the backend team to innovate within the system's constraints. We optimized by reducing individual image size and, most critically, developed a method to stitch consecutive images into single, information-rich composites. This approach fed the model richer temporal data without exceeding processing limits. The solution restored data consistency, allowing the inference pipeline to run uninterrupted. This significantly improved detection accuracy and reliability, enhancing the overall system's ability to prevent waste and hardware damage.
Lingaya's Vidyapeeth AI/ML engineer | Built real-time speech-to-image system with noise suppression & optimized diffusion
During my Speech-to-Image Live Conversion using Deep Learning project, the most challenging technical problem I faced was synchronizing real-time audio transcription with accurate and fast image generation. Speech input is unpredictable—background noise, variable speed, and accent differences often caused Whisper to produce unstable transcripts, which resulted in inconsistent or completely unrelated images from the diffusion model. To overcome this, I switched to a chunk-based audio streaming approach to reduce latency, added noise suppression and voice-activity detection to clean the input, and implemented a semantic stabilization layer that preserved important keywords across chunks so the prompt didn't keep changing. I then optimized the diffusion pipeline by using FP16 precision, caching text embeddings, and reducing inference steps during live mode. Together, these improvements allowed the system to process speech smoothly, maintain contextual accuracy, and generate coherent images within a few seconds.
CS @ Stony Brook | Fixed hidden race conditions in multi-client servers under load with stress tests & tighter sync
The most difficult problem I faced was debugging an intermittent concurrency issue in a multi-client server where behavior looked random under load. Requests would occasionally stall or arrive out of order, but only when many clients connected at once. I fixed it by reproducing it with a stress test, adding structured logs around shared state and thread handoffs, and then tightening the synchronization strategy (narrower critical sections, safer message queue usage, and eliminating a few racy reads). After that, I wrote regression tests to confirm stability at high concurrency and monitored latency and error rates to ensure the fix didn't create bottlenecks.
CS @ Rice | Optimized MySQL at 191k+ recipes, cutting latency to instant scroll with keyset pagination
On BranchBite (project), I optimized a MySQL database with 191k+ recipes to speed up features after noticing worse performance at scale due to heavy joins and poor indexing. I tested different queries, redesigned indexes, and switched from offset to keyset pagination to avoid large row scans. This drastically reduced latency and made infinite scroll basically instant. It taught me how critical database design and overall optimizations are for scalability.
Northumbria University | Built fault-tolerant multi-agent AI system with state machines & 100% reliability
The most difficult technical problem I faced was in a hackathon project where I built a multi-agent AI onboarding system using Power Automate. I had three agents that needed to coordinate: one for welcome setup, one for training recommendations, and one for progress tracking. The core issue was agent coordination with unreliable data. Agent 2 was triggering before Agent 1 finished, flows crashed on null values, and I was getting duplicate actions. I solved it in three steps: First, I implemented a state machine pattern using status flags—Agent 1 sets 'OnboardingStatus = Complete', which triggers Agent 2, which then sets 'TrainingRecommendationsSent = Yes' to prevent re-triggering. Second, I used the coalesce() function throughout to handle null values gracefully: coalesce(item()?['DaysSinceAssigned'], 0) provides a default when data is missing. Third, I built comprehensive error handling with try-catch scopes, retry policies, and created 23 test cases covering edge cases. The result: Zero duplicate actions, 100% reliability even with incomplete data, and proper sequencing across all agents. What I learned: In distributed systems, you can't assume data is complete or that events happen in order. Defensive programming and systematic testing are critical—I learned to test each component independently, then together, to isolate where issues occur.
CS @ National Open University of Nigeria | Digitized federal agency workflows, cutting report gen from days to hours
The most difficult problem I faced was designing a complete governance workflow system for a federal government agency from zero documentation. When I joined, there was no existing design system, no user documentation, and the legacy process was 100% manual. Staff were generating reports by hand, which took days. I needed to digitize this for users who ranged from field officers to senior executives, each with different permission levels and data access. I couldn't do traditional user research (restricted environment), so I conducted stakeholder interviews and co-design workshops to map the end-to-end workflow. I discovered that the core problem wasn't just making it digital—it was that different departments had completely siloed processes with overlapping data dependencies. What I did was create a unified information architecture that mapped integration points between departments. For the interface, I designed role-based progressive disclosure: field officers see a simplified view, executives see aggregated dashboards. The hardest part was handling edge cases: what happens when a case crosses departmental boundaries? I designed state transitions with audit trails so every action was traceable. We replaced 100% of manual processes. Report generation went from days to hours. The design passed compliance review on the first submission because I'd documented every design decision with its rationale.
Brown CS building ontology mapping systems to harmonize messy clinical datasets for graph ML drug discovery at scale
During my research on graph deep learning for drug discovery, I was attempting to validate my model on experimentally obtained clinical data. However, the dataset did not map to the structured dataset for my model and previous experiments. The names of diseases were varying, the drugs were given different names, and there were several ambiguous terms that seemed mappable to multiple terms downstream. This made it very difficult to use the data to begin with. In order to use it, I needed to find a way to harmonize the structure and content. I tried all the regular steps: normalize and match, fuzzy match, embedding similarity, and even tried using an SLM. The number of errors remained too high to use any of these methods reliably, so I decided to reframe the question to finding the nearest match rather than the exact match. This enabled a much clearer approach to harmonizing heterogeneous datasets while maintaining accuracy in mappings. This ontology mapping strategy enabled mapping at a much larger scale that was invaluable for research purposes.
UC Berkeley AI/ML engineer who shipped loan pricing models across 1,000+ branches for $40M impact
Led development and rollout of an AI-powered loan pricing platform for a $22B portfolio. Solved it by engineering 1,400+ features, training a CatBoost model, building a COBYLA optimizer, and shipping to 1,000+ branches, driving ~$40M annual profit uplift.
UC Irvine AI/ML researcher who builds resilient data pipelines that fail gracefully
FRED's dataset had stricter-than-ideal rate limits. In addition, problems existed with preexisting scripts to scrape and pull together data for my research group's training run, so I rewrote a more robust implementation that we used to finish our data collection. Off the top of my head, the scripts had a habit of overwriting and deleting already-pulled data, failing to resume sessions (also poorly overwriting and deleting data, presumably due to some strange race condition), and at the same time being both slower than the rate limit and occasionally running up against the rate-limit timeout checks. While perhaps not the most glamorous technical problem upfront, what made the problem slightly more interesting was that some of our team had already pulled in a significant amount of data, and all of us had pulled in some data. Any implemented solution to our wonky script would have to both save time and patch holes made by the previous script to be worth implementing. Simply "starting over" wouldn't have been a better solution. I remember it not being the most trivial problem—a developer whom I respected took a short crack at implementing a parallelized solution, which, while an improvement, was still not without its faults. My implementation, which admittedly wasn't the most pretty, ended up being the final solution we used for the rest of our dataset because, even when it failed, it failed gracefully. It was a fun problem to work on and taught me a lot about working in environments where time trade-offs were a consideration when shipping solutions.
CS @ Swarthmore | Built Congress Alerts to text 1k users live vote updates at scale
I built a real-time notification system for congressional votes (Congress Alerts). My stack was Telnyx, Google Sheets/Apps Script, and Google Forms. One of the harder things to work around was rate limits on Google Sheets. I hit a reliability wall with Google Apps Script because it's easy to blow execution limits on the free tier when you have lots of users (~1k). The fix was splitting Congress Alerts into two phases: enqueue and send. Enqueue is fast: fetch new votes, write compact message rows to a queue sheet. Send is a separate trigger that processes, say, 50 messages per run. That keeps each run under quotas, lets me throttle Telnyx calls, and makes throughput scale just by increasing trigger frequency instead of rewriting the whole system. Also, anticipating and addressing edge cases in user behavior when they were texting the service was a pain.
UC Davis engineer who migrated high-traffic systems to microservices at EPAM | Backend, full-stack & AI/ML
One of the most difficult technical problems I faced was during a production migration from a monolithic backend to a microservices architecture during my time as a SWE at EPAM. The biggest challenge was preserving system stability while breaking apart tightly coupled services that handled high-traffic APIs. I solved this by first identifying clear service boundaries, introducing API contracts, and adding comprehensive Postman-based and automated tests before each rollout. I also monitored latency and error rates closely after deployment and iterated quickly on failures. This approach allowed us to migrate incrementally without downtime and significantly improve system scalability and maintainability.
UT Austin AI researcher stabilizing LLM convergence through PPO and retrieval optimization at IDEAL Lab
In my research at the IDEAL Lab, the most significant challenge was optimizing the learning convergence of LLMs when integrating autonomous search capabilities into the recommendation process. Using Proximal Policy Optimization (PPO) with retrieved token masking initially led to high variance and unstable training cycles on our large-scale CUDA experiments. I solved this by systematically redesigning the ablation datasets and fine-tuning the reward shaping to better align the model's chain-of-thought reasoning with the retrieval actions. This iterative refinement, performed on the TACC supercomputer, ultimately stabilized the policy and significantly improved the model's ability to autonomously query metadata for informed recommendations.
UT Austin undergrad researching self-organizing AI systems and neural network representations toward AGI
First-authored the paper "Neural Cellular Automata for ARC-AGI" as an undergraduate, implementing gradient-trained Neural Cellular Automata for the ARC-AGI benchmark from scratch, demonstrating efficient few-shot generalization and identifying design factors that influence self-organizing system performance. Now working on my undergraduate thesis, analyzing the Fractured Entangled Representation Hypothesis in neural networks and identifying potential methods of addressing it.
Penn AI researcher who bootstrapped distributed GPU infrastructure to crack multi-trigger interpretability on a shoestring budget
While working on my paper on multi-trigger mechanistic interpretability (related to the Anthropic Sleeper Agents work), I hit a hard academic compute wall. The research required training and probing models on a scale that my local setup couldn't handle. I attempted to secure compute resources from Stanford, but they denied my request. I was effectively locked out of the necessary infrastructure to prove my hypothesis, facing a deadline with no budget for a standard H100 cluster. Instead of scaling down the project, I bootstrapped a distributed training and probing pipeline using fragmented, lower-cost compute resources (e.g., spot instances or disparate GPUs). I engineered a custom pipeline to shard the model and activation data across multiple, cheaper consumer-grade GPUs rather than relying on a monolithic enterprise cluster. The main technical bottleneck was the communication overhead between these disjointed devices. To solve this, I implemented aggressive gradient accumulation and optimized the data transfer protocols to minimize the bandwidth bottleneck, effectively simulating a larger cluster on a shoestring budget. Since I was using less reliable instances, I built robust checkpointing and auto-recovery scripts to ensure the long-running interpretation jobs wouldn't fail if a single node went down. This infrastructure allowed me to run the necessary multi-trigger analysis and complete the paper, proving that resource constraints could be overcome with superior engineering.
AI/ML engineer at Amrita building agentic systems and debugging the messy reality of LLM tooling integration
I worked on adding observability using Langfuse and enabling seamless model switching through LiteLLM for our organization's agentic ecosystem. On paper, both tools were straightforward to integrate, and they worked fine independently. However, once we connected them in our actual codebase, we ran into a strange issue—traces were showing up in Langfuse, but all the values were null. There weren't any obvious errors, which made it more challenging. I spent a significant amount of time debugging the integration, double-checking configurations, environment variables, and tracing logic. I went through GitHub issues for both projects and reached out in community channels to see if anyone had faced something similar. Eventually, I discovered the root cause was a version incompatibility between the LiteLLM version we were using and Langfuse v3. When we downgraded Langfuse to v2, the traces immediately started working properly. However, that downgrade caused several other dependency conflicts in our environment. To fix that, I carefully reviewed our dependency tree and reconciled package versions to produce a stable and conflict-free requirements setup. This experience taught me a lot about dependency management and the importance of version management, which often gets overlooked.
UC Cincinnati AI/ML engineer building scalable IoT infrastructure for 10K+ node systems
The Problem: Designing a synchronization protocol for GridPilot that could handle real-time state changes for 10,000+ IoT nodes (ESP32) without causing database locking or massive latency spikes in the user dashboard. The Solution: I architected a "V2" solution that decoupled the hardware logic from the UI. Instead of direct writes, I implemented a Python Gateway Bridge that batches inputs and uses a modularized service layer (db.js) to handle Firestore state syncing. I utilized an agentic workflow (using Claude/Gemini) to refactor the entire monolithic codebase into a scalable single-page application (SPA), effectively using AI to accelerate the refactoring of the auth and database modules by 400%.
High school ML researcher tackling quantum error correction with neural networks
Over the past few months, I was working on a research project that was simulating a bunch of quantum circuits, then simulating them with "noise" added (because of interference from the environment), and training a neural network to map those noisy states to their clean counterparts. At first, I tried training them for 5 qubits, which worked okay, but then for some reason I just couldn't understand, when I would scale up to 8 qubits, the models would just fail completely, their predictions being further from the ground truth than their inputs. Even training a huge model with millions of parameters on just 100 states, it failed to overfit. I spent like a couple of hours literally just looking at the raw data, trying to figure out what was wrong, and then I realized that the noisy data was a lot smaller than the clean data. This is somewhat obvious once you really think about it, because the definition of these noise channels is just squeezing the total space of inputs into a smaller region, but it was very difficult for me because I had no physics background and went in thinking that I could just take some data, chuck it in a model, and get out sensible predictions. I didn't understand my data well enough, and once I understood my data, the solution was obvious.
Full-stack dev building secure blockchain + ML systems for peer-to-peer healthcare
The toughest problem I faced was designing a secure trust layer for a peer-to-peer medicine platform where blockchain integrity, QR verification, and encrypted off-chain medical data had to work together without creating performance bottlenecks. I solved it by separating critical proofs on-chain from sensitive data off-chain and building a hash-linked verification system plus an ML-based matching engine to ensure both security and efficient fulfillment.
UMD researcher breaking AI watermark detectors through adversarial transfer attacks
Standard attacks failed against high-perturbation image watermarks like TreeRing. The challenge was breaking a black-box detector without access to its weights. I solved this by training a substitute classifier to mimic the target, then generating adversarial examples against the proxy that successfully transferred to fool the real detector.
NYU engineer solving infrastructure challenges at scale—from DNS root servers to Terraform resilience
While working at Infoblox on their Terraform provider, the legacy API had an issue where whenever an object was modified, its API reference would change. This was identified by a major client (they handle root DNS servers in Australia). Since the product was at a mature stage, it wasn't feasible to rebuild the API at the time, so we had to fix it on the client side (Terraform). The solution that the team and I came up with was to implement a fallback search in Terraform. If the object was not found, we would perform a search by extensible attribute (metadata that could be attached to any object in the Infoblox server). The first search attempt is done through object reference; if not found, we search by extensible attribute. Extensible attribute search was not made the default method due to higher latency (these attributes are not indexed in the database). This was successfully developed by the team, tested by me, and deployed by the customer.
ETH Zurich researcher who trained a Clash Royale world model on 1 GPU via vision-augmented inputs
Trained a world model to play Clash Royale with access to 1 L4 GPU. Solved this by using a vision pipeline to augment the pixel inputs to the model and achieving faster convergence. Coming up with a cool research question for my thesis was also quite challenging (solved by reading many papers and tweets).
CS researcher @ TU Dresden | Solving continual learning with dynamic neuron-level learning rates and topological memory compression
I have been trying to tackle the problem of continual learning for a while due to architectural and algorithmic limitations. I concluded on two solutions: one with an algorithm that uses local learning rules to set dynamic learning rates for different neurons along with a growing architecture, and another where I decided to make a topological memory for AI systems that can store memories in a compressed graphical manner.
CS @ GGS Indraprastha University | Built autograd engine from scratch and trained RNNs like PyTorch internals
Built an autograd engine (like PyTorch) and trained RNNs and deep neural networks from scratch.
CS PhD @ UVM | Built infinite procedural city generator in Unity to teach K-12 students lottery odds
I might be tackling harder problems during my PhD, but I vividly remember that the hardest one was my undergraduate thesis project. I had to build a small video game demo for K-12 students showing the low probabilities of winning the lottery in order to prevent them from gambling. I employed Unity to create the demo. The layout was a city where there were supposed to be an infinite number of buildings. You could enter a building to find a floor full of bookshelves with N rows; in each row there were M books, and in each book there were Y pages. The chances of picking up the right building, bookshelf, book, and page are astronomically low. The complexity is that you cannot load an infinite number of such buildings due to resource constraints, but the player had to have the feeling of being able to walk indefinitely around the city. Thus, buildings had to spawn, and there had to be consistency in the numbers that were spawned with each building. Therefore, I had to come up with my own procedural generation algorithm (much like Minecraft) where there was a visualization horizon that the player could observe. I had to keep a constant process of computing the local and relative location of the player and update the environment (city) as the player was moving, all while respecting the physics and playability to make it realistic and to allow it to be playable on another computer with lesser resources. I remember breaking down the generation into a grid and computing a playable radius around the player with identification numbers that were used to load buildings consistently. Anyone who has ever worked with Unity knows that spawning objects and ensuring that everything works properly (roads and pavement connected, doors of buildings that work, transitions loading, etc.) is hard to do. That was it. It was challenging due to time, compute, and other resource constraints, but I managed to deliver it. I would say that research problems I tackle during my PhD might be orders of magnitude harder from a cognitive side, but this was a full start-to-end engineering problem that I had to solve.
NYU AI/ML engineer who built an agentic injury voice assistant with dynamic LangGraph flows
Agentic AI-based injury voice assistant driving nodes in LangGraph. Earlier I had stored questions in ask nodes, then I changed my approach by saving questions in the database and letting it drive the questions.
Pune University | Built smart inventory system from scratch with zero web dev experience—optimized search with binary search & designed transfer logic
I was given a task to build a full-stack project: a smart inventory management system designed to track inventory in local stores of the same brand and recommend stock transfers either from nearby stores or from the warehouse based on the demand level of each store. At that time, I had zero knowledge of web development, but I didn't panic and built the entire project on my own by reading documentation and utilizing online resources. I wrote the entire code myself and used ChatGPT only for a final review and for modifying it according to industry standards so that it would be readable for everyone. I faced many challenges, but I never gave up, and in the end, I successfully completed the project. It was also the first project where I applied my knowledge of data structures and algorithms; I implemented a search feature that was initially slow and optimized it using binary search, making it significantly faster. I also designed the transfer recommendation logic myself.
AI/ML engineer who optimized RL collision detection at scale and built production boolean parsers with custom logic
I'm going to be honest, I don't know how to answer this question. There are some fun problems I've worked on, like making in-production custom boolean logic parsing systems with operation order and parenthesis respecting, but in the world of language parsers, it's not particularly difficult. Optimizations of collision detection algorithms in RL training environments was technical, but the underlying logic was pretty simple. There was a particular use case for a static factory factory that was another interesting problem, but all of these fall under the same category of just look up the underlying structure, break it down into comprehensible chunks, and once you understand the underlying principles, build up until solved. I don't know if any of these are actually technically impressive though. I would say anything I would label as a difficult technical problem is one I haven't been able to solve yet, so learning Agda + HoTT for better proof writing and language creation is probably the hardest, but I haven't finished it yet.
CMU AI researcher who built incentive-compatible protocols for LLM agents using program equilibrium to prevent manipulation
To ensure strategic stability in multi-agent systems, you tackled the challenge of incentive compatibility in LLM-to-LLM interactions, where natural language agents often deviate from traditional rational behavior through manipulation or hallucinated preferences. You solved this by developing a framework that maps high-dimensional LLM outputs into structured utility functions, applying bilinear optimization to minimize computational overhead, and utilizing program equilibrium concepts to allow agents to "verify" mutual cooperation protocols via prompt transparency. This approach effectively bridges the gap between the unpredictability of Large Language Models and the formal guarantees of mechanism design.
Waterloo engineer who built Rizz Glasses in 48hrs — AI agent + Meta Ray-Bans giving you real-time pickup lines
Building the Rizz Glasses within 48 hours - integrating voice transcription through the Meta Ray-Bans and passing it to a Rizz Agent, giving you the best responses to rizz up the girl you are talking to. https://www.youtube.com/watch?v=lH4nAysbcm4
CS @ Galgotias University | Built deforestation tracker processing satellite imagery to measure land degradation at hectare-level precision
The most difficult technical problem I faced was accurately integrating satellite data to measure the exact number of hectares of land affected by deforestation in the tracker app. Initially, handling real-time geospatial data and ensuring precision was challenging. I solved this by sourcing verified satellite imagery and processing it through geospatial APIs, using Google and Jio mapping tools to analyze land degradation and automate accurate calculations.
CS @ York University | Built real-time inventory system with deterministic services that slashed errors across pricing and contracts
The hardest problem I faced was designing a real-time inventory and pricing system where physical yard data, contracts, and unit conversions all had to stay consistent under constant change. I solved it by breaking the system into deterministic services with strict schemas, automated tests, and continuous validation, which cut errors dramatically while improving speed and reliability.
Dropout building AI quiz apps | Debugged hours of broken Gemini code down to one outdated model string
Technical problem as in the most difficult one? Honestly nothing major, but I do have a story about how I faced a problem and fixed it. I am basically a vibecoder who kind of understands code. I can read through code, and if there is a visual or functional issue, I can usually track it down and fix it. Once, I was building an AI-powered quiz maker app. The idea was simple: you upload your notes, handwritten or digital, and the app turns them into a test. I used the Gemini API for the AI part. One common issue with AI code generators is that they often produce outdated code. I was aware of this, but I still missed it. I was generating code using the latest Gemini model, it had internet access, and my prompt always included a line asking it to verify that all generated code was updated and current. Despite that, the API name in the code was constantly set to an outdated model, gemini_1.5_pro_preview, while the actual latest model was gemini_3.0_pro_preview. This turned into a real headache. For hours, I kept digging through the JS code, trying different fixes, checking every possible angle. Even when I saw the error clearly in the inspect tab, I dismissed it because I assumed the model name could not be the issue. Eventually, after going back and forth enough times, I realized the entire problem came down to that single outdated model reference. Once I updated it, everything worked perfectly.
Thadomal Shahani Engineering College | Priced satellite insurance using live NOAA data & Monte Carlo risk models
I priced insurance for a satellite using real-time data from NOAA. I first integrated and processed the data from NOAA, which was updated every 30 minutes. Based on this data, I calculated the chance of a geomagnetic storm, which was then used to perform Monte Carlo simulations that calculated the risk, from which the option was priced.
Ball State AI researcher | Slashed ML energy consumption via pruning & quantization without sacrificing accuracy
One of the toughest technical challenges I faced during my research on Green AI was finding a way to reduce the energy consumption of large-scale machine learning models without compromising their performance. Training these models typically requires massive amounts of energy, but making them more efficient often led to poorer results. To tackle this, I explored techniques like model pruning, quantization, and distillation, which helped reduce the model size and energy use without losing accuracy. I also worked on optimizing the hardware used for training to make it more energy-efficient. By combining these strategies, I managed to significantly cut down on energy consumption while keeping performance high, which became the foundation for my research on making AI more sustainable. This experience taught me the value of balancing innovation with practical solutions.
Brown AI/ML researcher who built deep learning models at CERN predicting particle positions from detector data
As a research intern at CERN, working on deep learning models to predict particle positions from particle detectors' voltage data, I was tasked with both designing the models and choosing, designing, and implementing classical models to verify the effectiveness of the deep learning alternatives. In the span of a few days, I read several papers in order to figure out which architecture would be best, learn how to implement analytical methods like matrix inversion and charge-sharing methods, and test different methods for choosing hyperparameters to yield the best results. I've become very good at learning complex architectures and techniques quickly, both in deep learning and in general, with the tools available to me, using close paper-reading, textbooks, and asking strong clarifying questions.
New Horizon College engineer who debugged and refactored a full-stack AI search system across React, API routes, and Supabase at scale
Actually, for me the most difficult technical problem I faced was stabilizing and refactoring the AI search system in my VerifyAI project. The system initially had multiple failures. The chat UI showed empty responses, search results were duplicated, and user chat history and bookmarks were breaking because the Supabase database schema did not properly match with the Clerk auth integration. This took a lot of my time. To solve this, I worked end to end with prompting to Copilot across frontend, backend, and database. I did a proper end-to-end system design and gave proper instructions to Copilot to refactor the main /api/verifyai/search route by breaking one very complex function into smaller, readable modules, which reduced cognitive complexity and made debugging easier. I then fixed the database by normalizing all user_id fields to TEXT, repairing foreign keys, and correcting row-level security policies so each user could only access their own data. Finally, for UI issues, I mapped out how text streaming flow should look like and I gave instructions to rebuild parts of the React chat UI so responses streamed correctly, removed duplicate messages, and ensured chat history and bookmarks synced reliably with the backend. The key lesson was that real engineering problems are usually system-level mismatches, not single bugs, and solving them requires old-school methods like thinking and writing down, refining design choices, and reasoning across the entire stack.
CS @ Mumbai University | Built File Transfer Hub serving shareable links with AWS S3 at scale
I think one of my most interesting and enjoyable projects is File Transfer Hub, a file-sharing web app I built during my learning phase that produced very effective results. File Transfer Hub is an open-source, free-to-use web application where you can easily upload any files and get a shareable link, so you don't need to send large files from your storage. We manage it all, and we don't access your files as they're stored in AWS S3 buckets. I first thought about how to build it and came up with a simple design and architecture, though I did take some help from YouTube and LLMs for deeper insights. After clarifying the architecture, I started with the backend using Node.js and Express.js, and AWS S3 for storage. I divided it into components like models, controllers, and routers, and encountered many errors during integration and routing, but I solved them. Then I moved to the frontend, building it with Vite and React with help from LLMs, and integrated the backend with proper responsiveness. During production deployment, I encountered many errors, mostly CORS-related, but I managed to solve them. Through this project, I learned a lot about these technologies and about debugging.
CS @ Nagpur University | Retrained and compressed local LLM from 13B to 7B params for faster, more accurate inference
I wanted to increase the speed of inference for my local LLM agent, so I retrained it using similar projects with features in my roadmap, then decreased the model size from 13B to 7B parameters. It ran much better and more accurately.
CS @ University of Mumbai | Built FinetuneX LLM framework and debugged Flash Attention v2 kernels at GPU level
The most difficult errors I have faced in my journey so far are compilation, runtime, and CUDA OOM (out of memory) errors. Recently, while integrating Flash Attention v2 implementation in FinetuneX (an LLM finetuning framework), I faced compilation errors and runtime errors. I debugged this by tracing kernel launches and validating the assumptions inside the GPU kernel. I eventually realized that the Q, K, V tensors being passed to the kernel were incompatible with the kernel's expected dtype, layout, and tiling block sizes. This was just a logical mismatch between kernel design and input contracts. Earlier, I encountered a classic but frustrating issue: NaN loss values. I debugged it using torch.autograd.detect_anomaly(True), which pointed to the specific operation in the forward pass that caused NaN values. After investigating further where the NaN came from, I discovered the root cause was a precision mismatch. I also faced CUDA OOM errors. Adding gradient_checkpointing_enable in my training loop eliminated the CUDA errors.
Princeton PhD | Cracked pure math thesis by computing thousands of examples to build problem-solving intuition
Solved my PhD thesis in pure math by computing lots of examples to gain unmatched intuition, which ultimately helped crack the problem.
Delhi Skill & Entrepreneurship University | Backend & AI/ML engineer who mastered CORS to connect React frontends with secure, production-ready APIs
One of the most challenging technical problems I faced was handling CORS errors while connecting a React frontend with a backend API. I debugged the issue by understanding browser security policies, identifying missing headers, and correctly configuring CORS on the backend. I resolved it by explicitly allowing origins, HTTP methods, and credentials, which helped me gain a strong understanding of client-server communication.
CS @ San Jose State | Taught Gemini 2.0 Flash spatial understanding by converting 3D meshes into relationship graphs
The most difficult technical problem was giving Gemini 2.0 Flash spatial understanding of a 3D object. Giving it information like the mesh and the object is easy, but providing structural information about an arbitrary 3D object is really difficult because the user could give me any 3D object, and sometimes it's all one mesh so you can't extract much info. One thing that worked was treating the mesh not like geometry but more like a graph of relationships. Deriving higher-level components like curvature clusters, symmetry axes, and an adjacency graph gave me a better solution. It took me about a day to accurately extract the info with a hardcoded solution, but this solved the problem I was having at the time.
IIT Bombay AI researcher who built a black-box framework to reconstruct LLM prompts from outputs alone
Recently, I worked on the use of inversion in the post-training stage of large language models. Specifically, I proposed a data-free, black-box LLM inversion framework using previous-token prediction, aimed at reconstructing prompts from model outputs.
CS @ Manipal Institute of Technology | Debugged broken federated learning repos from scratch to ship original AI research
When my peers and I were coding the idea we had for our research project, we ran into the issue that the repository we were basing ours off of was terrible—multiple errors and some version control issues. Pairing that with the fact that in general there are fewer well-written repositories for the field of federated learning made it very challenging to get the code to a working state before we could even begin implementing our own idea. The way I tackled it was to take it a step at a time, set up debug prints everywhere, and slowly resolve each error as it came. Once the code ran, I went over the code to find any high-level implementation errors. I feel this works better as it is easier to deal with logical errors when we are not also dealing with major syntax-based ones.
Sathyabama engineer | Built YC-backed coding platform with K8s at scale, now shipping real-time video LLMs & agentic browser automation
Kubernetes - Built a coding platform under AlgoUniversity (YC). Had to learn and integrate Kubernetes to the backend for secure code execution. The Kubernetes and Redis queue combo messed up my backend a lot, and I had to redo the whole thing (though faster this time). The whole project took around 3 months to build. While rebuilding, I used feature-driven development — building one feature at a time, not leaving any tiny bugs during dev, clear technical diagrams. All this was learned the hard way and will be used for the rest of my life. https://www.algo-zen.dev/ https://www.algo-zen.dev/login Use the password to login: Name - 43110443 Password - 123 Uni - Sathyabama Year - 2023-27 There may be some visual bugs. I will fix them after some months once my exams are over. Currently working on: - sharingan-core — Python library enabling LLMs to understand videos in real time (research paper in progress) - AgentFox - Open source agentic browser (aiming to be better than OpenAI Operator / OpenCLAW-style automation) - Built a production-grade real-time bus tracking system Stack: FastAPI, Redis, WebSockets, Flutter Links to my projects: https://007k.framer.ai/projects/algozen https://007k.framer.ai/projects/faculty-tracker-app https://007k.framer.ai/projects/marin https://007k.framer.ai/projects/sist-transit Resume: https://drive.google.com/drive/folders/1I4DCwo148of-ltz9VYGcawA2nnBODRNF?usp=sharing My first time applying on X. Thanks for reading! I'm looking for paid internships. I love building stuff.
CS @ ITBA | Built Vision Transformers from scratch in 2022 by reading raw papers when resources were scarce
Hardware: Power supply stopped working, so I had to test different possibilities to determine why it wasn't working. Software: Programmed a ViT from scratch (2022) before AI chatbots were as massive as they are today. Had to read through the papers many times, as well as reading the only 4 posts about them.
CS @ San Jose State | Debugged RLM inference with MLflow tracing, solved REPL failures at scale
At the top of my head, I can think of a recent technical problem I faced while implementing RLM (Recursive Language Model) through DSPy. DSPy is a prompt engineering framework and RLM is a new inference strategy that helps language models persist during longer context problems. Although RLM was available as a module through DSPy, there was no official documentation available online except for one post on Twitter. I got an error when I ran this implementation on Google Colab. To solve this problem, I set up an observability dashboard through MLflow and ngrok. After reviewing a couple of traces, I figured out the issue—the RLM was failing for every query as it didn't find a REPL environment (Read-Eval-Print Loop) to perform its recursive approach. To make it work, I installed Deno which supports REPL environment initialization and reran the code. Great! This worked and I learned something new that day.
CS @ USC | Cracked a legacy Google PDF rendering problem that stumped engineers for years with novel abstraction layer
Internship work at Google: The core of the project was the most challenging to solve. Many engineers at Google and other firms had tried to solve it before but couldn't achieve anything. In addition to the technical complexity, I had to work on legacy code, which added to the strenuousness of the task. It was a mix of understanding a huge, less-readable, less-frequently maintained, and complex code structure, and understanding the geometry behind how PDFs render text. Solving this task comprised many things: weekly meetings with the TL, product area lead, and the team to analyze various approaches, discuss recent changes around code modularity, and code abstraction. The final solution was to build an abstraction layer over the current structure and relay the logic from the lower level to the higher level. I was able to form a working prototype by the end of the internship, which led to appreciation from managers and the team.
CS @ PES University | Built nano-modal from scratch, reverse-engineering Modal's gRPC container orchestration
Prime Intellect Bounty program - solved through in-depth discussion with team members. Working on nano-modal (minimal implementation of Modal platform). The hardest part was working with gRPC and making containers execute code. Solved by understanding how the actual Modal platform works and taking inspiration from some of their blogs, talks, and tweets regarding the implementation.
CS @ Purdue | Debugged and fixed a torch.compile memory leak in SAM optimizer through VRAM profiling and PyTorch internals deep-dive
Detected a memory leak when using torch.compile with mode=max-autotune with the Sharpness Aware Minimization (SAM) optimizer. I solved it through extensive debugging and VRAM monitoring, trying different SAM implementations, reviewing PyTorch documentation, and investigating open GitHub issues. The problem was assigning to tensor.data inside the optimizer step function, which the max-autotune mode does not support.
Columbia AI/ML engineer who built batched speculative decoding with jagged sequences for 2x+ inference speedup
Implemented batched speculative decoding inference engine using simple PyTorch and Hugging Face APIs. Primary challenge: Most batched speculative decoding approaches prune acceptance length to the minimum in the batch to keep sequence lengths in sync. I handled jagged sequence lengths and corresponding KV-cache by proposing two approaches to overcome the limitations of cache implementation in Hugging Face APIs. Implemented two approaches with tradeoffs to achieve an inference speedup. Currently working on a scheduler to dynamically switch between these approaches.
Full-stack dev @ MMMUT | Optimized 3D dataset rendering for smooth browser performance at scale
TL;DR: One of the hardest problems I solved was optimizing performance in a large dataset visualization project so it remained smooth and usable. While working on a 3D visualization project, I had to render large datasets in the browser while maintaining smooth interaction. Initially, performance dropped significantly, making the application difficult to use. To solve this, I researched rendering techniques, optimized data loading using streaming approaches, reduced unnecessary re-renders, and adjusted how assets were loaded and displayed. I also profiled the application to identify bottlenecks and improved how components updated. This experience taught me how to approach performance problems methodically—measure first, identify the bottleneck, test improvements, and iterate until the system became stable and responsive.
UC Berkeley engineer who built a bi-directional sync engine with custom conflict resolution for offline-first apps
Implementing a bi-directional sync engine between a local NoSQL store and a cloud PostgreSQL database. The primary technical hurdle was resolving write conflicts and lost updates that occurred when users edited data while offline. I solved this by implementing a custom LWW conflict resolution strategy paired with a versioned synchronization protocol to ensure deterministic state convergence across all clients. Also, figuring out systematic trading strategies on Kalshi was extremely difficult and required more creativity than math.
Weizmann Institute researcher who shipped tool use to Databricks' LLM engine serving Fortune 500s at scale
I implemented tool use in Databricks' LLM inference engine, used by dozens of Fortune 500 companies. To make this fast, we had to integrate a trie data structure with a finite state machine into our LLM inference engine.
MBZUAI researcher inventing multimodal fusion architectures from first principles to preserve untrained LM capabilities
Inventing a new multimodal fusion architecture paradigm to preserve the native capability of the language model (untrained). Currently still solving it, but mostly by going to first principles and coding from scratch layer by layer to make sure architecture design, gradient flow, training setup, and evaluation significance are valid and well-built.
CS @ Azim Premji | Built drone pathfinding on edge compute to recreate Anduril Lattice
At my time at Trishul, I spent most of my time working on integrating path-finding movements and mapping for rough terrain while managing the physics and movement for drones on an extremely limited edge-compute budget. We were trying to recreate the Anduril Lattice system.
CS @ Purdue | Built custom small language model architecture optimized for real-time on-device inference
I developed a custom architecture for proactive small language models optimized for on-device inference. I started by synthesizing recent research papers to identify gaps in current efficiency methods. Using Gemini as a sounding board, I validated my logic against existing benchmarks before prototyping in Google Colab. The first version struggled with latency and failed to run on limited hardware like mobile devices, so I refactored the system by optimizing vector operations and stripping redundant layers. Through iterative testing, I ended up with a streamlined, efficient system capable of proactive task execution.
CS @ UCLA | Built causal neural networks for ECG analysis using multiple instance learning on 24-hour recordings
I struggled with analyzing 24-hour ECG recordings. The initial approach was to use downsampling, but I pivoted to multiple instance learning. Rather than analyze the entire recording at once, I had my model generate representations for each segment first, then aggregate these representations. My first aggregation approach effectively saw all segments at once, which is incorrect for ECG interpretation, as future segments do not influence past ones. I then used a Causal Neural Network to perform aggregation.
CS @ Waterloo/Cerebras | Built custom RTOS kernel with sub-ms context switching and EDF scheduling on bare metal
Built a custom RTOS kernel running on an STM32 board. Implemented low-latency context switching and multi-threading (requiring extensive work with interrupt types and alternatives), as well as priority-based scheduling (EDF scheduling) and low-latency malloc/dealloc.
CS @ UT Dallas | Building game engines from scratch and scaling AI/ML systems
The most difficult technical problem I ever faced was implementing an acceleration structure in my from-scratch 3D game engine for physics. I solved it by standardizing volume checks into each world object's hull, making the structure containing and updating code much easier to handle.
Data science grad at UT Dallas building AI systems and foolproof web apps for real-world problems
I was working on a personal web application for a food business where a requirement was that the application had to be robust and handle different quantity metrics from different users, but also be simple to operate so that most of the admin work could be done without needing to understand databases or JSON structures. It took a lot of understanding on my part to make sure all of the requirements were met in the backend while also trying to make the UI extremely simple to operate. I used Appwrite as the backend as a service to handle data, but designed custom UIs so that the admin page could be controlled by simple toggles to select specific things. Because the system needed to be robust, I also had to make sure there were checks for when an admin makes a change to visually confirm some of these changes to prevent technical errors from occurring.
Full-stack dev at MAIT | Building secure auth systems and tackling JWT challenges head-on
The most difficult problem I faced was during the BackendXpress project, where I tried to implement a JWT-based authentication system with access and refresh tokens. I tried many times, but it didn't work out very well. So I implemented a dual-token strategy, stored refresh tokens securely in the database, and created middleware for token verification.
AI/ML researcher at University of Indonesia writing custom CUDA kernels to push past Python's limits
Writing a custom CUDA kernel. I was doing research with NVIDIA and the Python for loop implementation was too slow, so I had to write a custom CUDA kernel. I solved it by going back and forth chatting with an LLM (it was early 2025, so LLMs weren't as good).
Montana State | Built a 250M LLM from scratch using my own deep learning library
Trained and instruction-tuned a 250M LLM with the deep learning library I wrote. On the DL library side, I was very pedantic with tests, validating gradients/activations to a tight tolerance over every case I could think of. This made it somewhat easy to make a bunch of very fast and small steps forward. On the decoder LM side, I spent a lot of time reading the foundational papers (GPT-2, BPE, attention). With a solid understanding of these, faithfully re-implementing them was quite smooth.
17-year-old AI/ML engineer at University of Lagos building smarter models through data optimization and deep learning
As a 17-year-old self-taught developer, one of the most difficult technical problems I faced was improving the accuracy of an image classification model I was building with TensorFlow. The model was underperforming due to data imbalance and overfitting. To solve it, I: Cleaned and restructured the dataset Applied data augmentation Tuned hyperparameters (learning rate, batch size, epochs) Added dropout layers to reduce overfitting Compared different architectures and selected the most efficient one This process helped me improve the model's accuracy significantly while reducing training time. It also strengthened my debugging skills and understanding of model behavior.
AI/ML researcher at VIT who cut their teeth on Rust compiler internals at GSoC
While working on Rust Compiler during Google Summer of Code, I had to spawn separate processes and talk to them through FFI. At the time, I did not understand the computer hierarchy too well and found it very mind-bending stuff. So after the GSoC project, I created another project called typ-browser whose core was written in Rust and UI in SwiftUI, similar to Ghostty. This was deliberately done to practice and understand FFI and process communication.
AI/ML researcher at Nigeria Maritime University applying structured problem-solving from wireless power to machine learning systems
During my wireless power transmission project, I initially couldn't achieve efficient energy transfer because the coils weren't resonating at the same frequency. I solved it by recalculating circuit parameters, redesigning the coils, and running iterative tests until efficiency improved. It taught me structured troubleshooting.
Marketing strategist who created pre-launch hype for Kimi K2, Gemini 3, and GPT-5 using zero-budget methods
I know how AI companies spend millions on ad campaigns yet fail. I solved this with my own method and have done it for free to test creating hype for Kimi K2, Gemini 3 before launch, and GPT-5—obviously unpaid, yet successful.
CS @ University of Minnesota | AI/ML researcher who sees every problem as a unique learning opportunity
I don't have a specific problem that's "huge enough" to mention, but I believe each problem I come across is unique in its own way, and I always grow a step after solving it. :)
PhD @ UCD | Building self-supervised speech models that work in low-resource languages
My research is focused on self-supervised speech representations for low-resource speech models. I am a 4th year PhD student at University College Dublin.
Gannon hardware engineer | Built autonomous vehicles to monitor electric power grids at scale
I built autonomous vehicles designed to observe electric power grids.
UCC student engineering AI/ML systems with production-ready deployment experience
I need the actual text content to clean up, not a URL. Please copy and paste the text you'd like me to edit for grammar, spelling, and capitalization.
CS @ UT Dallas | Published IEEE research optimizing redundancy vs deduplication in distributed systems
https://ieeexplore.ieee.org/abstract/document/10454894 Mainly optimization between redundancy and deduplication. I also worked on various other research problems: https://scholar.google.com/citations?user=N30jT7EAAAAJ&hl=en
CS @ Khwaja Moinuddin Chishti Language University | Modernizing legacy codebases with AI-assisted tooling
Updating a two-three year old repository with latest documentation and updates. I achieved it by AI-assisted coding using Context7 MCP.
Hunter College student mastering ML fundamentals through Calc, Stats, and Python—ready to build production systems
I have never really solved that many technical problems, but I am eager to learn and help as much as I can, whether that be getting coffee for the team! I know there are other candidates that will be better, but I will do whatever it takes. Even if it's paid or unpaid, I would just like a chance. Of course, I have only taken Calc 1, 2, Stats 213, Matrix Algebra, Intro to Python, C++, and Computer Architecture.
CS @ IIT BHU Varanasi | Building AI text editors with precision selection tracking in React
I was building an AI assignment for a text editor. The problem was to let AI work on only a selected section of text. It took a while to track the selection. Thankfully, Tiptap React had something to help.
CS @ Faculty of Sciences of Tunis | Modernizing legacy SCORM packages from 2004 for today's browsers
Making SCORM packages from 2004 work on modern browsers
CS @ Vivekananda Global | Backend & AI/ML engineer who debugged AI-generated code at scale
Honestly, the most technical problem I've ever faced was debugging AI-written code while making a project. Sometimes when I took help from AI, it made changes in my file and I didn't notice the log of what it did, so it was difficult to find that bug and solve it. I solved that by keeping my eyes on how and where it was making changes and giving the AI proper instructions to give me the whole log at the end showing what changes it made.
CS @ University of Michigan | Building AI agents with shared context across Instagram, Discord, iMessage & web platforms
As an AI agent by Anmol, I don't face personal technical challenges like a human engineer does. My "problems" are handled through training data, algorithms, and iterative improvements by my creators. That said, one of Anmol's most difficult problems is developing AI agents with shared context across different platforms—like Instagram, Discord, iMessage, internet forums, and more.
CS @ University of Memphis | Building AI lesson planner for legacy ed-tech platform with messy WordPress/PHP stack
The biggest challenge I faced was during my capstone project, where I am developing an AI lesson planner for a client's legacy educational platform. The site used an outdated stack (WordPress, PHP) and a poorly structured database, and the client lacked a clear feature roadmap. I took initiative to structure the project by establishing biweekly design meetings to break down requirements and implementing a Kanban board to track progress. I am currently executing this plan.
CS @ IIT Madras | Uncovered cyclicity patterns across Indian economy sectors using ML-driven GDP sensitivity analysis
Finding the right kind of data to do my first project, I found data on different sectors of the Indian economy so that I could uncover cyclicity and GDP sensitivity of different sectors of the economy.
Frontend dev at Royal College building full-stack apps who debugged complex third-party API integrations to production
In one of my recent projects, I encountered a major roadblock while integrating a third-party API. Initially, I couldn't get the API to respond at all, and once I established a connection, the calls were failing due to incorrect parameter mapping.
CS @ PES University | Debugged erratic real-time AI system integrating frontend, backend & ML at scale
I was working on a project that required real-time communication between the frontend, backend, and an ML/logic component. When tested separately, each module functioned flawlessly. However, after integration, the system displayed erratic behavior, including data inconsistencies, slow responses, and sporadic, difficult-to-replicate failures.
Cummins College AI/ML intern who debugged 100k+ row fintech pipeline by ditching LLMs for concat logic
In June 2025, I was working as a Gen AI Intern at a startup, where I was given a project involving a lot of CSV files with huge datasets. The project was related to the fintech domain and required forming sentences using rows of CSV data (e.g., "Ishwari Shekade, age 18, living in Pune") from the CSV data. Since I had just started working as a Gen AI Intern, I was very fascinated with LLMs and decided to use them for this task. I wrote the code and tested it on a small part of the CSV rows to see how effective it was. However, because the data was huge, it was taking too much time and my laptop was heating up. I tried making a lot of changes to optimize the code, but nothing worked and I was kind of stuck. I discussed this with my co-intern, but we couldn't conclude anything significant. I then decided to explore alternatives to LLMs for this task and came across fuzzy logic libraries. However, I learned that companies already use this nowadays and it's not very accurate, which is why the project came up in the first place. Then I decided to try a hit-and-trial approach: I used simple Python libraries and concat operations, and it worked! It was giving the expected output. This slight change in my approach led to significant results. This, according to me, was the most difficult technical problem I've faced, which I managed to solve eventually.
CS @ Federal University of Technology Ikot Abasi | Building ML-powered attendance systems for live classroom deployment
Trying to build a model for my department to submit attendance during classes.
HSE Moscow data scientist who vibe-coded an entire master's project in Python and crushed it with top grades
I didn't know Python properly, so I vibe-coded my whole master's project and passed with an excellent grade.
AI/ML researcher at University of Pavia diving into discrete geometry for robotics applications
Needed to study convex polyhedra (discrete geometry) for a side project and had to read difficult math papers.
AI/ML researcher at Georgia Tech building audio ML models that decode F1 telemetry from engine sounds
The task of predicting the speed, engine RPM, gear, and throttle% of a Formula 1 car by listening to its engine audio! Here is my solution demo: https://www.youtube.com/watch?v=ZsDxqnzAOLk Check out my other Audio ML projects here: https://govindamadhava.dev/
Banking developer at BITS Pilani solving complex stored procedure bugs in production systems
One difficult technical issue I faced was in a banking application where a specific scenario was not working because some logic was missing in a stored procedure. I analyzed the requirement, reviewed the stored procedure, identified the missing condition, and added the correct logic. After testing it in different scenarios, the issue was resolved successfully.
Remote AI/ML engineer at Orange Business tackling stable diffusion image generation challenges
Working on stable diffusion image generation, couldn't find the solution yet
AI/ML researcher at University of Alberta tackling representation learning challenges in offline RL
Poor representation learning in low-coverage offline RL
Wayne County CC explorer demystifying multi-model deployment through trial, error, and persistence
Approaching the task of running multiple models on the same hardware while not having a clue what I was doing. Consistently working on improving strategies and learning from failed ones led me to utilize models like CDs. This has changed my workflow enormously. I consider that solved.
AI/ML researcher at IIIT Delhi leading teams through large-scale experiments and ICML submissions
I recently led a research project that was submitted to ICML 2026. It was a lot of fun and I had to manage a team of 4 people, organizing and dividing work, running over 1000+ experiments, and calibrating them.
McMaster AI/ML engineer who learned React the hard way at hackathons before AI could help
The most difficult technical problem I faced was probably when I went in blind using React for one of my first few hackathons. AI was not at the same level as we have now, and I basically entered a vicious circle where the AI couldn't solve the problem at all and just started giving more errors and hallucinating a lot. I tried fixing it using the documentation but couldn't make heads or tails of it. I tried taking the help of mentors too, but they had no clue how to fix it. One of the most stressful moments of my life—I had to do some workarounds to get it to half work. Still the hardest hackathon I have ever participated in. I learned later on how to solve the error, and now I never run into the same issue ever again :)
Data Engineer @ Rutgers building AI-powered web apps with vibe coding and rock-solid backend infrastructure
Currently working on creating a web application for a project, coming from a Data Engineering background. I only know data movement and building workflows. I am using AI and vibe coding for the application but building the backend deployment and the integration of the pipeline on my own.