Inspiration

As software engineers, we know the importance of good documentation but often find it tedious and time-consuming to write quality docstrings. We wanted to create a solution that automates this critical but boring task. Especially for new developers, it becomes tedious to comprehend.

What it does

DocPilot integrates with GitHub to automatically generate docstrings when new functions are added or updated. It analyzes the function signature and body using Claude's advanced NLP to produce high-quality, informative docstrings.

How we built it

The core logic is written in Python, with Claude's API providing the NLP capabilities. We have a GitHub app and bot that receives webhook events and calls the docstring generation code. The bot then creates a pull request with the new docstrings.

Challenges we ran into

  • Parsing complex Python function signatures and structures like nested functions, variable arguments, type annotations etc. We built custom parsers to handle diverse edge cases.
  • Accounting for different coding conventions and styles - our logic had to be flexible enough to analyze different naming schemes and code organization.
  • Generating coherent docstring text that flows well and explains technical concepts simply needed iterative refinement of our NLP templates.
  • Performance optimization was required to meet GitHub webhook timeouts - we implemented caching, concurrent processing and other improvements.
  • Testing rigorously across different Python projects revealed edge cases that needed exception handling. Building a great user experience within the constraints of pull requests and GitHub UI.

Accomplishments we're proud of

  • The ability to handle a wide variety of Python functions, including functions with complex signatures, variable arguments, default parameters, and nested functions, and generate consistent PEP8-compliant docstrings for all of them.
  • Smooth integration with GitHub via webhook events that automatically trigger docstring generation on push events and pull requests. This makes the system seamless and unobtrusive for developers using DocPilot.
  • Docstrings contain a rich set of useful details extracted from the function - descriptions summarized from the function body, data types for arguments and return values, exceptions raised, sample usage code, and more.
  • The docstring content is written in a natural language style with consistent formatting and structure following best practices. This increases readability and makes the documentation easy to understand.
  • Careful handling of edge cases including functions with no parameters or return values, optional arguments, multiple return types etc. DocPilot intelligently handles these scenarios.
  • Flexible configuration system allows customization of docstring templates to match project conventions and enable generation of company/product specific examples and content.
  • Robust testing framework with high coverage ensures quality and reliability of docstring generation across diverse codebases.

What we learned

  • The value of good documentation and the difficulty developers face maintaining it.
  • Techniques for analyzing code structures and extracting key information.
  • Challenges of generating natural language from technical concepts.

What's next for DocPilot

  • Support for additional languages like JavaScript, Java, and Go.
  • Customization of docstring templates.
  • Auto-updating outdated docstrings when functions are modified.
  • AI-assisted writing of longer form documentation.

Built With

Share this project:

Updates