from Hacker News

Helium: Lighter Web Automation with Python

by mherrmann on 12/11/24, 12:11 PM with 49 comments

  • by hugs on 12/12/24, 12:00 AM

    Selenium project founder here. (Hi!) Thanks for all your work on this project. Lots of negativity around here these days, but just wanted to say thanks. The functional style of Helium's API reminds me a lot of Selenium's original API when it was 100% JavaScript (aka Selenium 1 aka Selenium Core) back in 2004.

    (Functional style: "method(thing)" vs object oriented style: "thing.method()")

    We mostly abandoned the functional style when we merged with the WebDriver project (aka Selenium 2), but that functional style still lives on in the Selenium IDE record/playback tool.

    That is all to say, there are fans of many different styles for automation APIs. No single API will please everyone. (But I personally like the simpler, functional style, fwiw!)

    Side-note: This is also why I'm a fan of the Nim programming language. "method(thing)" and "thing.method" are supported syntax for literally the same thing. For others new to the idea, the fancy term for this is "Uniform Function Call Syntax".

  • by languagehacker on 12/11/24, 2:49 PM

    Importing * is universally discouraged by most Python linters and best practice docs. You can always "import helium as h" if you're looking to type less.

    This looks largely like common workarounds that most people will write using Python-based browser automation. Most of the time, we accept that those capabilities aren't there by default because they are not explicit enough and can result in bugs and undefined behavior even when the elements that we expect to be on the page are actually there.

    Given the adage "explicit is better than implicit", I worry that a layer like this might create more trouble than it's worth for the sake of readability. When we get into the nitty-gritty of browser automation, it might just make it harder to debug than going straight to Selenium or Playwright.

  • by nkrisc on 12/11/24, 3:10 PM

    Having done some ad-hoc, temporary automation with Selenium in the past (to help fellow, less technically-inclined designers) I wish I had this at the time.

    Looks like a nice, almost natural language-like API around what is otherwise a quite cumbersome API.

  • by wokwokwok on 12/11/24, 3:50 PM

    How can a wrapper around selenium be lighter than it?

    A wrapper around an API is by definition heavier (more code, more functions) than using the lower level api.

    It’s not using less resources.

    It’s not faster (it has implicit waiting).

    It’s not less code; it’s literally a superset of selenium?

    Feels like a “selenium framework” is more accurate than light weight web automation?

    Anyway, there’s no fixing automation tests with fancy APIs.

    No matter what you try to do, if people are only interested in writing quick dirty scripts, you’re doomed to a pile of stupid spaghetti no matter what system or framework you have.

    If you want sustainable automation, you have to do Real Software Engineering and write actual composable modules; and you can do that in anything, even raw selenium.

    So… I’d be more interested if this was pitched as “composable lego for building automation” …

    …but, personally, as it stands all I can really see is “makes easy things easier with sensible defaults”.

    That’s nice for getting started; but getting started is not the problem with automation tests.

    It’s maintaining them.

  • by wslh on 12/11/24, 2:11 PM

    How does it compare with the "usual suspects"? I mean Playwright, Selenium, Cypress, and Puppeteer.
  • by fermigier on 12/11/24, 1:30 PM

    "We shut down the company at the end of 2019 and I felt it would be a shame if Helium simply disappeared from the face of the earth."

    I appreciate the effort. Thank you M. Hermann.

  • by giis on 12/11/24, 4:57 PM

    Looks nice. Is it possible start_chrome() with specific chrome browser profile name or re-use existing open firefox/chrome browser session and launch a new tab with specific domain?
  • by bilater on 12/11/24, 6:16 PM

    Nice - I can see some cool agentic flows created using this. A thing I want to look into is creating a sandbox instance (Ubuntu?) and letting an agent do its thing. Could be collecting data or answering questions and I can pull up the window to check in from time to time. It'll be like having an assistant.
  • by bryanrasmussen on 12/11/24, 7:34 PM

    How easy is it to detect that this is automation as opposed to a real user? I suppose probably pretty easy, so not sure if it is useful if I want to automate the web for things I do every day as I would really be running the risk of turning off access to those things if they determined I am automating them.
  • by edm0nd on 12/11/24, 3:58 PM

    Very neat!

    Rolling in a captcha solving service like DeathByCaptcha or AntiCaptcha and you got yourself a quick and easy script that can do anything on any website regardless of captchas.

  • by quickvi on 12/11/24, 3:00 PM

    for lightweight automation outside the browser:

    https://github.com/elyase/screenium

  • by Havoc on 12/12/24, 12:35 PM

    That looks useful. How does it know which box is the user field? Just read label and assume the one below that or to the right of the label?
  • by slt2021 on 12/11/24, 4:42 PM

    Thank you for sharing this project, this is really good
  • by bg24 on 12/11/24, 4:34 PM

    Nice work! I looked at the cheatsheet, and it is not obvious to me how to go through two factor authentication during login.
  • by __mharrison__ on 12/11/24, 3:44 PM

    Thanks for posting. All this AI has been interested in scraping personal sites.
  • by crazymoka on 12/11/24, 3:31 PM

    Can it be headless?
  • by Byte64 on 12/11/24, 6:57 PM

    This is so cool