from Hacker News

I made a simple geolocation service

by maxko on 7/24/20, 7:17 AM with 176 comments

  • by tastytacos on 7/24/20, 12:56 PM

    I applaud your efforts but it's disingenuous to title this how you "made a geolocation service" when you are just reading a field in someone else's geolocation service.
  • by sudhirj on 7/24/20, 8:50 AM

    Google App Engine had this quite a while ago (almost a decade, I think)and a very generous free tier. Also has approx lat-long, city, state and country.

    https://blip.runway7.net https://github.com/runway7/blip

    I lost track of the consumer apps I've used this on and still haven't received a $1+ bill.

  • by skuhn on 7/24/20, 9:45 AM

    Since most CDNs are already doing GeoIP lookups (for request headers or log entries), you can leverage that to provide the data back in the response body via origin, worker or even CDN edge config.

    Programmatically populating the response body, as in the Cloudflare worker example from the post, is better than going to the origin just to echo some headers back in the response. To me, something like Fastly's VCL config language is even simpler. It directly executes on every CDN edge node worldwide upon request.

    For example, I just whipped this up on Fastly using VCL. It returns GeoIP as json data for your IP at the root path:

    http://geo.zombe.es

    Or if you want a particular IP, just append it to the path:

    http://geo.zombe.es/2a04:4e42:600::313

    You could do the same via query params, headers, etc. Have URL endpoints that only return some of the data, and so forth.

    The VCL syntax gets a little gross when you handle quoting strings and assembling json and testing if the string is empty, but it gets the job done.

    Of course what you might want from GeoIP data may not be what you get. It's really kind of a useful kludge that gets treated sometimes as a panacea.

    This dataset right now thinks that I'm about 5 miles east of my location, but when subnets are repurposed it could be much more significant. And the data sources are always changing, so who knows what it will think tomorrow.

  • by jasonlingx on 7/24/20, 8:30 AM

    This one I made couple years ago (and haven’t checked in years) is still running for free thanks to Heroku and Cloudflare: https://github.com/jlxw/geoip
  • by itsjloh on 7/24/20, 8:39 AM

    I (like many others it seems!) have also built a geolocation service however mine is built on top of MaxMind's DBs that are mentioned in the post. Its on a few boxes running OpenResty and now handles 130m+ requests/day. Was really fun to build!

    https://github.com/jloh/geojs https://www.geojs.io/open/

  • by waffl on 7/24/20, 8:55 AM

    I had to tackle this problem as well for a small project and found that I could just send a GET request to https://www.cloudflare.com/cdn-cgi/trace and parse the response with a simple regex: /loc=(\w*)$/gm

    The response time averages 10ms and is free, granted I'm not sure if there is a ToS or anything attached to this endpoint, I've only found non-conclusive discussion here https://community.cloudflare.com/t/what-are-the-terms-of-use...

  • by speedgoose on 7/24/20, 8:17 AM

    Well that's a bit more than 2 requests per second on average. Sure you may have some bursts once in a while, but nothing a 3€ VM can't manage IMHO.
  • by ignoramous on 7/24/20, 2:15 PM

    Full marks to Cloudflare for engineering a radically effective and simple alternative to AWS Lambda (+API Gateway): It is simply a fantastic Serverless offering for low-latency network-bound workloads. For paying customers, they even thrown in freemium access to their globally distributed KV store and a forever-free Zonal Cache to sweeten the already good enough deal. That said, good luck with their Support team in case you discover undocumented limits (in production) like these [0].

    I got sent a (frustratingly incorrect) bot reply to a ticket and a reminder that Enterprise / Business / Pro customers are priority (in that order) even though I pay for Workers. It has been an uphill battle to get someone to take a look at the ticket so far. Thankfully, we haven't gone to production yet, but as a consequence, now need to plan to add mitigation in scenarios where Workers blacks-out our traffic (but Support can't be of immediate help because "free customer").

    [0] https://community.cloudflare.com/t/workers-and-sub-requests/...

  • by rightbyte on 7/24/20, 11:21 AM

    The cheapest alternative has to be to ask the users where they are and store it on the user device? I never understood the need for geoip unless you ship spyware.

    If the exact location is important geoip is not accurate enough anyway. Forwarding to regional sites automatically is just annoying when it doesn't work properly or someone is traveling abroad.

  • by walrus01 on 7/24/20, 8:56 AM

    since this article mentions maxmind, this reminded me of when geolocation by IP address goes horribly wrong, and non-technical persons interpret the results as something to be relied upon as factual:

    https://arstechnica.com/tech-policy/2016/08/kansas-couple-su...

    https://www.theguardian.com/technology/2016/aug/09/maxmind-m...

    https://mashable.com/2016/08/11/ip-addresses-kansas/

    ISP perspective here: Geolocation by granular /24 to /20 sized block of ipv4 space is often wildly inaccurate on a regional basis. It's entirely possible for the ARIN registration (used by maxmind) to be a street address in Seattle, but serving end user ISP customers a 4.5 hour drive away in a far eastern corner of WA state.

  • by efesak on 7/24/20, 9:07 AM

    Also please see https://github.com/analogic/ipgeo daily actualized ip/country database with open license (shameless ad)
  • by wingi on 7/24/20, 8:24 AM

    I had a geo-location service running in google app engine. for free. After pimping caching the daily traffic of 400.000 requests/day was in the free-tier.

    https://www.united-coders.com/christian-harms/detailed-perfo...

  • by fs2 on 7/24/20, 9:01 AM

    I've used the free MaxMinds database for a while but since the last year I've been using iplist.cc. It supports IPv4, IPv6, shows if an IP is tor, spam, which ASN and a lot more.

    It's also free and fast but with services like these I always wonder how long they manage to stay free.

  • by bellwether on 7/24/20, 8:11 AM

    Love the ingenuity and thank you for the performance comparison on Cloudflare Workers vs AWS Lambda. I personally wouldn’t consider what you built a Geolocation service, but glad it solved your use case!
  • by gitgud on 7/24/20, 9:42 AM

    Side note:

    6 million requests per month is only 2.2 requests per second, a raspberry PI could do this (technically).

  • by jamesponddotco on 7/24/20, 4:03 PM

    I built a simple API using pure NGINX to get the IP address of a client, for times when I needed to ask a customer for his IP address; it was easier sending them a URL for a service I control, than explaining how they could get that information another way.

    Now I have been thinking about opening it up for more people because while there are a variety of these services out there, one more does not hurt — it already exists, and works, so who knows, maybe more people would like to use it. The code is open, and access logs go to /dev/null; I could probably add a read-only SSH user for people to confirm that for themselves.

    Geolocation could easily be added to it, but then comes my question: is there any use-case for geolocation APIs that does not involve tracking users for shitty purposes?

    I was excited about adding this feature because I am using pure NGINX for it, and it was a fun learning experience, but I asked myself that question when I started writing the documentation for the website, and I still do not have an answer. Marketing material for other APIs that offer geolocation usually have user tracking as a selling point.

    Personally, I have no use for geolocation, and if all use-cases involve tracking without consent users, and breaking their privacy, I want no part in that.

  • by whatl3y on 7/24/20, 8:34 AM

    So this finds the country based on some cloudflare-populated info of an incoming request (which sounds like it solved OP’s problem which is good), but if you want to use MaxMind’s database to find the country of any public IP I built the following a few months back:

    https://github.com/Risk3sixty-Labs/geoapi

  • by sradman on 7/24/20, 12:13 PM

    TECHNIQUE: use the proprietary HTTP Request headers available through CDN/Cloud providers like Cloudflare Workers' cf-country [1], Amazon CloudFront's CloudFront-Viewer-Country [2], and Google App Engine's X-Appengine-Country/-Region/-City/... [3] to get client Geolocation data.

    [1] https://developers.cloudflare.com/workers/reference/apis/req...

    [2] https://docs.aws.amazon.com/AmazonCloudFront/latest/Develope...

    [3] https://cloud.google.com/appengine/docs/standard/go/referenc...

  • by ignoramous on 7/24/20, 6:05 PM

    One of the main challenges of building on Serverless platforms is rate-limiting.

    There's nothing stopping a script-kiddie from thundering away at the Serverless endpoint resulting in an unexpected and quite high billables [0].

    As for Workers specifically, Cloudflare's rate-limiting plan makes the whole thing 10x expensive at $10 for 2,000,000 good-requests [1] + $1 for 2,000,000 Workers requests. Other cloud providers I don't think fare any different.

    [0] https://community.cloudflare.com/t/how-to-protect-cloudflare...

    [1] https://support.cloudflare.com/hc/en-us/articles/11500027224...

  • by contravariant on 7/24/20, 1:07 PM

    I'm a bit confused. This looks like a few hundred lines of code to read a value from a hardcoded dictionary. Even as a proof of concept it would be more sensible to just add two numbers or something, at least that gives the impression that you could also make the API do something useful.
  • by EE84M3i on 7/24/20, 8:43 AM

    What are the runtime limitations that prevent the maxmind database from working inside cloudflare workers?
  • by scottndecker on 7/24/20, 4:43 PM

    MaxMind data accurate to the zip code level only about 30-60% of the time (as compared with what Google's Geolocation service will provide which is based off more data points than just IP address). Only use MaxMind if you're looking for region or country level accuracy.
  • by mcculley on 7/24/20, 10:57 AM

    Regarding latency of AWS Lambda:

    > on average the response took somewhere between from 200ms to 500ms

    I'm getting latency of 66ms to 126ms with some simple Java code running on AWS Lambda using provisioned concurrency. I find the latency is just fine for most use cases.

  • by csunbird on 7/24/20, 8:40 AM

    Interesting, so basically you are making Cloudflare's geoip service public for free.
  • by grizzles on 7/24/20, 8:37 AM

    How does MaxMind prevent someone from releasing an open source version of their database? If you are about to answer "Copyright", remember - you can't Copyright facts. This has been upheld in the Courts system many times.
  • by ColdHeat on 7/24/20, 2:25 PM

    I've been trying to build open source MaxMind alternatives for a bit. (Mostly so I could distribute them with my open source projects)

    IP to country is fairly easy and I open sourced all the scripts and the database itself [0].

    But IP to city is much harder, I'm not actually sure it's viable for anyone to do that without relying on some other 3rd party service.

    I'd be very interested to hear if anyone knows how to pull that off in an open sourceable manner.

    [0] https://github.com/geoacumen/geoacumen-country

  • by Nextgrid on 7/24/20, 8:56 AM

    Why would you need to depend on a third-party service and network being available for such a basic task? MaxMind provides their GeoLite database for free and it's extremely easy to embed it in your app.
  • by GiantSully on 7/24/20, 2:59 PM

    Though a little bit off-topic, before the service http://freegeoip.net/json/ ceased, I used it a lot in testing. I built one myself with the simplified source code from https://github.com/voyagin/freegeoip since I want to minimize the response time. BTW, the listed repo isn't mine.
  • by layoutIfNeeded on 7/24/20, 10:06 AM

    Can you make a geolocation service from static files? 2^32 IP4 addresses with 2 floats per address would take only 34 gigabytes of storage. Put 256 addresses in a given file, and turn the other three octets into folders. E.g. https://example.com/192/168/0.txt would contain the location for addresses 192.168.0.0-192.168.0.255.

    Would this be cheaper than running these services?

  • by sheeshkebab on 7/24/20, 11:31 AM

    This only work’s if geocoder is accessed directly from browser. It will not if you need to geocode ip addresses received in other ways on the server side.

    I guess it’s ok for a basic consumer website, although it’s not exactly equivalent to ip geocoder databases/services - they allow passing ip addresses as part of geocoding request.

  • by freelancercv on 7/24/20, 3:08 PM

  • by pereiratr on 7/24/20, 8:21 AM

    Nice! Have you tried AWS Lambda@Edge?
  • by tuananh on 7/24/20, 8:59 AM

  • by fasteo on 7/24/20, 9:21 AM

    Slightly off-topic.

    AFAIK, an unresolved problem is a proper geolocation service - at least for city level resolution - for mobile IP addresses. There are some services in this field (digital element), but they are very unreliable.

  • by stereo on 7/24/20, 9:21 AM

    That country list doesn't have Kosovo (2008) or South Sudan (2011).
  • by ing33k on 7/24/20, 12:23 PM

    doing a geo based redirection was my first golang project that got deployed to prod.

    Nginx + GeoIP2 module sets a HTTP request header and proxies the request to the golang app.

    The go app does a lookup from redis, based a combination of country header and some url params and responds back with a redirect header.

    Hosted it on a t2 medium in EC2 and I have seen it easily handle ~1500 requests / second without any issue.

  • by hluska on 7/24/20, 3:44 PM

    All possibly criticisms aside, this is really good and useful. Great job friend!!! It's a beautiful day for hacking!! :)
  • by priyaaank on 7/24/20, 11:39 AM

  • by llaolleh on 7/24/20, 6:18 PM

    This title is so misleading. I got to the end of the article and thought - wait this is it?!
  • by sigmonsays on 7/24/20, 1:25 PM

    i read the title and got interested, glad I skimmed. This is very misleading. You didn't "make" a service, you used someone elses.
  • by crispyporkbites on 7/24/20, 8:07 AM

    Are you charging for this?
  • by teknopaul on 7/24/20, 11:44 AM

    that two per second if you don't have a calculator handy.
  • by PaywallBuster on 7/24/20, 8:09 AM

    tl;dr uses Cloudflares workers as API endpoint and returns Country name based on header in the CF request.