Data Streams

Bitly NSQ Data Streams allow services or customers to receive real-time traffic for a large number of links via NSQ. Most commonly, customers will use data streams to access a full firehose of clicks to links on their account, but other flavors of data stream are available depending on your needs.

To learn more about the specific data streams available, pricing, and to get access setup contact api@bit.ly or your customer success manager.

Data Stream vs HTTP API

Accessing Bitly data over a data stream offers some trade-offs in comparison to our traditional HTTP API. Below are some of the pros and cons of each access method:

Data Stream

  • Per-Click Resolution - With data stream access you will receive a message describing each click event that your account has access to. This potentially can allow you to perform more complex and varied forms of analysis than you can accomplish with the pre-aggregated data returned by our HTTP api.
  • Real-Time Updates - Click events get pushed out over the data stream as fast as possible. While there aren't currently any latency guarantees on our data stream offerings, you will usually receive an event within seconds of a click happening.
  • Raw Data - While we still remove or obfuscate some data to protect the privacy of our users, a number of data points are provided in a raw form over our data stream that are otherwise only available in more heavily processed forms in the HTTP API. Some examples of this include user agent strings, Accept-Lanuge headers, and more detailed GeoIP data.

HTTP API

  • Pre-Aggregated Data - Counting at a certain scale can be hard, let us do the hard work for you.
  • Historic Data Availability - Our data streams only provide data as it is collected. Accordingly, to access historic data you must use our HTTP API.
  • Selective Querying - Our data streams are a literal firehose of your data. Every click that we see on your account gets sent to you. This can be very useful but it can also be overwhelming. If you are looking to get data about a specific bitlink or some other smaller subset of data, our HTTP API is likely a better fit.
  • Less Tooling Required - To call our HTTP API, you can use any commonly available HTTP client. To consume our data streams, you need to use an NSQ client which takes a bit more time to setup and learn about.

How Do Bitly Data Streams Work?

Once your access has been provisioned in our systems (email api@bitly.com or your success manager), you can point a NSQ client at our servers and we'll start pushing events to you.

For the nitty-gritty details of how NSQ works, check out the NSQ docs.

The short version is NSQ is a queue based messaging system. For each click that we see on one of your bitlinks, we'll send a JSON object that describes the click. If your client disconnects or has trouble keeping up with the stream for some reason, we'll queue messages until you are able to catch up.

Below is an example of a click event message that you may receive over our data streams:

{
  "h": "nbsMyD",
  "g": "baMsDr",
  "l": "chauncey",
  "hh": "my.bsd.co",
  "u": "https://example.com/",
  "r": "http://lm.facebook.com/lsr.php?u=https%3A%2F%2Fexample.com%2F",
  "a": "Mozilla/5.0 (Linux; Android 4.4.3; KFTHWA Build/KTU84M) AppleWebKit/537.36 (KHTML, like Gecko) Silk/49.3.1 like Chrome/49.0.2623.105 Safari/537.36",
  "i": "",
  "t": 1463954895,
  "k": "",
  "nk": 0,
  "hc": 1463750168,
  "_id": "dde88210-97bc-6cbd-738d-1c6a7beb77be",
  "al": "en-US,en;q=0.8",
  "c": "US",
  "tz": "America/New_York",
  "gr": "ME",
  "cy": "Topsham",
  "mc": 500,
  "ll": [
    43.9602,
    -69.9654
  ],
  "pc": "04086"
}

Data Stream Field Reference

Below is a reference describing the various fields that messages in our data streams may include.

NOTE: Not all of these fields will be available in all data streams and are subject to filtering based on expected use and contract terms.

Key Name Description
h User Hash Unique hash that we generate for each combination of shortened long URL and login. Useful for referencing sepcific Bitlinks.
h Global Hash Unique hash that we generate and share for all instances of a long URL. Useful for referencing documents/URLs.
l Login User who originally shortened the link. There are two special users of "Annonymous" and "Bitly" that are used for anon shortening and global hash generation.
hh Host Header Host header of this redirect request (some Bitlinks are valid on multiple short domains).
u URL Long URL that the user was redirected to.
r Referrer Referrer header of the Bitlink request.
a User Agent User Agent header of the Bitlink request.
i IP Address Will always be empty to protect user privacy.
t Timestamp Unix timestamp of when the decode occurred.
k Cookie Will always be empty to protect user privacy.
nk Known Cookie Will be 0 if this cookie has not been seen before, 1 if it has.
hc Hash Creation Unix timestamp of when the Bitlink was shortened.
_id ID UUID unique to this decode.
al Accept Language Accept Language header of the decode request.
c Country 2 letter country code based on the MaxMind GeoIP dataset.
gr Geo Region Based on the MaxMind GeoIP dataset (optional).
cy City Based on the MaxMind GeoIP dataset (optional).
mc MetroCode/DMA Code Based on the MaxMind GeoIP dataset (optional, lookup table).
pc Postal Code Based on the MaxMind GeoIP dataset (optional).
tz Timezone Based on the MaxMind GeoIP dataset (optional).
ll Latitude, Longitude Based on the MaxMind GeoIP dataset (optional).

Consuming Data Streams

There are a few options for consuming Bitly Data Streams. Depending on your needs and preferred tooling different options may work better for you than others.

NSQ Client Libraries

The NSQ community have developed a number of native clients for various languages and platforms. You can find the full list of known client libraries and their capabilities in the NSQ documentation.

NOTE: To use a NSQ client library with Bitly Data Streams, the client must support AUTH and TLS. Not all client libraries support these features so be sure to check.

Client libraries allow you to build custom stream consumers that directly consume the data stream and take actions on each message as it comes in (increment metrics, write data to a BI system, etc). If a client library is available for your preferred platform, this is usually the preferred approach.

NSQ Utilities

If there isn't an NSQ client library available for your preferred platform or if you have relatively simple processing needs (e.g. archiving the stream to disk), NSQ includes a number of builtin utilities that you can use to consume our data streams.

  • nsq_to_file - Consumes a NSQ stream and writes the stream to disk. This is often a useful tool if you are periodically ingesting stream data into a data warehouse or BI tool.
  • nsq_to_http - Consumes a NSQ stream and makes an HTTP call with the contents of the message to a configured endpoint. This often can be a useful way to ingest a data stream in real time w/o using an NSQ client library.
  • nsq_to_nsq - Consumes a NSQ stream and re-publishes the stream to a target NSQD instance. This is useful if you have multiple internal consumers for a data stream.

Data Stream Related APIs

/v3/nsq/lookup

This is an API endpoint for connecting to Bitly provided NSQ datastreams.

go-nsq usage

cfg := nsq.NewConfig()
cfg.TlsV1 = true
cfg.AuthSecret = "$ACCESS_TOKEN"
cfg.MaxInFlight = 1000
c := nsq.NewConsumer(topic, channel, cfg)
c.SetHandler(....)
lookup := "https://api-ssl.bitly.com/v3/nsq/lookup?access_token=$ACCESS_TOKEN"
c.ConnectToNSQLookupd(lookup)
<- c.StopChan

pynsq usage

lookup = "https://api-ssl.bitly.com/v3/nsq/lookup?access_token=$ACCESS_TOKEN"
r = nsq.Reader(message_handler=handler,
        lookupd_http_addresses=[lookup],
        auth_secret="$ACCESS_TOKEN",
        tls_v1=True,
        max_in_flight=1000,
        topic=..., 
        channel=....
    )
nsq.run()

Built in NSQ Conmmand Line Utilities usage (nsq_to_file, nsq_tail, etc)

./nsq_tail --lookupd-http-address="https://api-ssl.bitly.com/v3/nsq/lookup?access_token=$ACCESS_TOKEN" \
   --reader-opt="auth_secret,$ACCESS_TOKEN" \
   --reader-opt="tls_v1,true" \
   --max-in-flight=1000 \
   --topic=.... --channel=....

ERRORS

/v3/nsq/stats

This lists NSQ Topic and Channel Message Information and Connection State for a Topic.

To calculate size of message processing queue, sum() channel depth and channel deferred_count across all producers.

Parameters

  • topic - NSQ Data Stream Topic

Return Values

  • topics - A list of authorized NSQ topics.
  • channels - A list of channels that each get a copy of messages.
    • producer - which producer this producer (nsqd instance) this is for.
    • depth - Current number of messages that have not yet been sent to a Consumer for processing
    • in_flight_count - Current number of messages sent to a consumer that have not yet finished processing. This number is not counted towards depth.
    • deferred_count - Current number of messages requeued after a consumer attempted processing which have not yet been re-processed.
    • timeout_count - Lifetime total messages that were requeued due to timeout waiting for consumer response.
    • message_count - Lifetime total messages seen since producer (nsqd instance) start
    • paused - If message flow on this channel has been administratively paused.
  • clients - The individual connections to a channel that a consumer recieves messages over. Each
    • requeue_count - Connection lifetime total messages requeued
    • message_count - Connection lifetime total messages processed
    • connect_ts - timestamp for when this connection was initiated
    • remote_address - where this connection was initiated from.

Example Request

API Address: https://api-ssl.bitly.com
GET /v3/nsq/stats?topic=data_stream&access_token=ACCESS_TOKEN

Example Response

{
  "data": {
    "topics": [
      {
        "channels": [
          {
            "backend_depth": 140910, 
            "channel_name": "bitlyapioauthdemo", 
            "clients": [], 
            "deferred_count": 0, 
            "depth": 165911, 
            "in_flight_count": 0, 
            "message_count": 214456483, 
            "paused": false, 
            "producer": "publicnsq04.env.bitly.net:4151", 
            "requeue_count": 7, 
            "timeout_count": 7
          }, 
          {
            "backend_depth": 768709, 
            "channel_name": "bitlyapioauthdemo", 
            "clients": [
              {
                "auth_identity": "bitlyapioauthdemo", 
                "client_id": "hostname", 
                "connect_ts": 1404395711, 
                "deflate": false, 
                "finish_count": 1143, 
                "hostname": "hostname.local", 
                "in_flight_count": 1, 
                "message_count": 1144, 
                "name": "hostname", 
                "ready_count": 0, 
                "remote_address": "127.0.0.1:63890", 
                "requeue_count": 0, 
                "sample_rate": 0, 
                "snappy": true, 
                "state": 3, 
                "tls": true, 
                "user_agent": "nsq_tail/0.2.29-alpha go-nsq/1.0.0-alpha", 
                "version": "V2"
              }
            ], 
            "deferred_count": 0, 
            "depth": 793710, 
            "in_flight_count": 1, 
            "message_count": 251043535, 
            "paused": false, 
            "producer": "publicnsq05.env.bitly.net:4151", 
            "requeue_count": 54, 
            "timeout_count": 54
          }
        ], 
        "producers": [
          "publicnsq04.env.bitly.net:4151", 
          "publicnsq05.env.bitly.net:4151"
        ], 
        "topic_name": "data_stream"
      }
    ]
  }, 
  "status_code": 200, 
  "status_txt": "OK"
}

ERRORS