Blue Light Special: Ensuring fast global configuration changes

July 3, 2015, 7:41 am

≫ Next: The Internet is a cooperative system: CNAME to Dyn DNS outage of 6 July 2015

CloudFlare operates a huge global network of servers that proxy our customers' web sites, operate as caches, inspect requests to ensure they are not malicious, deflect DDoS attacks and handle one of the largest authoritative DNS systems in the world. And where there's software there's configuration information.

CloudFlare is highly customisable. Each customer has a unique configuration consisting of DNS records, all manner of settings (such as minification, image recompression, IP-based blocking, which individual WAF rules to execute) and per-URL rules. And the configuration changes constantly.

Warp speed configuration

We offer almost instant configuration changes. If a user adds a DNS record it should be globally resolvable in seconds. If a user enables a CloudFlare WAF rule it should happen very, very fast to protect a site. This presents a challenge because those configuration changes need to be pushed across the globe very quickly.

We've written in the past about the underlying technology we use: Kyoto Tycoon and how we secured it from eavesdroppers. We also monitor its performance.

DNS records are currently changing at a rate of around 40 per second, 24 hours a day. All those changes need to be propagated in seconds.

So we take propagation times very seriously.

Keep a close eye on this light of mine

For this we need to keep a close eye on how long it takes a change to reach every one of our data centers. Whilst we have in-depth metrics for our operations team to look at it's sometimes useful (and fun) to have something more visceral.

We also want developers and operations people to equally be aware of some critical metrics, and developers are spending their time observing the metrics and alerts aimed at operations.

On some rare occasions, perhaps due to routing problems on the wider Internet, we may find that our ability to push changes at the required velocity becomes impractical. To ensure that we know about this as soon as possible and know when to take action we've built a custom alert system that everyone in the office can see.

From an external global collection of machines we monitor propagation time for DNS records and trigger an alert if propagation time exceeds a pre-set threshold. The alert comes in the form of a blue rotating 'police light'.

We had joked about having a "red alert" alarm when we fall behind on propagation and so I turned that joke into reality.

Hawaii Pi-O

A Raspberry Pi hidden in an old hard drive case connects to our global monitors and obtains the current propagation time (as measured from outside our network). The Pi is connected (via a transistor acting as a switch) to a cheap mini police light that's visible throughout the office.

PS: All the puns in this post were added by John Graham-Cumming. I disclaim all responsibility.

↧

The Internet is a cooperative system: CNAME to Dyn DNS outage of 6 July 2015

July 6, 2015, 3:30 pm

≫ Next: Increasing Cache Hit Rates with Query String Sort

≪ Previous: Blue Light Special: Ensuring fast global configuration changes

Today, shortly after 21:00 UTC, on our internal operations chat there was a scary message from one of our senior support staff: "getting DNS resolution errors on support.cloudflare.com", at the same time as automated monitoring indicated a problem. Shortly thereafter, we saw alarms and feedback from a variety of customers (but not everyone) reporting "1001 errors", which indicated a DNS resolution error on the CloudFlare backend. Needless to say, this got an immediate and overwhelming response from our operations and engineering teams, as we hadn't changed anything and had no other indications of anomaly.

In the course of debugging, we were able to identify common characteristics of affected sites—CNAME-based users of CloudFlare, rather than complete domain hosted entirely on CloudFlare, which, ironically, included our own support site, support.cloudflare.com. When users point (via CNAME) to a domain instead of providing us with an IP address, our network resolves that name —- and is obviously unable to connect if the DNS provider has issues. (Our status page https://www.cloudflarestatus.com/ is off-network and was unaffected). Then, we were investigating why only certain domains were having issues—was the issue with the upstream DNS? Testing whether their domains were resolvable on the Internet (which they were) added a confounding data point.

Ultimately, the outage was identified as Dyn, another major DNS operator, having issues with their own DNS configuration. (https://www.dynstatus.com/incidents/4sbm48rdsdbq)

The Internet is made up of many networks, operated by companies, organizations, governments, and individuals around the world, all cooperating using a common set of protocols and agreed policies and behaviors. These systems interoperate in a number of ways, sometimes entirely non-obviously. The mutual goal is to provide service to end users, letting them communicate, enjoy new services, and explore together. When one provider has a technical issue, it can cascade throughout the Internet and it isn’t obvious to users exactly which piece is broken.

Fortunately, even when companies are competitors, the spirit of the Internet is to work together for the good of the users. Once we identified this issue, we immediately contacted Dyn and relayed what we knew, and worked with them to resolve the issue for everyone’s benefit. We have temporarily put in workarounds to address this issue on our side, and hope the underlying difficulties will be resolved shortly.

↧

Increasing Cache Hit Rates with Query String Sort

July 7, 2015, 10:20 am

≫ Next: Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code

≪ Previous: The Internet is a cooperative system: CNAME to Dyn DNS outage of 6 July 2015

Optimized Performance: Increasing Cache Hit Rate

At CloudFlare, we care a lot about serving requests as fast as possible. Files can be served much faster when already in CloudFlare’s cache. Skipping the trip to the customer’s web server eliminates the latency of that connection and saves bandwidth from the connection between CloudFlare and the customer’s origin, and allows us to utilize the full speed of our ultra-fast servers.

By default, CloudFlare only caches static files. However, Page Rules can be utilized to set more files as cacheable. For more information on Page Rules, please see the Page Rules section of our knowledge base.

Items are cached by their full URL, including the query string. However, due to the details of how query strings work, this can lead to some cache misses. There is no RFC which defines that the order of query strings arguments matter, but in some (rare) cases they do. Thus, by default, CloudFlare caches the following two requests separately:

https://example.com/a?color=red&word=hi https://example.com/a?word=hi&color=red

Introducing Query String Sort

With a newly available Enterprise-level feature called Query String Sort, CloudFlare will first sort the query strings in a URL into a deterministic order before checking cache for the resource or requesting it from the origin, meaning that the URLs:

https://example.com/a?color=red&day=friday&word=hi https://example.com/a?color=red&word=hi&day=friday https://example.com/a?day=friday&color=red&word=hi https://example.com/a?day=friday&word=hicolor=red https://example.com/a?word=hi&color=red&day=friday https://example.com/a?word=hi&day=friday&color=red

will all be cached and be requested from the origin as:
https://example.com/a?color=red&day=friday&word=hi

This feature is especially useful for API servers, which often rely heavily on multiple query string arguments. Remember, we chose an examples with only two and three arguments, but the number of permutations of a set of n elements is n!.

Turning on Query String Sort

To turn on the feature, Enterprise customers can visit the Caching app in the CloudFlare dashboard.

alt

Customers who are interested in learning more about our Enterprise plan can get in touch with us here.

↧

Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code

July 8, 2015, 7:28 am

≫ Next: CloudFlare Lands a New Office in Singapore

≪ Previous: Increasing Cache Hit Rates with Query String Sort

Recently I was contacted by Dr. Igor Kozin from The Institute of Cancer Research in London. He asked about the optimal way to compile CloudFlare's open source fork of zlib. It turns out that zlib is widely used to compress the SAM/BAM files that are used for DNA sequencing. And it turns out our zlib fork is the best open source solution for that file format.

CC BY-SA 2.0 image by Shaury Nash

The files used for this kind of research reach hundreds of gigabytes and every time they are compressed and decompressed with our library many important seconds are saved, bringing the cure for cancer that much closer. At least that's what I am going to tell myself when I go to bed.

This made me realize that the benefits of open source go much farther than one can imagine, and you never know where a piece of code may end up. Open sourcing makes sophisticated algorithms and software accessible to individuals and organizations that would not have the resources to develop them on their own, or the money pay for a proprietary solution.

It also made me wonder exactly what we did to zlib that makes it stand out from other zlib forks.

Recap

Zlib is a compression library that supports two formats: deflate and gzip. Both formats use the same algorithm also called DEFLATE, but with different headers and checksum functions. The deflate algorithm is described here.

Both formats are supported by the absolute majority of web browsers, and we at CloudFlare compress all text content on the fly using the gzip format. Moreover DEFLATE is also used by the PNG file format, and our fork of zlib also accelerates our image optimization engine Polish. You can find the optimized fork of pngcrush here.

Given the amount of traffic we must handle, compression optimization really makes sense for us. Therefore we included several improvements over the default implementation.

First of all it is important to understand the current state of zlib. It is a very old library, one of the oldest that is still used as is to this day. It is so old it was written in K&R C. It is so old USB was not invented yet. It is so old that DOS was still a thing. It is so old (insert your favorite so old joke here). More precisely it dates back to 1995. Back to the days 16-bit computers with 64KB addressable space were still in use.

Still it represents one of the best pieces of code ever written, and even modernizing it gives only modest performance boost. Which shows the great skill of its authors and the long way compilers have come since 1995.

Below is a list of some of the improvements in our fork of zlib. This work was done by me, my colleague Shuxin Yang, and also includes improvements from other sources.

uint64_t as the standard type - the default fork used 16-bit types.
Using an improved hash function - we use the iSCSI CRC32 function as the hash function in our zlib. This specific function is implemented as a hardware instruction on Intel processors. It has very fast performance and better collision properties.
Search for matches of at least 4 bytes, instead the 3 bytes the format suggests. This leads to fewer hash collisions, and less effort wasted on insignificant matches. It also improves the compression rate a little bit for the majority of cases (but not all).
Using SIMD instructions for window rolling.
Using the hardware carry-less multiplication instruction PLCMULQDQ for the CRC32 checksum.
Optimized longest-match function. This is the most performance demanding function in the library. It is responsible for finding the (length, distance) matches in the current window.

In addition, we have an experimental branch that implements an improved version of the linked list used in zlib. It has much better performance for compression levels 6 to 9, while retaining the same compression ratio. You can find the experimental branch here.

Benchmarking

You can find independent benchmarks of our library here and here. In addition, I performed some in-house benchmarking, and put the results here for your convenience.

All the benchmarks were performed on an i5-4278U CPU. The compression was performed from and to a ramdisk. All libraries were compiled with gcc version 4.8.4 with the compilation flags: "-O3 -march=native".

I tested the performance of the master zlib fork, optimized implementation by Intel, our own master branch, and our experimental branch.

Four data sets were used for the benchmarks. The Calgary corpus, the Canterbury corpus, the Large Canterbury corpus and the Silesia corpus.

Calgary corpus

Performance:
Compression rates:
For this benchmark, Intel only outperforms our implementation for level 1, but at the cost of 1.39X larger files. This difference is far greater than even the difference between levels 1 and 9, and should probably be regarded as a different compression level. CloudFlare is faster on all other levels, and outperforms significantly for levels 6 to 9. The experimental implementation is even faster for those levels.

Canterbury corpus

Performance:
Compression rates:
Here we see a similar situation. Intel at level 1 gets 1.44X larger files. CloudFlare is faster for levels 2 to 9. On level 9, the experimental branch outperforms the reference implementation by 2X.

Large corpus

Performance:
Compression rates:
This time Intel is slightly faster for levels 5 and 6 than the CloudFlare implementation. The experimental CloudFlare implementation is faster still on level 6. The compression rate for Intel level 1 is 1.58 lower than CloudFlare. On level 9, the experimental fork is 7.5X(!) faster than reference.

Silesia corpus

Performance:
Compression rates:
Here again, CloudFlare is the fastest on levels 2 to 9. On level 9 the difference in speed between the experimental fork and the reference fork is 2.44X.

Conclusion

As evident from the benchmarks, the CloudFlare implementation outperforms the competition in the vast majority of settings. We put great effort in making it as fast as possible on our servers.

If you intend to use our library, you should check for yourself if it delivers the best balance of performance and compression for your dataset. As between different file format and sizes performance can vary.

And if you like open source software, don't forget to give back to the community, by contributing your own code!

↧

CloudFlare Lands a New Office in Singapore

July 14, 2015, 11:25 am

≫ Next: Célébrer le 14 Juillet avec Marseille, le 36ème point de présence de CloudFlare

≪ Previous: Fighting Cancer: The Unexpected Benefit Of Open Sourcing Our Code

After months of preparation, my teammates Algin, Marty, Adam, Jono and I touched down in Singapore and were greeted by skyscrapers, malls, Singlish, chili crab, and Marty’s special sweet and sour chicken. It immediately hit us that we were no longer in San Francisco.

The Internet never sleeps, which means it is crucial for us to have a presence in Asia to operate our globally distributed network. Singapore was a natural choice for us given the thriving tech community, the business friendliness of the country, the delicious hawker stalls, and our harbor view rooftop hangout:

Since we are new in town, if there are meetups or groups in Singapore that you think we should be part of (or any good restaurants we should try) – let us know. We will be at RSA Asia Pacific & Japan on Friday July 24 here in Singapore. Come meet us in person and learn more about CloudFlare during Nick Sullivan’s session on The New Key Management - Unlocking the Safeguards of Keeping Keys Private.

As one global company, we took team members from both our San Francisco and London offices to be the foundation for the local team. We are actively looking to expand our Singapore team. We are hiring for several positions including systems reliability engineers, technical support engineers, solutions engineers, network engineers, systems engineers, and customer development sales functions. If you are interested in joining a growing company that is making the Internet better for more than a billion web surfers each month - check out our open positions here.

Special thank you to everyone in the Singapore Government and The Working Capitol for helping make the CloudFlare Singapore office a reality.

alt

↧

Célébrer le 14 Juillet avec Marseille, le 36ème point de présence de CloudFlare

July 14, 2015, 1:17 pm

≫ Next: Introducing Partner Analytics

≪ Previous: CloudFlare Lands a New Office in Singapore

What better day than the 14th of July (Bastille Day) to announce the latest addition to our network in Marseille, France? Our data center in the southern city of Marseille is our 2nd in France, 12th in Europe and 36th globally.

Pourquoi Marseille?

Marseille, France’s second largest city following Paris, is home to 2 million Internet users across the surrounding metropolitan area. It also serves as another point of redundancy to our Paris data center, one of our most trafficked facilities in the whole of Europe.

However, the true importance of Marseille is not just redundancy or its size. Marseille’s southern location makes it a major Internet gateway for networks throughout the Mediterranean, including many African and Middle Eastern countries. This is reflected by the fact that a substantial number of undersea submarine cables carrying Internet traffic are routed through Marseille (7 to be exact, and for those fastidious followers of our blog).

Marseille: a key interconnection point for traffic throughout the Mediterranean

These undersea cables are the principal means by which many countries are able to access the rest of the Internet—that is to say, access all of the other global networks that make up this big interconnected network we refer to as the Internet. In the case of Algeria, this latest deployment has reduced by half the latency to the over 2 million Internet applications using CloudFlare. In other words, all web sites and applications behind CloudFlare are now 2x as fast to access in Algeria (and in many other countries) now that Marseille is online.

If you’ve followed our previous blog posts, you’ll know that peering—the act of interconnecting with other regional networks—is critical to the fast, secure and local delivery of Internet traffic. And Marseille is no different. We’re proud to be the most recent participant on FranceIX’s Marseille Internet exchange, through which we are now able to serve traffic to networks throughout the Mediterranean. However, many large ISPs in the region, including some in Morocco and Tunisia, are not yet participants. If you are a customer of one of these ISPs, please reach out and encourage them to join!

Finally, as a result of our Marseille deployment, France is now the second country in Europe (following Germany after our Düsseldorf deployment) to host multiple CloudFlare data centers. However, it won’t be the last! Hint Hint….

— Joyeux 14 juillet de la part de toute l’équipe CloudFlare!
(Happy Bastille Day from the CloudFlare team! for all non-french speaking blog readers)

Source: TeleGeography

↧

Introducing Partner Analytics

July 23, 2015, 10:33 am

≫ Next: CloudFlare headed to HostingCon 2015. Thanks for the memories and let’s create some more!

≪ Previous: Célébrer le 14 Juillet avec Marseille, le 36ème point de présence de CloudFlare

CloudFlare has over 5,000 hosting provider partners. Every day, thousands of our partners' customers take advantage of CloudFlare to help them be faster and more secure. The benefits to our partners aren't just happier customers, they also translate into real savings. In the last month, for instance, we saved our partners more than 25 Petabytes in aggregate bandwidth. In addition to bandwidth savings, in that same period, we stopped more than 65 billion malicious requests that would have otherwise impacted our partners' infrastructure. Now we've broken out the bandwidth and performance data by partners so they can see the savings and protection we're delivering.

Back when we launched the CloudFlare Partner Program four years ago, we periodically distributed these figures as high level summaries of bandwidth saved, threats blocked, and number of domains protected and accelerated via each partnership. Our partners knew anecdotally from their own logs and operating expenditures that CloudFlare was reducing their costs and greatly improving their customers’ experiences, but we did not yet have the tools to help demonstrate these benefits on a repeatable and granular basis.

It wasn’t that we didn’t want to provide this data, it was that our tremendous growth rate had stretched our analytics backend to its limits. Capturing hundreds of terabytes of logs each day would take a toll on any logging and analytics infrastructure, let alone one serving 400 billion monthly page views and growing 450% annually! Thankfully this April, our data team came to the rescue and executed an upgrade of our Customer Analytics and request logging backend. As a result of this architectural overhaul, we now have the capacity (and then some) to deliver dynamic analytics data to all of our 5,000+ partners, aggregated across all of their respective clients in rolling 7 and 30 day periods.

Partner Portal

We are excited to announce that the Analytics section of the Partner Portal is now live. Next time you log in, you will be presented with a new ‘Analytics’ icon in your navigation bar which, when clicked, will help answer the common questions we hear from partners such as How much is CloudFlare saving me on my monthly bandwidth bill?, What types of threats are you protecting my customers from? and How many unique visitors and requests are my servers really handling?

Bandwidth

Let’s start with that first question, bandwidth savings. It’s an obvious focus area for our hosting partners, especially those buying bandwidth at a small scale or in areas of the world where bandwidth costs are high such as Asia, Australia, and Latin America.

The chart below, taken from a real partner, shows CloudFlare saved 234.8 terabytes of bandwidth over a 30 day period. Using figures from a leading infrastructure-as-a-service companies, the savings can be quantified at upwards of $20,000/month for low cost regions and more than $40,000/month in higher cost regions!

Bandwidth - Total Bandwidth is the sum of all responses (in bytes) sent back to visitors. The responses that did not require a trip to the origin – because they were already available on CloudFlare’s servers – are considered Cached Bandwidth, while those that did require a trip are Uncached Bandwidth. Cached responses are not seen by your upstream provider and can be thought of as direct savings on your bandwidth bill.

Threats

The second most common question we hear is about the types and quantities of threats we block from reaching our partners’ infrastructure. Each request intended for a CloudFlare protected site is examined in sub-millisecond time and compared against our L3 reputation and L7 threat databases. Even zero day vulnerabilities are regularly identified and blocked before software vendors and system administrators have a chance to distribute and apply patches.

The requests that do not pass our checks and blacklists are logged, categorized, and available for review. In the chart below we see the barrage of illegitimate requests that were intended for this partner, but never made it past CloudFlare’s servers. Our unique presence on the Internet – processing over 5% of all requests while observing (and scoring) the behavior of over 1 billion unique IPs per month – puts us in an unparalleled position to automatically stop these attacks at the source.

Threats - This chart shows you the Total Threats seen in the period, the Top Country they originated from, and the Top Threat Type. Each threat type is charted on a separate line and hovering over any given day pops a table showing the respective counts.

Requests & Unique Visitors

While saving bandwidth and blocking threats are two critical roles CloudFlare plays, our partners also want to know how many requests their web servers are processing and from how many different visitors all over the world? Understanding these two metrics are critical to properly scaling server and network capacity and quantifying the expedited experience provided to their customers' visitors.

As shown in the charts below, over 50% of requests to this specific partner were delivered directly from one of CloudFlare’s highly optimized servers in one of 36 global data centers. Each of these 71 million unique visitors experienced an accelerated browsing experience thanks to responses returned without needing a round trip to the partner’s server.

Requests - This chart shows the number of Total Requests made by visitors’ browsers to your customers’ sites, such as GET requests for assets like HTML, CSS, and JS. Of these, Cached Requests are those that CloudFlare was able to respond to without contacting the origin (i.e., your servers) while Uncached required a trip back to the origin before a response could be sent.

Unique Visitors - Each unique IP address – whether IPv4 or IPv6 – that we see make a request to your customers’ sites is counted towards the Total Unique Visitors. This value is further broken down by the Minimum and Maximum Unique Visitors seen on any given day in the measured period (previous 7 or 30 days).

Parting Thoughts

Going forward, we plan to add additional layers of insights, building off the new backend system that has made distributed queries easy and highly performant. We have a wealth of data at the edge and want to know what else you’d like to see. Please log in to the Partner Portal today to start reviewing your data in the new Analytics section and let us know your thoughts. Or if you're not yet a partner but interested in becoming one, please sign up and we'll get you started.

↧

CloudFlare headed to HostingCon 2015. Thanks for the memories and let’s create some more!

July 24, 2015, 4:08 pm

≫ Next: Bucharest, Romania: CloudFlare's 37th data center

≪ Previous: Introducing Partner Analytics

alt

The CloudFlare team is heading to HostingCon 2015 in San Diego next week. We are excited to meet colleagues from the industry, reconnect with partners, and make new friends.

This year’s conference marks a milestone of sorts. It’s our fifth time at HostingCon and we’ve come full circle - our first HostingCon took place in San Diego. Here are some fun facts on what we’ve accomplished since our first HostingCon in 2011:

25 new data centers expanded our network to a total of 36 worldwide
2M+ customers served
800+ conference attendees transported in our signature limo service. If you haven’t already, sign up to arrive in style.
2,500+ Nerf guns delivered. Check out the new models this year in celebration of Railgun 5.0 launch
3,000+ CloudFlare t-shirts bringing smiles to our partners

Today, CloudFlare is trusted by over 5,000 partners who offer performance and security to millions of customers accelerating and protecting websites, APIs, and mobile apps. We work hard to deliver real savings for our partners. For example, over the past month we saved our partners more than 25 petabytes in aggregate bandwidth (roughly equivalent to 350 hours of HDTV video); stopped 65 billion+ malicious attacks that would have impacted our partners' infrastructure. To find out detailed analytics for your own customers, log into the CloudFlare Partner Portal and check out this Partner Analytics blog.

The past 12 months have been particularly productive. We’ve added a number of partners, some geographically far flung, such as Hostnet (in Brazil) and Web Africa (in South Africa). We continued to innovate our technology to serve partners better. We’ve made it incredibly easy to offer CloudFlare across all your customers, launched ServerShield Plus (our security-focused offering for Plesk users), enabled Universal SSL for partners using full DNS, rolled out Virtual DNS to strengthen partners’ infrastructure, and released Railgun 5.0 (the latest version of our dynamic content accelerator).

There’s lots to celebrate. If you’re in San Diego for HostingCon, join us! Here’s what to expect:

Complimentary limo transfers from San Diego International Airport to the Marriott Marquis and Marina Hotel on Sunday, July 26. Reserve your spot.
CloudFlare-sponsored breakfast each morning with music from local artists
Our signature Nerf Railguns (limited quantity). Be sure to visit booth #307 early.
CloudFlare t-shirts

Sunday, July 26

Limo transfers from San Diego International Airport to the Marriott Marquis

Monday, July 27

3:30pm-4:30pm: Cybersecurity panel, featuring our own Nick Sullivan

Tuesday, July 28

9:00am-9:50am: ‘Clicks, Not Cash: Give Users a Free SEO boost’, presented by Maria Karaivanova
11:00am-6:00pm: Exhibit hall
4:00pm-6:00pm: Happy hour! Visit our booth while sipping your preferred beverage

Wednesday, July 29

11:00am-4:00pm: Exhibit Hall

↧

Bucharest, Romania: CloudFlare's 37th data center

July 31, 2015, 11:43 am

≫ Next: Quick and dirty annotations for Go stack traces

≪ Previous: CloudFlare headed to HostingCon 2015. Thanks for the memories and let’s create some more!

Our global expansion continues in Bucharest, Romania, the 6th largest city in the European Union* following London, Berlin, Madrid, Rome, and Paris (nearly all of which feature a CloudFlare PoP!). From Bucharest, our latest data center will serve all 11 million Romanian Internet users, as well as users throughout the Balkans and Eastern Europe.

In good company

Romania is geographically situated between Bulgaria, Hungary, Moldova, Serbia, and Ukraine, making it an ideal destination to attract additional Internet traffic throughout much of Eastern Europe. Of course, geographic reality is rarely a mirror of Internet reality. Adding a new point of presence doesn't automatically mean that traffic from surrounding areas (or even traffic in the very same country) will route to that particular data center. This entirely depends on the interconnection of International carriers with local Internet service providers (ISPs) and large networks like CloudFlare.

It is for this precise reason that we place even more emphasis on our interconnection within a particular PoP as opposed to the absolute number of dots we add to our network map. Of course, the combination of the two (expanding wide and deep) is even better, and is why CloudFlare is blazing fast around the world.

In Romania alone we are connected to nearly every single Tier 1 Internet provider with a network in-country (TeliaSonera, Cogent, GTT/Tinet, NTT, and Telecom Italia). And we are further connected (or nearly connected!) to nearly all local peering exchanges, including Interlan, NetIX, RoNIX and BalcanIX. Direct connections are also in progress with many of the largest local ISPs. #zoomzoom

It wouldn't be a blog post about Romania without...

Dracula and his supposed home—Bran Castle (commonly known as Dracula's Castle)—in Transylvania! The contrivance of Irish author Bram Stoker, Dracula recounts the story of vampire Count Dracula's attempts to move from Transylvania (in Romania) to England. Although there is no record that Stoker was aware of Bran Castle at the time of the book's writing, it remains entwined with the legend to this day.

While our Bucharest PoP is not located in a castle (but rather a modern Tier-3 data center), one of our upcoming sites will be (that's right, a castle turned into a data center). If you can guess the location in the comments, we'll have some CloudFlare gear sent your way!

— Mulţumesc din echipa CloudFlare

* By population within city limits.
Photo source: Marc Biarnès; images used under creative commons license.

↧

Quick and dirty annotations for Go stack traces

August 3, 2015, 5:26 am

≫ Next: A deep look at CVE-2015-5477 and how CloudFlare Virtual DNS customers are protected

≪ Previous: Bucharest, Romania: CloudFlare's 37th data center

CloudFlare’s DNS server, RRDNS, is entirely written in Go and typically runs tens of thousands goroutines. Since goroutines are cheap and Go I/O is blocking we run one goroutine per file descriptor we listen on and queue new packets for processing.

CC BY-SA 2.0 image by wiredforlego

When there are thousands of goroutines running, debug output quickly becomes difficult to interpret. For example, last week I was tracking down a problem with a file descriptor and wanted to know what its listening goroutine was doing. With 40k stack traces, good luck figuring out which one is having trouble.

Go stack traces include parameter values, but most Go types are (or are implemented as) pointers, so what you will see passed to the goroutine function is just a meaningless memory address.

We have a couple options to make sense of the addresses: get a heap dump at the same time as the stack trace and cross-reference the pointers, or have a debug endpoint that prints a goroutine/pointer -> IP map. Neither are seamless.

Underscore to the rescue

However, we know that integers are shown in traces, so what we did is first convert IPv4 addresses to their uint32 representation:

// addrToUint32 takes a TCPAddr or UDPAddr and converts its IP to a uint32.
// If the IP is v6, 0xffffffff is returned.
func addrToUint32(addr net.Addr) uint32 {
       var ip net.IP
       switch addr := addr.(type) {
       case *net.TCPAddr:
               ip = addr.IP
       case *net.UDPAddr:
               ip = addr.IP
       case *net.IPAddr:
               ip = addr.IP
       }
       if ip == nil {
               return 0
       }
       ipv4 := ip.To4()
       if ipv4 == nil {
               return math.MaxUint32
       }
       return uint32(ipv4[0])<<24 | uint32(ipv4[1])<<16 | uint32(ipv4[2])<<8 | uint32(ipv4[3])
}

And then pass the IPv4-as-uint32 to the listening goroutine as an _ parameter. Yes, as a parameter with name _; it's known as the blank identifier in Go.

// PacketUDPRead is a goroutine that listens on a specific UDP socket and reads
// in new requests
// The first parameter is the int representation of the listening IP address,
// and it's passed just so it will appear in stack traces
func PacketUDPRead(_ uint32, conn *net.UDPConn, ...) { ... }

go PacketUDPRead(addrToUint32(conn.LocalAddr()), conn, ...)

Now when we get a stack trace, we can just look at the first bytes, convert them back to dotted notation, and know on what IP the goroutine was listening.

goroutine 42 [IO wait]:
    [...]
    /.../request.go:195 +0x5d
rrdns/core.PacketUDPRead(0xc27f000001, 0x2b6328113ad8, 0xc20801ecc0, 0xc208044308, 0xc208e99280, 0xc208ad8180, 0x12a05f200)
    /.../server.go:119 +0x35a
created by rrdns/core.PacketIO
    /.../server.go:230 +0x8be

0xc27f000001 -> remove alignment byte -> 0x7f000001 -> 127.0.0.1

Obviously you can do the same with any piece of information you can represent as an int.

Are you interested in taming the goroutines that run the web? We're hiring in London, San Francisco and Singapore!

↧

A deep look at CVE-2015-5477 and how CloudFlare Virtual DNS customers are protected

August 4, 2015, 4:36 am

≫ Next: Dublin, Ireland: CloudFlare's 38th data center

≪ Previous: Quick and dirty annotations for Go stack traces

Last week ISC published a patch for a critical remotely exploitable vulnerability in the BIND9 DNS server capable of causing a crash with a single packet.

CC BY 2.0 image by Ralph Aversen

The public summary tells us that a mistake in handling of queries for the TKEY type causes an assertion to fail, which in turn crashes the server. Since the assertion happens during the query parsing, there is no way to avoid it: it's the first thing that happens on receiving a packet, before any decision is made about what to do with it.

TKEY queries are used in the context of TSIG, a protocol DNS servers can use to authenticate to each other. They are special in that unlike normal DNS queries they include a “meta” record (of type TKEY) in the EXTRA/ADDITIONAL section of the message.

CC BY 2.0 image by Ralph Aversen

Since the exploit packet is now public, I thought we might take a dive and look at the vulnerable code. Let's start by taking a look at the output of a crashing instance:

03-Aug-2015 16:38:55.509 message.c:2352: REQUIRE(*name == ((void*)0)) failed, back trace
03-Aug-2015 16:38:55.510 #0 0x10001510d in assertion_failed()+0x5d
03-Aug-2015 16:38:55.510 #1 0x1001ee56a in isc_assertion_failed()+0xa
03-Aug-2015 16:38:55.510 #2 0x1000bc31d in dns_message_findname()+0x1ad
03-Aug-2015 16:38:55.510 #3 0x10017279c in dns_tkey_processquery()+0xfc
03-Aug-2015 16:38:55.510 #4 0x100016945 in ns_query_start()+0x695
03-Aug-2015 16:38:55.510 #5 0x100008673 in client_request()+0x18d3
03-Aug-2015 16:38:55.510 #6 0x1002125fe in run()+0x3ce
03-Aug-2015 16:38:55.510 exiting (due to assertion failure)
[1]    37363 abort (core dumped)  ./bin/named/named -f -c named.conf

This is extremely helpful--after all this is a controlled crash caused by a failed assertion--and tells us what failed and where: message.c:2352. Here's the excerpt.

// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/message.c

    isc_result_t
    dns_message_findname(dns_message_t *msg, dns_section_t section,
                 dns_name_t *target, dns_rdatatype_t type,
                 dns_rdatatype_t covers, dns_name_t **name,
                 dns_rdataset_t **rdataset)
    {
        dns_name_t *foundname;
        isc_result_t result;

        /*
         * XXX These requirements are probably too intensive, especially
         * where things can be NULL, but as they are they ensure that if
         * something is NON-NULL, indicating that the caller expects it
         * to be filled in, that we can in fact fill it in.
         */
        REQUIRE(msg != NULL);
        REQUIRE(VALID_SECTION(section));
        REQUIRE(target != NULL);
        if (name != NULL)
==>         REQUIRE(*name == NULL);

    [...]

What we have here is a function "dns_message_findname" that searches for an RRset with the given name and type in the given message section. It employs a really common C API: to get the results the caller passes pointers that will be filled in (dns_name_t **name, dns_rdataset_t **rdataset).

CC BY 2.0 image by Ralph Aversen

As the big comment ironically acknowledges, it's really strict when validating these pointers: if they don't point to (dns_name_t *)NULL the REQUIRE assertion will fail and the server will crash with no attempt at recovery. Code calling this function must take extra care to pass a pointer to a NULL dns_name_t *, which the function will fill in to return the found name.

In not-memory safe languages is not uncommon to crash when a programmer assertion is violated, because a program might not be able to cleanup its own memory after something that is not supposed to happen happens.

So we continue our investigation by climbing up the stack trace to find the illegal call. Next step is dns_tkey_processquery. Here is a simplified excerpt.

// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/tkey.c

isc_result_t
dns_tkey_processquery(dns_message_t *msg, dns_tkeyctx_t *tctx,
              dns_tsig_keyring_t *ring)
{
    isc_result_t result = ISC_R_SUCCESS;
    dns_name_t *qname, *name;
    dns_rdataset_t *tkeyset;

    /*
     * Interpret the question section.
     */
    result = dns_message_firstname(msg, DNS_SECTION_QUESTION);
    if (result != ISC_R_SUCCESS)
        return (DNS_R_FORMERR);

    qname = NULL;
    dns_message_currentname(msg, DNS_SECTION_QUESTION, &qname);

    /*
     * Look for a TKEY record that matches the question.
     */
    tkeyset = NULL;
    name = NULL;
    result = dns_message_findname(msg, DNS_SECTION_ADDITIONAL, qname,
                      dns_rdatatype_tkey, 0, &name, &tkeyset);
    if (result != ISC_R_SUCCESS) {
        /*
         * Try the answer section, since that's where Win2000
         * puts it.
         */
        if (dns_message_findname(msg, DNS_SECTION_ANSWER, qname,
                     dns_rdatatype_tkey, 0, &name,
                     &tkeyset) != ISC_R_SUCCESS) {
            result = DNS_R_FORMERR;
            tkey_log("dns_tkey_processquery: couldn't find a TKEY "
                 "matching the question");
            goto failure;
        }
    }

[...]

There are two dns_message_findname calls here. Since we are looking for the one that passes a dirty name we can ignore the first one which is preceded by an explicit name = NULL;.

The second call is more interesting. The same dns_name_t *name is reused without resetting it to NULL after the previous dns_message_findname call. This must be where the bug is.

CC BY 2.0 image by Ralph Aversen

Now the question is: when would dns_message_findname set name but not return ISC_R_SUCCESS (so that the if is satisfied)? Let's have a look at the full function body now.

// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/message.c

isc_result_t
dns_message_findname(dns_message_t *msg, dns_section_t section,
             dns_name_t *target, dns_rdatatype_t type,
             dns_rdatatype_t covers, dns_name_t **name,
             dns_rdataset_t **rdataset)
{
    dns_name_t *foundname;
    isc_result_t result;

    /*
     * XXX These requirements are probably too intensive, especially
     * where things can be NULL, but as they are they ensure that if
     * something is NON-NULL, indicating that the caller expects it
     * to be filled in, that we can in fact fill it in.
     */
    REQUIRE(msg != NULL);
    REQUIRE(VALID_SECTION(section));
    REQUIRE(target != NULL);
    if (name != NULL)
        REQUIRE(*name == NULL);
    if (type == dns_rdatatype_any) {
        REQUIRE(rdataset == NULL);
    } else {
        if (rdataset != NULL)
            REQUIRE(*rdataset == NULL);
    }

    result = findname(&foundname, target,
              &msg->sections[section]);

    if (result == ISC_R_NOTFOUND)
        return (DNS_R_NXDOMAIN);
    else if (result != ISC_R_SUCCESS)
        return (result);

    if (name != NULL)
        *name = foundname;

    /*
     * And now look for the type.
     */
    if (type == dns_rdatatype_any)
        return (ISC_R_SUCCESS);

    result = dns_message_findtype(foundname, type, covers, rdataset);
    if (result == ISC_R_NOTFOUND)
        return (DNS_R_NXRRSET);

    return (result);
}

As you can see dns_message_findname uses first findname to match the records with the target name, and then dns_message_findtype to match the target type. In between the two calls... *name = foundname! So if dns_message_findname can find a record with name == qname in DNS_SECTION_ADDITIONAL but then it turns out not to have type dns_rdatatype_tkey, name will be filled in and a failure returned. The second dns_message_findname call will trigger on the dirty name and... boom.

CC BY 2.0 image by Ralph Aversen

Indeed, the patch just adds name = NULL before the second call. (No, we couldn't have started our investigation from the patch; what's the fun in that!?)

diff --git a/lib/dns/tkey.c b/lib/dns/tkey.c
index 66210d5..34ad90b 100644
--- a/lib/dns/tkey.c
+++ b/lib/dns/tkey.c
@@ -654,6 +654,7 @@ dns_tkey_processquery(dns_message_t *msg, dns_tkeyctx_t *tctx,
          * Try the answer section, since that's where Win2000
          * puts it.
          */
+        name = NULL;
         if (dns_message_findname(msg, DNS_SECTION_ANSWER, qname,
                      dns_rdatatype_tkey, 0, &name,
                      &tkeyset) != ISC_R_SUCCESS) {

To recap, here is the bug flow:

a query for type TKEY is received, dns_tkey_processquery is called to parse it
dns_message_findname is called a first time on the EXTRA section
a record with the same name as the query is found in the EXTRA section, causing name to be filled, but it's not a TKEY record, causing result != ISC_R_SUCCESS
dns_message_findname is called a second time to look in the ANS section, and it is passed the now dirty name reference
the assertion *name != NULL fails, BIND crashes

This bug was found with the amazing american fuzzy lop fuzzer by @jfoote_. A fuzzer is an automated tool that keeps feeding automatically mutated inputs to a target program until it crashes. You can see how it eventually stumbled upon the TKEY query + non-TKEY EXTRA RR combo and found this bug.

Virtual DNS customers have always been protected

Good news! CloudFlare Virtual DNS customers have always been protected from this attack, even if they run BIND. Our custom Go DNS server, RRDNS, parses and sanitizes all queries before forwarding them to the origin servers if needed.

Since Virtual DNS does not support TSIG and TKEY (which are meant to authenticate server-to-server traffic, not recursive lookups) it has no reason to relay EXTRA section records in queries, so it doesn't! That reduces the attack surface and indeed makes it impossible to exploit this vulnerability through Virtual DNS.

No special rules are in place to protect from this specific vulnerability: RRDNS always validates incoming packets for sanity, making sure they look like regular queries, and strips them down to the most simple form possible before relaying them.

↧

Dublin, Ireland: CloudFlare's 38th data center

August 4, 2015, 11:59 pm

≫ Next: Up and to the Right: Forrester Research Ranks CloudFlare as a “Leader” for DDoS Services Providers

≪ Previous: A deep look at CVE-2015-5477 and how CloudFlare Virtual DNS customers are protected

Top of the morning to our users and readers from Ireland! Our latest PoP in Dublin is our 38th globally, and 14th in Europe following our Bucharest deployment last week. As of yesterday, traffic from Ireland's 3.6 million Internet users will now be routed through Dublin as opposed to our London PoP (which will still serve as a point of redundancy).

Silicon Docks

By know you've heard of Silicon Valley, Silicon Alley, and possibly even Silicon Prairie, but across the pond there's another tech hub making quite a name for itself. Silicon Docks, the Dublin neighborhood bordering the Grand Canal Docks, is home to the European headquarters of Google, Facebook, Twitter, Dropbox, AirBnb, LinkedIn and CloudFlare customer, Yelp, just to name a few. While our own European headquarters is in London, Dublin's exploding tech scene made it an obvious choice for a new PoP.

Clearly our focus was more on helping #savetheweb than on the photo itself...

Dublin is also near to our hearts as the home of CloudFlare customers Web Summit and F.ounders, two of the world's premier tech conferences. Visitors to the 2012 Web Summit and F.ounders events may even remember being greeted by CloudFlare (see photo above) as they passed through the Dublin Airport.

Nope, not in a castle

Readers of our last post were asked to guess which future CloudFlare PoP will be housed in a castle-turned-data center. It isn't our Dublin facility, a modern tier-4 data center, which means 1) the "contest" continues and 2) more expansion is soon to come. But before we get to our castle, we have a big announcement coming very soon. Stay tuned!

— The CloudFlare team

Photo source: Miguel Mendez; images used under creative commons license.

↧

Up and to the Right: Forrester Research Ranks CloudFlare as a “Leader” for DDoS Services Providers

August 4, 2015, 8:13 pm

≫ Next: DNS parser, meet Go fuzzer

≪ Previous: Dublin, Ireland: CloudFlare's 38th data center

alt

Forrester Research, Inc. has released The Forrester Wave™: DDoS Services Providers, Q3 2015 report which ranks CloudFlare as a leader. How do you get placed “up and to the right”? The leaders in this Wave, including CloudFlare, demonstrated effective portals, good client and revenue growth, and a focus on customer service. They also all have the ability to defend against the largest amplification attacks and the most pernicious application attacks.

Here’s some of the criteria CloudFlare received the highest possible scores for:

Attack types defended
Data/scrubbing center geographic presence
Detection tactics
SSL traffic inspection

The DDoS Services Providers Wave also notes that CloudFlare boasts fast mitigation times, and that our customers gave us high marks for service delivery. The report cited CloudFlare’s excellent capabilities to deliver hybrid DDoS solutions as well.

So how does the report evaluate vendors? It evaluates vendors based on three major categories, each with specific criteria:

Current offering: The strength of vendors’ current DDoS product offering is based on evaluation categories including: business description, amplification attack defense, attack types defended, customer portal features, customer references, data/scrubbing center geographic presence, SSL traffic inspection, and standard mitigation times.
Strategy: Vendors’ position on the horizontal axis of the Wave graphic in the report indicates the strength of their DDoS strategy, specifically as it relates to our customer communication process, hybrid implementation availability, system integration partners, technical development, professional services, and sales staff.
Market Presence: Vendors’ market presence was measured based on their client base, revenue, revenue growth, and the number of years the company has offered DDoS services. One thing to note here, CloudFlare does not share financial information--this is our policy as a private company. That said, our scores for revenue and revenue growth in the market presence category were determined without sharing this information. Because of our policy, we could only share information about our client base and the number of years we have offered DDoS services.

Forrester evaluated nine participating vendors for this report. Inclusion criteria used for vendor selection for this evaluation were a mix of the breadth of each vendor’s offerings, its global presence, and its market relevance.

To check out the full report, download your complimentary copy here.

We are honored to be included and ranked as a leader in The Forrester Wave™: DDoS Services Providers, Q3 2015 report.

Source: The Forrester Wave™: DDoS Services Providers, Q3 2015, Forrester Research, Inc., July 22, 2015

↧

DNS parser, meet Go fuzzer

August 6, 2015, 7:40 am

≫ Next: Now serving the Middle East: 4 new data centers, partnerships

≪ Previous: Up and to the Right: Forrester Research Ranks CloudFlare as a “Leader” for DDoS Services Providers

Here at CloudFlare we are heavy users of the github.com/miekg/dns Go DNS library and we make sure to contribute to its development as much as possible. Therefore when Dmitry Vyukov published go-fuzz and started to uncover tens of bugs in the Go standard library, our task was clear.

Hot Fuzz

Fuzzing is the technique of testing software by continuously feeding it inputs that are automatically mutated. For C/C++, the wildly successful afl-fuzz tool by Michał Zalewski uses instrumented source coverage to judge which mutations pushed the program into new paths, eventually hitting many rarely-tested branches.

go-fuzz applies the same technique to Go programs, instrumenting the source by rewriting it (like godebug does). An interesting difference between afl-fuzz and go-fuzz is that the former normally operates on file inputs to unmodified programs, while the latter asks you to write a Go function and passes inputs to that. The former usually forks a new process for each input, the latter keeps calling the function without restarting often.

There is no strong technical reason for this difference (and indeed afl recently gained the ability to behave like go-fuzz), but it's likely due to the different ecosystems in which they operate: Go programs often expose well-documented, well-behaved APIs which enable the tester to write a good wrapper that doesn't contaminate state across calls. Also, Go programs are often easier to dive into and more predictable, thanks obviously to GC and memory management, but also to the general community repulsion towards unexpected global states and side effects. On the other hand many legacy C code bases are so intractable that the easy and stable file input interface is worth the performance tradeoff.

Back to our DNS library. RRDNS, our in-house DNS server, uses github.com/miekgs/dns for all its parsing needs, and it has proved to be up to the task. However, it's a bit fragile on the edge cases and has a track record of panicking on malformed packets. Thankfully, this is Go, not ~~BIND~~ C, and we can afford to recover() panics without worrying about ending up with insane memory states. Here's what we are doing

func ParseDNSPacketSafely(buf []byte, msg *old.Msg) (err error) {
    defer func() {
        panicked := recover()

        if panicked != nil {
            err = errors.New("ParseError")
        }
    }()

    err = msg.Unpack(buf)

    return
}

We saw an opportunity to make the library more robust so we wrote this initial simple fuzzing function:

func Fuzz(rawMsg []byte) int {
    msg := &dns.Msg{}

    if unpackErr := msg.Unpack(rawMsg); unpackErr != nil {
        return 0
    }

    if _, packErr = msg.Pack(); packErr != nil {
        println("failed to pack back a message")
        spew.Dump(msg)
        panic(packErr)
    }

    return 1
}

To create a corpus of initial inputs we took our stress and regression test suites and used github.com/miekg/pcap to write a file per packet.

package main

import (
    "crypto/rand"
    "encoding/hex"
    "log"
    "os"
    "strconv"

    "github.com/miekg/pcap"
)

func fatalIfErr(err error) {
    if err != nil {
        log.Fatal(err)
    }
}

func main() {
    handle, err := pcap.OpenOffline(os.Args[1])
    fatalIfErr(err)

    b := make([]byte, 4)
    _, err = rand.Read(b)
    fatalIfErr(err)
    prefix := hex.EncodeToString(b)

    i := 0
    for pkt := handle.Next(); pkt != nil; pkt = handle.Next() {
        pkt.Decode()

        f, err := os.Create("p_" + prefix + "_" + strconv.Itoa(i))
        fatalIfErr(err)
        _, err = f.Write(pkt.Payload)
        fatalIfErr(err)
        fatalIfErr(f.Close())

        i++
    }
}

CC BY 2.0 image by JD Hancock

We then compiled our Fuzz function with go-fuzz, and launched the fuzzer on a lab server. The first thing go-fuzz does is minimize the corpus by throwing away packets that trigger the same code paths, then it starts mutating the inputs and passing them to Fuzz() in a loop. The mutations that don't fail (return 1) and expand code coverage are kept and iterated over. When the program panics, a small report (input and output) is saved and the program restarted. If you want to learn more about go-fuzz watch the author's GopherCon talk or read the README.

Crashes, mostly "index out of bounds", started to surface. go-fuzz becomes pretty slow and ineffective when the program crashes often, so while the CPUs burned I started fixing the bugs.

In some cases I just decided to change some parser patterns, for example reslicing and using len() instead of keeping offsets. However these can be potentially disrupting changes—I'm far from perfect—so I adapted the Fuzz function to keep an eye on the differences between the old and new, fixed parser, and crash if the new parser started refusing good packets or changed its behavior:

func Fuzz(rawMsg []byte) int {
    var (
        msg, msgOld = &dns.Msg{}, &old.Msg{}
        buf, bufOld = make([]byte, 100000), make([]byte, 100000)
        res, resOld []byte

        unpackErr, unpackErrOld error
        packErr, packErrOld     error
    )

    unpackErr = msg.Unpack(rawMsg)
    unpackErrOld = ParseDNSPacketSafely(rawMsg, msgOld)

    if unpackErr != nil && unpackErrOld != nil {
        return 0
    }

    if unpackErr != nil && unpackErr.Error() == "dns: out of order NSEC block" {
        // 97b0a31 - rewrite NSEC bitmap [un]packing to account for out-of-order
        return 0
    }

    if unpackErr != nil && unpackErr.Error() == "dns: bad rdlength" {
        // 3157620 - unpackStructValue: drop rdlen, reslice msg instead
        return 0
    }

    if unpackErr != nil && unpackErr.Error() == "dns: bad address family" {
        // f37c7ea - Reject a bad EDNS0_SUBNET family on unpack (not only on pack)
        return 0
    }

    if unpackErr != nil && unpackErr.Error() == "dns: bad netmask" {
        // 6d5de0a - EDNS0_SUBNET: refactor netmask handling
        return 0
    }

    if unpackErr != nil && unpackErrOld == nil {
        println("new code fails to unpack valid packets")
        panic(unpackErr)
    }

    res, packErr = msg.PackBuffer(buf)

    if packErr != nil {
        println("failed to pack back a message")
        spew.Dump(msg)
        panic(packErr)
    }

    if unpackErrOld == nil {

        resOld, packErrOld = msgOld.PackBuffer(bufOld)

        if packErrOld == nil && !bytes.Equal(res, resOld) {
            println("new code changed behavior of valid packets:")
            println()
            println(hex.Dump(res))
            println(hex.Dump(resOld))
            os.Exit(1)
        }

    }

    return 1
}

I was pretty happy about the robustness gain, but since we used the ParseDNSPacketSafely wrapper in RRDNS I didn't expect to find security vulnerabilities. I was wrong!

DNS names are made of labels, usually shown separated by dots. In a space saving effort, labels can be replaced by pointers to other names, so that if we know we encoded example.com at offset 15, www.example.com can be packed as www. + PTR(15). What we found is a bug in handling of pointers to empty names: when encountering the end of a name (0x00), if no label were read, "." (the empty name) was returned as a special case. Problem is that this special case was unaware of pointers, and it would instruct the parser to resume reading from the end of the pointed-to empty name instead of the end of the original name.

For example if the parser encountered at offset 60 a pointer to offset 15, and msg[15] == 0x00, parsing would then resume from offset 16 instead of 61, causing a infinite loop. This is a potential Denial of Service vulnerability.

A) Parse up to position 60, where a DNS name is found

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | ->15 |      |

------------------------------------------------->

B) Follow the pointer to position 15

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | ->15 |      |

         ^                                        |
         ------------------------------------------

C) Return a empty name ".", special case triggers

D) Erroneously resume from position 16 instead of 61

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | ->15 |      |

                 -------------------------------->

E) Rinse and repeat

We sent the fixes privately to the library maintainer while we patched our servers and we opened a PR once done. (Two bugs were independently found and fixed by Miek while we released our RRDNS updates, as it happens.)

Not just crashes and hangs

Thanks to its flexible fuzzing API, go-fuzz lends itself nicely not only to the mere search of crashing inputs, but can be used to explore all scenarios where edge cases are troublesome.

Useful applications range from checking output sanity by adding crashing assertions to your Fuzz() function, to comparing the two ends of a unpack-pack chain and even comparing the behavior of two different versions or implementations of the same functionality.

For example, while preparing our DNSSEC engine for launch, I faced a weird bug that would happen only on production or under stress tests: NSEC records that were supposed to only have a couple bits set in their types bitmap would sometimes look like this

deleg.filippo.io.  IN  NSEC    3600    \000.deleg.filippo.io. NS WKS HINFO TXT AAAA LOC SRV CERT SSHFP RRSIG NSEC TLSA HIP TYPE60 TYPE61 SPF

The catch was that our "pack and send" code pools []byte buffers to reduce GC and allocation churn, so buffers passed to dns.msg.PackBuffer(buf []byte) can be "dirty" from previous uses.

var bufpool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 0, 2048)
    },
}

[...]

    data := bufpool.Get().([]byte)
    defer bufpool.Put(data)

    if data, err = r.Response.PackBuffer(data); err != nil {

However, buf not being an array of zeroes was not handled by some github.com/miekgs/dns packers, including the NSEC rdata one, that would just OR present bits, without clearing ones that are supposed to be absent.

case `dns:"nsec"`:
    lastwindow := uint16(0)
    length := uint16(0)
    for j := 0; j < val.Field(i).Len(); j++ {
        t := uint16((fv.Index(j).Uint()))
        window := uint16(t / 256)
        if lastwindow != window {
            off += int(length) + 3
        }
        length = (t - window*256) / 8
        bit := t - (window * 256) - (length * 8)

        msg[off] = byte(window) // window #
        msg[off+1] = byte(length + 1) // octets length

        // Setting the bit value for the type in the right octet
--->    msg[off+2+int(length)] |= byte(1 << (7 - bit))

        lastwindow = window
    }
    off += 2 + int(length)
    off++
}

The fix was clear and easy: we benchmarked a few different ways to zero a buffer and updated the code like this

// zeroBuf is a big buffer of zero bytes, used to zero out the buffers passed
// to PackBuffer.
var zeroBuf = make([]byte, 65535)

var bufpool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 0, 2048)
    },
}

[...]

    data := bufpool.Get().([]byte)
    defer bufpool.Put(data)
    copy(data[0:cap(data)], zeroBuf)

    if data, err = r.Response.PackBuffer(data); err != nil {

Note: a recent optimization turns zeroing range loops into memclr calls, so once 1.5 lands that will be much faster than copy().

But this was a boring fix! Wouldn't it be nicer if we could trust our library to work with any buffer we pass it? Luckily, this is exactly what coverage based fuzzing is good for: making sure all code paths behave in a certain way.

What I did then is write a Fuzz() function that would first parse a message, and then pack it to two different buffers: one filled with zeroes and one filled with 0xff. Any differences between the two results would signal cases where the underlying buffer is leaking into the output.

func Fuzz(rawMsg []byte) int {
    var (
        msg         = &dns.Msg{}
        buf, bufOne = make([]byte, 100000), make([]byte, 100000)
        res, resOne []byte

        unpackErr, packErr error
    )

    if unpackErr = msg.Unpack(rawMsg); unpackErr != nil {
        return 0
    }

    if res, packErr = msg.PackBuffer(buf); packErr != nil {
        return 0
    }

    for i := range res {
        bufOne[i] = 1
    }

    resOne, packErr = msg.PackBuffer(bufOne)
    if packErr != nil {
        println("Pack failed only with a filled buffer")
        panic(packErr)
    }

    if !bytes.Equal(res, resOne) {
        println("buffer bits leaked into the packed message")
        println(hex.Dump(res))
        println(hex.Dump(resOne))
        os.Exit(1)
    }

    return 1
}

I wish here, too, I could show a PR fixing all the bugs, but go-fuzz did its job even too well and we are still triaging and fixing what it finds.

Anyway, once the fixes are done and go-fuzz falls silent, we will be free to drop the buffer zeroing step without worry, with no need to audit the whole codebase!

Do you fancy fuzzing the libraries that serve 43 billion queries per day? We are hiring in London, San Francisco and Singapore!

↧

Now serving the Middle East: 4 new data centers, partnerships

August 10, 2015, 2:37 pm

≫ Next: Dear Internet, Send Us Your Videos

≪ Previous: DNS parser, meet Go fuzzer

Our last embarkation into a new geography coincided with a significant milestone: our 30th data center (and first in Africa) in Johannesburg, South Africa. And as we march past number 40, we’re proud to announce yet another. Introducing CloudFlare’s latest points of presence (PoPs) in Doha, Qatar; Dubai, United Arab Emirates; Kuwait City, Kuwait; and Muscat, Oman. These data centers are the first wave in our MENA (Middle East/North Africa) expansion, and the 39th, 40th, 41st and 42nd data centers, respectively, to join our global network.

Up to this point all CloudFlare traffic delivered to the MENA region was served from our London, Frankfurt, Marseille, Paris and/or Singapore data centers, with round trip latency of up to 200-350ms. As in Africa, local bandwidth in MENA is notoriously expensive making it cost prohibitive to deliver content locally. That is (once again), until now! We're proud to announce the first of a series of agreements with regional carriers including Etisalat, Omantel, Ooredoo, and Zain to help build a better Internet in the region.

How to build a better Internet

A few of the necessary ingredients to build a better Internet include international connectivity (often in the form of submarine cable systems) that connect to national backbones which, in turn, deliver the Internet to the access networks that bridge the last mile to Internet users themselves—all of which must be in place and optimally utilized to meet increasing demand in the most cost effective way. Fortunately, in the MENA region, there is already good international connectivity, with the vast majority of Internet traffic transferred internationally through submarine cables (something we referenced in our Marseille blog post), and substantial investments already made into national backbones and last mile infrastructure (e.g. broadband and mobile networks) to the end user.

All MENA countries are connected with at least two submarine cables; the UAE alone has 15+!

The problem is that the international submarine cables built in the region were designed to link individual countries in MENA with Europe and Asia, and not to provide regional connectivity within MENA. The number of undersea cables that pass through the region without interconnection is staggering. Consequently, there is patchy submarine connectivity between the Middle Eastern parts and the North African parts of MENA (with Egypt playing a pivotal role). What this means for CloudFlare (and you) is that it takes a lot of individual PoPs to achieve broad, local coverage of broadband and mobile users in the region. It is for this reason that we’ve announced a record setting four new PoPs deployed in a single month, with even more on the way.

An additional complexity is that, even with all of the right infrastructure related ingredients, the majority of Internet content accessed in MENA is hosted far far away in North America, Europe and Asia. That is, of course, unless you are a CloudFlare user. As of now, the collective content of all 2 million+ CloudFlare users is locally accessible through the networks of our partners in the region. No additional configuration required. This re-routing of traffic is now automatically in effect for all CloudFlare users worldwide.

It's not a mirage

As the latency measurements below demonstrate, our data centers across the Middle East are already delivering significant performance improvement. In the United Arab Emirates, latency has decreased from as high as 200ms to the closest CloudFlare PoP to 10ms, a 20x improvement.

In Qatar (just in time for the upcoming World Cup) latency to Ooredoo (the incumbent operator) has decreased from 150ms to 1ms (a 150x improvement!).

The benefits also extend to some other countries and networks in the region. Latency to networks in Saudi Arabia, for example, have seen latency decline from 200-150ms to less than 50ms (a 4x improvement).

Source: Measurements from RIPE Atlas

As a whole, our first wave of MENA deployments serve traffic originating not only from Kuwait, Oman, Qatar, and the United Arab Emirates, but also traffic from networks in Saudi Arabia, Jordan, Iran, Pakistan and the Sudan, among others. Additional PoPs are in the works to expand and amplify these benefits throughout the region.

If you are curious about how this network expansion can help your organization, get in touch with our team.

—from the CloudFlare team شكرا

P.S. No castle was touched within the deployment of these PoPs; keep guessing! We still don't have a winner in our contest announced here.

Map source: Undersea Cable Map of the Middle East from Greg's Cable Map.

↧

Dear Internet, Send Us Your Videos

August 14, 2015, 1:48 pm

≫ Next: Ensuring the web is for everyone

≪ Previous: Now serving the Middle East: 4 new data centers, partnerships

CloudFlare turns 5 years old this September. It's been an amazing ride since our launch. Before we launched at TechCrunch Disrupt on September 27, 2010, we'd signed up about 1,000 beta customers. It took us nine months to get those first customers. (By comparison, today we typically sign up 1,000 customers every 3 hours.)

Those first beta customers were instrumental. They put up with us when we were had only one data center (in Chicago). They put up with us as we brought traffic online in our next facilities in Ashburn, Virginia and San Jose, California — and had the routing challenges that came along with running a distributed network for the first time. They sent us bug reports, provided us feature requests, and were instrumental to building the foundation that grew into what is CloudFlare today.

Archival Footage

When we launched, we wanted to feature their stories and experience about CloudFlare so we had them submit their stories by video. Here's the video we included as part of our launch presentation.

I'm proud of the fact that more than 80% of those original 1,000 customers are still using CloudFlare five years later.

Send Us Your Stories

As we get ready to celebrate our fifth birthday, we wanted to reach out to our customers again — now well over two million of you. Our ask is the same as the first time: make a short video and tell us what you love about CloudFlare.

We'll put together a montage of some of the videos as part of our birthday celebration. And, for the users who send us the best submissions, we'll fly you to San Francisco and put you up to be a part of our birthday celebration on Thursday, September 24th.

Thank you for helping us build CloudFlare over the last five years. We look forward to watching your videos, hearing your stories about CloudFlare, and celebrating our fifth birthday together in about a month. Until then, stay tuned for a number of announcements over the coming weeks. We're pretty sure we'll be able to top our birthday surprises of years past.

Visit our video submissions page for instructions on how to enter.

↧

Ensuring the web is for everyone

August 17, 2015, 9:15 am

≫ Next: Introducing a Powerful Way to Purge Cache on CloudFlare: Purge by Cache-Tag

≪ Previous: Dear Internet, Send Us Your Videos

This is the text of an internal email I sent at CloudFlare that we thought worth sharing more widely. I annotated it a bit with links that weren't in the original.

"Tim Berners-Lee- Mosaic by Sue Edkins at Sheen Lane Centre" by Robert Smith - Own work. Licensed under CC BY-SA 4.0 via Commons

Subject: Days of future past

Folks,

One of the exciting things about working at CloudFlare is our continual push to stay on top of what's new for our customers. We've pushed things like IPv6 and SPDY in the past; and we'll soon be giving the world DNSSEC and HTTP/2. In the world of SSL we've stayed on top of changes in recommended cipher suites and offer the latest signature algorithms SHA-2 to our customers.

But as we do this we must not forget the old protocols. Because we serve a truly global audience we serve everyone on the planet. It's easy inside a Silicon Valley bubble to think that everyone is on 1Gbps Internet connection with the latest version of Chrome on a new Mac, but the worldwide reality is far different.

We see every type of machine and browser out there. And by machine I mean computers old and new, smartphones, dumb phones, command-line clients, every type of proxy server. And we see them on satellite connections from ships at sea, 3G connections in developing countries, fiber connections to the home and more.

As we keep pushing for the future we also have to look to the past and make sure we support everyone. Supporting everyone means that all CloudFlare sites are accessible to everyone who uses the web and when someone asks "Can you handle X?" we can simply answer "Yes" without any caveats. And X can be something created 15 years ago or 15 months ago.

So, when making technical decisions we need to ask ourselves "Who are we excluding if we do this?" and really push ourselves to come up with a solution if we are excluding some portion of the Internet's users and create solutions that don't compromise speed and security.

At the 2012 Olympics in London the creator of the web, Tim Berners-Lee, appeared in the opening ceremony and tweeted "This is for everyone". Let's make sure we keep the web available, secure and fast for everyone.

John.

↧

Introducing a Powerful Way to Purge Cache on CloudFlare: Purge by Cache-Tag

August 19, 2015, 9:46 am

≫ Next: Mombasa, Kenya: CloudFlare's 43rd data center

≪ Previous: Ensuring the web is for everyone

Today CloudFlare is introducing a new way to purge the cache using Cache-Tags. Cache-Tags are assigned to cached content via a Cache-Tag response header, and are stored as metadata with cached objects so that global purges take only seconds, targeting a granular, finite set of cached objects.

For example, an e-commerce website can use Cache-Tags to purge all of their catalog images at once, without affecting any of their other assets. A blog can use Cache-Tags to update their JavaScript files in cache, without forcing a cache miss on their CSS. A business can use Cache-Tags to purge cache of all four hundred pages of their blog without purging any of the pages from their core platform.

The CloudFlare Cache

With 42 data centers around the world, web pages served directly from CloudFlare’s cache are guaranteed to be just a few hops away from any visitor, anywhere. With a little bit of fine tuning, many websites succeed in delivering most of their content from cache, saving a majority of bandwidth on their origin servers. One website even managed to reduce their AWS bill by 96% when they started caching assets behind CloudFlare.

CloudFlare’s cache is powerful, but when a developer pushes out changes to their website, it can be frustrating to wait for the content to update in CloudFlare’s cache. Long cache TTLs are great for speed, but when a developer updates their website, they want the newest version to be available immediately, globally.

CloudFlare now offers three different ways to purge the cache.

Using Cache-Tags

To setup a Cache-Tag for an item, simply add a Cache-Tag response header to that object. A Cache-Tag header can have multiple tags, separated by a comma.

Cache-Tag: tag1,tag2,tag3

CloudFlare will automatically assign any tags in the response header as metadata to that object when it is cached. Before the response is proxied to the website visitor, CloudFlare will remove the Cache-Tag header so that end users of your website cannot see the Cache-Tags assigned to an object.

To purge content by its Cache-Tag, use the CloudFlare dashboard or purge programmatically with the API.

This is a sample curl to purge cache by a Cache-Tag called ‘cat-tag’.

Purging by Cache-Tag is fast. Purges take effect globally in just a few seconds, even if the tag purged is associated with hundreds or thousands of cached files.

To read more about how to use Cache-Tags, check out our support article: here.

Cache Tag Availability

Purge by Cache-Tag is enabled automatically for all Enterprise plan websites. All a developer has to do to get started is add the Cache-Tag HTTP response header to items on their website. If you are not yet an Enterprise customer, get in touch with our team here.

↧

Mombasa, Kenya: CloudFlare's 43rd data center

August 27, 2015, 8:12 am

≫ Next: SXSW Interactive 2016: Vote for CloudFlare’s Submissions

≪ Previous: Introducing a Powerful Way to Purge Cache on CloudFlare: Purge by Cache-Tag

Only two weeks after the announcement of our four new points of presence (PoPs) in the Middle East, it is with much hullabaloo that we announce our 43rd PoP, and second in Africa following Johannesburg, in Mombasa, Kenya (a.k.a. “The Castle”). In a challenge that vexed many of our readers, Mombasa is our first PoP to be located in a real life castle-turned-data center (see above). From this castle CloudFlare is already serving networks in every country across East Africa, with reach to many of the region's 30 million+ Internet users.

Building a better Internet in Eastern Africa

While today it feels as if Internet access is ubiquitous, this is most certainly not the case everywhere. The continent of Africa was connected relatively late to the Internet and, in the first years, access was limited to a small segment of the population due to lackluster investment and growth in underlying Infrastructure, and high access costs. Most Africans were also without access to broadband Internet, and were largely limited to viewing content created and hosted half a world away—for the same reason there was little access, there was also no local hosting industry to speak of. By now, if you’ve followed our blog, you know that locating content and applications far away from users is a recipe for latency (i.e. the Internet in slow motion). With CloudFlare, your content and applications are automatically served from the closest PoP in our global network directly to your users.

Fortunately, over the past several years, Africa in general, and East Africa in particular, has seen tremendous growth in the underlying Internet Infrastructure including the arrival of many submarine cable systems. International Internet bandwidth in the region has multiplied twenty-fold in the past 5 years alone. At the same time, an increasing number of Africans now have access to the Internet, reaching a milestone of 20% Internet penetration, a level that is considered the minimum to enable a country to get significant economic benefit from an Internet economy. This is something we’re fortunate to witness (and facilitate) first hand. Entrepreneurs throughout Africa, many of whom are CloudFlare customers, are beginning to seize on the opportunities afforded by Internet access.

Our castle in the sky cloud

To build on this growing momentum, and to continue to foster the development of the African Internet ecosystem, we’re pleased to announce our first deployment in East Africa. Prior to now, traffic to CloudFlare customers from East African countries was largely served from our London and Amsterdam data centers with round trip latency of 150-350ms or, in some cases, with better latency from our data center in South Africa. As of now, our Mombasa site is serving traffic to networks in Kenya, Tanzania, Rwanda, Uganda, Ethiopia and even Mauritius (among many others).

CloudFlare's new home in the SEACOM cable landing station in Mombasa, Kenya

As the latency measurements demonstrate below (note: all measurements represent round trip time to the closest CloudFlare PoP in milliseconds), in Uganda, latency has decreased from as high as 150ms to the closest CloudFlare PoP to 20ms, a 7.5x improvement.

The same can be seen in Tanzania where latency has decreased from 150ms to just under 15ms (another 7.5x improvement!).

Source: Measurements from RIPE Atlas

Even though access costs (i.e. the cost of Internet bandwidth) remain extremely high across Africa, we continue to work with local partners to expand and amplify these benefits throughout the continent. If you are an Internet operator that would like to bring the benefits of CloudFlare to your network—whether in Africa or globally—or are simply curious about how this network expansion can help your organization, get in touch with our team.

—The CloudFlare team

The CloudFlare network today: (Mombasa pin soon to be added!)

↧

SXSW Interactive 2016: Vote for CloudFlare’s Submissions

August 27, 2015, 1:31 pm

≫ Next: Railgun v5 has landed: better, faster, lighter

≪ Previous: Mombasa, Kenya: CloudFlare's 43rd data center

SXSW Interactive 2016: Vote for CloudFlare’s Submissions

It’s that time of year again, when the end of summer is in sight, students are back in school, football is on TV again, and your social feeds are flooded with “vote for my panel at SXSW” updates. While it feels like our team was just at SXSW, it’s already time to start planning for SXSW ‘16. If these topics interest you, please take a minute to vote for them!

How to vote:

Create an account here
Enter your name & email address, then confirm your account
Log-in with your new account and go to the “PanelPicker”
Click “search/vote” and search for presentations by title
VOTE!!

*Voting ends on Friday, September 4th!

Just like last year, PanelPicker voting counts for 30% of a panel/presentation’s acceptance to SXSW. Check out the previews of our sessions below. Every vote counts!

CloudFlare's SXSW 2016 Submissions:

1) They’re Coming for our Internet: We can fight back
Join Matthew Prince, CloudFlare’s co-founder and CEO, for a presentation focused on Internet censorship and global security issues. Matthew will share how online censorship varies globally, and how tech giants should collaborate to expand the Internet’s reach, not divide it. He will also cover what your own personal rights are as an online user, and how you can better preserve them. If you're reading this blog post, this is a presentation you won’t want to miss!

Speaker:
Matthew Prince, CloudFlare

2) Innovating Like the “Early Days” 5+ years Later
Innovating is easy in the early days--especially without the legacy systems, prior customer commitments, or formal internal processes that come with time. Fast forward and you have more employees, customers, commitments, internal silos, and business goals than ever before. How do you maintain the agile innovation pace you had early on? This panel of builders and visionaries will share how they stay laser focused on what’s over the horizon, avoiding incrementalism. They’ll share how they keep their teams paving the way for others to follow.

Speakers:
Dane Knecht, CloudFlare
Charise Flynn, Dwolla
Marc Boroditsky, Twilio

3) PR for Startups: Low to No Budget Tips for Today
Learn how to drive PR for your startup--no matter how big/small you are or what your current role is. Join a former tech journalist and PR leaders from growth-stage and unicorn startups--across the enterprise, on-demand, and consumer technology industries--for a candid discussion on navigating the media landscape. Walk away with tips and tools (even free ones!) to drive awareness and take your company to the next level.

Speakers:
Daniella Vallurupalli, CloudFlare
Johnny Brackett, Shyp
Michelle Masek, Imgur
Ryan Lawler, 500 Startups

Please vote and help CloudFlare get to SXSW Interactive 2016! I can already taste the BBQ...

↧