I’m really excited to be publishing a real live technical blog post! I have fun blogging about speaking, travel for speaking, and racing, but I’m looking forward to my new professional opportunities lending themselves to more technical blogs on a variety of data platform topics. I’m really looking forward to blogging on some more cutting-edge data platform topics as well, but for today I have a bit of a blog about a SQL Server feature that doesn’t get a lot of love. We can debate all the reasons for why stretch database hasn’t seen a lot of use, but it’s still there and still an interesting feature to play with.
Recently I had the pleasure of putting together a proof-of-concept for a customer who was looking into using Stretch Database to offload a large amount of archive data into Azure rather than their own datacenter(s). This particular application had some tables that were created nearly 15 years ago (a fact that will become relevant shortly). Enabling Stretch Database is covered well in the Microsoft document here so I’ll skip straight to the error I was getting.
I noticed when I enabled Stretch Database (via the wizard) some tables were starting to migrate completely normally and some were not. I began querying sys.dm_db_rda_migration_status (a DMV used for monitoring Stretch Database operations) and saw that all of the tables that were not moving were registering error 7320 in the error_number column of that DMV. A quick search on the error revealed the following text of the generic error: Cannot execute the query “%ls” against OLE DB provider “%ls” for linked server “%ls”. %ls. Not very helpful, is it? It was at this point we contacted Microsoft support because we thought we’d found a bug – some tables were migrating correctly and some weren’t, even though they’d all been configured for stretch the same way via the wizard. Our support ticket went between Azure SQL Database support and SQL Server support before we were connected with a solid engineer who assisted in troubleshooting but was unable to find anything incorrect in the tables or the way Stretch Database was configured.
We were at a loss but he recommended another review of the SQL Server error log to see if there was anything untoward there at all that we had missed in our previous reviews. What I noticed in that final review was that we were occasionally seeing errors relating to ANSI_NULLS settings in the error log. Cross-referencing that with the times the Stretch Database migration process was attempting to kick off on the erroring tables revealed that something to do with ANSI_NULLS was breaking Stretch Database migration. But what?
I relayed this information to the support engineer and he contacted the development team to ask what ANSI_NULLS had to do with Stretch Database migration throwing an error related to linked servers – an odd chain of errors. They replied and said that if a table was created with the ANSI_NULLS setting set to OFF then the Stretch Database migration process will throw that linked server error. These tables, due to their age I mentioned earlier, were created with that setting set to OFF. Problem solved! (Although changing the ANSI_NULLS setting for hundreds of related tables is its own set of challenges that should be its very own blog post). In better news, the support engineer submitted a request to the docs team to clearly document this previously undocumented issue.
I realize, given the minimal adoption of Stretch Database, this blog post may be targeted towards a limited audience. That said, if it helps just one person avoid the issues that I ran into and save hours or days of work to figure this out, it is a huge help to somebody! If that somebody is you, cheers!
As always, thanks for reading and come back soon – there will be much more content here as 2019 rolls on.
Last week, IDERA was kind enough to invite me to host a Geek Sync Webinar for them. I presented this particular Geek Sync as a webinar version of my Top 5 Tips to Keep Always On Always Humming session that I’ve done at a few SQL Saturdays in the past. Although brevity forces the slightly cutesy title, the session is specific to Always On Availability Groups and my top tips to keep them running after you’ve set them up. Essentially, it’s a list of mistakes I’ve made (or colleagues of mine I’ve made), and I don’t want you to trip over the same things that we did!
It was a good, interactive audience and I received a few questions that I simply couldn’t handle in the allotted hour for the Geek Sync. While I plan an additional blog early in the New Year to discuss those, I did get a few specific questions on the basics of setting up a read-only routing list and I committed to those people that I would get a blog out before my vacation began that covered the basics of setting up a read-only routing list.
Before we dive in, this blog assumes that you have setup a functional Always On Availability Group with a listener. If you are struggling with that, feel free to reach out to me via Twitter and we’ll take a look at it and see what the best way is for me to help you. If that’s all working for you, though, all that is left is to follow the instructions below and you’ll have a functional read-only routing list.
We need to first identify the read-only routing URL that we will be using in our script. That URL follows this format: ‘TCP://<FQDN of your server>:port # used by SQL instance’. For example, a read-only routing URL for one of my test servers may look like ‘TCP://TESTSERVER01.MGTEST.DEMOS:1433’. If your server is setup according to defaults (or you know that the instance is only listening on one port and you know that port number), these instructions will be sufficient. If you have an instance whose setup is more complicated than this, I have always found the script contained in this Microsoft blog handy. That script, in very detailed fashion, walks you through how to calculate your read-only routing URL in a wide variety of configurations.
Now that we’ve determined our read-only routing URL, we’re ready to create our read-only routing list. To keep this blog simple and readable, I’ll create a read-only routing list for a two-node Always On Availability Group (meaning the read-only connections will be pointed at the secondary under normal operations). Since, in this blog, we’re doing all of this via T-SQL, I’ve pasted a sample script below that 1) ensures that the secondary replica allows read-only connections, 2) establishes the read-only routing URL, and 3) creates the read-only routing list. My sample script (which sets up routing lists for scenarios when either 01 or 02 is the primary) is pasted below. Following the script, I have a short note about setting up round-robin load balancing via a read-only routing list (this is a SQL 2016(+) only feature) and a holiday sendoff!
ALTER AVAILABILITY GROUP [TESTAG01] MODIFY REPLICA ON N’TESTSERVER01′ WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)); ALTER AVAILABILITY GROUP [TESTAG01] MODIFY REPLICA ON N’TESTSERVER01′ WITH (SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N’TCP://TESTSERVER01.MGTEST.DEMOS:1433′)); GO
ALTER AVAILABILITY GROUP [TESTAG01] MODIFY REPLICA ON N’TESTSERVER02′ WITH (SECONDARY_ROLE (ALLOW_CONNECTIONS = READ_ONLY)); ALTER AVAILABILITY GROUP [TESTAG01] MODIFY REPLICA ON N’TESTSERVER02′ WITH (SECONDARY_ROLE (READ_ONLY_ROUTING_URL = N’TCP://TESTSERVER02.MGTEST.DEMOS:1433′)); GO
ALTER AVAILABILITY GROUP [TESTAG01] MODIFY REPLICA ON N’TESTSERVER01′ WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=((‘TESTSERVER02’, ‘TESTSERVER01’)))); GO
ALTER AVAILABILITY GROUP [TESTAG01] MODIFY REPLICA ON N’TESTSERVER02′ WITH (PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=((‘TESTSERVER01’, ‘TESTSERVER02’)))); GO
If you’ve made it this far, thanks for reading! As I mentioned, the examples above are simple and meant to provide a basic understanding of how to setup a read-only routing list. That said, SQL Server 2016 and later gives you the ability to provide round-robin load balancing via a read-only routing list. If I were to extend my sample AG above to four nodes and I wanted three of them to be load-balanced, it might look something like this: READ_ONLY_ROUTING_LIST = ((‘TESTSERVER01′,’TESTSERVER02’, ‘TESTSERVER03’), ‘TESTSERVER04’). If you note the extra parentheses to the left of TESTSERVER01 and to the right of TESTSERVER03, that is all it takes for the listener to understand that it is supposed to load balance read-only connections among those three nodes.
As I speak about, Always On Availability Groups can be a very complex topic to write about and discuss. This post is, by no means, meant to be an exhaustive exploration of read-only routing lists or availability groups. It is meant to be, as requested by the Geek Sync attendees, a quick view into setting up a basic read-only routing list. Hopefully it achieved that goal!
My blog will be quiet over the next couple weeks as I celebrate the holidays with family and friends. I sincerely hope all of you are able to relax and recharge as well this holiday season. We’ll be speaking a lot more in 2018, I believe! Cheers and Happy New Year!
As a consultant, I spend a lot of time with customers whose most significant pain point is what they’re spending on SQL Server licensing. In general, they’re all facing a similar scenario: they’ve found an architecture that works for them and as they scale that out for new clients or new users they continue purchasing the same servers (with the same version and edition of SQL Server) that’s always worked. While there’s nothing wrong with that, eventually management starts asking some questions:
Why do we need all these servers when IT says they’re barely using any CPU?
What do all these servers do?
Why we are using X-year-old software?
As DBAs (especially those of us who wear the architect hat as well), we’re in a constant battle between priorities 1 and 1A: ensuring maximum uptime for our customers and spending the least amount of money to achieve that uptime. Settling for an older architecture on an old version of SQL Server does a great job fulfilling priority 1 but, generally, a poor job fulfilling priority 1A. The more money we spend on licensing, the less we have to spend on training, new hardware, etc.
It’s incumbent on us to keep abreast of the evolution in the SQL Server universe. As we’ve seen, Microsoft has massively accelerated the pace of their development in the SQL Server space, whether we’re talking about the database engine itself or Azure SQL Database or something in-between.
Can your company save money and provide required uptime by a move to Azure? Do you need to upgrade to SQL Server 2016 SP1 but downgrade to Standard now that in-memory OLTP, advanced compression, and greater partitioning functionality no longer require Enterprise Edition? Do you need to use something like ScaleArc to ensure you’re leveraging your complete Always On availability group investment?
This blog would be thousands of words long if I delved into every single option, but my point is a simple one. As things in the SQL Server universe change by the month rather than by the year, we all need to keep up with the latest developments and think about how they might make our job easier and/or our architecture less expensive to license and maintain so our company can spend more money on their most valuable resource – us!
Read blogs, follow SQL Server experts on Twitter, attend SQL Saturdays, and make plans to attend PASS Summit so you can stay on the cutting edge of cost-saving developments. If regular operations and maintenance keep you from having the time to reevaluate your architecture, engage a Microsoft data platform consultant (like me!) to help you in that evolution. We all know old habits die hard, but they can cost you and your company valuable resources as well. Engage with the community to help break out of those old habits (and learn cool things too)!
If you work with AlwaysOn Availability Groups in a production environment long enough, you will inevitably encounter a situation that forces you to failover your AG. If you’ve just started working with AGs, that failover is likely followed by a developer or application administrator calling you in the middle of the night cheerfully telling you “something is not right with the database”.
I know this because it happened to me in my early days of supporting AlwaysOn AGs in production, and the reason was never because the database itself was not alive and available. Thankfully, while AlwaysOn is absolutely a marketing term, it’s also an accurate description of a properly configured environment. Fortunately, Microsoft provides us a lot of helpful information covering basic setup and configuration of an AlwaysOn AG. That information can be found starting here.
What isn’t centrally provided and what would have prevented a few late night phone calls to my phone over the years was a list of tips to configure the other things an application needs from an AlwaysOn AG configuration to smoothly operate after a failover.
Tip #1 Quorum Votes
While this is more of a server administrator than DBA tip, since it’s likely the DBA’s phone that will ring if this is configured incorrectly, we need to take an interest in what’s happening across our operations center. Correctly establishing server quorum votes, in most environments, is the responsibility of a server administrator. Rest assured, though, when the on-call phone rings at 4 am you’re the one waking up if this is not configured correctly.
While a deep dive into this topic is a blog in and of itself, the short version of the tip is this: if your AG spans datacenters, make sure that each datacenter has an even number of quorum votes in the cluster and that the cluster has a file share witness configured at a third, separate site. Basically, a typical Windows cluster needs a majority of voting cluster nodes online to maintain normal cluster operations. If the votes are evenly spread between two datacenters and you have a witness at a third site breaking the tie, losing just one datacenter will not bring your cluster down. While you can force start a cluster after it loses quorum, that is going to create extra, manual work for you both you and your server administrators and that work will be done under the utmost pressure because your database will be inaccessible to the application via the AG listener even though the servers themselves may be healthy. Trust me when I say it’s better to learn this lesson reading a blog than being woken up in the middle of the night with somebody yelling “we’re down!”
Tip #2 Read Only Routing Lists
An important, but often overlooked tip is to configure a read-only routing list for your AG listener. I typically do this as part of the initial build-out of the AG because, without this, you lose one of the coolest features of AlwaysOn availability groups: the ability to route read-only connections to replicas. That feature can free up the primary to provide much more stable performance to the application(s) it serves. It also greatly decreases the possibility that your phone rings with somebody complaining about read-only queries slowing down (or worse yet, blocking) read/write queries.
Tip #3 Copy User Logins
This tip continues our trip up the stack and (hopefully) solves a problem that is easy to solve but can be very difficult to figure out under pressure. Many of the applications I’ve worked on over the years had a variety of database logins (using both Windows authentication and SQL Server authentication). While this is not ideal, there are often technical and/or business reasons why there are multiple logins (using both types of authentication) that need access to the database. In an AlwaysOn AG, particularly one where you have added replicas after the initial establishment of the AG, the replica addition may not have included setting up all the logins the application needs. In that scenario the failover will appear to have worked fine and led to happy application admins and DBAs…until components of the application begin malfunctioning because they don’t have access to the replica (now primary) database. Trying to fix this under a lot of pressure may lead you on a wild goose chase or three (as the application errors or complaining users may not always point you in the right direction), but if you follow the steps found here it will walk you through the process of synchronizing logins between two servers. This process should be one of the first things you do when setting up a new AG replica instance unless you really and truly enjoy off-hours phone calls about broken applications!
Tip #4 Regular Transaction Log Backups
The last two tips are more focused on keeping your AG happy, healthy, and running smoothly which should help keep you, the DBA, happy and healthy as well. Tip #4 is a general SQL Server maintenance best practice but can take on a bit of added importance when supporting AlwaysOn availability groups. Make sure you have a transaction log backup strategy in place and being actively monitored on all databases, but especially the databases participating in the AG. While the discussion of whether or not you should perform these backups on your primary or replica(s) (AlwaysOn gives you the option to choose the location of these backups) is an in-depth topic best left for another blog entry, regular transaction log backups help make maintenance and recovery of an AG-participating database much easier. While shrinking a transaction log is rarely, if ever, recommended, it’s a much more complicated operation in an AlwaysOn AG. Beyond that, a transaction log whose size is out of control can make database recovery a much longer process simply because of the size of the files you are moving/restoring – which means you are spending many more hours with your phone ringing off the hook with unhappy users. That is something that we’d all like to avoid.
Tip #5 Stagger Index Maintenance
My final tip, #5, should keep your network administrators off your back and your AGs as synchronized as possible. It’s also a fairly simple thing to do. If you have a large database within your AG, make sure you stagger the index maintenance that is performed during your regular index maintenance. Whether you rely on a maintenance plan within SQL Server, homebuilt index maintenance scripts, or one of the many solid index maintenance solutions available within the SQL Server community, ensure that the transaction log traffic your index maintenance generates (which can be surprisingly large) is not causing your remote replicas to fall behind your primary database. If your index maintenance is causing those remote replicas to fall behind by seconds or, worse yet, minutes, simply stagger your index maintenance jobs throughout the off-hours or at least make sure the index maintenance on your large tables is spread throughout the time allotted to you for index maintenance. This means that not only are the users who are using the replicas for ad-hoc querying and/or reports likely receiving up-to-the-second data but, in the event of a failover, your exposure to data loss is as minimal as your network and server infrastructure will allow.
I hope these five tips do indeed help keep your AlwaysOn availability groups, application administrators, and users happy. That should mean some happier DBAs as well! I look forward to taking deeper dives into a couple of these topics here on the blog, but if you enjoyed this post I encourage you to sign up for my December 1 webinar here. It will go deeper into some of these topics and I’ll be able to take questions as well.