Tuesday, August 14, 2018

SOLVED - How do I query Active Directory using ADSI / LDAP Linked Server with over 1,000 rows

I need to extract account information from Active Directory (AD)/LDAP into a SQL Server table and compare it to other tables for account verification and to address a hole host of synchronization issues between various systems.  Easy enough, right?   Let me just go google that for a quick answer.

Tons of examples!  Here are a few examples of what I found (in case you didn't click the link above):
  • https://blog.sqlauthority.com/2016/03/30/sql-server-query-active-directory-data-using-adsi-ldap-linked-server/
  • https://www.mssqltips.com/sqlservertip/2580/querying-active-directory-data-from-sql-server/
  • https://docs.microsoft.com/en-us/windows/desktop/adsi/distributed-query
I'm golden!  Right?  I copy and paste some code and off I go.  Except it doesn't work. This is the error I get:

I mean, it does work until you get 1000 rows.  I google some more.  It turns out that lots of people have run into this issue.  What a pain!  I look at the solution suggested.  Basically, I cycle through the alphabet selecting records that start with A, then B, etc. I copy and past, I give it a try.  Again it fails.  It turns out that I have over 1000 people in my AD that have an account that starts with A.

Hmmm... What if I cycle through everyone with AA, AB, ... ,ZZ?  I write the code.  It works... up until I get to AL.  Turns out, I have over 1000 people who's accounts start with AL.  Okay, how about I use 3 letters in my search?  AAA, AAB, ... ,ZZZ.  Run it, it's going slow... and FAILS on ALE.  Too many people have account names that start with ALE.  Okay... 4 characters?  It seems to be running!  It's taking a really long time, but it's running!  I think we're good!  And F&**, it failed on MICH. What the hell people!  Why do you all have the same name?  This is the code I created to do this search.  (I'm going to warn you, it has a Cartesian join.)

Let's do some math. I have 29 characters that users are allowed to use in the AD accounts.  If I do a search with 4 characters each and cycle through all 29 characters, that's 707,281 searches (can someone check my math?).  I have just over 93,000 records in the one AD.  707,000 searches to find 93,000 records ins't a good strategy and it's slow and it'll probably crash something. That's not good.

What do I do? CRAP! I guess I have to actually write a program.  I don't really want to do that.  More google searches.  Lots more.  Days go by.  Weeks go by. Months go by. And slowly I realize that google is failing me.  No amount of googling for the answer will write that code for me.  Maybe someone already did something similar or so better, but I couldn't find it.

And so I wrote the below.  It's not perfect.  I'm willing to take suggestions.  But it works.  And I think you'll find it'll work for you. In about 2 minutes it returned over 93,000 records.

The code shown creates a temp table so I can see what it's doing.  You can strip this out, but I left it in so you can see as well.  It also includes a few other pieces that aren't needed for this post, but I left it there.  In addition, this doesn't show you how to do the setup.  If you need to know how to do the setup, use one of the links above from the other articles. 

How it works
I provide a list of all valid characters in the LDAP (case isn't important).  It then cycles through the list and tries to query against the LDAP.  It it fails, it adds a character to the search string and tries again.  It'll then add/remove each characters as needed until it cycles through the entire list.

The result is a program that extracts 93,000 LDAP records in just over 2 minutes.  It did this in 1914 tries.

Here is an example of the attempted tries.  I hide a bunch of rows. You can see how it cycles from A, AA, AB, AC, ... ,B, BA, ... ,Z, ETC.




Next Steps
This code is only the first step.  I don't need to run this daily.  I have another code snippet that I run periodically to extract the changes within the past 24 hours or so.  This way I can keep the table in-sync without querying the entire ldap daily.

It's important to note that this uses open query against a linked server called ADSI.  If your LDAP linked server is different, you'll need to modify the code.

Code Start
/*
Populate LDAP info into table LDAP_ALL. 
Loop through characters you indicate
are valid ldap characters and add/remove characters to 
account for LDAP for row limit.
JaredACarter 8/31/2018
*/

--Set to 1 to show temp table '#ExtractHistory'
DECLARE @TrackExtractMode INT = 1
--Set to 1 to rebuild tables, set to 0 to insert into existing
DECLARE @RebuildTables INT = 1
--Valid Characters in your LDAP
DECLARE @ValidChars CHAR(255) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ.-_'
--Your LDAP path, for Example 'LDAP://ldap.yourdomain.com' 
DECLARE @ldappath VARCHAR(50) = 'LDAP://ldap.mydomain.com' 

DECLARE @sAMAccountName VARCHAR(50) = ''
DECLARE @tsql VARCHAR(5000)
DECLARE @ChrPos INT = 1
DECLARE @ValidCharLen INT = LEN(@ValidChars)
DECLARE @SearchChar CHAR(255) = SUBSTRING(@ValidChars,@ChrPos,1)
DECLARE @SearchCharLen INT = 0
DECLARE @LastChar CHAR(1) = ''
DECLARE @LastCharPos INT = 0

IF OBJECT_ID('tempdb..#ExtractHistory') IS NOT NULL
DROP TABLE #ExtractHistory

CREATE TABLE #ExtractHistory(
ChrPos INT
,ValidCharLen INT
,SearchChar CHAR(255)
,SearchCharLen INT
,LastChar CHAR(1)
,LastCharPos INT
,QryRowCount INT
,QryError INT)

IF @RebuildTables = 1 BEGIN
IF OBJECT_ID('dbo.LDAP_ALL') IS NOT NULL
DROP TABLE dbo.LDAP_ALL
END

IF OBJECT_ID('dbo.LDAP_ALL') IS NULL
CREATE TABLE [dbo].[LDAP_ALL](
       [ldappath] [varchar](50) NOT NULL,
       [cn] [nvarchar](250) NULL,
       [displayName] [nvarchar](250) NULL,
       [userPrincipalName] [nvarchar](250) NULL,
       [mailNickname] [nvarchar](250) NULL,
       [msExchHideFromAddressLists] [bit] NULL,
       [name] [nvarchar](250) NULL,
       [givenName] [nvarchar](250) NULL,
       [sn] [nvarchar](250) NULL,
       [mail] [nvarchar](250) NULL,
       [title] [nvarchar](250) NULL,
       [department] [nvarchar](250) NULL,
       [company] [nvarchar](250) NULL,
       [ADsPath] [nvarchar](500) NULL,
       [sAMAccountName] [nvarchar](250) NULL,
       [employeeID] [nvarchar](250) NULL,
       [extensionAttribute1] [nvarchar](250) NULL,
       [whenChanged] [datetime] NULL
) ON [PRIMARY]

GOTO TryExtract

TryExtract:
BEGIN TRY
       SET @SearchCharLen = LEN(RTRIM(@SearchChar))
       SET @LastChar = SUBSTRING(RTRIM(@SearchChar),@SearchCharLen,1)
       SET @LastCharPos = CHARINDEX(@LastChar,@ValidChars)

/*START ADD/MODIFY YOUR LOGIC TO EXTRACT
LDAP DATA INTO TEMP TABLE HERE*/
       SET @sAMAccountName = RTRIM(@SearchChar)
       SET @tsql =
              'INSERT INTO LDAP_ALL
              SELECT ''' + @ldappath + ''' as ldappath,
                      cn,
                      displayName,
                      userPrincipalName,
                      mailNickname,
                      msExchHideFromAddressLists,
                      name,
                      givenName,
                      sn,
                      mail,
                      title,
                      department,
                      company,
                      ADsPath,
                      sAMAccountName,
                      employeeID,
                      extensionAttribute1,
                      whenChanged
              FROM OPENQUERY(ADSI,' + '''SELECT cn,
                      displayName,
                      userPrincipalName,
                      mailNickname,
                      msExchHideFromAddressLists,
                      name,
                      givenName,
                      sn,
                      mail,
                      title,
                      department,
                      company,
                      ADsPath,
                      sAMAccountName,
                      employeeID,
                      extensionAttribute1,
                      whenChanged
              FROM ''''' + @ldappath + '''''
              WHERE objectClass = ''''User''''
              AND objectCategory = ''''Person''''
              AND sAMAccountName=''''' + @sAMAccountName +'*'''''')'
/* END ADD/MODIFY YOUR LOGIC TO
EXTRACT LDAP DATA INTO TEMP TABLE HERE */

       EXEC (@tsql)
       INSERT INTO #ExtractHistory(ChrPos, ValidCharLen, SearchChar,
                      SearchCharLen, LastChar, LastCharPos, QryRowCount, QryError)
       SELECT @ChrPos, @ValidCharLen, @SearchChar,
                      @SearchCharLen, @LastChar, @LastCharPos, @@ROWCOUNT, @@ERROR
       GOTO CalcNextChar
END TRY
BEGIN CATCH
       INSERT INTO #ExtractHistory(ChrPos, ValidCharLen, SearchChar,
              SearchCharLen, LastChar, LastCharPos, QryRowCount, QryError)
       SELECT @ChrPos, @ValidCharLen, @SearchChar,
              @SearchCharLen, @LastChar, @LastCharPos, @@ROWCOUNT, @@ERROR
       IF CHARINDEX('Cannot fetch a row from OLE DB provider',ERROR_MESSAGE()) > 0
              BEGIN GOTO AddLetter END
       ELSE BEGIN GOTO ExitAll END
END CATCH

CalcNextChar:  --Calculate the next character to use
       SET @SearchCharLen = LEN(RTRIM(@SearchChar))
       SET @LastChar = SUBSTRING(RTRIM(@SearchChar),@SearchCharLen,1) 
       SET @LastCharPos = CHARINDEX(@LastChar,@ValidChars)
       IF @SearchChar = ''
              BEGIN GOTO ExitAll END
       --IF the last character is the last letter of available characters, then remove character
       ELSE IF @LastCharPos = @ValidCharLen
              BEGIN
              SET @SearchChar = SUBSTRING(@SearchChar,1,@SearchCharLen-1)
              GOTO CalcNextChar
              END
       --Get the next character
       ELSE
              BEGIN
              SET @SearchChar = SUBSTRING(@SearchChar,1,@SearchCharLen-1)+SUBSTRING(@ValidChars,@LastCharPos+1,1)
              GOTO IsEndTest
              END

AddLetter:  --ADD A LETTER TO THE STRING
       SET @SearchChar = RTRIM(@SearchChar) + SUBSTRING(@ValidChars,1,1)
       GOTO IsEndTest

IsEndTest:
       IF (@ChrPos > @ValidCharLen)
              BEGIN GOTO ExitAll   END
       ELSE
              BEGIN GOTO TryExtract END

ExitAll:
IF @TrackExtractMode = 1 BEGIN
SELECT * FROM #ExtractHistory
END
RETURN

--SELECT * FROM #ExtractHistory
--SELECT * FROM LDAP_ALL

--DELETE FROM LDAP_ALL

5 comments:

  1. This is absolutely fantastic!
    It works perfectly, just be careful to change the ldappath VARCHAR(50) 50 value as well.(twice found)

    Many thanks!

    ReplyDelete
    Replies
    1. I can see only one row, can you please help me with it??

      Delete
  2. Very good job! I think more people need to see this.

    ReplyDelete
  3. This is great. Do you mind sharing the code that you run periodically to extract the changes within the past 24 hours?

    ReplyDelete

SOLVED - How do I query Active Directory using ADSI / LDAP Linked Server with over 1,000 rows

I need to extract account information from Active Directory (AD)/LDAP into a SQL Server table and compare it to other tables for account ver...