Search This Blog

Friday, October 27, 2023

SailPoint reverse tokenization challenge

SailPoint's original XML Exporter was released with the Standard Services Build (SSB) in Java code, so that if users needed to customize it they could.  I ran into several issues with the original code that I published on Compass to fix:


This was in 2018.  After that I tackled the issue of reverse tokenization because the XML Exporter used a simple text replace and the IIQDA used an XPath method.  I incorporated the XPath reverse tokenization into the XML Exporter java source and deployed that to several clients.

SailPoint has since taken, those concepts and made some of those features in their XML Exporter Plugin.  At the same time, I also developed my own plugin from my original code and have expanded it.

But on a particular client, I realized that there are times when a simple replace reverse tokenization is needed.  This is needed in two places: 1) When java code is tokenized, which the XPath cannot reach, a simple substitution is needed.  2) In IT roles inside the Profiles, there is no way you can adequately describe every XML element's XPath to tokenize the entitlements inside those profiles.  This is required for roles that reference an LDAP domain.  You want to have the LDAP Domain tokenized.  Hence was my first challenge.

To accomplish this, I reactivated the simple reverse tokenization of the original code, which I had literally just coded around, and added a second file called the simple reverse tokenization file.  Reading in that file would cause all of the code to have a simple replace operation on it.

One challenge on this is that the original code expected the tokens to be described like this:

%%TOKEN%%=Pattern

this is backwards and prevents the ability to have multiple patterns reverse tokenize to the same value, so I added the ability to have the tokens in the correct pattern like this:

Pattern1=%%TOKEN%%
Pattern2=%%TOKEN%%

This allows both patterns to create the same token, for example:

dc=example,dc=com=%%AD_DOMAIN%%
dc=test,dc=local=%%AD_DOMAIN%%

To solve this I wrote the following code:

/**
 * Comb through to see if there is a match
 */
private String combAllCasePatterns(String word, String token, String replaceIn) {
  log.debug("XML-400 Trying "+word+" on "+replaceIn);
  String replaceOut=replaceIn;
  word = word.toLowerCase();
  long combinations = 1L << word.length();
  for (long i = 0L; i < combinations; i++) {
    char[] result = word.toCharArray();
    for (int j = 0; j < word.length(); j++) {
      if (((i >> j) & 1) == 1 ) {
        result[j] = Character.toUpperCase(word.charAt(j));
      }
    }
    log.debug("XML-400 Trying combination "+i+" of "+combinations+" :"+new String(result));
    replaceOut=replaceIn.replace(new String(result), token);
    if(!replaceOut.equals(replaceIn)) return replaceOut;
  }
  return replaceOut;
}

Credit to java - Finding all upper/lower case combinations of a word - Code Review Stack Exchange for the start of the comb method.  That code actually wasn't 100% correct but I got it to work.

But then here is the real challenge: what if the data looks like this:

<String>CN=Employee,OU=Example Users,DC=example,DC=com</String>

When you are doing an xml.replace("dc=example,dc=com","%%AD_DOMAIN%%") there is no way to do a case insensitive replace, unless you want to translate the search string to regex.  

In order to tokenize any capitalization version of the key, you literally have to try every combination of upper and lower case letters.

Do you see an issue here?  The longer the search string the longer the computation - a 20 character value would take over a million computations.  Also there is another complication - there are often non-alphabetics in the search string.  In the example which has a 17 character string, only 14 of the characters are alphabetic.  If you can remove those 3 non-alphabetic characters, then you can reduce the iteration from 131,072 to 16,384 iterations.  Here is my logic to accomplish that:

  /**
   * Comb through to see if there is a match
   */
  private String combAllCasePatterns(String wordIn, String token, String replaceIn) {
    log.debug("XML-400 Trying "+wordIn+" on "+replaceIn);
    String replaceOut=replaceIn;
    String word=wordIn;
    int wordlen=word.length();
    log.debug("XML-401 word length is "+wordlen);
    byte[] wordchars=word.getBytes(StandardCharsets.UTF_8);
    byte[] packedchars=new byte[wordlen];
    boolean[] isalphachar=new boolean[wordlen];
    int packedlen=0;
    for(int ipack=0; ipack<wordlen; ++ipack) {
      byte chb=wordchars[ipack];
      if((chb>=65 && chb<=90) || (chb>=97 && chb<=122)) {
        packedchars[packedlen]=chb;
        isalphachar[ipack]=true;
        packedlen++;
      }
      else {
        isalphachar[ipack]=false;
      }
    }
    byte[] newpack=new byte[packedlen];
    for(int ipack=0; ipack<packedlen; ++ipack) {
      newpack[ipack]=packedchars[ipack];
    }
    word = new String(newpack, StandardCharsets.US_ASCII);
    log.debug("XML-402 word length after removing non-letters:"+packedlen);
    log.debug("XML-403 word after removing non-letters:"+word);
    word = word.toLowerCase();
    long combinations = 1L << word.length();
    for (long i = 0L; i < combinations; i++) {
      char[] result = word.toCharArray();
      for (int j = 0; j < word.length(); j++) {
        if (((i >> j) & 1) == 1 ) {
          result[j] = Character.toUpperCase(word.charAt(j));
        }
      }
      log.debug("XML-404 Trying combination "+i+" of "+combinations
        +" :"+new String(result));
      // Rebuild the word from the packed characters
      packedlen=0;
      for(int ipack=0; ipack<wordlen; ++ipack) {
        if(isalphachar[ipack]) {
          packedchars[ipack]=(byte)(result[packedlen]);
          packedlen++;
        }
        else {
          packedchars[ipack]=wordchars[ipack];
        }
      }
      log.debug("XML-405 Trying combination "+i+" of "+combinations
        +" :"+new String(packedchars,StandardCharsets.US_ASCII));
      replaceOut=replaceIn.replace(new String(packedchars,StandardCharsets.US_ASCII), token);
      // Stop on any replace
      if(!replaceOut.equals(replaceIn)) return replaceOut;
    }
    return replaceOut;
  }

This accomplishes the task.  Challenge solved.  Oh, in order to trigger the case insensitive replace I made the user add an extra % to the token, and caution the user to use the smallest search string and only apply to IT roles or whatever particular code you wish it on, or the computation time can be excessive.