ural 1517. Freedom of Choice (后缀数组,最长公共子串)

ural1517
题意:给出两个字符串,求最长的公共字串(要求出具体的字符串是什么)

思路:依然是后缀数组,在更新长度 的时候记录起始位置即可,1A。以及,发现多开了一个完全没有必要的数组w[],这次已删。

 

20160730update:模板已更正,lrj的模板的rk[i]为0 的时候会出现re的问题…已特判。

poj 2774 Long Long Message (最长公共字串,后缀数组模板题)

poj2774

题意:给出两个字符串,问最长的公共连续字串。

思路:后缀数组模板题。

具体可以参考两篇国集论文(09,04)
topcoder中的讲解
codechef上的讲解
还有一篇讲 dc3算法的论文:SuffixArrays_dc3
这里不谈具体的后缀数组的学习内容,说说大概的学习过程。

首先要理解后缀,后缀数组(sa[]),名次数组(rk[]),height数组,lcp 这些概念

先从定义入手,得到sa数组的n2logn的求法…

由于复杂度爆炸,所以有了两个算法来优化求sa的过程,一个是nlogn的倍增,还有一个是O(n)的dc3。。。

倍增的算法中用到了radix sort

上面这些,都是在说如何求sa,但是如果只有sa一个数组的话,就没有办法很好感受 后缀数组的power.

于是引入了height数组。

定义 height[i]=suffix(sa[i-1])和 suffix(sa[i])的最长公
共前缀,也就是排名相邻的两个后缀的最长公共前缀。那么对于 j 和 k,不妨设
rank[j]<rank[k],则有以下性质:
suffix(j) 和 suffix(k) 的 最 长 公 共 前 缀 为 height[rank[j]+1],
height[rank[j]+2], height[rank[j]+3], … ,height[rank[k]]中的最小值。

有很多问题都是基于height数组的,慢慢感受。

 

再说这道题:我们可以把两个字符串中间用一个特殊符号连接起来。

那么两个字符串的最长公共字串,就变成了求合并后的字符串的所有后缀的最长公共前缀。(原因是字符串的任何一个字串都可以看成是该字符串的某个后缀的前缀

那么容易知道,该最长公共前缀的长度一定是某个height值(原因是,height[i]表示的是排名相邻的两个后缀的最长公共前缀的长度,如果不相邻,那么取的是他们排名之间所有height的最小值,只会越来越小。

还需要注意的是,必须满足得到该height的两个后缀分别出现在原来的两个字符串中…

要怎么办到呢? 其实很容易,由于sa[i]数组存放的是排名第i的后缀是后缀几(定义从第x个字符开始的后缀就是后缀x)

设初始第一个字符串的长度为len1,那么如果是第一个字符串的后缀,一定有sa[i]<len1,如果是第二个字符串的后缀,就一定有sa[i]>len1  (sa[i]==len1的是插入的特殊符号开始的后缀)

 

还有一些细节可以参考代码注释

 

UD20160730:改正了lrj书中的错误。。对于rk[i]==0的情况进行了特判。。不然会re…

 

 

 

 

hdu 1280 前m大的数 (计数排序)

hdu1280

题意:给出n(3000)个数,两两求和,输出最大的m(5000)个和。

思路:由于数据有限,想到计数排序。。。以及,m个可能刚好某个数据没有全部输出,要在while里判断一下。。

。。。好好的人。。怎么开始刷水了。。。。。

其实是因为最近在看.suffix array…然后里面涉及到了radix sort..算法课的时候到是写过。。。不过快忘了orz。。所以打算找几道题目。。。? 然而这是计数排序不是基数排序啊摔/!

 

suffix array (转自 codechef)

原文链接:链接

讲了后缀数组的概念,然后从最暴力的O(n*n*logn )的复杂度(O(n)用来比较字符串,O(nlogn)是排序的复杂度)逐步优化,依据各个串之间的关系,大概讲了倍增算法,以及给出了一篇The Skew Algorithm 的论文。

文中实现的倍增算法的复杂度是O(Nlog2N)的。。是因为作者不会基数排序23333。

 

This text will focus on the construction of Suffix Arrays, it will aim to explain what they are and what they are used for and hopefully some examples will be provided (it will be mainly simple applications so that the concepts don’t get too attached to the theoretical explanation).

As usual, this follows my somewhat recent series of tutorials in order to make the reference post with links as complete as possible!

  • What is a Suffix Array?

In simple terms, a suffix array is just a sorted array of all the suffixes of a given string.

As a data structure, it is widely used in areas such as data compression, bioinformatics and, in general, in any area that deals with strings and string matching problems, so, as you can see, it is of great importance to know efficient algorithms to construct a suffix array for a given string.

Please note that on this context, the name suffix is the exact same thing as substring, as you can see from the wikipedia link provided.

A suffix array will contain integers that represent the starting indexes of the all the suffixes of a given string, after the aforementioned suffixes are sorted.

On some applications of suffix arrays, it is common to paddle the string with a special character (like #, @ or $) that is not present on the alphabet that is being used to represent the string and, as such, it’s considered to be smaller than all the other characters. (The reason why these special characters are used will hopefully be clearer ahead in this text)

And, as a picture it’s worth more than a thousand words, below is a small scheme which represents the several suffixes of a string (on the left) along with the suffix array for the same string (on the right). The original string is attcatg$.

The above picture describes what we want to do, and our goal with this text will be to explore different ways of doing this in the hope of obtaining a good solution.

We will enumerate some popular algorithms for this task and will actually implement some of them in C++ (as you will see, some of them are trivial to implement but can be too slow, while others have faster execution times at the cost of both implementation and memory complexity).

  • The naive algorithm

We shall begin our exploration of this very interesting topic by first studying the most naive algorithm available to solve our problem, which is also the most simple one to implement.

The key idea of the naive algorithm is using a good comparison-based sorting algorithm to sort all the suffixes of a given string in the fastest possible way. Quick-sort does this task very well.

However we should remind ourselves that we are sorting strings, so, either we use the overloaded < sign to serve as a “comparator” for strings (this is done internally in C++ for the string data type) or we write our own string comparison function, which is basically the same thing regarding time complexity, with the former alternative consuming us more time on the writing of code. As such, on my own implementation I chose to keep things simple and used the built-in sort() function applied to a vector of strings. As to compare two strings, we are forced to iterate over all its characters, the time complexity to compare strings is O(N), which means that:

On the naive approach, we are sorting N strings with an O(N log N) comparison based sorting algorithm. As comparing strings takes O(N) time, we can conclude that the time complexity of our naive approach is O(N2 log N)

After sorting all the strings, we need to be able to “retrieve” the original index that each string had initially so we can actually build the suffix array itself.

[Sidenote: As written on the image, the indexes just “come along for the ride”.

To do this, I simply used a map as an auxiliary data structure, such that the keys are the strings that will map to the values which are the original indexes the strings had on the original array. Now, retrieving these values is trivial.]

Below, you can find the code for the naive algorithm for constructing the Suffix Array of a given string entered by the user as input:

As you can see by the above code snippet, the implementation of the naive approach is pretty straightforward and very robust as little to virtually no space for errors is allowed if one uses built-in sorting functions.

However, such simplicity comes with an associated cost, and on this case, such cost is paid with a relatively high time complexity which is actually impractical for most problems. So, we need to tune up this approach a bit and attempt to devise a better algorithm.

This is what will be done on the next section.

  • A clever approach of building the Suffix Array of a given string

As noted above, Suffix Array construction is simple, but an efficient Suffix Array construction is hard.

However, after some thinking we can actually have a very defined idea of why we are performing so badly on such construction.

The reason why we are doing badly on the construction of the SA is because we are NOT EXPLOITING the fact that the strings we are sorting, are actually all part of the SAME original string, and not random, unrelated strings.

However, how can this observation help us?

This observation can help us greatly because now we can actually use tuples that contain only some characters of the string (which we will group in powers of two) such that we can sort the strings in a more ordered fashion by their first two characters, then we can improve on and sort them by their first four characters and so on, until we have reached a length such that we can be sure all the strings are themselves sorted.

With this observation at hand, we can actually cut down the execution time of our SA construction algorithm from O(N2 log N) to O(N log2 N).

Using the amazing work done by @gamabunta, I can provide his explanation of this approach, along with his pseudo-code and later improve a little bit upon it by actually providing an actual C++ implementation of this idea:

@gamabunta‘s work

Let us consider the original array or suffixes, sorted only according to the first 2 character. If the first 2 character is the same, we consider that the strings have the same sort index.

Now, we wish to use the above array, and sort the suffixes according to their first 4 characters. To achieve this, we can assign 2-tuples to each string. The first value in the 2-tuple is the sort-index of the respective suffix, from above. The second value in the 2-tuple is the sort-indexof the suffix that starts 2 positions later, again from above.

If the length of the suffix is less than 2 characters, then we can keep the second value in the 2-tuple as -1.

Now, we can call quick-sort and sort the suffixes according to their first 4 characters by using the 2-tuples we constructed above! The result would be:

Similarly constructing the 2-tuples and performing quick-sort again will give us suffixes sorted by their first 8 characters.

Thus, we can sort the suffixes by the following pseudo-code

The above algorithm will find the Suffix Array in O(N log2 N).

end of @gamabunta‘s work

Below you can find a C++ implementation of the above pseudo-code:

 

 

This concludes the explanation of a more efficient approach on building the suffix array for a given string. The runtime is, as said above, O(N log2N).

  • Constructing (and explaining) the LCP array

The LCP array (Longest Common Prefix) is an auxiliary data structure to the suffix array. It stores the lengths of the longest common prefixes between pairs of consecutive suffixes in the suffix array.

So, if one has built the Suffix Array, it’s relatively simple to actually build the LCP array.

In fact, using once again @gamabunta‘s amazing work, below there is the pseudo-code which allows one to efficiently find the LCP array:

We can use the SortIndex array we constructed above to find the Longest Common Prefix, between any two prefixes.

The LCP Array is the array of Longest Common Prefixes between the ith suffix and the (i-1)th suffix in the Suffix Array. The above algorithm needs to be called N times to build the LCP Array in a total of O(N log N) time.

  • Moving on from here

This post was actually the first long post I wrote about a subject which I’m not familiar with, AT ALL. This is always a risk I am also taking, but I tried to adhere only to the sub-topics I considered I mastered relatively well myself (at least, in theory, as I still don’t think I could implement this correctly on a live contest or even a pratice problem… But, as I said many times, I’m here to work as hard as I can to learn as much as I can!)

I hope that what I wrote is, at least, decent and I did it basically as a good way of gathering information which is very spread over many papers and websites online, so that when people read this post they will be able to grasp the ideas for the naive solution as well as for the improvement presented as a better solution.

There are many interesting linear algorithms to “attack” this problem, with one of the most famous being the Skew Algorithm, which is lovely described on the link I provide here.

Besides this, there are several other algorithms which are also linear that exploit the relationship between Suffix Trees and the Suffix Array and that use linear sorting algorithm like radix sort, but, which I sadly don’t yet understand which makes me unable to discuss them here.

However, I hope this little text does its job by at least gathering some useful information on a single post 🙂

I am also learning as I write these texts and this has helped me a lot on my evolution as a coder and I hope I can keep contributing to give all my best to this incredible cmmunity 😀

Best regards,

Bruno Oliveira

 

bzoj 1874: [BeiJing2009 WinterCamp]取石子游戏 (sg函数,要求输出第一步具体方案)

 

1874: [BeiJing2009 WinterCamp]取石子游戏

Time Limit: 5 Sec  Memory Limit: 162 MB
Submit: 726  Solved: 296
[Submit][Status][Discuss]

Description

小H和小Z正在玩一个取石子游戏。 取石子游戏的规则是这样的,每个人每次可以从一堆石子中取出若干个石子,每次取石子的个数有限制,谁不能取石子时就会输掉游戏。 小H先进行操作,他想问你他是否有必胜策略,如果有,第一步如何取石子。

Input

输入文件的第一行为石子的堆数N 接下来N行,每行一个数Ai,表示每堆石子的个数 接下来一行为每次取石子个数的种类数M 接下来M行,每行一个数Bi,表示每次可以取的石子个数,输入保证这M个数按照递增顺序排列。

Output

输出文件第一行为“YES”或者“NO”,表示小H是否有必胜策略。 若结果为“YES”,则第二行包含两个数,第一个数表示从哪堆石子取,第二个数表示取多少个石子,若有多种答案,取第一个数最小的答案,若仍有多种答案,取第二个数最小的答案。

Sample Input

4
7
6
9
3
2
1
2
 

Sample Output

YES
1 1
Hint
样例中共有四堆石子,石子个数分别为7、6、9、3,每人每次可以从任何一堆石子中取出1个或者2个石子,小H有必胜策略,事实上只要从第一堆石子中取一个石子即可。

数据规模和约定
数据编号 N范围 Ai范围 数据编号 N范围 Ai范围
1 N=2 Ai≤10 6 N≤10 Ai≤10
2 N=2 Ai≤1000 7 N≤10 Ai≤100
3 N=3 Ai≤100 8 N≤10 Ai≤1000
4 N≤10 Ai≤4 9 N≤10 Ai≤1000
5 N≤10 Ai≤7 10 N≤10 Ai≤1000
对于全部数据,M≤10,Bi≤10

HINT

Source

 

思路:有没有解直接sg函数搞。关键在于输出具体方案。

如果存在某个i,在第i堆取走了ok[j]个石头后,恰好使得sum的值变成0(也就是使得从N态到P态的值)

取这样的i和ok[j]中最小的(i是第一关键字,ok[j]是第二关键字)

需要注意的是,在第i堆取走ok[j]之后的sg值 应该为 sum^sg[a[i]]^sg[a[i]-ok[j]],而不是直接sum^sg[a[i]-ok[j]]

可以理解成先撤销了之前计算的a[i]的操作,换成qu

 

codeforces 429 B. Working out (dp)

cf429 b 题目链接
题意:

n*m个格子,每个格子有一个人value a[i][j]>0,连个人,一个从左上角到右下角,每次只能向下或者向右移动,一个从左下到右上,每次只能向上或者向右移动,现在要求两个人恰好相遇一次,相遇点的a不算数,问在满足这样的条件下使得两个人的a最大。。。(很坑的一点是。。这里相遇并不考虑时间。。就是说,不在同一时间都到达过某一格子,也认为相遇。所以我认为,题目含义更准确的说法是,路径只有一个交点)

思路:很巧妙。先预处理4个二维数组,分别表示点(i,j)到四个角落的最大值。(这个处理很容易,类似数字三角形)

然后枚举相遇的点,对于确定的相遇的点(x,y),我们可以认为是四个角落各连一条线到点(i,j)

由于只能相遇一次,所以连接方式只有两种情况。

 

 

hdu 2050 折线分割平面 (找规律,递推)

hdu 2050题目链接

题意:n条折线。。最多能把平面分成几部分。。
思路:联想到m条直线,最多能把平面分成m*(m+1)/2+1部分。。

画图发现。。。 f[2*n-1]==g[n]。。

 

 

 

hdu 2049 不容易系列之(4)——考新郎 (错排公式,注意精度)

hdu 2049 题目链接
题意:n个妹子和n个汉子对应。。然后让每个汉子取选一个妹子,不能重复,问恰好有m个汉子选错妹子的可能的方案数。

思路:从n个中选n-m个,然后做剩余m个错排即可。

答案就是c[n][n-m]*d[m]  c[]为组合数公式,d为错排公式。
然而wa到死。。。

因为我用了double….有毒。。。

double表示整数也是会丢失精度的!!!

double表示整数也是会丢失精度的!!!

double表示整数也是会丢失精度的!!!

double表示整数也是会丢失精度的!!!

double表示整数也是会丢失精度的!!!

Screenshot from 2016-07-27 19-49-40

我自杀去了,世界再见(x

 

 

 

 

hdu 2048 神、上帝以及老天爷 (错排公式)

hdu2048 题目链接

题意:n个人不放回的从一个有n个每个人对应id的卡片的盒子取一张卡片,取的正好和自己的对应就算中奖。求所有人都没有中奖的概率。

思路:错排。。。

复习了一下错排公式。。。d[n] = (n-1)*(d[n-1]+d[n-2])  (d[1]=0,d[2]=1)

然后求概率的时候。。惊讶得发现概率稳定在了36.79%(1/e)附近。。。

这是因为。。。错排还有一个公式:D(n) = n! [(-1)^2/2! + … + (-1)^(n-1)/(n-1)! + (-1)^n/n!].

求概率每次把n!除掉了。。剩下的。。其实就是e的泰勒展开,当x=-1时的值。

因为当n越大时。。这个概率越接近1/e

这道题里。。。在保留百分数的小数点两位的精度的条件下。。当n为7的时候。。答案就已经是36.79保持不变了。。。

 

 

 

 

hdu 2047 阿牛的EOF牛肉串 (递推)

hdu 2047 题目链接

 

题意:一个由’E’ ‘O’ ‘F’ 组成的长度为n的字符串。‘O’不能相邻。。问方案数。。

思路:递推。。。蒙对了(误

考虑第n位,如果为’E’或者‘F’,此时对n-1位没有限制,答案为f[n-1],所以 一共是2×f[n-1]

如果为’O’,此时第n-1位为’E’或者’F’,此时对n-2位没有限制,答案为f[n-2],一共是2×f[n-2]

因此 f[i] = 2*(f[i-1]+f[i-2])

 

hdu 2045 不容易系列之(3)—— LELE的RPG难题 (递推)

hdu2045 题目链接

 

题意:

一串 方格,每个格子可以涂三种颜色,要求相邻的格子颜色不能相同,首尾格子也不能相同。

思路:递推。没推出来23333 我好菜啊.jpg.

考虑有n个格子。。那么假设第n-1个格子的颜色是a,根据第一个格子和第n-1个格子颜色是否相同分为两种情况。

如果不相同,那么第n个格子不能和第1个格子颜色相同,也不能和第n-1个格子颜色相同,所以只有一种选择。

如果相同,那么第n个格子有两种选择。

 

 

hdu 2018 母牛的故事 (基础dp,记忆化搜索)

hdu2018题目链接

题意:第1年有1头奶牛,每年生一头奶牛,新生的奶牛从生下来的第四年(包括生下来那年)也开始每年一头奶牛。
问第n年有多少头奶牛。
思路:最容易想到的。。递推一下。。。dp[i] = dp[i-1] + dp[i-3] (注意初始化不是一个dp[1]=1,而是dp[1..4]=1..4)

递推代码:

 

也可以考虑记忆化嘛

理论上应该是所有的dp都可以写成记忆化。。。?

记忆化的好处是不用考虑什么边界的问题。。。坏处是。。可能爆栈。。。。。

 

记忆化代码:

 

hdu 2084 数塔 (基础dp)

hdu2084题目链接

题意:dp入门题。。。数字三角形。。

思路:

昨天看mit公开课。。。讲到dp的精髓是sub-problem+ reuse…

为什么自底向上呢。。。

初始化dp[n][i] = a[n][i]其实是在处理只有最后一行的子问题。。。

需要特别强调的是。。处于某个子问题的时候。。。其他部分就好像不存在一样。。。

每一个点只能向下或者向右下两条路可走。。。

那么对于这一点的最大值。。。一定是取后来可走的两点的最大值加上自身。。。

 

 

 

hdu 4283 You Are the One (区间dp)

hdu 4283题目链接

题意:有N个人按顺序排成一排上台表演,每个都有一个num[]值,若在他是第k个上场的人,则会有num[]*(k-1)的unhappiness。台下有一个黑屋(stack),对每一个人,可以选择让他先进屋子或者直接上台。现在让你找到一个最优方案使得所有人的unhappiness之和最小。

思路:

我想对了的:

无。状态设计就错了。。。转移方程也就不可能对。。。

错的一塌糊涂。。。嗯。。基础题。。完全不会2333

参考题解:参考博客1
参考博客2