When memory is burred by erosion of time...
1. To work around NP problem by chopping the solution sequence. As all certificate can be generated as a sequence of actions or states, the total solution set is factorial of the number of vertices in sequence which is non-polynomial. However, by someway, somehow, we know some positions are already fixed by a subset of vertices. This will help greatly reduce the size of solution set by only trying all permutations of subsequences between these critical positions. When the number of these critical positions increase to some extent, the non-polynomial problem reduced to polynomial problem.
There is always doubts for me in definition of "algorithm". In NP-complete problem, we try not only recognize the problem, but also have to solve the problem instance by a certain polynomial algorithm. Suppose we have a machine which can give a general solution by recognizing the problem pattern, then we don't need the algorithm, or in other words, the machine itself is the algorithm. The only problem is the pattern is too unspecific and the searching takes too long.
I take chem205 this term and it probably the most stupid course I have to take. But there is no choice for me as it is regarded as a deficiency for me. Deficiency? What should a computer science student should know about in sense of chemical reaction, assuming he won't be interested in biological application of computer science?
However, the first class turns out to be not so bad. Loyola is a good campus for study, even though you have to spend more than an hour to reach there. The teacher is fast-speaker whose style is the one I appreciate. In Concordia I did like some young, smart professors who are quick-responding, provided they are qualified by my standard. (I am joking.)
The point I want to make is that there is a knowledge gap between education systems of China and Canada. As I quoted, "knowledge is not understanding", the converse is also true here: "understanding is not knowledge". Even though I already have the big picture of this world in sense of atoms and molecules, still I have no bridge to translate them in correct English terms. (It is bad English expression. Should I say bridge two system by correct terms? ) People talk about common senses when they pronounce complicated chemical formulas. However, to an ear of non-English speaker, it takes more response time to map them to their counterparts. Maybe we can borrow the term "average seeking time" (AST) from database field to describe this phenomenon. The listening training in Concordia did have a great influence to me. Now I feel much comfortable when the instructor talks in a higher "bit-rate" (in unit of bps?). So, this course may not be useless as it is expected. At least it shrinks the gap of two education systems in terms of commonsense, or everyday chemical which is a secret name for chem205.
Experience is priceless since your mind is established through continual interaction towards enclosing environment. You can even say without experience, you are dumb. The more experience you have, the more complicated you may become. Here the experience is in a sense of new challenging one provide you can survive long enough to make your mind to some kind of perfect, or uselessly slow in some cases like windows XP with service pack 2. :)
Claim: Every well-ordered set can be mapped to natural number in a isomorphic manner.
Proof: Well-ordered set must have a least element m and let it be associated with 1. Since all subset of well-ordered set must have a least element. We can extract the least element m and get a subset which still has a least element. Make this least element associated with 2. Recursively (how do we define recursively?) extract the least element from the current subset and create a new subset and assign the least element to next natural number. Thus we establish the isomorphism between a well-ordered set with natural number.
Claim: A set with equivalent relation keeps its equivalent relation if new element is introduced by equivalent to any one of its element.
Proof: x in A which is a set with equivalent relation. y not in A and x '=' y which means xRy <==>yRx, xRx and yRy, xRa and yRx ==> yRa. Then for all a in A, x '=' a. Since x '=' y, by transitivity ...
(Actually it is an interrupted idea and what I can recall later is that I found I was late for my class. Then I
rushed out of door with no idea left. What a pity.)
(Now when I read these almost meaningless stuff again, I tried to retrieve what I want to do before. It seems it is an idea that a certain "equivalent relation" is like a bridge to connect two "equivalent set" when we can find this "relation". i.e. if we know set A,B are both equivalent relation with some relation R, and we want to "relate" these two sets by "equal" or whatever. Then we can pick up one element from each set and see if equivalent relation R can be applied to these two elements.
Say x in A, y in B. By definition for all a in A, a)xRa, aRx; 2)aRa,xRx; 3) for all b(xRb ==> aRb) because aRx, similarly for all b(aRb ==> xRb) because xRa;
Similarly we can get the three properties of "y" for all b in B.
If we can find the same relation R is also applicable to x and y, then we can claim this "R" is also applicable to set A and B which means for all a in A and for all b in B, R is applicable to a and b.
Therefore I means R is like a bridge between two sets which allow us to "compare" two sets.
These above idea is what I can imagine in this meaningless summer vacation in China.)
The pattern analysis:
This is always lingering in my mind since I studied discrete mathematics. Originally I had an impression that there are total 16 different relation matrix between two elements. Now I realized that I am wrong. It is much simpler and interesting.
Assume we are just recovering from blackout and only understand this outside world by observing and recognizing particular some elements, say object A and B. Their sequence of appearance makes no sense to our dreamy mind since we only open our eyes occasionally at glance of the world. Let's use 1 to represent that we observe the existence of an object and 0 for fail to discover it at a certain instance.
Then all the possible choices of observing element A,B is:
A B
0 0 -------- denoted as 0 (or !A, !B)
1 0 -------- denoted as 1 (or A, !B)
0 1 -------- denoted as 2 (or !A, B)
1 1 -------- denoted as 3 (or A, B)
i) Observe: !A,!B is an trivial case and we can always assume it exists. (Imagine that something never happened is like a dream and I am not living in a dream world. )
ii) Observe: Our conclusion regarding relationship between A and B is possibly changing along the accumulation of our observation. The more we observe, the more accurate, the more reliable our conclusion becomes.
iii) Observe: Every time we open our eyes, we only recognize A or no A, recognize B or no B. There is no further information such as which comes, which leaves. Or in one word, we are an observer of a "state" machine.
iv) Observe: For "single-pattern" (this is what I call them.), there are three patterns and we always assume the trivial case "!A,!B" is coming with them:
a) (A,!B) + (!A,!B) : This pattern is translated as "B =/=> A" or in English "it is not true that B implies A." Because A appears and doesn't appear with absence of B.
b) (!A, B) + (!A,!B) : This is symmetric of case a). Similarly, "A=/=> B" or "A doesn't imply B".
c) (A,B) + (!A,!B) : This is interpreted as "A=B" as A appears when B appears and A doesn't appear when B doesn't.
v) Observe: There are also three case for "double-pattern".
a)
...
(The following is added long after the above is written and they are all seemingly strange to me. I know as much as every reader does when reading the above.
I tried to retrieve what is lingering in my mind in this fruitless summer in China and this is what I can do.
...
v) Observe: Here comes logic deduction from pattern.
a) (A,!B)+(A,B)+(!A,!B) : This means A ==> B because there is no case such that B exists and A doesn't exist.
b) (!A,B)+(A,B)+(!A,!B) : Similarly this means B ==> A because whenever A appears, B must exist. But not visa versa. Or in other words, B is not depended on A and A seems to be depended on B.
c) (!A,B)+(A,!B)+(!A,!B) : This means that "A=/=B" which can be interpreted as "A appears whenever B doesn't appear" and "B appears whenever A doesn't appear". (Here I am in a little puzzle since we can never eliminate the case of (!A,!B) as we don't know if this case happens or not.)
vi) The last case is meaningless since we can call it "chaoes" which gives us no clue of what is going on.
(A,B)+(!A,B)+(A,!B)+(!A,!B) : nothing special between A and B or there is no particular interesting relation between A and B.
Does this above make any sense? I doubt it. It seems that I want to deduct "logic" from "pattern". Or logic and pattern are originally same??? )
#include <stdlib.h>
#include <stdio.h>
class Base
{
public:
	virtual void doit(int num ){ printf("base num=%d\n", num);}
};
typedef void (Base::*MemFuncType)(int);
//#define Call_Member(object, memfunc) (object.*(memfunc))
class Derived: public Base
{
public:
	virtual void doit(int num){ printf("derived num=%d\n", num);}
};
void myTest(Base* pBase, MemFuncType memFunc, int num)
{
	//Call_Member(*pBase, memFunc)(5);
	(pBase->*memFunc)(num);
}
int main()
{
	//the member function pointer
	MemFuncType pFunc;
	Base* pBase= new Base;
	Derived* pDerived=new Derived;
	//we make the member function pointer pointing to Base class method
	pFunc=pBase->doit;
	//this is predictable
	printf("this is predictable: base class with its method pointer\n");
	myTest(pBase, pFunc, 3);
	//how about this? what do you expect a derived class with base method pointer
	printf("how about this? what do you expect a derived ");
	printf("class with base method pointer\n");
	myTest(pDerived, pFunc, 4);
	//typedef void (Derived::*DerivedMemFuncType)(int);
	printf("\nNow let's try to make method pointer pointing a derived class method:\n");
	pFunc=(MemFuncType)(pDerived->doit);
	printf("what do you think? a base class pointer with its derived class method ");
	printf("pointer\n");
	myTest(pBase, pFunc, 5);
	printf("And a derived class pointer with its derived class method pointer\n");
	myTest(pDerived, pFunc,6);
	return 0;
}
The running result:
this is predictable: base class with its method pointer
base num=3
how about this? what do you expect a derived class with base method pointer
derived num=4
Now let's try to make method pointer pointing a derived class method:
what do you think? a base class pointer with its derived class method pointer
base num=5
And a derived class pointer with its derived class method pointer
derived num=6
Press any key to continue
A function pointer or a struct?
It is extremely surprising to me when I realized there is an alternative way to instantiate a template "set" class with a struct instead of a function pointer. See below the declaration of template class set from MSDN:
template<class Key, class Pred = less<Key>, class A = allocator<Key> >
    class set {
public:
...
explicit
set(const Pred& comp = Pred(), const A& al = A());
   
set(const set& x);
   
set(const value_type *first, const value_type *last,
        const Pred& comp = Pred(), const A& 
al = A());
...
};
How to use this class? The most straight-forward way 
would be like to define a function pointer type and then declare a function 
pointer which points to an implemented function and pass this function pointer 
variable as parameter for "constructor" to instantiate the "set" variable after 
you pass the function pointer type as parameter of type. See the following 
common declaration part and note both the struct declaration and function 
pointer declaration:
class Int
{
	//the function usually is a friend function to access member variable
	friend bool IntComp(const Int& first, const Int& second){return first.i<second.i;}
	friend ostream& operator<<(ostream& os, Int& it){ os<<it.i; return os;}
	friend struct IntCompStruct;
protected:
	int i;
public:
	Int(int myInt):i(myInt){;}
	//bool operator<(Int other){return i<other.i;}
};
//this is the unusual way by declaring a struct which overloads the operator()
struct IntCompStruct
{
	bool operator()(const Int& first, const Int& second) const
	{
		return first.i<second.i;
	}
};
//This is the usual way by define the function pointer type and declaring the pointer which points to implemented function.
typedef bool (*IntCompFuncType)(const Int& , const Int&);
IntCompFuncType pIntComp=IntComp;  
//the function pointer is pointing the real implemented function
 
A) This is the most straight-forward usage of function 
pointer. Please note you need pass the "function pointer type" as parameter for 
type declaration and also the real parameter for "constructor":
set<Int, IntCompFuncType> intSet(pIntComp);
int main()
{	
	set<Int, IntCompFuncType> intSet(pIntComp);
	for (int i=0; i<20; i++)
	{
		Int myInt((int)(rand()%30));
		intSet.insert(myInt);
	}
	cout<<intSet.size()<<endl;
	
	set<Int, IntCompFuncType>::iterator it;
	for(it=intSet.begin(); it!=intSet.end(); it++)
	{
		cout<<*it<<endl;
	}
	return 0;
}
B) The second way for usage is rather surprising for me. Instead of declaring function pointer, you declare a struct which overloads 
the operator(). This means the compiler will automatically try to plug in two variables of type of "Int" to get the default "comparison"
function. And amazingly you DON'T have to pass parameter to constructor. 
(Originally I do like this:
...
typedef struct IntCompStruct{...}intCompStruct; //declare type and declare a variable
...
int main()
{
set<Int, IntCompStruct> intSet(intCompStruct); //This is wrong! As compile thought it is a function declaration!!!
)
The correct way is like following:
int main()
{
	set<Int, IntCompStruct> intSet;
	for (int i=0; i<20; i++)
	{
		Int myInt((int)(rand()%30));
		intSet.insert(myInt);
	}
	cout<<intSet.size()<<endl;
	set<Int, IntCompStruct>::iterator it;
	for(it=intSet.begin(); it!=intSet.end(); it++)
	{
		cout<<*it<<endl;
	}
	return 0;
}
Both way give you the same running result:
14
1
2
4
5
6
10
11
14
17
18
22
25
27
29
Press any key to continue
The following example is from chapter one of <<C++ template>>. I found it particular interesting because it makes me realize that "typedef" has some difference than straight-forward replacement. See,
typedef char* Chars;
typedef const Chars CChars; //constant pointer to char*
But see the following:
typedef const char* CChars; //pointer to constant chars.
And it is equivalent to: typedef char* const CChars;
(After several months later when I read the above words, it takes me quite a time to get the meaning. You see, out of
context, out of mind. The point here is that there is so-called "associativity law" involved here and this is usually
predefined in a language which cannot be overloaded by method of operator-overloading or type-definition. Does it imply
something? It may ring me a bell that there are some law more fundamental beyond common programming language.)
A riddle
If you smile, she smiles; If you are sad, she is sad.
It is a mirror.
It is an isomorphic mapping if vice versa.
You have some belief and I have some belief, then how can I convince you or how can you let me believe in what you believe.
The simple common-sense is a proof. But on what basis? Suppose we have no common concept at all, where does the proof
starts. Should I use a kind of linear transformation to transform all your idiomatic concept into my domain? Do we trust
this transformation at the first place? Or do you even understand what I am talking to you? Are we living in two different
galaxies?
AI?
The day before yesterday, when I waited for Metro in Guy-Concordia, it suddenly puzzled me with a basic question for AI. Are we supposed to do this research? Or in other words, is it the highest priority for AI developer to develop a system which even surpasses human brain so much? For example, the zebra puzzle is usually beyond human brain for searching. But does this fact denies our status as a member of human society? Am I supposed to have an IQ as high as Einstein?
I have the basic learning ability, namely induction and deduction, why don't we concentrate on this? Of course, I know the arguments even before they are spoken. Without thoroughly understanding the basic searching technique, all is empty talk. But I still want to argue that before robots are built at least we need to have a visualize concept of what kind of robots we are building. Is it a common labour with average intelligence or a born Nobel prize winner? The second choice seems to be more important or more attractive to computer science since we have more than too many first choices. This planet is full of human beings. However, can we jump to the second without reaching first? I highly suspect it. Then why don't we concentrate on anglicizing morons, idiots, jerks instead of elites? Maybe I can voluntarily become the first sample for research.
Sleeping Talking!!!
If a problem is proved to be NP-complete then what else can you do for it? Approximate algorithms? Or so-called genetic algorithms which seems to me like rational randomization algorithms? All this is true provided we are talking about the same dimension. What if we can transform the problem to a high order dimension. I know this sounds perfectly like nonsense wild illusion of science fiction or insane garbage. However, don't we observe the similar thing happened before? For example, how do we calculate the time complexity of an algorithm for finding all factors of a NUMBER? Do we regard the N = NUMBER? I have been told from time to time that the N should be the size of length of encoding of the problem. So, actually this is a game of how we encode the problem. We are lucky that we find a clever encoding to represent number such that the encoding is logarithm of its encoding length after millions of evolution. Is this a kind of high-order function for knowledge representation? What if in distant future, all common people feel comfortable in communicating with matrix or state machine patterns? What if our computer is designed to calculate high-order logic than using this rather uncivilized boolean representation of logic? Just imagine number encoding is a typical knowledge compression or efficient representation? Can't we find a more efficient way to represent logical equivalence instead of A<=>(BvC) ^ A<=> !(BvC) which leads to inconsistence of CNF when number of variable is three?
(When I had my hair-cut, my brain got clearer and I realized that I am dreaming when I wrote above because it is exact the
opposite! Encoding can never reduce the time complexity!)
my understanding of neural network
using vector represent objects already implicitly declare all properties are 
equally weighted.
If we have a set and we are interested in the binary relation of all binary 
relations, for example we 
want the positive difference of all possible pairs in ascending order, and the 
most compact mannar. In
other words, the Golocm ruler problem. If we want all binary relation in a 
consecutive mannar, or Hamilton
path. If we want find the smallest of all possible Hamilton path, we are asking 
travel salesman problem.
All these are related with binary relation of binary relations.
How to fast compare a set of elements?
The simplest situation is well order set because we can use natural number to 
quantify them. Or even we use
real number to measure, still we can convert them into natural number mapping 
which is always the simplest
representation of relations, or degree of binary relations.
That's why in neural network they use weight to represent each connection, by 
using real number of weight
we actually express the binary relation. This is the empirical of the 
algorithms. And what's more, the 
network topology is a high order relation representation among modules. While 
weight represents the relation
within module. Still the ...
What is neural network:
1. property weight within one object are represented with real number
2. possible binary relations among objects are represented by network which is 
essentially using graph.
3. using feedback to adjust property weight to mimic learning, here the property 
weight must be viewed 
recursively where connections between models can be viewed as internal weight 
within a bigger model.
Therefore algorithm is universal such that everything is within ONE GENERAL 
MODEL. So, the model is
defined recursively.
After I have this idea established, I feel I can relax myself for a lunch 
because this is the biggest 
work today even it is through half hour reading.
the way of all ways, the name of all names
generalization function:
1. we need a powerful recognition function:
recognize(X) where X is anything and this "recognize" function is so power that 
it can even partially 
recognize in which case it also gives the difference, it can also compare.
2. then do we need anything else? an object must be recognized, a pattern must 
be recognized, a decision-
making algorithm is just based on this recognizition. for example, in supervised 
learning a correct example
is given and when we are given a new case to study, we must recognize or 
partially-recognize to see the 
invariant of data format in both "premise" and "result". 
When I say "partially-recognize", it means the premise is not singleton or 
terminal or simple and a more
refined recognizition process must be done on basis of decomposition of object.
A typical decision-tree algorithms is one simple solution. A neural-network 
model is also applicable in
the sense that all possible binary relation degree among elements in premise can 
be well-formalized.
(define "all possible binary relation degree" as a set of real numbers mapped to 
all possible binary relations
of a set after all binary relation has been measured with a common base degree.)
(define process "well-formalized": a set is establised with a certain 
well-formed relation.)
About "inductive bias" problem, I have a simple solution based on Darwinian 
theory: those fitted survive.
In computer science implementation: hyposthesis and algorithms must be able to 
be generated automatically.
No matter by permutation or mutation or randomization, something must be 
generated and tested. And because
of resource limit in computer system, something must be forgotten and all 
program pieces are competing to
survive. (Or do they need to? I mean the selection system forces the survivals 
are stronger which is measured
by degree of usage. i.e. at top of priority queue. If we can mimic OS 
architechture to "cache" our resources
we can mimic long-term memory and short-term memory.)
All programs are like virus who are competing for survival, or pretending to 
competing because only the 
result is like competition. It is the selection mechenism which makes them look 
like competing.
The more recognized, the more used, the higher priority it is given. Those fails 
to be used for long time,
gradually forgotten from system.
The ability of forgetting is essential in evolution of program.
The ability of generating algorithms is essential to all other tasks.
The ability of recognizition is essential among all abilities.
Which one should we start on?
 
Maybe I can use it in my future "go" game which is always in my mind
Alex,
...
Integrating All Together
Expert system is just the memory part of AI. It doesn't necessarily mean a dead end of AI development because in my mind a
real AI system is a super combination of all AI research areas: using neural network for architecture, using genetic
algorithms to evolve all algorithms (maybe including itself), using strong method or expert system as memory, using
weak method as deduction system, and combine data-driven approach with expert system for induction to generate hypothesis,
using both supervised and unsupervised training to refine its learning process (it should combine with hypothesis-
generation-system), ...
all in all, everything must be combined and only by that we may replicate a real human. However, a fundamental question is
Do we really need to replicate human when we already have too many of them? In other words, a real human has a lot of parts
we don't want our AI to have, like laziness, error-prone in deduction .... Then why should we replicate a 100% human? What
we really appreciate for AI is its part of property of machine which is hard for human to be. Therefore we might want to
lower the weight which represents more human like "neural network, genetic algorithms" etc. (This is just an example.)
Anyway it must be very interesting to integrate all system together. Maybe I can do this because this doesn't need too much
intelligence. :)
In the game of "go"
There are two themes in system: experience and reasoning. The first needs little explanations and will be applied when no
explicit or satisfactory reasoning can be given, or a certain special situations are recognized. The second is purely
deductions from rule-based system or data-driven pattern discover system. To modify the first is easy since they are data
feedback. For the second it is a bit difficult to modify its algorithms since this may require some fundamental change in
axiom system and we need to argue with machine. :)
In the game of "go", there is a good way for learning. It is called replay when two player replay their steps and explain,
query each step. Such a training mechanism is essential in our system.
Induction <==unification classification===> Deduction
These two operation maybe represented in a single operation with its inverse operation. i.e. When we represent concept of
ball, we may use generalized formula shape(X, round)^color(X, Y) ^ size(X, Z), then later deduction is just unification
process by replacing variable with ground symbol. And in contrast, the induction is to classify clustered data to possible
variable. i.e. a learning data set is like following:
1. shape(obj1, round)^color(obj1, red) ^ size(obj1, big)
2. shape(obj2, round)^color(obj2, blue) ^ size(obj2, small)
3. shape(obj3, round)^color(obj3, red) ^ size(obj3, middle)
...
The process to discover the invariance among dataset is induction when dataset is already abstracted at level of symbols.
And only by abstraction dataset can we exploit the algorithms available for us.
I have a hutch that these two operations can be implemented like function and inverse function. A hash table is a typical
example.
In many occasions, unification is called specialization while classification is called generalization.
optimistic assumption and pessimistic assumption
It is said that in Prolog anything that is not proved to be true is assumed to be false. While in human world anything that
is not proved to be false is assumed to be true. This seems to be pessimistic view and optimistic view about world which
can be used to estimate both upper bound and lower bound of a changing, uncertain world when incomplete information is
supplied. If in game "go", we can use both view to estimate our human opponent and get an upper bound and lower bound of
estimation, it can be used in our heuristic selection like alpha-beta-prune.
Sometimes philosophy can be biggest heuristic in all scientific researches.
Two types of learning
In deductive learning, nothing is really new except that you need to discover where it is hidden. But in inductive learning, it is possible that you may never discover it because they may never exist. And what you discover are all hypothesis which need to be verified by testing.
How to learn
We try to discover a number x by discovering what it is bigger than and what it is smaller than: i.e. x>=a and x<=b. Then
we declare that x is in range of [a ,b]. If we are lucky or we are doing very well, we find a=b and we can declare x=a=b.
That is to say, we discover an object by discovering its upper-bound and lower-bound. Similarly we define an object by
defining what an object is and what an object is not which corresponds to the outer space and inner space of the concept
of the object.
Then when...
Concept clustering or classification is the first step of recognition. Without this ability of comparison and contrast, we
can never be promoted to the stage of recognition.
Frame Representation vs. Semantic Network Representation
In textbook, it lists some problems of Frame representation such as representational inadequacy, lack of negation, disjunction, quantification etc. However, frame is more or less a sub-dialect of Object-Oriented design for representation of world which is more and more popular since it was born. So, the obvious conclusion is that the problems above is not so serious. But why?
My idea is that one is optimistic view and the other is pessimistic view about world. In frame or OO we assume everything is true unless it is explicitly claimed to be false. Then why do we bother to have a disjunction if we assume all are always true? As for negation, is it really necessary if we don't have disjunction since if we declare something then we already implicitly declare its negation form and we don't need it. This representation is an optimistic view of world. Just like we assume everyone is innocent before he is proved otherwise. On the contrary, semantic network takes a pessimistic view of world or sceptical view of world which always assume nothing is true unless it is proved to be true.
In my stereotypical view, one is the upper bound, the other is the lower bound of representation.
What to represent?
This is the most fundamental question in knowledge representation. What on earth are we going to represent? Concept or relation between concept? Which is more important? Or what is more useful?
There are two views about world. One is static view and the other is dynamic view. In traditional knowledge representation, it emphasis more on the static view. Therefore they have the so-called semantic network, conceptual graph etc. They are pretty good when you use microscope to study each concepts casually in a sunshine afternoon before you go off nap. So what? Do we represent knowledge for just representing them as they are? We are using representation to do our deduction job and in a changing world this network doesn't work properly because it doesn't grasp the nature of the world. The nature of world is the binary relationship between two objects or concepts. No matter how you view the single object or concept, you are required to get a big picture of the movement of all objects in a system (closed world at some time point). That is why relational database can satisfy most of needs of industry because it grasp the relation between entities.
So, the dynamic view is at an upper wind against static view.
Backtrack Algorithms
In the pseudo code of algorithms, the most important part is the "while loop" for backtrack and it is the first time for me
to notice it. It is like an pushdown automaton to check the top of stack to make sure if the parent of node or the sibling
node is encountered. (Indeed it is a pushdown automaton.) This is how backtrack is done.
Searching Tree
We deliberately want to change a general search graph into a tree to avoid looping while in most cases all possible binary
relations consists a general graph. And the reason I don't feel happy about the algorithm in textbook is that it seems to
me that there is little difference between backtrack and depth-first-search. The reason we use backtrack rather than DFS
maybe the case when we are interested more in the path when goal state is reached. In purely DFS, path is not maintained
by the algorithm explicitly and it is the responsibility of programmer who has to use certain data structure to bookkeep
the path. Then why should you classify them into two classifications to confuse readers? Unless the author can give a
reason other than above I will not feel happy. I mean "state-searching" algorithm and "path-searching" algorithm.
A Reminder of a Lesson I learned in Lecture
When this question is asked, I take it for granted that there is no difference. However, it does prove that most smart
classmates in my class are smarter than me. It is like this: To identify if you are descendant of an ancient famous person,
say George Washington, which direction of searching would you use? From you and up to all your ancestors, or from George
Washington down to all his descendants until now? At first glance, I think there is no difference. Then the analysis of
professor and other classmates teaches me a lesson. From bottom-up, you always have a branching factor of two, your
parents. And from top-down, you may have a branching factor bigger than two. Taking the fact into consideration that people
have more children than today, we know probably bottom-up is a wise choice. Just remember this lesson.
Best-First-Search
In the explanation of algorithm, the author emphasis that each state must maintain extra information to achieve two thing:
1. The information of its ancestor so that when goal state is reached the solution path can be established.
2. The information of its ancestor so that the length of path can be compared when it is reached from different path.
And this is exact the problem I encountered during my 15 puzzle implementation. First I use parent pointer to maintain
parent of each state. Later I have the concern that STL vectors, sets, lists are copied by value and some of them maintain
internal sorted by STL. Therefore the pointer might be invalid after each expansion of structure. So, I have to store path
information in each state instead of storing partial path in each state. And anther headache issue is that the sorting must
be done whenever a new state is added. Even though I take advantage that "set" in STL is always sorted internally, still in
15 puzzle, most children generated are new states and the list of candidate states becomes so huge that the server
"alamanni" is seriously affected and my account was suspended. This is the story of 15 puzzle.
The implementation of Best-First-Search has some details to be considered.
Neuron Network
After lecture I talked with Alejandro for a while about his interesting question: How do we adjust threshold in Perceptron?
And I have an idea that we don't even have to design a perceptron with threshold because we can use a high-layer perceptron
to mimic the threshold of current perceptron by adjust the weight between it and high-layer perceptron. It is like using a
high-order function to manipulate some operations in low-order function. He also mentioned that he uses MatLab for his
pattern-recognition course and he doesn't found the way how to adjust threshold. So, the question arises from here.
During the lecture I almost fell in asleep not only because I don't like the subject but I also only slept for five hours
last night. However, I have some idea about this neural network. It is like this: I am always thinking about a possible
universal, basic, fit-for-all algorithms which can generate all other algorithms and evolutes, adapts to environment. Now I
have a slightest idea of what it looks like. It is just a pattern which is a mirror of data. The algorithms is represented
by input data because only by representing algorithm with input data can you achieve the requirement. But the question is
how? And in neural network it is nothing but logic pattern or simply pattern because logic itself is just another
representation of pattern. My algorithm is as simple as enough so that it can cope everything and it is selected by
environment like Darwin's law. You just adjust yourself to be like a mirror to reflect the outside world pattern. This is
the big idea of algorithm of algorithm.
Another big idea is about probability which I always have difficulty to grasp its nature. Now I change my mind to the fact
that a probability is simply a measurement of binary relation between entity. As long as you can get a mapping for the
relation you grasp the probability. And the format of real number of shape of probability is a disguise of its discrete
pattern and can be uncovered by mapping real number to integer.
As for time, can we imagine that it is another universal measurement for well-order-set because we just need such a
measurement to establish such a well ordered set in our world. One entity may not have shape or size, but it always has
time for mark its order. Naturally it leads another question, if we are measure time in different time coordinate, we may
not have a well ordered set unless the time in different coordinates are transformed.
What is a model?
At beginning, I have a wrong impression that a model is a tautology like p(X) v !p(X). Then I realized that a model is an
interpretation which needs not to be a tautology. For example, forall X [p(X) -> q(X)] and if the interpretation is like
following then the interpretation is a model:
let p(X) = {X<3 where X is an integer}; q(X)={ X<5 where X is an integer}
However, what is a tautology? It seems that it is a super model because no matter what interpretation you have, the
tautology is always able to satisfy all variable assignment. For example, forall X [p(X) v !p(X)].
And according to textbook, this expression is also called valid since it satisfy all possible interpretation.
Inference Rule
I don't like the way in text book about completeness of inference rule. It is said that an inference rule is complete if
it can produce every possible logically followed expression from original set of logical expressions. Then later it
introduce another concept of strategy which is said to combine with inference rule to make it complete. Then which one is
complete? Inference rule or combination of inference rule and strategy? And to make things worse, it also introduces a
concept of algorithms. So, should we regard algorithms as an alias of strategy? Certainly not because strategy is a even
high level scheme than algorithm.
philosophy is the biggest heuristic of computer science
In lecture of neural network, professor shows some idea about how neural network feed forward signal and back propagate
error. This is a quite sophisticate algorithm and if this is true in brain then there may be some, quote, "genius hardware"
in brain.
However, do I believe there is some built-in algorithm in human? Do I believe there is born genius who has unique brain
structure? Yes and no.
The answer is yes because people look like different and personalities are different. The answer is no because I don't
believe there is built-in algorithm to some extent. Let us imagine the brain is constructed with countless perceptrons
which are in a chaos. By chaos I mean these perceptrons have random weight, random number of levels, random so-called bias
which is just a default initial value. Then we can claim that some of these random combinations will be a match with some
logical patterns. And these are just the states when a child was born. After he/she starts contacting outside world, the
brain is automatically "searching" among those random perceptron combinations for fittest match to "reflect" image of
outside stimulus. And this search is very efficient because you are NOT constructing patterns by "error-and-trial" model,
instead the models are already constructed before searching starts. However, when matched model is found, the feedback
effect will reinforce it by frequent usage. Note that this is exactly like random algorithm. Computer scientists try very
hard to find sophisticate algorithms and just wonder how human can do it by intuition in brain. It is just done by random
algorithm which can work in very short time provided the number of perceptrons are large enough. Imagine that you have
countless perceptron combinations and you want to find one pattern to fit your input-output. Feed all of them
simultaneously and just check the output. The catch is still using space for time to reduce computation complexity.
If this wild guessing is true about our brain, then we may claim that genius is possible because there is some random
chance that someone's random perceptron combinations are very matching most of our scientific logic which means that most
of his brain can be used. While for some other people, maybe most of their perceptron combinations are like contradictions,
invalid, inconsistency etc. However, please note I don't introduce any difference in algorithms between genius and common
people because algorithms is like hard coding and God dislike hard coding. We only need a simple matching mechanism which
can be implemented at very low level of "neural reflection".
This is the definition I found in internet:
(one of the major branches of philosophy, also known as philosophy of knowledge. It concerns the forms, nature, preconditions, sources, types and limits of knowledge.)
And the following is some interesting definition of knowledge I found in internet:
(The objects, concepts and relationships that are assumed to exist in some area of interest. A collection of knowledge, represented using some knowledge representation language is known as a knowledge base and a program for extending and/or querying a knowledge base is a knowledge-based system.
Knowledge differs from data or information in that new knowledge may be created from existing knowledge using logical inference. If information is truthful data plus meaning then knowledge is information plus justification/explanation.)
The above statements reminds me a famous quote in my textbook of comp445 "computer networking":
(Data is not information; Information is not knowledge; Knowledge is not understanding; Understanding is not wisdom.)
(During the process of searching above quote in my textbook, I found another quote of knowledge: Knowledge is of two kinds.
We know a subject ourselves, or we know where we can find information upon it.
And is it beautiful that in our system we either maintain the most-frequently-used algorithm or data in memory and also
a pointer to a table in storage disk where we store all our known algorithms and data, a database. And I wouldn't feel
surprised that current or future AI systems use an operating-system like system because an AI system is essentially a kind
of operating system.)
All of my dictionaries fail to show this word and I have to search it by internet and find some philosopher's website. There is a so-called tripartite definition of knowledge:
S knows that p iff
1) p is true 2) S believes that p 3) S is justified in believing that p
Only 1) is objective and 2) and 3) are both related with subjective involvement. Are they important in knowledge acquirement
process? No, because proof is a way to dispel personal belief and communicate knowledge with no personal bias if the proof
is well-done in scientific manner.
Turing's proof
It has been quite long since I took comp335 in which Turing proves that some problems cannot be solved by computer.
Since then till now, the proof is always seemed to me like magic. Now I see a slightest direction of what it says. It is
probably a similar paradox in proposition like: this sentence is a lie. Observe, there are two kinds of data in Turing's
proof and here. One is the data used to represent a pattern which we use as operations like the program code for
computation or the contents in sentence for structure of logic. The other kind of data is just purely data which is like
as it is. It is a pain that we have no other way to represent our meta data by other kind of data. Like compiler, the
grammar is also written in formats of a data file and if you don't make distinction on some symbols used purely as meta
symbol, you will mess up.
The insight here is that we might need to distinguish meta data from raw data.
the law of the excluded middle and the law of the excluded contradiction
I found out a phenomenon that the older the book is, the better the book is. I borrowed a book from library in 1965 which
is quite good on theory of set and for the first time I know there is a name for some rules. For example, in our assignment
you have to use these laws for deduction:
Consider the following story:
Jones, Smith, and Clark hold the jobs of programmer, knowledge engineer, and 
manager (not necessarily in that order!). Jones owes the programmer $10.  The 
manager's spouse prohibits borrowing money. Smith is not married.
Your task is to figure out which person has which job.
 
For most of us, the intuition is an illusion. There is too much information in very first sentence and if you fail to express them in logic, surely you cannot proceed on. How do you express the idea that one man has exact one job and one job is done by exact one man and no man has two job?
1. forall X ¬owe(X,X)
2. forall X programmer(X) → owe(jones, X)
3. forall X manager(X) → ¬owe(X, Y) ^ married(X)
4. ¬married(smith)
5. programmer(smith) v manager(smith) v engineer(smith)
6. programmer(clark) v manager(clark) v engineer(clark)
7. programmer(jones) v manager(jones) v engineer(jones)
8. programmer(smith) v programmer(jones) v programmer(clark)
9. manager(smith) v manager(jones) v programmer(clark)
10.engineer(smith) v engineer(jones) v engineer(clark)
11. forall X programmer(X) → ¬engineer(X) ^ ¬manager(X)
12. forall X engineer(X) →¬programmer(X) ^ ¬manager(X)
13. forall X manager(X) → ¬programmer(X) ^ ¬programmer(X)
The catch here is that it is not so obvious for all of us to realize the real meaning in following two statement:
Within a certain context,
Every statement is true or false.
No statement is both true and false.
The first one does NOT imply that the truth or falsehood of every statement must be determined or even decidable. And the
name of two laws for both statements are the law of the excluded middle and the law of the excluded contradiction.
In the elementary-hood axiom, we implicitly apply another basic axiom which is more basic: recognition of an element.
CNF SAT vs. Linearly Independence
During waiting shuttle bus, I have an idea about analogy between CNF SAT and linearly independence.
Let B1,B2...Bn be column vector of matrix M and then the linear system by M*A where A is column vector will have solution
if and only if B1, B2...Bn can be expressed as linear combination with coefficients as tulps of A.
And for CNF SAT, it is similar.
Pattern Recognition
All searching problem can be eventually reduced to be a string pattern searching problem because the target can always be
represented as a state in a state machine and the target context is a stream of states of a state machine. In paradigm
a searching can be divided into two main kinds. One kind is that we can do some pre-processing with both the pattern x with
|x|=m>0 and the context t with |t|>0. We can call this kind of search as knowledge intensive, or strong method in AI. The
other one is that we cannot do pre-processing with either x or t and it is typically the weak method or classical searching
as it has no pre-requested knowledge about the searching pattern and context. This is a universal method. However, I think
most practical searching problem must be between these two extreme. That is to say, even we have not much information about
the context t, we still have some time to study the pattern x. This assumption is reasonable since most pattern is not very
big compared with t. And most algorithm of string searching is based on this idea like KMP etc.
AI Grade
What are you going to do about it if you did very bad in a thing which you love very much, for example, AI? Does it mean I am not suitable for this subject? Anyway, I don't believe too much in the grades of Concordia just like many people in other universities don't believe the grading system of Concordia too much. However, it passes to me one message: you can do fairly well in most of undergraduate courses without ever reading the textbook. However, it is not the case for most of graduate courses even you read the textbooks.
Notes in Reading Weiser's Paper on Program Slicing
It is a painful experience for me to read something because for time to time I am encountering ambiguity of natural language. I think my brain is more and more like a naively-implemented compiler which shares no robustness of human brain in understanding other people's idea in English. Or a more probable explanation is that my English reading/writing ability is as low as what my Engl212 reflects.
The idea of slicing window is exactly what a programmer observes in a debugging tool window.
The way of program slicing is by deleting statements such that the group of statements has only a single immediate successor. This ensures that "no statement increases its number of immediate successors as a result of statement deletion". This reminds me something. It is an algorithms of simplification of DFA which was sent by miss Sunny about a year ago and I still owe her an implementation.
The description of dataflow graph reminds me the representation of "sufficient condition and necessary condition" in data flow. Suppose a node in graph has m incoming edges and n outgoing edges, then the union of all m incoming edges are sufficient conditions and all n outgoing edges are necessary conditions. This matches our intuition that necessary conditions are often used as post-inspection methods while sufficient conditions are used as construction methods for a certain object.
In this section it also mentions about the determination of termination of a slicing program which is impossible by Turing's theorem. And as an alternative, it uses dataflow graph instead of accurate knowledge of original program. But how rough is this inaccurate representation? Is dataflow model rough enough so that it can avoid the non-termination puzzle?
The author also skipped the impossibility of finding smallest possible slice by asserting that evaluation of the functional equivalence of two pieces of code is impossible. But why? How to prove?
It mentioned a paper by Graham and Wegman(1976) about the algorithms of dataflow graph which calculates a node at
O(e * log(e)).
When REF(N)=DEF(N)={x}, it is called identity function on x. My understanding is like c++ statements of x++ or ++x;.
Notes on Li's paper
It mentioned another paper (Dumais and Li 2002) on distributed predicate detection.
It seems statements or actions of a program are in some kinds of total order. A distributed program make it mess. The total order becomes partial order. A slice is simply a subset of the partial order. If we can detect the invariable pattern, we find the predicates.
Is the dependency relation similar to functional dependency in database?
Equality is NOT defined by negative information, like empty set
Assume M,N,O are regular languages.
M ^ N ^ O = M ^ N is obviously false. However, to prove it is false, you cannot use concrete symbol replacement test.
i.e. replacing M,N,O with sigma {a},{b},{c}, LHS=empty set, RHS=empty set. Does NULL=NULL give you any information? NO, it is undefined. So, it is not the fault of concrete symbol replacement test. It is the problem of NULL or empty set.
Time is a Measurement of Order.
So does Predicate.
Context Free Language is a description of linear algorithm and the entity of each part is in a total order. If we discovered a subset of partial order, we discovered parallelism.(??)
my view of pattern
I believe there is a lost ring of chain in "pattern-recognition" because pattern itself needs to be discovered automatically. Or in other words, we need a more generic definition of pattern. WH doesn't agree at all. When scan a word, there are only three cases. word is a prefix, or a suffix, or a subset of a pattern. And my goal is to discover a "Regular Language" for pattern alphabets. In other words, all "pattern" phrases or atoms can span the input by "regular language" operations. I believe this is the simplest definition. With this first-dimension pattern definition, we are possible to advance to more dimensions.
So, discovery of pattern becomes a simple test of